JP7323098B2

JP7323098B2 - Dialogue support device, dialogue support system, and dialogue support program

Info

Publication number: JP7323098B2
Application number: JP2019076447A
Authority: JP
Inventors: 友裕黒木; 幹雄高橋; 勇志 ▲高▼井; 貴弘大塚; 隼人内出; 友哉澤田; 啓吾川島; 由佳津田; 哲郎志田; 諒吉田; 美穂石川; 隆義飯田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2023-08-08
Anticipated expiration: 2039-04-12
Also published as: JP2020173714A

Description

本発明は、対話支援装置、対話支援システム、及び対話支援プログラムに関する。 The present invention relates to a dialogue support device, a dialogue support system, and a dialogue support program.

日常的な対話において、相手の感情が把握できなかったり、相手に意図が正確に伝わらなかったりすることは少なくない。特に、テレビ会議システムや、スカイプ（登録商標）等を用いた遠隔地との対話においては、伝達される情報は、劣化した映像や音声となるため、より問題は顕著になる。 In daily conversations, it is not uncommon for people to not be able to grasp the other person's emotions or to convey their intentions accurately. In particular, in the case of teleconferencing systems, remote conversations using Skype (registered trademark), etc., the transmitted information is degraded video and audio, and the problem becomes even more pronounced.

一方、複数人で行うディスカッション等の場面では、全体を俯瞰しながら対話を適切にコントロールすることは難しい。例えば、発言が偏らない、意見のある人がきちんと発言の機会を与えられる、議論が白熱し過ぎない、といったコントロールである。 On the other hand, in a situation such as a discussion with a plurality of people, it is difficult to appropriately control the dialogue while overlooking the whole. For example, there are controls to ensure that statements are not biased, that people with opinions are given proper opportunities to speak, and that discussions are not overly heated.

このような問題を解決するために適用することのできる技術として、特許文献１には、人物の動画像から顔の表情を分析する表情分析手段と、人物の動画像から人物の顔の簡略画像を作成して記憶装置に蓄積する簡略画像作成手段と、表情の分析結果に従って記憶装置に蓄積されている簡略画像から対応する簡略画像を選択する簡略画像選択手段と、選択された簡略画像に対して、表情分析手段により分析した表情に応じて特殊効果を施す特殊効果処理手段とを備えた技術が開示されている。 As a technology that can be applied to solve such problems, Patent Document 1 discloses facial expression analysis means for analyzing a facial expression from a moving image of a person, and a simple image of a person's face from the moving image of a person. simplified image creation means for creating and storing in a storage device, simplified image selection means for selecting a corresponding simplified image from the simplified images stored in the storage device according to the analysis result of the facial expression, and for the selected simplified image and a special effect processing means for applying a special effect according to the facial expression analyzed by the facial expression analysis means.

また、特許文献２には、第１及び第２の装置が接続されたネットワークシステムであって、ユーザ情報と画像データとを第１の装置で取得する取得手段と、取得した画像データに含まれる顔領域を抽出する抽出手段と、抽出手段で抽出した顔領域の顔の表情を識別する識別手段と、顔の表情毎に、顔の表情を示す表情識別情報と表示情報とを関連付けて記憶する記憶手段と、表示情報の中から、識別手段で識別した顔の表情に対応する表情識別情報と関連付けて記憶されている表示情報を特定する特定手段と、特定された表示情報と取得手段で取得したユーザ情報とを対応付けて第２の装置の表示部に表示させる表示手段とを有する技術が開示されている。 Further, Patent Document 2 discloses a network system in which first and second devices are connected, in which acquiring means for acquiring user information and image data in the first device, and An extraction means for extracting a face area, an identification means for identifying the facial expression of the face area extracted by the extraction means, and for each facial expression, expression identification information indicating the facial expression and display information are stored in association with each other. A storage means, an identification means for identifying display information stored in association with expression identification information corresponding to the facial expression identified by the identification means from among the display information, and the identified display information and the acquisition means for obtaining the display information. A technique is disclosed that includes display means for displaying on a display unit of a second device in association with the user information obtained.

特開２００４－６４１０２号公報Japanese Patent Application Laid-Open No. 2004-64102 特開２０１５－１６５４０７号公報JP 2015-165407 A

しかしながら、特許文献１及び特許文献２の各文献に記載の技術では、対話の参加者の当該対話における全体的な状況については考慮されていないため、必ずしも効果的に対話を活性化することができるとは限らなかった。 However, the techniques described in Patent Literature 1 and Patent Literature 2 do not consider the overall situation of the dialogue participants in the dialogue, so the dialogue cannot necessarily be effectively activated. It wasn't.

本発明は、以上の事情を鑑みて成されたものであり、対話を効果的に活性化することのできる対話支援装置、対話支援システム、及び対話支援プログラムを提供することを目的とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a dialogue support device, a dialogue support system, and a dialogue support program capable of effectively activating dialogue.

請求項１に記載の本発明に係る対話支援装置は、対話の参加者の前記対話における状況を導出可能な物理量を取得する取得部と、前記取得部によって取得された前記物理量を用いて、前記参加者の前記対話における状況を導出する導出部と、前記導出部によって導出された前記状況に対応する状況情報を表示する表示処理、及び前記状況情報を記憶する記憶処理の少なくとも一方の処理を行う処理部と、を備え、前記状況情報は、前記参加者の感情を表す情報であり、前記感情を表す情報は、前記感情を表す画像情報であり、前記画像情報は、前記感情の度合いが最大となった場合における、対応する前記参加者の顔を撮影して得られた顔撮影画像情報であるものである。 According to claim 1, the dialogue support apparatus according to the present invention comprises an acquisition unit that acquires a physical quantity from which a situation in the dialogue of a dialogue participant can be derived, and using the physical quantity acquired by the acquisition unit, the performing at least one of a derivation unit for deriving a situation in the dialogue of the participant, a display process for displaying situation information corresponding to the situation derived by the derivation unit, and a storage process for storing the situation information a processing unit, wherein the situation information is information representing an emotion of the participant; the information representing the emotion is image information representing the emotion; This is the photographed face image information obtained by photographing the face of the corresponding participant in the case of .

請求項１に記載の本発明に係る対話支援装置によれば、対話における状況に対応する状況情報の表示及び記憶の少なくとも一方の処理を行うことで、対話を効果的に活性化することができる。 According to the dialogue support device according to the present invention, the dialogue can be effectively activated by performing at least one of displaying and storing the situation information corresponding to the situation in the dialogue. .

また、請求項１に記載の本発明に係る対話支援装置によれば、状況情報を、参加者の感情を表す情報とすることで、より効果的に対話の活性化を促すことができる。 In addition, according to the dialogue support device of the present invention, the situation information is information representing the emotions of the participants, so that activation of the dialogue can be promoted more effectively.

また、請求項１に記載の本発明に係る対話支援装置によれば、感情を表す情報を、感情を表す画像情報とすることで、より直感的に参加者の感情を把握することができる。 In addition, according to the dialogue support device of the present invention, the emotion-expressing image information is used as the emotion-expressing image information, so that the participant's emotion can be grasped more intuitively.

また、請求項１に記載の本発明に係る対話支援装置によれば、画像情報を、感情の度合いが最大となった場合における、対応する参加者の顔を撮影して得られた顔撮影画像情報とすることで、より効果的に参加者の感情を把握することができる。 Further, according to the dialogue support device of the present invention as set forth in claim 1 , the image information is a photographed face image obtained by photographing the face of the corresponding participant when the degree of emotion is maximized. By using information, it is possible to grasp the emotions of the participants more effectively.

請求項２に記載の本発明に係る対話支援装置は、請求項１に記載の対話支援装置であって、前記画像情報が、前記顔撮影画像情報に加えて、前記感情を誇張する情報が含まれる画像情報とされている。 According to claim 2 , the dialogue support apparatus according to the present invention is the dialogue support apparatus according to claim 1 , wherein the image information includes information exaggerating the emotion in addition to the photographed face image information. image information that can be

請求項２に記載の本発明に係る対話支援装置によれば、画像情報を、顔撮影画像情報に加えて、感情を誇張する情報が含まれる画像情報とすることで、より効果的に参加者の感情を把握することができる。 According to the dialogue support device according to the present invention described in claim 2 , the image information is image information that includes information that exaggerates emotions in addition to face-shot image information. can understand the emotions of

請求項３に記載の本発明に係る対話支援装置は、請求項１に記載の対話支援装置であって、前記物理量が、前記参加者を撮影して得られた画像、及び前記参加者の発言を示す音声の少なくとも一方とされている。 According to claim 3 , the dialogue support apparatus according to the present invention is the dialogue support apparatus according to claim 1 , wherein the physical quantity is an image obtained by photographing the participant and an image obtained by photographing the participant. At least one of the voices indicating

請求項３に記載の本発明に係る対話支援装置によれば、物理量を、参加者を撮影して得られた画像、及び参加者の発言を示す音声の少なくとも一方とすることで、特殊な装置を用いることなく、対話の活性化を促すことができる。 According to the dialogue support device according to the present invention described in claim 3 , the physical quantity is at least one of an image obtained by photographing the participant and a voice indicating the speech of the participant, so that the special device Activation of dialogue can be promoted without using

請求項４に記載の本発明に係る対話支援装置は、請求項３に記載の対話支援装置であって、前記状況が、前記参加者の感情の度合いを表す物理量、及び前記参加者の動作を表す物理量の少なくとも一方とされている。 According to claim 4 , the dialogue support apparatus according to the present invention is the dialogue support apparatus according to claim 3 , wherein the situation includes a physical quantity representing the degree of emotion of the participant and a motion of the participant. At least one of the physical quantities represented.

請求項４に記載の本発明に係る対話支援装置によれば、状況を、参加者の感情の度合いを表す物理量、及び参加者の動作を表す物理量の少なくとも一方とすることで、より効果的に対話の活性化を促すことができる。 According to the dialogue support device according to the present invention as recited in claim 4 , the situation is at least one of a physical quantity representing the degree of emotion of the participant and a physical quantity representing the action of the participant. It can encourage activation of dialogue.

請求項５に記載の本発明に係る対話支援装置は、請求項４に記載の対話支援装置であって、前記状況が、前記画像から得られる前記参加者のうなずきの頻度を示す物理量、前記画像から得られる前記参加者の表情の度合いを示す物理量、前記画像及び前記音声の少なくとも一方から得られる前記参加者の発言の度合いを示す物理量、の少なくとも１つとされている。 The dialogue support device according to the present invention described in claim 5 is the dialogue support device according to claim 4 , wherein the situation is a physical quantity indicating the frequency of nodding of the participant obtained from the image, and a physical quantity indicating the degree of speech of the participant obtained from at least one of the image and the voice.

請求項５に記載の本発明に係る対話支援装置によれば、状況を、画像から得られる参加者のうなずきの頻度を示す物理量、画像から得られる参加者の表情の度合いを示す物理量、画像及び音声の少なくとも一方から得られる参加者の発言の度合いを示す物理量、の少なくとも１つとすることで、より簡易に対話の活性化を促すことができる。
請求項６に記載の本発明に係る対話支援装置は、請求項５に記載の対話支援装置であって、前記発言の度合いが、直近の予め定められた期間の発言速度に応じた度合いであるものである。
請求項７に記載の本発明に係る対話支援装置は、請求項５に記載の対話支援装置であって、前記うなずきの頻度を示す物理量が、当該頻度の低下率を示す情報であるものである。
請求項８に記載の本発明に係る対話支援装置は、請求項１に記載の対話支援装置であって、前記感情を表す情報を学習するものである。
請求項９に記載の本発明に係る対話支援装置は、請求項１に記載の対話支援装置であって、前記状況情報が、前記対話の参加者全員の感情を表す情報であるものである。 According to the dialogue support device according to the present invention as recited in claim 5 , the situation is represented by a physical quantity indicating the frequency of nodding of the participant obtained from the image, a physical quantity indicating the degree of expression of the participant obtained from the image, the image and the By using at least one of the physical quantity obtained from at least one of the voices and indicating the degree of speech of the participant, activation of the dialogue can be prompted more easily.
According to claim 6, there is provided the dialogue support apparatus according to claim 5, wherein the degree of utterance is a degree corresponding to the utterance speed of the most recent predetermined period. It is.
According to claim 7, the dialogue support apparatus according to the present invention is the dialogue support apparatus according to claim 5, wherein the physical quantity indicating frequency of nodding is information indicating a decrease rate of the frequency. .
According to an eighth aspect of the present invention, there is provided a dialogue support device according to claim 1, which learns the information representing the emotion.
According to a ninth aspect of the present invention, there is provided a dialogue support device according to the first aspect, wherein the situation information is information representing emotions of all participants in the dialogue.

請求項１０に記載の本発明に係る対話支援システムは、請求項１から請求項９の何れか１項に記載の対話支援装置と、前記対話支援装置の前記取得部に前記対話における状況を導出可能な物理量を送信する送信部、及び前記対話支援装置の前記処理部が前記表示処理を行う場合に、当該表示処理の表示対象となる表示部、を備えた端末と、を含む。 According to claim 10 , there is provided a dialogue support system according to the present invention, comprising: the dialogue support device according to any one of claims 1 to 9 ; a terminal provided with a transmission unit that transmits possible physical quantities, and a display unit that is a display target of the display processing when the processing unit of the dialogue support device performs the display processing.

請求項１０に記載の本発明に係る対話支援システムによれば、対話における状況に対応する状況情報の表示及び記憶の少なくとも一方の処理を行うことで、対話を効果的に活性化することができる。 According to the dialogue support system according to the tenth aspect of the present invention, the dialogue can be effectively activated by performing at least one of the processing of displaying and storing the situation information corresponding to the situation in the dialogue. .

請求項１１に記載の本発明に係る対話支援プログラムは、対話の参加者の前記対話における状況を導出可能な物理量を取得し、取得した前記物理量を用いて、前記参加者の前記対話における状況を導出し、導出した前記状況に対応する状況情報を表示する表示処理、及び前記状況情報を記憶する記憶処理の少なくとも一方の処理を行う、処理をコンピュータが実行する対話支援プログラムであって、前記状況情報は、前記参加者の感情を表す情報であり、前記感情を表す情報は、前記感情を表す画像情報であり、前記画像情報は、前記感情の度合いが最大となった場合における、対応する前記参加者の顔を撮影して得られた顔撮影画像情報であるものである。 According to claim 11 , the dialogue support program acquires a physical quantity from which a situation in the dialogue of a participant in a dialogue can be derived, and uses the acquired physical quantity to estimate the situation in the dialogue of the participant. A dialog support program for executing at least one of a display process for deriving and displaying situation information corresponding to the derived situation and a storage process for storing the situation information, wherein the situation is The information is information representing the emotion of the participant, the information representing the emotion is image information representing the emotion, and the image information is the corresponding image of the participant when the degree of the emotion is maximum. This is face photographed image information obtained by photographing the face of the participant .

請求項１１に記載の本発明に係る対話支援プログラムによれば、対話における状況に対応する状況情報の表示及び記憶の少なくとも一方の処理を行うことで、対話を効果的に活性化することができる。 According to the dialogue support program according to the eleventh aspect of the present invention, the dialogue can be effectively activated by at least one of displaying and storing the situation information corresponding to the situation in the dialogue. .

以上説明したように、本発明によれば、対話を効果的に活性化することができる。 As described above, according to the present invention, dialogue can be effectively activated.

実施形態に係る対話支援システムのハードウェア構成の一例を示すブロック図である。It is a block diagram showing an example of hardware constitutions of a dialogue support system concerning an embodiment. 実施形態に係る対話支援システムの機能的な構成の一例を示すブロック図である。It is a block diagram showing an example of functional composition of a dialogue support system concerning an embodiment. 実施形態に係る発言度の説明に供するタイムチャートである。5 is a time chart for explaining the degree of speech according to the embodiment; 実施形態に係る状況対応情報データベースの構成の一例を示す模式図である。FIG. 3 is a schematic diagram showing an example of the configuration of a situation correspondence information database according to the embodiment; 実施形態に係る対応情報の学習方法の説明に供する模式図である。It is a schematic diagram with which it uses for description of the learning method of the correspondence information which concerns on embodiment. 実施形態に係る対話情報データベースの構成の一例を示す模式図である。4 is a schematic diagram showing an example of the configuration of a dialogue information database according to the embodiment; FIG. 実施形態に係る顔文字・誇張情報データベースの構成の一例を示す模式図である。FIG. 3 is a schematic diagram showing an example of the configuration of an emoticon/exaggeration information database according to the embodiment; 実施形態に係る対話支援処理の一例を示すフローチャートである。6 is a flowchart illustrating an example of dialogue support processing according to the embodiment; 実施形態に係る対話支援画像の構成の一例を示す正面図である。4 is a front view showing an example of the configuration of a dialogue support image according to the embodiment; FIG. 実施形態に係る派閥情報表示処理の一例を示すフローチャートである。9 is a flowchart showing an example of faction information display processing according to the embodiment; 実施形態に係る派閥関係表示画像の構成の一例を示す正面図である。FIG. 5 is a front view showing an example of the configuration of a faction display image according to the embodiment; 実施形態に係る対応情報の他の決定方法の説明に供するタイムチャートである。7 is a time chart for explaining another method of determining correspondence information according to the embodiment;

以下、図面を参照して、本発明を実施するための形態例を詳細に説明する。なお、本実施形態では、本発明を、複数人で会議を行う場合における対話（会議での発言）を統括的に支援する対話支援装置と、各々対話の参加者が個別に用いる複数の端末と、を含む対話支援システムに適用した場合について説明する。また、本実施形態では、対話の各参加者が互いに異なる遠隔地に分散している場合について説明する。 Embodiments for carrying out the present invention will be described in detail below with reference to the drawings. In this embodiment, the present invention is implemented by a dialogue support device that comprehensively supports dialogue (speech in a conference) when a conference is held by a plurality of people, and a plurality of terminals that are individually used by each participant in the dialogue. A case of application to a dialogue support system including . Also, in the present embodiment, a case will be described in which the participants in the dialogue are dispersed in different remote locations.

まず、図１及び図２を参照して、本実施形態に係る対話支援システム９０の構成を説明する。図１に示すように、本実施形態に係る対話支援システム９０は、ネットワーク８０に各々アクセス可能とされた、対話支援装置１０と、複数の端末２０と、を含む。なお、対話支援装置１０の例としては、パーソナルコンピュータ及びサーバコンピュータ等の情報処理装置が挙げられる。また、端末２０の例としては、据え置き型やノートブック型等のパーソナルコンピュータや、スマートフォン、タブレット端末等の携帯型の端末が挙げられる。 First, the configuration of a dialogue support system 90 according to the present embodiment will be described with reference to FIGS. 1 and 2. FIG. As shown in FIG. 1 , a dialogue support system 90 according to this embodiment includes a dialogue support device 10 and a plurality of terminals 20 each of which can access a network 80 . Examples of the dialogue support device 10 include information processing devices such as personal computers and server computers. Examples of the terminal 20 include stationary and notebook personal computers, and portable terminals such as smartphones and tablet terminals.

本実施形態に係る端末２０は、対話支援システム９０を用いた会議での対話の参加者（以下、単に「参加者」という。）に各々割り当てられた端末である。端末２０は、ＣＰＵ（Central Processing Unit）２１、一時記憶領域としてのメモリ２２、不揮発性の記憶部２３、タッチパネル等の入力部２４、液晶ディスプレイ等の表示部２５及び媒体読み書き装置（Ｒ／Ｗ）２６を備えている。また、端末２０は、カメラ２８、マイク２９及び無線通信部２７を備えている。ＣＰＵ２１、メモリ２２、記憶部２３、入力部２４、表示部２５、媒体読み書き装置２６、カメラ２８、マイク２９及び無線通信部２７はバスＢ１を介して互いに接続されている。媒体読み書き装置２６は、記録媒体９６に書き込まれている情報の読み出し及び記録媒体９６への情報の書き込みを行う。 The terminals 20 according to the present embodiment are terminals assigned to participants of dialogue in a conference using the dialogue support system 90 (hereinafter simply referred to as “participants”). The terminal 20 includes a CPU (Central Processing Unit) 21, a memory 22 as a temporary storage area, a nonvolatile storage unit 23, an input unit 24 such as a touch panel, a display unit 25 such as a liquid crystal display, and a medium reading/writing device (R/W). 26. The terminal 20 also includes a camera 28 , a microphone 29 and a wireless communication section 27 . The CPU 21, memory 22, storage section 23, input section 24, display section 25, medium read/write device 26, camera 28, microphone 29 and wireless communication section 27 are connected to each other via a bus B1. The medium read/write device 26 reads information written in the recording medium 96 and writes information to the recording medium 96 .

記憶部２３は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリ等によって実現される。なお、本実施形態に係る対話支援システム９０では、各端末２０のカメラ２８の画角内に端末２０を用いる参加者の顔が収まり、かつ、各端末２０のマイク２９による集音範囲内に端末２０を用いる参加者の発言が入るように、各端末２０が位置決めされている。 The storage unit 23 is implemented by a HDD (Hard Disk Drive), SSD (Solid State Drive), flash memory, or the like. In the dialogue support system 90 according to the present embodiment, the face of the participant using the terminal 20 fits within the angle of view of the camera 28 of each terminal 20 and the sound collection range of the microphone 29 of each terminal 20 Each terminal 20 is positioned to receive the speech of the participant using 20 .

一方、対話支援装置１０は、対話支援システム９０で取り扱う各種情報を統括的に保管して管理する装置である。対話支援装置１０は、ＣＰＵ１１、一時記憶領域としてのメモリ１２、不揮発性の記憶部１３、キーボードとマウス等の入力部１４、液晶ディスプレイ等の表示部１５、媒体読み書き装置１６及び通信インタフェース（Ｉ／Ｆ）部１８を備えている。ＣＰＵ１１、メモリ１２、記憶部１３、入力部１４、表示部１５、媒体読み書き装置１６及び通信Ｉ／Ｆ部１８はバスＢ２を介して互いに接続されている。媒体読み書き装置１６は、記録媒体１７に書き込まれている情報の読み出し及び記録媒体１７への情報の書き込みを行う。 On the other hand, the dialogue support device 10 is a device that comprehensively stores and manages various kinds of information handled by the dialogue support system 90 . The dialogue support device 10 includes a CPU 11, a memory 12 as a temporary storage area, a non-volatile storage section 13, an input section 14 such as a keyboard and a mouse, a display section 15 such as a liquid crystal display, a medium read/write device 16, and a communication interface (I/ F) a section 18; The CPU 11, memory 12, storage section 13, input section 14, display section 15, medium read/write device 16, and communication I/F section 18 are connected to each other via a bus B2. The medium read/write device 16 reads information written in the recording medium 17 and writes information to the recording medium 17 .

記憶部１３はＨＤＤ、ＳＳＤ、フラッシュメモリ等によって実現される。記憶媒体としての記憶部１３には、対話支援プログラム１３Ａが記憶されている。対話支援プログラム１３Ａは、対話支援プログラム１３Ａが書き込まれた記録媒体１７が媒体読み書き装置１６にセットされ、媒体読み書き装置１６が記録媒体１７からの対話支援プログラム１３Ａの読み出しを行うことで、記憶部１３へ記憶される。ＣＰＵ１１は、対話支援プログラム１３Ａを記憶部１３から読み出してメモリ１２に展開し、対話支援プログラム１３Ａが有するプロセスを順次実行する。 The storage unit 13 is implemented by an HDD, SSD, flash memory, or the like. A dialogue support program 13A is stored in the storage unit 13 as a storage medium. The recording medium 17 in which the dialogue support program 13A is written is set in the medium reading/writing device 16, and the medium reading/writing device 16 reads out the dialogue support program 13A from the recording medium 17. stored to The CPU 11 reads out the dialog support program 13A from the storage unit 13, develops it in the memory 12, and sequentially executes the processes of the dialog support program 13A.

また、記憶部１３には、状況対応情報データベース１３Ｂ、対話情報データベース１３Ｃ及び顔文字・誇張情報データベース１３Ｄが記憶される。状況対応情報データベース１３Ｂ、対話情報データベース１３Ｃ及び顔文字・誇張情報データベース１３Ｄについては、詳細を後述する。 The storage unit 13 also stores a situation correspondence information database 13B, a dialogue information database 13C, and an emoticon/exaggeration information database 13D. The situation correspondence information database 13B, the dialogue information database 13C, and the emoticon/exaggeration information database 13D will be described later in detail.

次に、図２を参照して、本実施形態に係る対話支援装置１０及び端末２０の機能的な構成について説明する。図２に示すように、対話支援装置１０は、取得部１１Ａ、導出部１１Ｂ及び処理部１１Ｃを含む。対話支援装置１０のＣＰＵ１１が対話支援プログラム１３Ａを実行することで、取得部１１Ａ、導出部１１Ｂ及び処理部１１Ｃとして機能する。 Next, functional configurations of the dialogue support device 10 and the terminal 20 according to the present embodiment will be described with reference to FIG. As shown in FIG. 2, the dialogue support device 10 includes an acquisition unit 11A, a derivation unit 11B, and a processing unit 11C. By executing the dialogue support program 13A, the CPU 11 of the dialogue support device 10 functions as an acquisition unit 11A, a derivation unit 11B, and a processing unit 11C.

本実施形態に係る取得部１１Ａは、参加者の対話における状況を導出可能な物理量を取得する。本実施形態に係る取得部１１Ａでは、上記物理量として、参加者を撮影して得られた画像（以下、「撮影画像」という。）、及び参加者の発言を示す音声（以下、「発言音声」という。）の２種類の物理量を適用しているが、これに限らない。例えば、撮影画像及び発言音声の何れか一方のみを上記物理量として適用する形態としてもよい。 The acquisition unit 11A according to the present embodiment acquires physical quantities from which situations in the dialogue of the participants can be derived. In the acquisition unit 11A according to the present embodiment, as the physical quantities, an image obtained by photographing the participant (hereinafter referred to as "captured image") and a voice indicating the participant's utterance (hereinafter referred to as "utterance voice") ) are applied, but the present invention is not limited to this. For example, only one of the captured image and the spoken voice may be applied as the physical quantity.

また、導出部１１Ｂは、取得部１１Ａによって取得された物理量を用いて、参加者の対話における状況を導出する。本実施形態に係る導出部１１Ｂでは、上記状況として、参加者の感情の度合いを表す物理量（以下、「感情度」という。）、及び参加者の動作を表す物理量（以下、「動作量」という。）の２種類の物理量を参加者別に導出する。より具体的には、本実施形態に係る導出部１１Ｂは、上記動作量として、撮影画像から得られる受話者の所定期間（本実施形態では、１０秒間）当たりのうなずきの回数（以下、「うなずき頻度」という。）Ｎ、及び発言音声から得られる発話者の発言の度合いを示す物理量（以下、「発言度」という。）Ｈを導出する。また、導出部１１Ｂは、上記感情度として、撮影画像から得られる参加者の表情の度合いを示す物理量（以下、「表情度」という。）を導出する。 Further, the deriving unit 11B uses the physical quantity acquired by the acquiring unit 11A to derive the situation in the dialogue of the participants. In the derivation unit 11B according to the present embodiment, the physical quantity representing the degree of emotion of the participant (hereinafter referred to as "emotion level") and the physical quantity representing the movement of the participant (hereinafter referred to as "motion amount") ) are derived for each participant. More specifically, the derivation unit 11B according to the present embodiment uses the number of times of nodding of the receiver per predetermined period (10 seconds in the present embodiment) obtained from the captured image (hereinafter referred to as “nodding N, and a physical quantity (hereinafter referred to as "utterance level") H indicating the degree of utterance of the speaker obtained from the utterance voice are derived. The derivation unit 11B also derives a physical quantity (hereinafter referred to as "expression level") indicating the degree of expression of the participant obtained from the photographed image as the level of emotion.

より具体的に、本実施形態に係る導出部１１Ｂは、上記表情度として、対応する参加者の怒りの度合いを示す怒り度Ｉ、対応する参加者の嫌悪の度合いを示す嫌悪度Ｋ、及び対応する参加者の恐れの度合いを示す恐れ度Ｏを導出する。また、本実施形態に係る導出部１１Ｂは、上記表情度として、対応する参加者の喜びの度合いを示す喜び度Ｙ、対応する参加者の悲しみの度合いを示す悲しみ度Ｓ、及び対応する参加者の驚きの度合いを示す驚き度Ｂを導出する。 More specifically, the deriving unit 11B according to the present embodiment uses, as the expression degrees, an anger level I indicating the degree of anger of the corresponding participant, a disgust level K indicating the degree of disgust of the corresponding participant, and a response A fear degree O, which indicates the degree of fear of the participants, is derived. Further, the derivation unit 11B according to the present embodiment uses, as the expression levels, a joy level Y indicating the degree of joy of the corresponding participant, a sadness level S indicating the degree of sadness of the corresponding participant, and a A degree of surprise B indicating the degree of surprise of is derived.

なお、本実施形態では、これらの６種類の感情度を、対応する参加者が用いる端末２０のカメラ２８により得られた撮影画像に基づいて、例えば、“Real-time face detection and emotion/gender classification”、インターネット＜ＵＲＬ：https://github.com/oarriaga/face_classification＞等に記載の既知の技術を適用して導出する。 In addition, in this embodiment, these six types of emotion levels are based on images captured by the camera 28 of the terminal 20 used by the corresponding participant, for example, "Real-time face detection and emotion/gender classification ”, the Internet <URL: https://github.com/oarriaga/face_classification> or the like is derived by applying the known technology described.

この技術では、ニューラルネットワークライブラリであるＫｅｒａｓをベースとして、ＣＮＮ（Convolutional Neural Network、畳み込みニューラルネットワーク）により顔の特徴を抽出し、各感情を認識する。例えば、笑顔（喜び度Ｙ）であれば笑顔の特徴に関するデータベースが用意されており、対象となる撮影画像から顔の要素（例えば、部分的な目、鼻、口などの形。）から類似度を判定する。本実施形態では、この類似度を感情度として適用する。また、本実施形態では、上記６種類の感情度を、共に共通の範囲（本実施形態では、０から１００までの範囲）とするように正規化した値として導出する。 In this technology, based on the neural network library Keras, facial features are extracted by a CNN (Convolutional Neural Network) to recognize each emotion. For example, in the case of a smile (with a degree of joy Y), a database of smile features is prepared. judge. In this embodiment, this degree of similarity is applied as the degree of emotion. Further, in this embodiment, the six types of emotion levels are derived as normalized values so that they all have a common range (a range from 0 to 100 in this embodiment).

なお、各感情度の導出は、他にもマイクロソフト社のＡｚｕｒｅ（登録商標）で提供されているサービスであるＥｍｏｔｉｏｎＡＰＩ（Application Programming Interface）等の多くの既知の技術を適用することにより可能であるため、ここでの、これ以上の説明は省略する。 The derivation of each emotion level is also possible by applying many known technologies such as the Emotion API (Application Programming Interface), which is a service provided by Microsoft's Azure (registered trademark). Therefore, further description is omitted here.

このように、本実施形態では、上記６種類の感情度を適用しているが、これに限らず、上記６種類のうちの１種類、又は５種類以下の複数種類の組み合わせを適用する形態としてもよい。 As described above, in the present embodiment, the above six types of emotion levels are applied. good too.

一方、本実施形態に係る導出部１１Ｂは、発言度Ｈを次の式（１）により算出する。式（１）におけるｓ（ｔ）は、対象とする発話者の発言速度（＝発言文字数／秒）を表す。 On the other hand, the derivation unit 11B according to the present embodiment calculates the speech degree H using the following equation (1). s(t) in Equation (1) represents the utterance speed (=the number of uttered characters/second) of the target speaker.

即ち、式（１）は、直近の１０分間（６００秒間）の発言速度ｓ（ｔ）を、算出時点に近い発言ほど重み値を大きくして積算して得られる値を発言度Ｈとして算出する。本実施形態では、発話者の発言速度ｓ（ｔ）を導出する際に用いる発言文字数として、対応する発言音声を、既知の音声認識技術によって認識し、これによって得られたテキスト情報の文字数を適用するが、これに限るものではない。 That is, the formula (1) calculates the speech rate H as the value obtained by integrating the speech speed s(t) for the most recent 10 minutes (600 seconds), with the weight value increased for speech closer to the time of calculation. . In this embodiment, as the number of uttered characters used when deriving the utterance speed s(t) of the speaker, the corresponding uttered voice is recognized by a known speech recognition technology, and the number of characters of the text information obtained by this is applied. However, it is not limited to this.

例えば、通常、会議の場における各参加者の発言は、一例として図３に示すように、他者の発言の間に纏めて行われるが、本実施形態では、算出時点に近いタイミングでの発言速度ｓ（ｔ）ほど重視するものとしている。これにより、発言度Ｈを、対応する発話者の対話中の話題に対する理解の高さを、より的確に表すものとして算出できるようにしている。 For example, as shown in FIG. 3 as an example, each participant's utterances in a conference are usually grouped together between other utterances. It is assumed that the speed s(t) is more important. As a result, the utterance level H can be calculated as a more accurate representation of the level of understanding of the topic during the conversation of the corresponding utterer.

なお、発言度Ｈを算出する数式は、式（１）には限らない。例えば、式（１）において適用した直近の１０分間は一例であり、他の期間としてもよいことは言うまでもない。また、式（１）では、算出時点に近い発言ほど重み値を大きくしているが、この重み付けを行うことなく発言度Ｈを算出する形態としてもよい。また、本実施形態では、発言度Ｈの導出に数式を用いる場合について説明したが、この形態に限らず、例えば、テーブル変換により発言度Ｈを導出する形態としてもよい。更に、本実施形態では、発言度Ｈを、発言音声を用いて導出しているが、これに限らない。例えば、撮影画像を用いて、各参加者の口の動きから発言速度ｓ（ｔ）を導出し、この発言速度ｓ（ｔ）を式（１）に代入することによって発言度Ｈを算出する形態等としてもよい。 Note that the formula for calculating the speech degree H is not limited to formula (1). For example, the most recent 10 minutes applied in formula (1) is an example, and it goes without saying that other periods may be used. In addition, in Equation (1), the closer the utterance to the calculation time is, the larger the weight value is. However, the utterance level H may be calculated without weighting. Also, in the present embodiment, a case has been described in which a mathematical formula is used to derive the speech degree H, but the present invention is not limited to this form, and a form in which the speech degree H is derived by table conversion, for example, is also possible. Furthermore, in the present embodiment, the utterance degree H is derived using the utterance voice, but the present invention is not limited to this. For example, using a photographed image, the utterance speed s(t) is derived from the movement of each participant's mouth, and the utterance rate H is calculated by substituting this utterance speed s(t) into Equation (1). etc.

また、本実施形態に係る導出部１１Ｂは、うなずき頻度Ｎを次の式（２）により算出する。式（２）におけるｎ（ｔ）は、対象とする受話者の所定時間当たりのうなずき回数（＝うなずき回数／秒）を表す。 Further, the derivation unit 11B according to the present embodiment calculates the nodding frequency N using the following equation (2). n(t) in equation (2) represents the number of nods of the target listener per predetermined time (=the number of nods/second).

即ち、式（２）は、直近の１０分間（６００秒間）の所定時間当たりのうなずき回数ｎ（ｔ）を、算出時点に近いうなずきほど重み値を大きくして積算して得られる値をうなずき頻度Ｎとして算出する。本実施形態では、撮影画像に含まれる受話者の顔画像が、当該受話者から見て前方で、かつ、下方に傾斜したことに引き続いてほぼ元の位置に復帰した場合に、１回うなずいたと判断している。なお、本実施形態では、顔画像の傾斜及び復帰の検出を、顔画像の所定部位の画像（本実施形態では、目の画像）が下方に所定距離（本実施形態では、３ｍｍ）以上移動した後に、ほぼ元の位置に戻ったことを検出することにより行っているが、これに限るものではないことは言うまでもない。 That is, Equation (2) calculates the nodding frequency n(t) by multiplying the number of times n(t) of nodding per predetermined period of time over the most recent 10 minutes (600 seconds), with the weight value increased for the nodding closer to the calculation time. Calculate as N. In this embodiment, when the face image of the receiver included in the photographed image is in front of the receiver and tilts downward, and then returns to the original position, it is considered that the receiver has nodded once. Deciding. Note that in this embodiment, the detection of the tilt and return of the face image is performed when the image of a predetermined portion of the face image (eye image in this embodiment) moves downward by a predetermined distance (3 mm in this embodiment) or more. Although this is done by detecting the return to the original position later, it goes without saying that this is not the only option.

なお、うなずき頻度Ｎを算出する数式は、式（２）には限らない。例えば、式（２）において適用した直近の１０分間は一例であり、他の期間としてもよいことは言うまでもない。また、式（２）では、算出時点に近いうなずき回数ｎ（ｔ）ほど重み値を大きくしているが、この重み付けを行うことなくうなずき頻度Ｎを算出する形態としてもよい。また、本実施形態では、うなずき頻度Ｎの導出に数式を用いる場合について説明したが、この形態に限らず、例えば、テーブル変換によりうなずき頻度Ｎを導出する形態としてもよい。 Note that the formula for calculating the nodding frequency N is not limited to formula (2). For example, the most recent 10 minutes applied in equation (2) is an example, and it goes without saying that other periods may be used. In addition, in equation (2), the weight value increases as the number of times n(t) of nodding is closer to the time of calculation, but the nodding frequency N may be calculated without this weighting. Further, in the present embodiment, the case where a formula is used to derive the nodding frequency N has been described, but the present invention is not limited to this form, and the nodding frequency N may be derived by table conversion, for example.

そして、処理部１１Ｃは、導出部１１Ｂによって導出された上記状況に対応する状況情報を端末２０の表示部２５に表示する表示処理、及び上記状況情報を記憶部１３に記憶する記憶処理の双方の処理を行う。但し、この形態に限らず、上記表示処理及び上記記憶処理の何れか一方の処理を行う形態としてもよい。 Then, the processing unit 11C performs both display processing of displaying the situation information corresponding to the situation derived by the deriving unit 11B on the display unit 25 of the terminal 20 and storage processing of storing the situation information in the storage unit 13. process. However, the present invention is not limited to this form, and may be a form in which either one of the display process and the storage process is performed.

本実施形態では、上記状況情報として、対応する参加者の感情を表す情報を適用している。より具体的には、本実施形態では、上記感情を表す情報として、当該感情を表すテキスト情報、画像情報、及び各参加者の相互間における感情の関係を示す情報を適用している。 In this embodiment, information representing the emotion of the corresponding participant is applied as the situation information. More specifically, in this embodiment, text information and image information representing the emotion, and information representing the emotional relationship between the participants are applied as the information representing the emotion.

なお、本実施形態では、上記テキスト情報として、対応する参加者のうちの何れかの発言者による発言に対する他者の感情を表す情報を適用している。また、本実施形態では、上記画像情報として、顔文字を適用している。また、本実施形態では、上記画像情報として、感情の度合いが最大となった場合における、対応する参加者の顔を撮影して得られた顔撮影画像情報及び感情を誇張する情報が含まれる画像情報も適用している。 In the present embodiment, as the text information, information representing the feelings of others with respect to statements made by one of the corresponding participants is applied. Further, in this embodiment, emoticons are applied as the image information. Further, in the present embodiment, as the image information, an image including face photographed image information obtained by photographing the face of the corresponding participant when the degree of emotion is maximized and information exaggerating the emotion. information is also applied.

一方、本実施形態に係る端末２０は、制御部２１Ａを含む。端末２０のＣＰＵ２１が、記憶部２３に予め記憶された図示しない対話支援アプリケーション・プログラムを実行することで、制御部２１Ａとして機能する。 On the other hand, the terminal 20 according to this embodiment includes a control section 21A. CPU 21 of terminal 20 functions as control section 21A by executing a dialogue support application program (not shown) pre-stored in storage section 23 .

本実施形態に係る制御部２１Ａは、送信部としての無線通信部２７を介して、対話支援装置１０の取得部１１Ａに、上記対話における状況を導出可能な物理量を送信する。また、制御部２１Ａは、対話支援装置１０の処理部１１Ｃが上記表示処理を行う場合に、当該表示処理の表示対象となる表示部２５を制御する。 The control unit 21A according to the present embodiment transmits the physical quantity from which the situation in the dialogue can be derived to the acquisition unit 11A of the dialogue support device 10 via the wireless communication unit 27 as a transmission unit. Further, when the processing unit 11C of the dialogue support device 10 performs the display processing, the control unit 21A controls the display unit 25 to be displayed in the display processing.

次に、図４を参照して、本実施形態に係る状況対応情報データベース１３Ｂについて説明する。図４に示すように、本実施形態に係る状況対応情報データベース１３Ｂは、状況を示す情報と、対応する状況において、対応する参加者の感情を示すものとして当該参加者に対応付けて表示するテキスト情報である対応情報とが関連付けられて記憶されている。上記対応情報が、本発明の感情を表すテキスト情報に相当する。 Next, the situation correspondence information database 13B according to this embodiment will be described with reference to FIG. As shown in FIG. 4, the situation correspondence information database 13B according to the present embodiment includes information indicating a situation and a text displayed in association with the participant as indicating the emotion of the corresponding participant in the corresponding situation. Correspondence information, which is information, is stored in association therewith. The corresponding information corresponds to the text information representing emotion of the present invention.

ここで、上記状況を示す情報には、図４に示すように、発話者による発言度Ｈ及び受話者による６種類の感情度に加えて、受話者によるうなずき頻度Ｎの所定時間（本実施形態では、６０秒）前からの低下率を示す、うなずき頻度低下率Ｕが含まれる。 Here, as shown in FIG. 4, the information indicating the situation includes, in addition to speech level H by the speaker and six types of emotion levels by the listener, a predetermined time period of nodding frequency N by the listener (this embodiment). In this case, the nodding frequency decrease rate U, which indicates the decrease rate from 60 seconds ago, is included.

また、対応する状況に対応する上記対応情報は、一例として以下のように導出する。即ち、まず、一例として図５に示すように、会議の場で想定される「ＩＦ（状況）ＴＨＥＮ（対応情報）」を予め仮説として多数用意する。図５に示す例では、発話者の発言度Ｈが１８０以上であり、かつ、受話者のうなずき頻度低下率Ｕが５０％以上であり、かつ、受話者の怒り度Ｉが５０以上である状況の場合、受話者の感情を示す対応情報として「ちょっと話についていけないなぁ」を仮説としている。 Further, the correspondence information corresponding to the corresponding situation is derived as follows as an example. That is, first, as shown in FIG. 5 as an example, a large number of "IF (situation) THEN (correspondence information)" assumed in the meeting place are prepared in advance as hypotheses. In the example shown in FIG. 5, the utterance level H of the speaker is 180 or more, the nodding frequency reduction rate U of the listener is 50% or more, and the anger level I of the listener is 50 or more. In the case of , "I can't keep up with what you're saying" is hypothesized as corresponding information indicating the listener's emotion.

そして、本実施形態では、仮説として用意した多数の状況及び対応情報の組み合わせを実際の会議の場で適用して、状況の条件を満足する対応情報を端末２０に表示させ、当該表示が有効であったか否かを繰り返し評価することにより学習する。そして、この学習によって得られた対応情報を状況対応情報データベース１３Ｂに反映させる。なお、ここで行う評価は、受話者の主観による評価でもよいし、対応情報を表示した後の実際の改善効果（例えば、笑顔が増える、発言が増える等）といった客観的な評価でもよい。 Then, in the present embodiment, a combination of a large number of situations and response information prepared as hypotheses is applied in an actual meeting, and response information that satisfies the conditions of the situation is displayed on the terminal 20, and the display is effective. It learns by repeatedly evaluating whether or not there was. Then, the correspondence information obtained by this learning is reflected in the situation correspondence information database 13B. The evaluation performed here may be a subjective evaluation by the receiver, or may be an objective evaluation such as an actual improvement effect after the response information is displayed (for example, more smiles, more remarks, etc.).

このように、本実施形態では、状況に対応する対応情報を学習させているが、必ずしも学習を行う必要はなく、予め仮説として用意した状況及び対応情報そのものを状況対応情報データベース１３Ｂに選択的に適用する形態としてもよい。 As described above, in the present embodiment, the correspondence information corresponding to the situation is learned, but it is not always necessary to learn. It is good also as a form to apply.

次に、図６を参照して、本実施形態に係る対話情報データベース１３Ｃについて説明する。図６に示すように、本実施形態に係る対話情報データベース１３Ｃは、端末ＩＤ（IDentification）、画像データ、音声データ、テキストデータ、時刻、及び最大感情度の各情報が関連付けられて記憶される。 Next, the dialogue information database 13C according to this embodiment will be described with reference to FIG. As shown in FIG. 6, the dialogue information database 13C according to the present embodiment stores information such as a terminal ID (IDentification), image data, voice data, text data, time, and maximum emotional level in association with each other.

上記端末ＩＤは、各参加者が用いる端末２０を識別するために割り振られた情報である。なお、本実施形態では、端末ＩＤと、当該端末ＩＤが割り振られた端末２０を用いる参加者を示す情報（本実施形態では、名前）とが対応付けられて記憶部１３に記憶されている。従って、対話支援装置１０は、何れかの端末２０との間で通信を行う場合に、当該端末２０を用いる参加者を特定することができる。 The terminal ID is information assigned to identify the terminal 20 used by each participant. Note that in the present embodiment, the terminal ID and the information (name in the present embodiment) indicating the participant using the terminal 20 to which the terminal ID is assigned are associated and stored in the storage unit 13 . Therefore, the dialogue support device 10 can identify the participant using the terminal 20 when communicating with any terminal 20 .

また、上記画像データは、対応する端末２０から取得された撮影画像を示す情報であり、上記音声データは、対応する端末２０から取得された発言音声を示す情報であり、上記テキストデータは、対応する発言音声をテキスト化した情報である。なお、本実施形態では、上記テキストデータを、対応する音声データを、既知の音声認識技術を用いてテキストデータに変換することで得ている。 Further, the image data is information indicating a photographed image acquired from the corresponding terminal 20, the audio data is information indicating the utterance voice acquired from the corresponding terminal 20, and the text data is information representing the corresponding It is information that converts the utterance voice to text. In this embodiment, the text data is obtained by converting the corresponding voice data into text data using a known voice recognition technique.

また、上記時刻は、対応する画像データ及び音声データが取得された日時を示す情報であり、上記最大感情度は、対応する参加者の、対応する画像データが得られている期間内における最大値となる感情度の種類を示す情報である。 Further, the time is information indicating the date and time when the corresponding image data and audio data were acquired, and the maximum emotional level is the maximum value of the corresponding participant within the period during which the corresponding image data is obtained. This is information indicating the type of emotion level to be.

なお、本実施形態では、図６に示すように、最大感情度における各参加者を示す情報として、当該参加者が用いる端末２０の端末ＩＤを適用しているが、これに限らないことは言うまでもない。また、図６では、最大感情度の種類を符号のみで表しているが、例えば、‘Ｉ’は怒り度Ｉを表し、‘Ｏ’は恐れ度Ｏを表している。更に、図６では、最大感情度を発話者のみについて対話情報データベース１３Ｃに記憶している場合を例示しているが、これに限らず、対応する期間における受話者の最大感情度も対話情報データベース１３Ｃに記憶する形態としてもよい。 In this embodiment, as shown in FIG. 6, the terminal ID of the terminal 20 used by the participant is used as the information indicating each participant at the maximum emotional level. stomach. Also, in FIG. 6, the types of the maximum emotional level are represented only by symbols, but for example, 'I' represents the anger level I, and 'O' represents the fear level O. As shown in FIG. Furthermore, although FIG. 6 illustrates the case where the maximum emotional level is stored in the dialogue information database 13C only for the speaker, the maximum emotional level of the listener during the corresponding period is also stored in the dialogue information database 13C. 13C may be used.

次に、図７を参照して、本実施形態に係る顔文字・誇張情報データベース１３Ｄについて説明する。図７に示すように、本実施形態に係る顔文字・誇張情報データベース１３Ｄは、最大感情度、顔文字、及び誇張情報の各情報が関連付けられて記憶されている。 Next, the emoticon/exaggeration information database 13D according to the present embodiment will be described with reference to FIG. As shown in FIG. 7, the emoticon/exaggerated information database 13D according to the present embodiment stores each piece of information of maximum emotional level, emoticon, and exaggerated information in association with each other.

上記最大感情度は上述した対話情報データベース１３Ｃの最大感情度と同一の情報であり、上記顔文字は、対応する最大感情度に対応する顔文字を示すデータであり、上記誇張情報は、対応する最大感情度に対応する誇張の内容を示す情報である。 The maximum emotion level is the same information as the maximum emotion level of the dialogue information database 13C, the emoticon is data indicating the emoticon corresponding to the corresponding maximum emotion level, and the exaggerated information is the corresponding This is information indicating the content of the exaggeration corresponding to the maximum degree of emotion.

例えば、図７に示す顔文字・誇張情報データベース１３Ｄでは、最大感情度となる感情度の種類が恐れ度Ｏである場合に対応する顔文字が「(^_^;)」であることを示している。また、図７に示す例では、最大感情度となる感情度の種類が恐れ度Ｏである場合に対応する誇張情報が示す誇張の内容が、「ガーン」とのテキスト情報、及び恐れを示す画像であることを示している。なお、上記恐れを示す画像は、例えば、後述する図９に示す、対応する参加者の顔画像の額付近に複数の縦線が重畳された画像２５Ｇ等が例示される。 For example, in the emoticon/exaggerated information database 13D shown in FIG. 7, when the type of emotion level that is the maximum emotion level is the fear level O, the corresponding emoticon is "(^_^;)". ing. Further, in the example shown in FIG. 7, the content of the exaggeration indicated by the exaggeration information corresponding to the case where the type of emotion level that is the maximum emotion level is the fear level O is the text information "Gang" and the image showing fear. It shows that The image showing the fear is, for example, an image 25G in which a plurality of vertical lines are superimposed near the forehead of the face image of the corresponding participant shown in FIG. 9, which will be described later.

次に、図８～図１１を参照して、本実施形態に係る対話支援システム９０の作用を説明する。まず、図８及び図９を参照して、対話支援処理を実行する場合の対話支援装置１０の作用を説明する。会議の各参加者が用いる端末２０によって上述した対話支援アプリケーション・プログラムの実行が開始されることに応じて、対話支援装置１０のＣＰＵ１１が対話支援プログラム１３Ａを実行することにより、図８に示す対話支援処理が実行される。なお、ここでは、錯綜を回避するために、複数の参加者による対話が時間的に重複することなく進められる場合について説明する。また、ここでは、錯綜を回避するために、状況対応情報データベース１３Ｂ及び顔文字・誇張情報データベース１３Ｄが構築済みである場合について説明する。 Next, the operation of the dialogue support system 90 according to this embodiment will be described with reference to FIGS. 8 to 11. FIG. First, with reference to FIGS. 8 and 9, the operation of the dialogue support device 10 when executing dialogue support processing will be described. When the terminal 20 used by each participant in the conference starts executing the dialogue support application program, the CPU 11 of the dialogue support device 10 executes the dialogue support program 13A, whereby the dialogue shown in FIG. Support processing is performed. Here, in order to avoid complication, a case will be described in which dialogues by a plurality of participants proceed without overlapping in terms of time. Also, in order to avoid complication, a case where the situation correspondence information database 13B and the emoticon/exaggeration information database 13D have already been constructed will be described here.

対話支援アプリケーション・プログラムの実行が開始されると、各参加者が用いる端末２０は、自身のカメラ２８による撮影及びマイク２９の作動を開始し、これによって得られた撮影画像を示す画像データ及び発言音声を示す音声データの対話支援装置１０への送信を開始する。 When the execution of the dialogue support application program is started, the terminal 20 used by each participant starts photographing with its own camera 28 and operating the microphone 29, and the image data showing the photographed image obtained by this and the speech are displayed. Transmission of voice data representing voice to the dialogue support device 10 is started.

そこで、図８のステップ２００で、取得部１１Ａは、各端末２０から送信された画像データ及び音声データの受信、及び受信した各データの記憶部１３への記憶を開始する。なお、取得部１１Ａは、受信した各データを記憶部１３に記憶する際に、対応するデータの送信元の端末２０に割り振られた端末ＩＤ及び取得した時点の時刻を関連付けて記憶する。 Therefore, in step 200 of FIG. 8, the acquisition unit 11A starts receiving image data and audio data transmitted from each terminal 20 and storing each received data in the storage unit 13 . When storing each piece of received data in the storage unit 13, the acquisition unit 11A associates and stores the terminal ID assigned to the terminal 20 that transmitted the corresponding data and the time at which the data was acquired.

ステップ２０２で、取得部１１Ａは、各端末２０から受信している音声データによる発言音声が所定期間（本実施形態では、５秒間）途切れるまで待機することにより、対話の各参加者の一連の発言（以下、「一連発言」という。）が終了するまで待機する。 At step 202, the acquiring unit 11A waits until the speech data of the speech data received from each terminal 20 is interrupted for a predetermined period (five seconds in this embodiment), thereby obtaining a series of speeches of each participant in the dialogue. (hereinafter referred to as "a series of statements") is completed.

ステップ２０４で、導出部１１Ｂは、各参加者の直近の一連発言分の画像データ及び音声データを記憶部１３から読み出す。ステップ２０６で、導出部１１Ｂは、読み出した音声データを既知の音声認識技術を用いて各参加者別にテキストデータに変換する。 At step 204 , the derivation unit 11</b>B reads from the storage unit 13 the image data and voice data for the most recent series of utterances of each participant. At step 206, derivation unit 11B converts the read speech data into text data for each participant using a known speech recognition technique.

ステップ２０８で、導出部１１Ｂは、読み出した画像データを用いて、各参加者別に上記６種類の感情度（本実施形態では、怒り度Ｉ、嫌悪度Ｋ、恐れ度Ｏ、喜び度Ｙ、悲しみ度Ｓ、驚き度Ｂ）を上述したように導出する。なお、本実施形態では、感情度を、参加者毎で、かつ、感情度毎に、読み出した直近の一連発言分の画像データにおける最大値を導出する。但し、この形態に限らず、例えば、読み出した直近の一連発言分の画像データにおける時系列順の中央の画像データを用いて導出する形態や、読み出した直近の一連発言分の画像データにおける時系列順の最後の画像データを用いて導出する形態等を適用してもよい。 In step 208, the derivation unit 11B uses the read image data to determine the six types of emotion levels for each participant (in this embodiment, the anger level I, the disgust level K, the fear level O, the joy level Y, and the sadness level). The degree S and the degree of surprise B) are derived as described above. In the present embodiment, the maximum value of the read image data for the most recent series of utterances is derived for each participant and for each emotion level. However, the present invention is not limited to this form, and for example, a form of derivation using the central image data in chronological order in the read image data of the most recent series of utterances, or a form of derivation using the image data of the most recent series of read image data A form of derivation using the last image data in the order may be applied.

ステップ２１０で、導出部１１Ｂは、各参加者別の６種類の感情度のうち、最大値となった感情度（最大感情度）の導出対象の時点に対応する画像データ（静止画像データ）を各参加者別に特定する。ステップ２１２で、処理部１１Ｃは、ステップ２１０の処理によって特定した静止画像データが示す撮影画像、及びステップ２０６の処理によって得られたテキストデータを用いて、対話を支援するための画像（以下、「対話支援画像」という。）を構成する。この際、処理部１１Ｃは、一例として図９に示すように、対応する参加者の撮影画像２５Ｃに対して、テキストデータにより示されるテキスト２５Ｄを、所謂吹き出しの形態で表示されるように対話支援画像３０を構成する。 At step 210, the derivation unit 11B extracts image data (still image data) corresponding to the derivation target time point of the maximum emotional level (maximum emotional level) among the six types of emotional levels for each participant. Identify for each participant. In step 212, the processing unit 11C uses the photographed image indicated by the still image data specified by the process of step 210 and the text data obtained by the process of step 206 to generate an image for supporting dialogue (hereinafter referred to as " (referred to as “dialogue support image”). At this time, as shown in FIG. 9 as an example, the processing unit 11C supports dialogue by displaying text 25D indicated by the text data in the form of a so-called speech balloon on the photographed image 25C of the corresponding participant. An image 30 is constructed.

ステップ２１４で、導出部１１Ｂは、ステップ２０６の処理によって得られたテキストデータを用いて、上述したように、式（１）を用いて各参加者別の発言度Ｈを算出する。ステップ２１６で、導出部１１Ｂは、ステップ２０４の処理によって読み出した画像データを用いて、上述したように、うなずき頻度低下率Ｕを算出する。この際、読み出した画像データでは、うなずき頻度低下率Ｕを算出する際に適用する所定時間（本実施形態では、６０秒）前のうなずき頻度Ｎ（以下、「起算頻度」という。）が得られない場合がある。この場合、本実施形態では、起算頻度として、各参加者別の過去のうなずき頻度Ｎの平均値を適用する。但し、この形態に限らず、例えば、各参加者別の直近のうなずき頻度Ｎを起算頻度として適用する形態等としてもよい。 At step 214, the derivation unit 11B uses the text data obtained by the processing at step 206 to calculate the remark level H for each participant using equation (1) as described above. At step 216, the derivation unit 11B uses the image data read out by the processing at step 204 to calculate the nodding frequency reduction rate U as described above. At this time, from the read image data, the nodding frequency N (hereinafter referred to as the "calculation frequency") before a predetermined time (in this embodiment, 60 seconds), which is applied when calculating the nodding frequency decrease rate U, is obtained. sometimes not. In this case, in this embodiment, the average value of past nodding frequencies N for each participant is applied as the starting frequency. However, it is not limited to this form, and for example, a form in which the most recent nodding frequency N for each participant is applied as the starting frequency may be employed.

ステップ２１８で、処理部１１Ｃは、ステップ２１０の処理において用いた各参加者別の最大感情度に、顔文字・誇張情報データベース１３Ｄに顔文字が登録されている最大感情度が含まれるか否かを判定し、肯定判定となった場合はステップ２２０に移行する。 At step 218, the processing unit 11C determines whether or not the maximum emotional level for each participant used in the processing at step 210 includes the maximum emotional level whose emoticons are registered in the emoticon/exaggerated information database 13D. is determined, and if the determination is affirmative, the process proceeds to step 220 .

ステップ２２０で、処理部１１Ｃは、ステップ２１８の処理において含まれると判定された最大感情度に対応する顔文字を顔文字・誇張情報データベース１３Ｄから読み出す。ステップ２２２で、処理部１１Ｃは、一例として図９に示すように、読み出した顔文字２５Ｅが、対応する参加者に対応するテキスト２５Ｄに含めて吹き出し内に表示されるように対話支援画像３０を更新し、その後にステップ２２４に移行する。 At step 220, the processing unit 11C reads the emoticon corresponding to the maximum emotional level determined to be included in the processing at step 218 from the emoticon/exaggeration information database 13D. At step 222, the processing unit 11C converts the dialogue support image 30 so that the read emoticon 25E is included in the text 25D corresponding to the corresponding participant and displayed in a balloon, as shown in FIG. 9 as an example. update and then go to step 224 .

一方、ステップ２１８において否定判定となった場合は、ステップ２２０及びステップ２２２の処理を実行することなくステップ２２４に移行する。 On the other hand, if the determination in step 218 is negative, the process proceeds to step 224 without executing the processes of steps 220 and 222 .

ステップ２２４で、処理部１１Ｃは、ステップ２１０の処理において用いた各参加者別の最大感情度に、顔文字・誇張情報データベース１３Ｄに誇張情報が登録されている最大感情度が含まれるか否かを判定し、肯定判定となった場合はステップ２２６に移行する。 At step 224, the processing unit 11C determines whether the maximum emotional level for each participant used in the process at step 210 includes the maximum emotional level whose exaggerated information is registered in the emoticon/exaggerated information database 13D. is determined, and if the determination is affirmative, the process proceeds to step 226 .

ステップ２２６で、処理部１１Ｃは、ステップ２２４の処理において含まれると判定された最大感情度に対応する誇張情報を顔文字・誇張情報データベース１３Ｄから読み出す。ステップ２２８で、処理部１１Ｃは、一例として図９に示すように、読み出した誇張情報が示す情報を、対応する参加者に対応されて表示されるように対話支援画像３０を更新し、その後にステップ２３０に移行する。なお、図９に示す対話支援画像３０の例では、上記誇張情報が示す情報として、対応する参加者の撮影画像の上部に「ガーン」とのテキスト２５Ｆが表示され、対応する参加者の撮影画像における顔の額付近に複数の縦線が重畳された画像２５Ｇが表示される。 At step 226, the processing section 11C reads out the exaggerated information corresponding to the maximum degree of emotion determined to be included in the processing at step 224 from the emoticon/exaggerated information database 13D. At step 228, as shown in FIG. 9 as an example, the processing unit 11C updates the dialogue support image 30 so that the information indicated by the read exaggerated information is displayed corresponding to the corresponding participant, and then Go to step 230 . In the example of the dialogue support image 30 shown in FIG. 9, as the information indicated by the exaggerated information, the text 25F of "Gang" is displayed above the corresponding participant's photographed image, and the corresponding participant's photographed image is displayed. An image 25G in which a plurality of vertical lines are superimposed near the forehead of the face is displayed.

一方、ステップ２２４において否定判定となった場合は、ステップ２２６及びステップ２２８の処理を実行することなくステップ２３０に移行する。 On the other hand, if the determination in step 224 is negative, the process proceeds to step 230 without executing the processing of steps 226 and 228 .

ステップ２３０で、処理部１１Ｃは、以上の処理によって算出した発言度Ｈ、うなずき頻度低下率Ｕ、及び感情度の各参加者別の組み合わせに合致する条件が状況対応情報データベース１３Ｂに含まれるか否かを判定し、肯定判定となった場合はステップ２３２に移行する。 At step 230, the processing unit 11C determines whether or not the situation correspondence information database 13B includes a condition that matches the combination of the remark level H, the nodding frequency decrease rate U, and the emotional level calculated by the above process for each participant. If the determination is affirmative, the process proceeds to step 232 .

ステップ２３２で、処理部１１Ｃは、ステップ２３０の処理において含まれると判定された条件に対応する対応情報を状況対応情報データベース１３Ｂから読み出す。ステップ２３４で、処理部１１Ｃは、一例として図９に示すように、読み出した対応情報２５Ｈが所定の位置（図９に示す例では、対話支援画像３０の上端部近傍の位置）に表示されるように対話支援画像３０を更新し、その後にステップ２３６に移行する。 At step 232, the processing unit 11C reads correspondence information corresponding to the conditions determined to be included in the processing at step 230 from the situation correspondence information database 13B. In step 234, the processing unit 11C causes the read correspondence information 25H to be displayed at a predetermined position (in the example shown in FIG. 9, the position near the upper end of the dialogue support image 30), as shown in FIG. 9 as an example. Then, the dialog support image 30 is updated as follows, and then the process proceeds to step 236 .

一方、ステップ２３０において否定判定となった場合は、ステップ２３２及びステップ２３４の処理を実行することなくステップ２３６に移行する。 On the other hand, if the determination in step 230 is negative, the process proceeds to step 236 without executing the processes of steps 232 and 234 .

ステップ２３６で、処理部１１Ｃは、対話を支援するための他の支援情報が表示されるように対話支援画像３０を更新する。なお、本実施形態では、上記支援情報として、一例として図９に示すように、各参加者の撮影画像、発言度Ｈ（図９では「発言」と表記。）、うなずき頻度Ｎ（図９では「肯定」と表記。）及び顔文字（図９では「気分」と表記。）を含む支援情報２５Ｉが表示されるように対話支援画像３０を更新する。また、本実施形態では、上記他の支援情報として、対応する音声の再生の指示を受け付けるための音声ボタン２５Ｊが表示されるように対話支援画像３０を更新する。更に、本実施形態では、上記他の支援情報として、表示している対話支援画像３０の上下方向へのスクロールの指示を受け付けるためのスクロールボタン２５Ｋが表示されるように対話支援画像３０を更新する。なお、その他の支援情報として、図９に示すように、発話者が発言している際の受話者の撮影画像２５Ｐを当該発話者における各吹き出しの近傍に並べて表示する形態としてもよい。 At step 236, the processing unit 11C updates the dialogue support image 30 so that other support information for supporting the dialogue is displayed. Note that, in the present embodiment, as the support information, as shown in FIG. 9 as an example, each participant's photographed image, speech frequency H (indicated as "speech" in FIG. 9), nodding frequency N (in FIG. 9) The dialog support image 30 is updated so that the support information 25I including the support information 25I including "affirmative") and emoticons (referred to as "mood" in FIG. 9) is displayed. In addition, in the present embodiment, the dialog support image 30 is updated so that the voice button 25J for receiving the instruction to reproduce the corresponding voice is displayed as the other support information. Furthermore, in the present embodiment, the dialog support image 30 is updated so that the scroll button 25K for receiving an instruction to scroll the displayed dialog support image 30 in the vertical direction is displayed as the other support information. . As other support information, as shown in FIG. 9, a photographed image 25P of the speaker while the speaker is speaking may be arranged and displayed near each balloon of the speaker.

ステップ２３８で、処理部１１Ｃは、以上の処理によって得られた各種情報を対話情報データベース１３Ｃに登録（記憶）する。このステップ２３８の処理により、対話情報データベース１３Ｃが逐次構築されることになる。 At step 238, the processing unit 11C registers (stores) the various information obtained by the above processing in the dialogue information database 13C. Through the process of step 238, the dialogue information database 13C is constructed sequentially.

ステップ２４０で、処理部１１Ｃは、以上の処理によって得られた対話支援画像３０を示す画像情報を各端末２０に送信するように通信Ｉ／Ｆ部１８を制御する。この処理により、上述した対話支援アプリケーション・プログラムにより、一例として図９に示す対話支援画像３０が各端末２０の表示部２５に表示される。各参加者は、自身が用いる端末２０に表示された対話支援画像３０を参照し、音声を再生させたい場合は対応する音声ボタン２５Ｊを指定し、対話支援画像３０を上下方向にスクロールさせたい場合はスクロールボタン２５Ｋを所望の方向に移動させる。これに応じて、各端末２０で実行されている対話支援アプリケーション・プログラムは、参加者によって音声ボタン２５Ｊ及びスクロールボタン２５Ｋの少なくとも一方が操作された場合に、操作された状態を示す状態情報を対話支援装置１０に送信する。 At step 240 , the processing unit 11</b>C controls the communication I/F unit 18 so as to transmit to each terminal 20 image information indicating the dialogue support image 30 obtained by the above processing. As a result of this process, the dialog support image 30 shown in FIG. 9 is displayed on the display unit 25 of each terminal 20 by the above-described dialog support application program. Each participant refers to the dialogue support image 30 displayed on the terminal 20 used by himself/herself, designates the corresponding sound button 25J when he wants to reproduce the sound, and scrolls the dialogue support image 30 in the vertical direction. moves the scroll button 25K in the desired direction. In response to this, the dialogue support application program running on each terminal 20 communicates state information indicating the state of operation when at least one of the voice button 25J and the scroll button 25K is operated by the participant. Send to the support device 10 .

そこで、ステップ２４２で、処理部１１Ｃは、何れかの端末２０から音声ボタン２５Ｊが指定された旨を示す状態情報が受信されたか否かを判定し、否定判定となった場合はステップ２４６に移行する一方、肯定判定となった場合はステップ２４４に移行する。 Therefore, in step 242, the processing unit 11C determines whether or not state information indicating that the voice button 25J has been specified has been received from any terminal 20. If the determination is negative, the process proceeds to step 246. On the other hand, if the determination is affirmative, the process proceeds to step 244 .

ステップ２４４で、処理部１１Ｃは、指定された音声ボタン２５Ｊに対応する音声データを記憶部１３から読み出して、対応する状態情報の送信元の端末２０に送信し、その後にステップ２４６に移行する。ステップ２４４の処理により、音声ボタン２５Ｊが指定された旨を示す状態情報を送信した端末２０では、対話支援アプリケーション・プログラムによって参加者が指定した音声が再生される。 At step 244, the processing unit 11C reads out the voice data corresponding to the designated voice button 25J from the storage unit 13 and transmits it to the terminal 20 that is the source of the corresponding status information, and then proceeds to step 246. By the processing of step 244, the terminal 20 that has transmitted the state information indicating that the voice button 25J has been designated reproduces the voice designated by the participant by the dialogue support application program.

ステップ２４６で、処理部１１Ｃは、何れかの端末２０からスクロールボタン２５Ｋが操作された旨を示す状態情報が受信されたか否かを判定し、否定判定となった場合はステップ２５０に移行する一方、肯定判定となった場合はステップ２４８に移行する。 At step 246, the processing unit 11C determines whether or not state information indicating that the scroll button 25K has been operated has been received from any terminal 20. If the determination is negative, the processing unit 11C proceeds to step 250. , and when the determination is affirmative, the process proceeds to step 248 .

ステップ２４８で、処理部１１Ｃは、スクロールボタン２５Ｋが上方向に移動された場合には、スクロールボタン２５Ｋの移動量に応じた量だけ対話支援画像３０を上方向にスクロールさせるための情報を、対応する状態情報の送信元の端末２０に送信する。また、処理部１１Ｃは、スクロールボタン２５Ｋが下方向に移動された場合には、スクロールボタン２５Ｋの移動量に応じた量だけ対話支援画像３０を下方向にスクロールさせるための情報を、対応する状態情報の送信元の端末２０に送信する。そして、処理部１１Ｃは、以上の処理を行った後にステップ２５０の処理に移行する。ステップ２４８の処理により、スクロールボタン２５Ｋが操作された状態を示す状態情報を送信した端末２０では、対話支援アプリケーション・プログラムによって、表示部２５で表示されている対話支援画像３０が上記操作に応じてスクロールされる。 At step 248, when the scroll button 25K is moved upward, the processing unit 11C stores information for scrolling the dialogue support image 30 upward by an amount corresponding to the amount of movement of the scroll button 25K. to the terminal 20 that sent the status information. Further, when the scroll button 25K is moved downward, the processing unit 11C stores information for scrolling the dialogue support image 30 downward by an amount corresponding to the amount of movement of the scroll button 25K. The information is transmitted to the terminal 20 that is the transmission source of the information. Then, the processing section 11C proceeds to the processing of step 250 after performing the above processing. In the terminal 20 that has transmitted the state information indicating that the scroll button 25K has been operated by the process of step 248, the dialog support application program changes the dialog support image 30 displayed on the display unit 25 to scrolled.

ステップ２５０で、処理部１１Ｃは、本対話支援処理の終了タイミングが到来したか否かを判定し、否定判定となった場合はステップ２０２に戻る一方、肯定判定となった時点でステップ２５２に移行する。なお、本実施形態では、対話支援処理の終了タイミングを、本対話支援処理が対象としている会議に参加している全ての参加者の端末２０で実行されている対話支援アプリケーション・プログラムが終了されるタイミングとしているが、これに限らない。例えば、対象としている会議が所定時間（例えば、１０分）以上停止したタイミング、対象としている会議に予め設定された時間（例えば、１時間）が経過したタイミング等を対話支援処理の終了タイミングとしてもよい。 At step 250, the processing unit 11C determines whether or not the end timing of the dialogue support processing has arrived. do. In the present embodiment, the end timing of the dialogue support processing is determined when the dialogue support application programs running on the terminals 20 of all the participants participating in the conference targeted by the dialogue support processing are terminated. Although it is described as timing, it is not limited to this. For example, the end timing of the dialogue support process may be the timing at which the target meeting has stopped for a predetermined time (eg, 10 minutes) or longer, or the timing at which a preset time (eg, 1 hour) has passed for the target meeting. good.

ステップ２５２で、処理部１１Ｃは、ステップ２００の処理によって開始した、各端末２０から送信された画像データ及び音声データの受信、及び受信した各データの記憶部１３への記憶を終了した後、本対話支援処理を終了する。 In step 252, the processing unit 11C receives the image data and audio data transmitted from each terminal 20 and stores the received data in the storage unit 13. Terminate the dialogue support process.

一方、本実施形態に係る対話支援システム９０では、何れかの参加者が対話支援画像３０における派閥情報表示ボタン２５Ａを指定すると、各参加者の相互間における感情の関係をグラフィカルに示す情報である派閥関係表示画像を表示する派閥関係表示機能を有している。 On the other hand, in the dialogue support system 90 according to the present embodiment, when one of the participants designates the faction information display button 25A in the dialogue support image 30, the information graphically shows the emotional relationship between the participants. It has a faction display function for displaying a faction display image.

次に、図１０～図１１を参照して、派閥関係表示機能の実行時における対話支援システム９０の作用を説明する。なお、図１０は、対象としている会話に参加している何れかの参加者の端末２０から、派閥情報表示ボタン２５Ａが指定された旨を示す情報が受信された場合に、対話支援装置１０のＣＰＵ１１により実行される派閥情報表示処理の流れを示すフローチャートである。 Next, with reference to FIGS. 10 and 11, the action of the dialogue support system 90 when executing the faction relationship display function will be described. FIG. 10 shows the operation of the dialogue support device 10 when information indicating that the faction information display button 25A has been specified is received from the terminal 20 of any of the participants participating in the target conversation. 4 is a flow chart showing the flow of faction information display processing executed by the CPU 11. FIG.

図１０のステップ３００で、処理部１１Ｃは、その時点から所定時間（本実施形態では、１０分間）遡った時間から、その時間までに記憶した画像データを、対応する端末ＩＤと共に対話情報データベース１３Ｃから読み出す。ステップ３０２で、処理部１１Ｃは、読み出した画像データを用いて、予め定められた構成とされた派閥関係表示画像を構成する。ステップ３０４で、処理部１１Ｃは、構成した派閥関係表示画像を示す情報を、派閥情報表示ボタン２５Ａが指定された旨を示す情報の送信元の端末２０に送信する。派閥関係表示画像を示す情報を受信した端末２０では、一例として図１１に示す派閥関係表示画像３２を表示部２５に表示する。図１１に示すように、本実施形態に係る派閥関係表示画像３２では、対象としている会議の参加者間で相互に抱いている感情がグラフィカルに表示される。 At step 300 in FIG. 10, the processing unit 11C stores the image data stored up to a predetermined time (10 minutes in the present embodiment) before that time, together with the corresponding terminal ID, in the dialogue information database 13C. read from At step 302, the processing unit 11C uses the read image data to compose a faction relation display image having a predetermined configuration. At step 304, the processing unit 11C transmits information indicating the configured faction relationship display image to the terminal 20 that sent the information indicating that the faction information display button 25A was designated. When the terminal 20 receives the information indicating the faction display image, the display unit 25 displays the faction display image 32 shown in FIG. 11 as an example. As shown in FIG. 11, the faction relationship display image 32 according to the present embodiment graphically displays the mutual feelings of the participants of the target conference.

なお、本実施形態では、各参加者間で相互に抱いている感情を示す情報として、次の式（３）で算出される相互近接度ＳＫ_ｘｙを適用している。なお、式（３）におけるｘ及びｙは各々異なる参加者を表し、ｎ_ｘは参加者ｙが発言している際の参加者ｘのうなずき回数を表し、ｎ_ｙは参加者ｘが発言している際の参加者ｙのうなずき回数を表す。ここで、うなずき回数ｎ_ｘ及びうなずき回数ｎ_ｙは、読み出した画像データが示す撮影画像を用いて、上述した式（２）に用いるうなずき回数ｎ（ｔ）と同様に導出する。 In the present embodiment, mutual proximity SK _xy calculated by the following equation (3) is used as information indicating mutual feelings among participants. Note that x and y in equation (3) represent different participants, _nx represents the number of nods of participant x when participant y is speaking, and _ny is the number of times participant x is speaking. represents the number of nods of participant y when Here, the number of nods _nx and the number of nods _ny are derived in the same way as the number of nods n(t) used in the above equation (2) using the photographed image indicated by the read image data.

そして、本実施形態に係る派閥関係表示画像３２では、算出した相互近接度ＳＫ_ｘｙの逆数を離間距離として各参加者の撮影画像を配置する。この際、うなずき回数ｎ_ｘとうなずき回数ｎ_ｙとの差分が所定値より大きな場合、一例として図１１に示すように、うなずき回数が少ない方から多い方に向けて矢印を表示し、かつ、敵対視を示す画像２５Ｌを表示可能とする。また、この場合、うなずき回数が多い方から少ない方に向けて矢印を表示し、かつ、好感を示す画像２５Ｍを表示可能とする。また、上記離間距離が所定距離未満である場合、対応する参加者間を他よりも太い直線で結び、かつ、直線上に好感を示す画像２５Ｍを表示可能とする。更に、上記離間距離が上記所定距離以上である場合、対応する参加者間を直線で結び、かつ、直線上に衝突感を示す画像２５Ｎを表示可能とする。 In the factional relationship display image 32 according to the present embodiment, the photographed images of the participants are arranged with the reciprocal of the calculated mutual proximity SK _xy as the separation distance. At this time, when the difference between the number of nods _nx and the number of nods _ny is greater than a predetermined value, as shown in FIG. An image 25L showing vision can be displayed. Also, in this case, an arrow can be displayed from the side with the higher number of nods to the side with the lower number of nods, and an image 25M showing a favorable impression can be displayed. Further, when the separation distance is less than the predetermined distance, the corresponding participants are connected by a straight line thicker than the others, and the image 25M showing favorable impression can be displayed on the straight line. Furthermore, when the separation distance is equal to or greater than the predetermined distance, the corresponding participants are connected by a straight line, and an image 25N showing a sense of collision can be displayed on the straight line.

図１１に示す例では、例えば、Ａさんと、他の参加者（Ｂさん、Ｃさん、Ｄさん）との間は相互に衝突感を抱いていることを示しており、また、例えば、ＣさんはＤさんに対して敵対視しているが、ＤさんはＣさんに対して好感を抱いていることを示している。更に、図１１に示す例では、ＢさんとＤさんとは互いに好感を抱いていることを示している。 In the example shown in FIG. 11, for example, Mr. A and other participants (Mr. B, Mr. C, Mr. D) indicate that they have a sense of conflict with each other. Mr. is hostile to Mr. D, but Mr. D shows that he has a favorable impression of Mr. C. Furthermore, the example shown in FIG. 11 indicates that Mr. B and Mr. D have a favorable impression of each other.

この派閥関係表示画像３２を参照することにより、対話の参加者は、他者が自分に抱いている感情の推定結果を把握することができるため、その場に応じた、より効果的な発言を行ったり、態度をとったりすることができる。また、派閥関係表示画像３２を参照することにより、対話の参加者は、自身の他者に対する感情の推定結果が妥当か否かを判断することができるため、推定結果が誤っている場合に是正することが可能となる。 By referring to the factional relationship display image 32, the participants of the dialogue can grasp the estimation results of the feelings that others have toward them, so that they can make more effective remarks according to the situation. You can go and take an attitude. In addition, by referring to the factional relationship display image 32, the participants in the dialogue can determine whether or not the estimation result of their feelings toward others is appropriate. It becomes possible to

派閥関係表示画像３２が表示部２５に表示されると、参加者は、当該派閥関係表示画像３２を参照した後、終了ボタン２５Ｂを指定する。これに応じて、対応する端末２０の制御部２１Ａは、派閥関係表示画像３２の表示を終了する旨を示す情報（以下、「表示終了情報」という。）を対話支援装置１０に無線通信部２７を介して送信する。 When the faction relationship display image 32 is displayed on the display unit 25, the participant refers to the faction relationship display image 32 and then designates the end button 25B. In response to this, the control unit 21A of the corresponding terminal 20 sends information (hereinafter referred to as “display end information”) to the dialogue support device 10 indicating that the display of the faction relationship display image 32 will end. Send via

そこで、次のステップ３０６で、処理部１１Ｃは、表示終了情報が指定されるまで待機した後、本派閥情報表示処理を終了する。 Therefore, in the next step 306, the processing section 11C waits until the display end information is specified, and then ends the faction information display process.

以上説明したように、本実施形態によれば、対話の参加者の対話における状況を導出可能な物理量を取得する取得部１１Ａと、取得部１１Ａによって取得された物理量を用いて、参加者の対話における状況を導出する導出部１１Ｂと、導出部１１Ｂによって導出された状況に対応する状況情報を表示する表示処理、及び状況情報を記憶する記憶処理の双方の処理を行う処理部１１Ｃと、を備えている。従って、対話を効果的に活性化することができる。 As described above, according to the present embodiment, the obtaining unit 11A obtains the physical quantity from which the situation in the dialogue of the dialogue participant can be derived, and the physical quantity obtained by the obtaining unit 11A is used to obtain the dialogue of the participant. a derivation unit 11B for deriving the situation in the derivation unit 11B; and a processing unit 11C for performing both display processing for displaying the situation information corresponding to the situation derived by the derivation unit 11B and storage processing for storing the situation information. ing. Therefore, dialogue can be effectively activated.

また、本実施形態によれば、上記状況情報を、参加者の感情を表す情報としている。従って、より効果的に対話の活性化を促すことができる。 Further, according to the present embodiment, the situation information is information representing the emotions of the participants. Therefore, activation of dialogue can be promoted more effectively.

また、本実施形態によれば、上記感情を表す情報を、感情を表すテキスト情報としている。従って、より具体的に参加者の感情を把握することができる。 Further, according to the present embodiment, the information representing emotion is text information representing emotion. Therefore, it is possible to grasp the emotions of the participants more specifically.

また、本実施形態によれば、上記テキスト情報を、参加者のうちの何れかの発言者による発言に対する他者の感情を表す情報としている。従って、発言を聞いている参加者の感情を把握することができる。 In addition, according to the present embodiment, the text information is information representing the feelings of others regarding the statement by one of the speakers among the participants. Therefore, it is possible to grasp the emotion of the participant listening to the speech.

また、本実施形態によれば、上記感情を表す情報を、感情を表す画像情報としている。従って、より直感的に参加者の感情を把握することができる。 Further, according to the present embodiment, the information representing emotion is image information representing emotion. Therefore, it is possible to grasp the emotions of the participants more intuitively.

また、本実施形態によれば、上記画像情報を、顔文字としている。従って、より直感的に参加者の感情を把握することができる。 Further, according to the present embodiment, the image information is emoticons. Therefore, it is possible to grasp the emotions of the participants more intuitively.

また、本実施形態によれば、上記画像情報を、感情の度合いが最大となった場合における、対応する参加者の顔を撮影して得られた顔撮影画像情報としている。従って、より効果的に参加者の感情を把握することができる。 Further, according to the present embodiment, the image information is taken face image information obtained by photographing the face of the corresponding participant when the degree of emotion is maximized. Therefore, it is possible to grasp the emotions of the participants more effectively.

また、本実施形態によれば、上記画像情報を、顔撮影画像情報に加えて、感情を誇張する情報が含まれる画像情報としている。従って、より効果的に参加者の感情を把握することができる。 Further, according to the present embodiment, the image information is image information including information exaggerating emotions in addition to face photographed image information. Therefore, it is possible to grasp the emotions of the participants more effectively.

また、本実施形態によれば、上記感情を表す情報を、参加者の相互間における感情の関係を示す情報としている。従って、より効果的に対話の活性化を促すことができる。 Further, according to the present embodiment, the information representing the emotion is information representing the relationship of emotions between the participants. Therefore, activation of dialogue can be promoted more effectively.

また、本実施形態によれば、上記物理量を、参加者を撮影して得られた画像、及び参加者の発言を示す音声としている。従って、より低コストで対話の活性化を促すことができる。 Further, according to the present embodiment, the physical quantity is an image obtained by photographing the participant and a sound indicating the speech of the participant. Therefore, activation of dialogue can be promoted at a lower cost.

また、本実施形態によれば、上記状況を、参加者の感情の度合いを表す物理量、及び参加者の動作を表す物理量としている。従って、より簡易に対話の活性化を促すことができる。 Further, according to the present embodiment, the situation is a physical quantity representing the degree of emotion of the participant and a physical quantity representing the motion of the participant. Therefore, activation of dialogue can be promoted more easily.

更に、本実施形態によれば、上記状況を、画像から得られる参加者のうなずきの頻度を示す物理量、画像から得られる参加者の表情の度合いを示す物理量、音声から得られる参加者の発言の度合いを示す物理量としている。従って、より簡易に対話の活性化を促すことができる。 Furthermore, according to the present embodiment, the above situations are represented by a physical quantity that indicates the frequency of nodding of the participant obtained from the image, a physical quantity that indicates the degree of expression of the participant obtained from the image, and a physical quantity that indicates the degree of expression of the participant obtained from the voice. It is a physical quantity that indicates the degree. Therefore, activation of dialogue can be promoted more easily.

なお、上記実施形態では、各参加者の個別の感情度を用いて、端末２０に表示する対応情報を決定する場合について説明したが、これに限定されない。例えば、参加者全員の感情度を用いて対応情報を決定する形態としてもよい。例えば、一例として図１２に示すように、各参加者の喜び度Ｙが同時に所定値（一例として、５０）以上となった場合、参加者全員が一体的に喜んでいると想定できる。この場合、対応情報として、一例として「一体感があり、良い状況です。」といった表示を各端末２０で行うことで、より効果的に対話を活性化することができる。 In the above-described embodiment, a case has been described in which correspondence information to be displayed on the terminal 20 is determined using each participant's individual emotional level, but the present invention is not limited to this. For example, it is also possible to adopt a form in which correspondence information is determined using the degree of emotion of all participants. For example, as shown in FIG. 12, it can be assumed that all the participants are happy together when the degree of joy Y of each participant reaches a predetermined value (eg, 50) at the same time. In this case, by displaying on each terminal 20, for example, "there is a sense of unity and the situation is good" as correspondence information, the dialogue can be activated more effectively.

また、上記実施形態では、本発明を、対話の参加者が互いに異なる場所に分散して会議を行っている形態に適用した場合について説明したが、これに限定されない。例えば、対話の各参加者が同一の会議室等で会議を行う形態に本発明を適用してもよい。この場合、各端末２０に設けられたカメラ２８及びマイク２９に代えて、端末２０とは別体として構成された１つ又は複数のカメラ及びマイクを用いて、会議の参加者全員の画像及び音声を収集する形態としてもよい。 Further, in the above-described embodiment, a case has been described in which the present invention is applied to a form in which the participants of the dialogue are distributed to different locations for a meeting, but the present invention is not limited to this. For example, the present invention may be applied to a form in which each participant of the dialogue holds a conference in the same conference room or the like. In this case, instead of the camera 28 and microphone 29 provided in each terminal 20, one or more cameras and microphones configured separately from the terminal 20 are used to capture the images and voices of all conference participants. may be collected.

また、上記実施形態では、対話支援装置１０において対話支援処理を実行する場合について説明したが、これに限定されない。例えば、少なくとも１台の端末２０によって対話支援処理を実行する形態としてもよい。この形態の場合、本発明の対話支援装置が該当する端末２０に含まれることになる。また、例えば、各参加者の発言度Ｈ、うなずき頻度低下率Ｕ、及び各感情度の少なくとも１つを、対応する参加者が用いる端末２０で導出する形態としてもよい。 Also, in the above-described embodiment, the case where the dialogue support processing is executed in the dialogue support device 10 has been described, but the present invention is not limited to this. For example, at least one terminal 20 may execute the dialogue support process. In this form, the corresponding terminal 20 includes the dialogue support device of the present invention. Further, for example, at least one of the utterance level H, the nodding frequency decrease rate U, and each emotion level of each participant may be derived by the terminal 20 used by the corresponding participant.

また、上記実施形態では、各端末２０において対話支援画像３０を表示する場合について説明したが、これに限定されない。例えば、対話支援画像３０を対話支援装置１０において表示する形態としてもよい。 Further, in the above embodiment, the case where the dialogue support image 30 is displayed on each terminal 20 has been described, but the present invention is not limited to this. For example, the dialogue support image 30 may be displayed on the dialogue support device 10 .

また、上記実施形態では、本発明を会議に適用した場合について説明したが、これに限定されない。例えば、人事面接、商談等といった会議以外の複数人で行う対話の場に本発明を適用する形態としてもよい。 Also, in the above embodiment, the case where the present invention is applied to a conference has been described, but the present invention is not limited to this. For example, the present invention may be applied to a dialogue between a plurality of people other than a meeting, such as a personnel interview or a business negotiation.

また、上記実施形態では、本発明の感情を表す画像情報として顔文字を適用した場合について説明したが、これに限定されない。例えば、顔文字に加えて、絵文字、アイコン（Icon）の少なくとも１つを適用する形態としてもよい。 Further, in the above-described embodiment, a case where emoticons are applied as image information representing emotions of the present invention has been described, but the present invention is not limited to this. For example, in addition to emoticons, at least one of pictograms and icons may be applied.

また、上記実施形態では、上記状況として、参加者の感情の度合いを表す物理量、及び参加者の動作を表す物理量の双方を適用した場合について説明したが、これに限定されない。例えば、参加者の感情の度合いを表す物理量、及び参加者の動作を表す物理量の何れか一方のみを適用する形態としてもよい。 Further, in the above embodiment, a case has been described in which both a physical quantity representing the degree of emotion of the participant and a physical quantity representing the motion of the participant are applied as the situation, but the present invention is not limited to this. For example, only one of the physical quantity representing the degree of emotion of the participant and the physical quantity representing the motion of the participant may be applied.

また、上記実施形態では、上記状況として、画像から得られる参加者のうなずきの頻度を示す物理量、画像から得られる参加者の表情の度合いを示す物理量、音声から得られる参加者の発言の度合いを示す物理量、の全てを適用した場合について説明したが、これに限定されない。例えば、これらの物理量の１つ、又は全てを除く複数の組み合わせを適用する形態としてもよい。 In the above-described embodiment, the physical quantity indicating the frequency of nodding of the participant obtained from the image, the physical quantity indicating the degree of expression of the participant obtained from the image, and the degree of speech of the participant obtained from the voice are used as the above-described situations. Although the case where all of the indicated physical quantities are applied has been described, the present invention is not limited to this. For example, one of these physical quantities or a plurality of combinations excluding all of them may be applied.

また、上記実施形態では、うなずき頻度低下率Ｕを用いて対応情報を決定する場合について説明したが、これに限定されない。例えば、うなずき頻度Ｎそのものを用いて対応情報を決定する形態としてもよい。 Further, in the above-described embodiment, the case where the correspondence information is determined using the nodding frequency decrease rate U has been described, but the present invention is not limited to this. For example, the correspondence information may be determined using the nodding frequency N itself.

また、上記実施形態では、状況対応情報データベース１３Ｂとして、発話者と受話者の双方に関する情報が混在しているデータベースを適用した場合について説明したが、これに限定されない。例えば、発話者と受話者の各々別に異なるデータベースを構築して適用する形態としてもよい。 Further, in the above-described embodiment, a case has been described in which a database in which information on both the speaker and the receiver are mixed is applied as the situation correspondence information database 13B, but the present invention is not limited to this. For example, a form may be adopted in which different databases are constructed and applied for each of the speaker and the receiver.

その他、式（１）～式（３）は何れも一例であり、本発明の主旨を逸脱しない範囲内において、適宜変更して適用することができることは言うまでもない。 In addition, Formulas (1) to (3) are all examples, and needless to say, they can be appropriately modified and applied without departing from the gist of the present invention.

また、上記実施形態において、例えば、取得部１１Ａ、導出部１１Ｂ、処理部１１Ｃの各処理を実行する処理部（processing unit）のハードウェア的な構造としては、次に示す各種のプロセッサ（processor）を用いることができる。上記各種のプロセッサには、前述したように、ソフトウェア（プログラム）を実行して処理部として機能する汎用的なプロセッサであるＣＰＵに加えて、ＦＰＧＡ（Field-Programmable Gate Array）等の製造後に回路構成を変更可能なプロセッサであるプログラマブルロジックデバイス（Programmable Logic Device：PLD）、ＡＳＩＣ（Application Specific Integrated Circuit）等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が含まれる。 Further, in the above-described embodiment, for example, the hardware structure of the processing unit that executes each process of the acquisition unit 11A, the derivation unit 11B, and the processing unit 11C includes the following various processors: can be used. As described above, the various processors include a CPU, which is a general-purpose processor that executes software (programs) and functions as a processing unit, as well as FPGAs (Field-Programmable Gate Arrays), etc., which have circuit configurations after manufacturing. Programmable Logic Device (PLD), ASIC (Application Specific Integrated Circuit), etc. etc. are included.

処理部は、これらの各種のプロセッサのうちの１つで構成されてもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡの組み合わせや、ＣＰＵとＦＰＧＡとの組み合わせ）で構成されてもよい。また、処理部を１つのプロセッサで構成してもよい。 The processing unit may be configured with one of these various processors, or a combination of two or more processors of the same type or different types (for example, a combination of multiple FPGAs or a combination of a CPU and an FPGA). may consist of Also, the processing unit may be configured with a single processor.

処理部を１つのプロセッサで構成する例としては、第１に、クライアント及びサーバ等のコンピュータに代表されるように、１つ以上のＣＰＵとソフトウェアの組み合わせで１つのプロセッサを構成し、このプロセッサが処理部として機能する形態がある。第２に、システムオンチップ（System On Chip：SoC）等に代表されるように、処理部を含むシステム全体の機能を１つのＩＣ（Integrated Circuit）チップで実現するプロセッサを使用する形態がある。このように、処理部は、ハードウェア的な構造として、上記各種のプロセッサの１つ以上を用いて構成される。 As an example of configuring the processing unit with one processor, first, one processor is configured by combining one or more CPUs and software, as typified by computers such as clients and servers. There is a form that functions as a processing unit. Secondly, there is a form of using a processor that implements the functions of the entire system including the processing unit with a single IC (Integrated Circuit) chip, as typified by a System On Chip (SoC). In this way, the processing unit is configured using one or more of the above various processors as a hardware structure.

更に、これらの各種のプロセッサのハードウェア的な構造としては、より具体的には、半導体素子などの回路素子を組み合わせた電気回路（circuitry）を用いることができる。 Furthermore, as the hardware structure of these various processors, more specifically, an electric circuit in which circuit elements such as semiconductor elements are combined can be used.

１０対話支援装置
１１ＣＰＵ
１１Ａ取得部
１１Ｂ導出部
１１Ｃ処理部
１２メモリ
１３記憶部
１３Ａ対話支援プログラム
１３Ｂ状況対応情報データベース
１３Ｃ対話情報データベース
１３Ｄ顔文字・誇張情報データベース
１４入力部
１５表示部
１６媒体読み書き装置
１７記録媒体
１８通信Ｉ／Ｆ部
２０端末
２１ＣＰＵ
２２メモリ
２３記憶部
２４入力部
２５表示部
２５Ａ派閥情報表示ボタン
２５Ｂ終了ボタン
２５Ｃ撮影画像
２５Ｄテキスト
２５Ｅ顔文字
２５Ｆテキスト
２５Ｇ画像
２５Ｈ対応情報
２５Ｉ支援情報
２５Ｊ音声ボタン
２５Ｋスクロールボタン
２５Ｌ、２５Ｍ、２５Ｎ画像
２５Ｐ撮影画像
２６媒体読み書き装置
２７無線通信部
２８カメラ
２９マイク
３０対話支援画像
３２派閥関係表示画像
８０ネットワーク
９０対話支援システム
９６記録媒体 10 dialogue support device 11 CPU
11A acquisition unit 11B derivation unit 11C processing unit 12 memory 13 storage unit 13A dialogue support program 13B situation correspondence information database 13C dialogue information database 13D emoticon/exaggeration information database 14 input unit 15 display unit 16 medium read/write device 17 recording medium 18 communication I /F unit 20 terminal 21 CPU
22 Memory 23 Storage unit 24 Input unit 25 Display unit 25A Faction information display button 25B End button 25C Photographed image 25D Text 25E Emoticon 25F Text 25G Image 25H Support information 25I Support information 25J Voice button 25K Scroll buttons 25L, 25M, 25N Image 25P Photographed image 26 Medium read/write device 27 Wireless communication unit 28 Camera 29 Microphone 30 Dialogue support image 32 Faction relation display image 80 Network 90 Dialogue support system 96 Recording medium

Claims

an acquisition unit that acquires a physical quantity from which a situation in the dialogue of a dialogue participant can be derived;
a derivation unit that derives a situation in the dialogue of the participant using the physical quantity acquired by the acquisition unit;
a processing unit that performs at least one of display processing for displaying the situation information corresponding to the situation derived by the deriving unit and storage processing for storing the situation information;
with
The situation information is information representing the emotions of the participants,
the information representing the emotion is image information representing the emotion;
The image information is photographed face image information obtained by photographing the face of the corresponding participant when the degree of emotion is maximum.
Dialogue support device.

The image information is image information that includes information exaggerating the emotion in addition to the photographed face image information.
A dialogue support device according to claim 1 .

The physical quantity is at least one of an image obtained by photographing the participant and a sound indicating the participant's remarks,
A dialogue support device according to claim 1 .

The situation is at least one of a physical quantity representing the degree of emotion of the participant and a physical quantity representing the behavior of the participant,
4. A dialogue support device according to claim 3 .

The situation is a physical quantity indicating the frequency of nodding of the participant obtained from the image, a physical quantity indicating the degree of expression of the participant obtained from the image, and the participant obtained from at least one of the image and the voice. is at least one of
5. A dialogue support device according to claim 4 .

The degree of utterance is a degree according to the utterance speed of the most recent predetermined period,
6. A dialogue support device according to claim 5.

The physical quantity indicating the frequency of nodding is information indicating the rate of decrease of the frequency,
6. A dialogue support device according to claim 5.

learning the emotional information;
A dialogue support device according to claim 1.

The situation information is information representing the emotions of all participants in the dialogue.
A dialogue support device according to claim 1.

A dialogue support device according to any one of claims 1 to 9 ;
When the transmitting unit that transmits the physical quantity from which the situation in the dialogue can be derived to the acquisition unit of the dialogue support device and the processing unit of the dialogue support device perform the display processing, display targets of the display processing a terminal comprising a display;
Dialogue support system including.

Acquiring a physical quantity from which a situation in the dialogue of a dialogue participant can be derived;
Deriving a situation in the dialogue of the participant using the obtained physical quantity,
performing at least one of a display process for displaying situation information corresponding to the derived situation and a storage process for storing the situation information;
A dialogue support program in which a computer executes processing ,
The situation information is information representing the emotions of the participants,
the information representing the emotion is image information representing the emotion;
The image information is photographed face image information obtained by photographing the face of the corresponding participant when the degree of emotion is maximum.
Dialogue support program.