JP7036046B2

JP7036046B2 - Information processing equipment, information processing methods, and information processing programs

Info

Publication number: JP7036046B2
Application number: JP2019005363A
Authority: JP
Inventors: 慎江上; 一希笠井; 純一和田
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2019-01-16
Filing date: 2019-01-16
Publication date: 2022-03-15
Anticipated expiration: 2039-01-16
Also published as: WO2020148920A1; JP2020113197A

Description

本発明は、情報処理装置、情報処理方法、及び情報処理プログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and an information processing program.

ユーザ間の円滑なコミュニケーションを支援するサービスが従来技術として知られている。特許文献１には、コールセンターのオペレータと顧客との会話における音声に基づいて感情を認識し、音声と感情とを組み合わせて音声分析することで、オペレータのパフォーマンスを評価する電話音声モニタリング評価システムが記載されている。特許文献２には、ユーザが入力したチャット文が示す感情を認識し、ユーザ間の感情の類似度を求める感情マッチング装置が記載されている。 A service that supports smooth communication between users is known as a conventional technique. Patent Document 1 describes a telephone voice monitoring evaluation system that evaluates operator performance by recognizing emotions based on voices in conversations between call center operators and customers, and performing voice analysis by combining voices and emotions. Has been done. Patent Document 2 describes an emotion matching device that recognizes emotions indicated by a chat sentence input by a user and obtains the degree of similarity of emotions between users.

特開２０１７－１３５６４２号公報（２０１７年８月３日公開）Japanese Unexamined Patent Publication No. 2017-135642 (published on August 3, 2017) 特開２００５－２８４８２２号公報（２００５年１０月１３日公開）Japanese Unexamined Patent Publication No. 2005-284822 (published on October 13, 2005)

しかしながら、上述のような従来技術は、会話における音声のみに基づいて、又は、入力されたチャット文のみに基づいて、話者の感情を認識するため、多面的な感情認識ができないという問題がある。 However, the conventional technique as described above has a problem that multifaceted emotion recognition cannot be performed because the speaker's emotion is recognized only based on the voice in the conversation or based only on the input chat sentence. ..

本発明の一態様は、会話中の各ユーザの感情を多面的に認識し、認識された感情に基づく会話の評価を通知するコミュニケーション支援技術を提供することを目的とする。 One aspect of the present invention is to provide a communication support technique for recognizing the emotions of each user during a conversation from various aspects and notifying the evaluation of the conversation based on the recognized emotions.

前記の課題を解決するために、本発明の一態様に係る情報処理装置は、複数の参加者のうち第１の参加者の表情に関する第１の表情情報と、前記複数の参加者のうち第２の参加者の表情に関する第２の表情情報とを取得する表情情報取得部と、当該第１の参加者の発話に関する第１の発話情報と、前記複数の参加者のうち第２の参加者の発話に関する第２の発話情報とを取得する音声情報取得部と、前記第１の表情情報と前記第２の表情情報とを参照して、前記第１の参加者と前記第２の参加者との表情に関する関係性を示す表情関係性情報を生成する表情関係性情報生成部と、前記第１の発話情報と前記第２の発話情報とを参照して、前記第１の参加者と前記第２の参加者との発話に関する関係性を示す発話関係性情報を生成する発話関係性情報生成部と、前記表情関係性情報と前記発話関係性情報とを参照して前記第１の参加者と前記第２の参加者との関係を示す情報である関係性情報を生成する関係性情報生成部と、を備えていることを特徴としている。 In order to solve the above-mentioned problems, the information processing apparatus according to one aspect of the present invention includes the first facial expression information regarding the facial expressions of the first participant among the plurality of participants and the first among the plurality of participants. The facial expression information acquisition unit that acquires the second facial expression information regarding the facial expressions of the second participant, the first speech information regarding the speech of the first participant, and the second participant among the plurality of participants. The first participant and the second participant with reference to the voice information acquisition unit that acquires the second utterance information related to the utterance, the first facial information, and the second facial information. With reference to the facial relationship information generation unit that generates facial relationship information indicating the relationship between the first and the first spoken information and the second spoken information, the first participant and the said first participant. The first participant refers to the utterance relationship information generation unit that generates utterance relationship information indicating the relationship with the second participant, and the facial expression relationship information and the utterance relationship information. It is characterized by including a relationship information generation unit that generates relationship information, which is information indicating the relationship between the second participant and the second participant.

これによれば、各参加者の音声情報及び表情情報の両方に基づいて、会議中の参加者間の関係性を評価することができる。 According to this, it is possible to evaluate the relationship between the participants during the meeting based on both the voice information and the facial expression information of each participant.

前記一態様に係る情報処理装置において、前記関係性情報は、前記第１の参加者と前記第２の参加者との関係を示すリアルタイム又は経時的な情報である。 In the information processing apparatus according to the one aspect, the relationship information is real-time or temporal information indicating the relationship between the first participant and the second participant.

これによれば、各参加者の音声情報及び表情情報の両方に基づいて、会議中の参加者間の関係性をリアルタイムで評価することができる。 According to this, it is possible to evaluate the relationship between the participants during the meeting in real time based on both the voice information and the facial expression information of each participant.

前記一態様に係る情報処理装置において、前記第１の表情情報には、前記第１の参加者の表情を表現する複数の第１の指標が含まれており、前記第２の表情情報には、前記第２の参加者の表情を表現する複数の第２の指標が含まれており、前記表情関係性情報生成部は、前記第１の指標と前記第２の指標との差に関する表情差分情報を生成し、生成した表情差分情報を、前記表情関係性情報に含める。 In the information processing apparatus according to the one aspect, the first facial expression information includes a plurality of first indexes expressing the facial expressions of the first participant, and the second facial expression information includes a plurality of first indexes. , A plurality of second indexes expressing the facial expressions of the second participant are included, and the facial expression relationship information generation unit uses the facial expression difference regarding the difference between the first index and the second index. Information is generated, and the generated facial expression difference information is included in the facial expression relationship information.

これによれば、表情関係性情報を生成するために参照する参加者の表情を表現するために複数の指標を用いるため、より正確に参加者の表情を表現することができる。 According to this, since a plurality of indexes are used to express the facial expressions of the participants referred to for generating the facial expression relationship information, the facial expressions of the participants can be expressed more accurately.

前記一態様に係る情報処理装置において、前記第１の表情情報には、前記第１の参加者の視線方向に関する第１の視線情報が含まれており、前記第２の表情情報には、前記第２の参加者の視線方向に関する第２の視線情報が含まれており、前記表情関係性情報生成部は、前記第１の視線情報と前記第２の視線情報とを参照して視線関係性情報を生成し、生成した視線関係性情報を、前記表情関係性情報に含める。 In the information processing apparatus according to the one aspect, the first facial expression information includes the first line-of-sight information regarding the line-of-sight direction of the first participant, and the second facial expression information includes the above-mentioned second facial expression information. The second line-of-sight information regarding the line-of-sight direction of the second participant is included, and the facial expression relationship information generation unit refers to the first line-of-sight information and the second line-of-sight information to form a line-of-sight relationship. Information is generated, and the generated line-of-sight relationship information is included in the facial expression relationship information.

これによれば、表情関係性情報を生成するために参照する参加者の表情情報に参加者の視線情報も含むため、より正確に参加者の表情を表現することができる。 According to this, since the facial expression information of the participant referred to for generating the facial expression relationship information includes the line-of-sight information of the participant, the facial expression of the participant can be expressed more accurately.

前記一態様に係る情報処理装置において、前記発話関係性情報生成部は、前記第１の発話情報が示す前記第１の参加者の発話時間と、前記第２の発話情報が示す前記第２の参加者の発話時間との関係を示す発話時間関係性情報を生成し、生成した発話時間関係性情報を、前記発話関係性情報に含める。 In the information processing apparatus according to the one aspect, the utterance relationship information generation unit has the utterance time of the first participant indicated by the first utterance information and the second utterance information indicated by the second utterance information. The utterance time relationship information indicating the relationship with the utterance time of the participant is generated, and the generated utterance time relationship information is included in the utterance relationship information.

これによれば、発話関係性情報を生成するために参照する参加者の発話情報に発話時間関係性情報も含むため、より正確に参加者の発話関係性情報を生成することができる。 According to this, since the utterance time relationship information of the participant referred to for generating the utterance relationship information is also included, the utterance relationship information of the participant can be generated more accurately.

前記一態様に係る情報処理装置において、前記発話関係性情報生成部は、前記第１の発話情報及び前記第２の発話情報の少なくとも何れかに、特定のカテゴリーに含まれる発話内容が含まれているか否かを判定し、判定した結果に応じた情報を前記発話関係性情報に含める。 In the information processing apparatus according to the one aspect, the utterance relationship information generation unit includes, at least one of the first utterance information and the second utterance information, the utterance content included in a specific category. It is determined whether or not the information is present, and the information corresponding to the result of the determination is included in the utterance relationship information.

これによれば、発話関係性情報に特定のカテゴリーに含まれる発話内容が含まれているか否かの判定結果に応じた情報も含むため、より正確に参加者間の関係性情報を生成することができる。 According to this, since the utterance relationship information includes information according to the judgment result of whether or not the utterance content included in a specific category is included, the relationship information between the participants can be generated more accurately. Can be done.

前記一態様に係る情報処理装置において、前記発話関係性情報生成部は、前記第１の発話情報及び前記第２の発話情報の少なくとも何れかから、所定時間内において相対的に出現頻度の高い単語を抽出し、抽出した単語を前記発話関係性情報に含める。 In the information processing apparatus according to the one aspect, the utterance relationship information generation unit is a word having a relatively high frequency of appearance within a predetermined time from at least one of the first utterance information and the second utterance information. Is extracted, and the extracted words are included in the utterance relationship information.

これによれば、発話関係性情報に頻度の高い単語の上方も含むため、より正確に参加者間の関係性情報を生成することができる。 According to this, since the utterance relationship information includes the upper part of the frequently used word, the relationship information between the participants can be generated more accurately.

前記一態様に係る情報処理装置において、前記関係性情報生成部は、前記関係性情報を参照して、前記第１の参加者及び前記第２の参加者の少なくとも何れかに提示する提示情報を生成する。 In the information processing apparatus according to the one aspect, the relationship information generation unit refers to the relationship information and presents presentation information to at least one of the first participant and the second participant. Generate.

これによれば、提示情報を参加者に提示することで、関係性情報を参加者に認識させることができる。 According to this, by presenting the presented information to the participants, it is possible to make the participants recognize the relationship information.

前記提示情報には、前記第１の参加者の発話時間と、前記第２の参加者の発話時間との割合を示す情報、及び、前記第１の参加者の視線方向と、前記第２の参加者の視線方向との合致率の経時変化に関する情報が含まれている。 The presented information includes information indicating the ratio between the utterance time of the first participant and the utterance time of the second participant, the line-of-sight direction of the first participant, and the second. It contains information about the time course of the match rate with the participant's line-of-sight direction.

これによれば、提示情報を参加者に提示することで、各参加者の発話時間の割合、及び、各参加者の視線方向の合致率の経時変化を参加者に認識させることができる。 According to this, by presenting the presented information to the participants, it is possible to make the participants recognize the ratio of the utterance time of each participant and the change with time of the matching rate in the line-of-sight direction of each participant.

前記一態様に係る情報処理装置において、前記表情関係性情報生成部、及び前記発話関係性情報生成部は、前記第１及び第２の参加者の属性を示す参加者情報を更に参照して、前記表情関係性情報及び前記発話関係性情報を生成する。 In the information processing apparatus according to the one aspect, the facial expression relationship information generation unit and the utterance relationship information generation unit further refer to the participant information indicating the attributes of the first and second participants. The facial expression relationship information and the utterance relationship information are generated.

これによれば、表情関係性情報及び発話関係性情報の生成に参加者の属性も参照するので、より正確な表情関係性情報及び発話関係性情報を生成することができる。 According to this, since the attributes of the participants are also referred to in the generation of the facial expression relationship information and the utterance relationship information, more accurate facial expression relationship information and the utterance relationship information can be generated.

また、前記課題を解決するために、本発明の一態様に係る情報処理方法は、複数の参加者のうち第１の参加者の表情に関する第１の表情情報と、前記複数の参加者のうち第２の参加者の表情に関する第２の表情情報とを取得する表情情報取得ステップと、当該第１の参加者の発話に関する第１の発話情報と、前記複数の参加者のうち第２の参加者の発話に関する第２の発話情報とを取得する音声情報取得ステップと、前記第１の表情情報と前記第２の表情情報とを参照して、前記第１の参加者と前記第２の参加者との表情に関する関係性を示す表情関係性情報を生成する表情関係性情報生成ステップと、前記第１の発話情報と前記第２の発話情報とを参照して、前記第１の参加者と前記第２の参加者との発話に関する関係性を示す発話関係性情報を生成する発話関係性情報生成ステップと、前記表情関係性情報と前記発話関係性情報とを参照して前記第１の参加者と前記第２の参加者との関係を示すリアルタイム又は経時的な情報である関係性情報を生成する関係性情報生成ステップと、を含む、ことを特徴としている。 Further, in order to solve the above-mentioned problems, the information processing method according to one aspect of the present invention includes the first facial expression information regarding the facial expressions of the first participant among the plurality of participants and the first facial expression information regarding the facial expressions of the first participant among the plurality of participants. The facial information acquisition step for acquiring the second facial information regarding the facial expressions of the second participant, the first utterance information regarding the utterance of the first participant, and the second participation among the plurality of participants. The first participant and the second participation with reference to the voice information acquisition step of acquiring the second utterance information regarding the utterance of the person and the first facial information and the second facial information. With reference to the first utterance information and the second utterance information, and the first participant The first participation with reference to the utterance relationship information generation step for generating the utterance relationship information indicating the utterance relationship information with the second participant, and the facial expression relationship information and the utterance relationship information. It is characterized by including a relationship information generation step of generating relationship information, which is real-time or temporal information indicating the relationship between the person and the second participant.

これによれば、各参加者の音声情報及び表情情報に基づいて、会議中の参加者間の関係性を評価することができる。 According to this, it is possible to evaluate the relationship between the participants during the meeting based on the voice information and the facial expression information of each participant.

また、前記課題を解決するために、本発明の一態様に係る情報処理プログラムは、前記何れかに記載の情報処理装置としてコンピュータを機能させるための情報処理プログラムであって、前記表情情報取得部、前記音声情報取得部、前記表情関係性情報生成部、前記発話関係性情報生成部、及び前記関係性情報生成部としてコンピュータを機能させる。 Further, in order to solve the above-mentioned problems, the information processing program according to one aspect of the present invention is an information processing program for operating a computer as the information processing device according to any one of the above, and is the facial expression information acquisition unit. The computer functions as the voice information acquisition unit, the facial expression relationship information generation unit, the speech relationship information generation unit, and the relationship information generation unit.

本発明の一態様によれば、各参加者の音声情報及び表情情報に基づいて、会議中の参加者間の関係性を評価することができる。 According to one aspect of the present invention, the relationship between the participants during the meeting can be evaluated based on the voice information and the facial expression information of each participant.

本発明の一実施形態に係る情報処理装置を含む情報処理システムの構成要素の一例を示すブロック図である。It is a block diagram which shows an example of the component of the information processing system including the information processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理装置を含む情報処理システムの概要を示す図である。It is a figure which shows the outline of the information processing system which includes the information processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理装置を含む情報処理システムにおけるデータの流れの概要を示す図である。It is a figure which shows the outline of the data flow in the information processing system including the information processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理装置を含む情報処理システムが提示する情報の一例を示す図である。It is a figure which shows an example of the information presented by the information processing system including the information processing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理装置を含む情報処理システムが提示する情報の他の例を示す図である。It is a figure which shows the other example of the information which the information processing system including the information processing apparatus which concerns on one Embodiment of this invention presents.

［実施形態１］
以下、本発明の一実施形態について、詳細に説明する。図１は、本実施形態の情報処理装置１０を含む情報処理システム１００の概要を示す図である。図１に示すように、情報処理システム１００は、情報処理装置１０、第１の端末装置２０、及び第２の端末装置３０を備えている。なお端末装置の数は、本実施形態を限定するものではなく、３以上であってもよい。 [Embodiment 1]
Hereinafter, one embodiment of the present invention will be described in detail. FIG. 1 is a diagram showing an outline of an information processing system 100 including the information processing device 10 of the present embodiment. As shown in FIG. 1, the information processing system 100 includes an information processing device 10, a first terminal device 20, and a second terminal device 30. The number of terminal devices is not limited to this embodiment, and may be 3 or more.

図２は、本発明の一実施形態に係る情報処理装置１０を含む情報処理システム１００の概要を示す図である。情報処理システム１００においては、図２に示すように、第１の端末装置２０を使用する第１の参加者２００と、第２の端末装置３０を使用する第２の参加者２０１との間の関係性を評価する。 FIG. 2 is a diagram showing an outline of an information processing system 100 including an information processing device 10 according to an embodiment of the present invention. In the information processing system 100, as shown in FIG. 2, between the first participant 200 who uses the first terminal device 20 and the second participant 201 who uses the second terminal device 30. Evaluate the relationship.

情報処理システム１００においては、第１の端末装置２０及び第２の端末装置３０から得られる第１の参加者２００及び第２の参加者２０１の会議中の表情情報及び発話情報を元に、情報処理装置１０が第１の参加者２００と第２の参加者２０１との会議中の関係性を評価する。情報処理システム１００は、関係性を評価した結果を、第１の端末装置２０及び第２の端末装置３０の少なくとも一方に表示させることで、第１の参加者２００及び第２の参加者２０１の少なくとも一方に評価結果をリアルタイムでフィードバックし、会議中のコミュニケーション状態の改善を促す。 In the information processing system 100, information is obtained based on the facial expression information and the spoken information of the first participant 200 and the second participant 201 obtained from the first terminal device 20 and the second terminal device 30 during the meeting. The processing device 10 evaluates the relationship between the first participant 200 and the second participant 201 during the meeting. The information processing system 100 displays the result of the evaluation of the relationship on at least one of the first terminal device 20 and the second terminal device 30, so that the first participant 200 and the second participant 201 can display the result. The evaluation results are fed back to at least one person in real time to promote improvement of the communication status during the meeting.

なお、本実実施形態において、「会議」とは、狭義の意味の会議に限定されるものではなく、面談、面接、カウンセリング、問診、接客、接見、相談等が含まれる。一例として、
・上司と部下との面談
・医師による患者への問診
・カウンセラーによる対象者へのカウンセリング
・客に対する店員の接客や窓口相談
・ＷｅｂＭｅｅｔｉｎｇ等の遠隔でのコミュニケーション
・ｅ－Ｌｅａｒｎｉｎｇ等の対ビデオ画像に対するコミュニケーション
等が含まれる。 In the present embodiment, the “meeting” is not limited to a meeting in a narrow sense, but includes interviews, interviews, counseling, interviews, customer service, interviews, consultations, and the like. As an example,
・ Interviews between superiors and subordinates ・ Interviews with patients by doctors ・ Counseling with target persons by counselors ・ Customer service and counter consultation for customers ・ Remote communication such as Web Meeting ・ For video images such as e-Learning Communication etc. are included.

また、図３は、情報処理システム１００におけるデータの流れの概要を示す図である。 Further, FIG. 3 is a diagram showing an outline of the data flow in the information processing system 100.

〔第１の端末装置２０〕
図１に示すように、第１の端末装置２０は、カメラ２１、マイク２２、表示部２３、制御部２４、スピーカ２５、及び通信部２６を備えている。 [First terminal device 20]
As shown in FIG. 1, the first terminal device 20 includes a camera 21, a microphone 22, a display unit 23, a control unit 24, a speaker 25, and a communication unit 26.

＜動画取得処理＞
カメラ２１は、第１の参加者を撮像し、撮像画像を制御部２４に供給する。ここで、カメラ２１による撮像画像は、動画像であることが好ましく、当該構成の場合、図３に示すように、カメラ２１は、制御部２４に対して、動画ファイル、及び動画ファイルに含まれる各画像のリストである動画ファイルリストの少なくとも何れかを供給する。カメラ２１は、当該動画ファイルに含まれる各画像の撮像時刻を示すタイムスタンプを制御部２４に供給する。 <Video acquisition process>
The camera 21 captures the first participant and supplies the captured image to the control unit 24. Here, the image captured by the camera 21 is preferably a moving image, and in the case of the configuration, as shown in FIG. 3, the camera 21 is included in the moving image file and the moving image file with respect to the control unit 24. Supply at least one of a moving image file list, which is a list of each image. The camera 21 supplies the control unit 24 with a time stamp indicating the imaging time of each image included in the moving image file.

なお、第１の端末装置２０は、複数のカメラを備える構成としてもよく、当該構成の場合、制御部２４は、カメラデバイス識別情報を参照することによって、カメラ２１を特定することができる。 The first terminal device 20 may be configured to include a plurality of cameras, and in the case of the configuration, the control unit 24 can specify the camera 21 by referring to the camera device identification information.

＜画像認識処理＞
制御部２４は、カメラ２１から供給される動画ファイル、動画ファイルリスト、及びタイムスタンプを参照して、画像認識処理を行う。 <Image recognition processing>
The control unit 24 performs image recognition processing with reference to the moving image file, the moving image file list, and the time stamp supplied from the camera 21.

一例として、図３に示すように、制御部２４は、動画ファイル、動画ファイルリスト、及びタイムスタンプを参照した画像認識処理を行うことによって、時系列表情値、時系列顔パーツ座標、時系列視線座標を算出する。時系列表情値、時系列顔パーツ座標、時系列視線座標は、第１の参加者の表情に関する第１の表情情報の一例である。 As an example, as shown in FIG. 3, the control unit 24 performs image recognition processing with reference to a moving image file, a moving image file list, and a time stamp, thereby performing time-series facial expression values, time-series face part coordinates, and time-series line-of-sight. Calculate the coordinates. The time-series facial expression value, the time-series face part coordinates, and the time-series line-of-sight coordinates are examples of the first facial expression information regarding the facial expressions of the first participant.

＜音声取得処理＞
マイク２２は、主として、第１の参加者の発話する音声を集音し、集音した音声を示す音声ファイル、及び、当該音声ファイルにおける発話の時点を特定するためのタイムスタンプを制御部２４に供給する。 <Voice acquisition processing>
The microphone 22 mainly collects the voice uttered by the first participant, and provides a voice file indicating the collected voice and a time stamp for specifying the time point of the utterance in the voice file to the control unit 24. Supply.

なお、第１の端末装置２０は、複数のマイクを備える構成としてもよく、当該構成の場合、制御部２４は、音声デバイス識別情報を参照することによって、マイク２２を特定することができる。 The first terminal device 20 may be configured to include a plurality of microphones, and in the case of the configuration, the control unit 24 can specify the microphone 22 by referring to the voice device identification information.

＜発話認識処理＞
制御部２４は、マイク２２から供給される音声ファイル及びタイムスタンプを参照して、発話認識処理を行う。 <Utterance recognition processing>
The control unit 24 performs the utterance recognition process with reference to the audio file and the time stamp supplied from the microphone 22.

一例として、図３に示すように、制御部２４は、音声ファイル、及びタイムスタンプを参照した発話認識処理を行うことによって、区間時系列テキストデータを生成する。ここで区間時系列テキストデータは、主として第１の参加者が発話した内容を時系列的にテキストデータとして示す情報である。当該区間時系列テキストデータは、第１の参加者の発話を示す第１の発話情報の一例である。 As an example, as shown in FIG. 3, the control unit 24 generates section time-series text data by performing an utterance recognition process with reference to an audio file and a time stamp. Here, the section time-series text data is information that mainly indicates the contents spoken by the first participant as text data in chronological order. The section time-series text data is an example of the first utterance information indicating the utterance of the first participant.

〔第２の端末装置３０〕
また、第２の端末装置３０は、カメラ３１、マイク３２、表示部３３、制御部３４、スピーカ３５、及び通信部３６を備えている。 [Second terminal device 30]
The second terminal device 30 includes a camera 31, a microphone 32, a display unit 33, a control unit 34, a speaker 35, and a communication unit 36.

＜動画取得処理＞
カメラ３１は、第２の参加者を撮像し、撮像画像を制御部３４に供給する。ここで、カメラ３１による撮像画像は、動画像であることが好ましく、当該構成の場合、図３に示すように、カメラ３１は、制御部３４に対して、動画ファイル、及び動画ファイルに含まれる各画像のリストである動画ファイルリストの少なくとも何れかを供給する。カメラ３１は、当該動画ファイルに含まれる各画像の撮像時刻を示すタイムスタンプを制御部３４に供給する。 <Video acquisition process>
The camera 31 takes an image of the second participant and supplies the captured image to the control unit 34. Here, the image captured by the camera 31 is preferably a moving image, and in the case of the configuration, as shown in FIG. 3, the camera 31 is included in the moving image file and the moving image file with respect to the control unit 34. Supply at least one of a moving image file list, which is a list of each image. The camera 31 supplies the control unit 34 with a time stamp indicating the imaging time of each image included in the moving image file.

なお、第２の端末装置３０は、複数のカメラを備える構成としてもよく、当該構成の場合、制御部３４は、カメラデバイス識別情報を参照することによって、カメラ３１を特定することができる。 The second terminal device 30 may be configured to include a plurality of cameras, and in the case of the configuration, the control unit 34 can specify the camera 31 by referring to the camera device identification information.

＜画像認識処理＞
制御部３４は、カメラ３１から供給される動画ファイル、動画ファイルリスト、及びタイムスタンプを参照して、画像認識処理を行う。 <Image recognition processing>
The control unit 34 performs image recognition processing with reference to the moving image file, the moving image file list, and the time stamp supplied from the camera 31.

一例として、図３に示すように、制御部３４は、動画ファイル、動画ファイルリスト、及びタイムスタンプを参照した画像認識処理を行うことによって、時系列表情値、時系列顔パーツ座標、時系列視線座標を算出する。時系列表情値、時系列顔パーツ座標、時系列視線座標は、第２の参加者の表情に関する第２の表情情報の一例である。 As an example, as shown in FIG. 3, the control unit 34 performs image recognition processing with reference to a moving image file, a moving image file list, and a time stamp, thereby performing time-series facial expression values, time-series face part coordinates, and time-series line-of-sight. Calculate the coordinates. The time-series facial expression value, the time-series face part coordinates, and the time-series line-of-sight coordinates are examples of the second facial expression information regarding the facial expressions of the second participant.

＜音声取得処理＞
マイク３２は、主として、第２の参加者の発話する音声を集音し、集音した音声を示す音声ファイル、及び、当該音声ファイルにおける発話の時点を特定するためのタイムスタンプを制御部３４に供給する。 <Voice acquisition processing>
The microphone 32 mainly collects the voice spoken by the second participant, and provides a voice file indicating the collected voice and a time stamp for specifying the time point of the utterance in the voice file to the control unit 34. Supply.

なお、第２の端末装置３０は、複数のマイクを備える構成としてもよく、当該構成の場合、制御部３４は、音声デバイス識別情報を参照することによって、マイク３２を特定することができる。 The second terminal device 30 may be configured to include a plurality of microphones, and in the case of the configuration, the control unit 34 can specify the microphone 32 by referring to the voice device identification information.

＜発話認識処理＞
制御部３４は、マイク３２から供給される音声ファイル及びタイムスタンプを参照して、発話認識処理を行う。 <Utterance recognition processing>
The control unit 34 performs an utterance recognition process with reference to the audio file and the time stamp supplied from the microphone 32.

一例として、図３に示すように、制御部３４は、音声ファイル、及びタイムスタンプを参照した発話認識処理を行うことによって、区間時系列テキストデータを生成する。ここで区間時系列テキストデータは、主として第２の参加者が発話した内容を時系列的にテキストデータとして示す情報である。当該区間時系列テキストデータは、第２の参加者の発話を示す第２の発話情報の一例である。 As an example, as shown in FIG. 3, the control unit 34 generates section time-series text data by performing an utterance recognition process with reference to an audio file and a time stamp. Here, the section time-series text data is information that mainly indicates the contents spoken by the second participant as text data in chronological order. The section time-series text data is an example of the second utterance information indicating the utterance of the second participant.

〔情報処理装置１０〕
情報処理装置１０は、表情情報取得部１３、音声情報取得部１４、表情関係性情報生成部１５、発話関係性情報生成部１６、及び関係性情報生成部１７を備えている。情報処理装置１０は、さらに、通信部１１を備えている。表情情報取得部１３、音声情報取得部１４、表情関係性情報生成部１５、発話関係性情報生成部１６、関係性情報生成部１７は、演算部１２が備えている。 [Information processing device 10]
The information processing apparatus 10 includes a facial expression information acquisition unit 13, a voice information acquisition unit 14, a facial expression relationship information generation unit 15, an utterance relationship information generation unit 16, and a relationship information generation unit 17. The information processing device 10 further includes a communication unit 11. The calculation unit 12 includes a facial expression information acquisition unit 13, a voice information acquisition unit 14, a facial expression relationship information generation unit 15, an utterance relationship information generation unit 16, and a relationship information generation unit 17.

（表情情報取得部１３）
表情情報取得部１３は、通信部１１を介して複数の会議参加者のうち第１の参加者の表情に関する第１の表情情報と、複数の会議参加者のうち第２の参加者の表情に関する第２の表情情報とを取得する。 (Facial expression information acquisition unit 13)
The facial expression information acquisition unit 13 relates to the first facial expression information regarding the facial expression of the first participant among the plurality of conference participants and the facial expression of the second participant among the plurality of conference participants via the communication unit 11. Acquire the second facial expression information.

＜数値データ洗浄処理＞
表情情報取得部１３は、一例として、通信部１１を介して、第１の参加者に関する表情情報に含まれる時系列数値データである時系列表情値、時系列顔パーツ座標、及び時系列視線座標を参照し、当該時系列数値データに対して、一例として以下の処理を行うことによって、数値データ洗浄処理を行う。
・無効データ区間を削除する
・有効データ区間におけるデータを平均する
・分散及び項数に変換する
表情情報取得部１３は、上述の数値データ洗浄処理を行うことによって、第１の参加者に関する区間時系列数値データを生成する。当該区間時系列数値データは、有効区間における時系列表情値、時系列顔パーツ座標、及び時系列視線座標を含んでいる。 <Numerical data cleaning process>
As an example, the facial expression information acquisition unit 13 via the communication unit 11 has time-series facial expression values, time-series face part coordinates, and time-series line-of-sight coordinates, which are time-series numerical data included in the facial expression information regarding the first participant. As an example, the numerical data cleaning process is performed by performing the following process on the time-series numerical data with reference to.
-Delete the invalid data section-Average the data in the valid data section-Convert to the distribution and the number of terms The facial expression information acquisition unit 13 performs the above-mentioned numerical data cleaning process to perform the section time related to the first participant. Generate series numerical data. The section time-series numerical data includes time-series facial expression values, time-series face part coordinates, and time-series line-of-sight coordinates in the effective section.

表情情報取得部１３は、第２の参加者に関する表情情報についても同様の処理を行い、第２の参加者に関する区間時系列数値データを生成する。 The facial expression information acquisition unit 13 performs the same processing on the facial expression information regarding the second participant, and generates interval time-series numerical data regarding the second participant.

＜表情の検出＞
表情情報取得部１３は、第１の参加者に関する区間時系列数値データを参照して、第１の参加者の表情を表現する複数の第１の指標を算出する。また、表情情報取得部１３は、第２の参加者に関する区間時系列数値データを参照して、第２の参加者の表情を表現する複数の第２の指標を算出する。 <Facial expression detection>
The facial expression information acquisition unit 13 calculates a plurality of first indexes expressing the facial expressions of the first participant with reference to the interval time-series numerical data relating to the first participant. Further, the facial expression information acquisition unit 13 calculates a plurality of second indexes expressing the facial expressions of the second participant by referring to the section time-series numerical data relating to the second participant.

ここで、表情を表現する指標の例には、以下の指標が挙げられる。
・怒り（anger）
・侮辱（contempt）
・嫌悪（disgust）
・恐怖（fear）
・喜び（happiness）
・中立（neutral）
・悲しみ（sadness）
・驚き（surprise）
したがって、表情を表現する指標とは、当該表情が示す感情を表現する指標ということもできる。 Here, as an example of the index expressing the facial expression, the following index can be mentioned.
・ Anger
・ Insult (contempt)
・ Disgust
・ Fear
・ Happiness
・ Neutral
・ Sadness
・ Surprise
Therefore, the index expressing the facial expression can also be said to be an index expressing the emotion indicated by the facial expression.

なお、表情情報取得部１３は、第１の参加者に関する区間時系列数値データに含まれる時系列表情値を、そのまま第１の参加者の表情を表現する複数の第１の指標として用いてもよい。同様に、表情情報取得部１３は、第２の参加者に関する区間時系列数値データに含まれる時系列表情値を、そのまま第２の参加者の表情を表現する複数の第２の指標として用いてもよい。 The facial expression information acquisition unit 13 may use the time-series facial expression value included in the section time-series numerical data relating to the first participant as a plurality of first indexes for expressing the facial expression of the first participant as it is. good. Similarly, the facial expression information acquisition unit 13 uses the time-series facial expression values included in the interval time-series numerical data relating to the second participant as a plurality of second indexes for expressing the facial expressions of the second participant as they are. May be good.

また、第１の参加者の表情及び第２の参加者の表情は、上記の指標を各成分とするベクトルとして表現することもできる。こられのベクトルを、表情ベクトルと呼ぶこともある。 Further, the facial expressions of the first participant and the facial expressions of the second participant can also be expressed as a vector having the above index as each component. These vectors are sometimes called facial expression vectors.

なお、各参加者の表情を検出する技術及び検出した表情が示す感情を指標化して表現する技術は、本実施形態を限定するものではなく、例えば、公知の技術を用いることができる。 The technique for detecting the facial expressions of each participant and the technique for indexing and expressing the emotions indicated by the detected facial expressions are not limited to the present embodiment, and for example, known techniques can be used.

＜視線の検出＞
また、表情情報取得部１３は、通信部１１を介して、第１の端末装置２０及び第２の端末装置３０から、第１の参加者及び第２の参加者の視線方向に関する情報を取得する。具体的には、一例として、表情情報取得部１３は、第１の参加者の視線方向に関する情報として、上述した第１の参加者に関する区間時系列数値データに含まれる時系列視線座標を取得する。同様に、表情情報取得部１３は、第２の参加者の視線方向に関する情報として、上述した第２の参加者に関する区間時系列数値データに含まれる時系列視線座標を取得する。 <Detection of line of sight>
Further, the facial expression information acquisition unit 13 acquires information regarding the line-of-sight directions of the first participant and the second participant from the first terminal device 20 and the second terminal device 30 via the communication unit 11. .. Specifically, as an example, the facial expression information acquisition unit 13 acquires the time-series line-of-sight coordinates included in the above-mentioned section time-series numerical data regarding the first participant as information regarding the line-of-sight direction of the first participant. .. Similarly, the facial expression information acquisition unit 13 acquires the time-series line-of-sight coordinates included in the above-mentioned section time-series numerical data regarding the second participant as information regarding the line-of-sight direction of the second participant.

なお、視線座標の取得方法としては、特に限定されないが、第１の端末装置２０及び第２の端末装置３０に、点光源（不図示）を設け、点光源からの光の角膜反射像をカメラ２１及びカメラ３１で所定時間撮影することにより、ユーザの視線座標を取得する方法が挙げられる。点光源の種類は特に限定されず、可視光、赤外光が挙げられるが、例えば赤外線ＬＥＤを用いることで、ユーザに不快感を与えることなく、視線座標を取得することができる。 The method of acquiring the line-of-sight coordinates is not particularly limited, but a point light source (not shown) is provided in the first terminal device 20 and the second terminal device 30, and the corneal reflex image of the light from the point light source is captured by the camera. A method of acquiring the user's line-of-sight coordinates by taking a picture with the 21 and the camera 31 for a predetermined time can be mentioned. The type of the point light source is not particularly limited, and examples thereof include visible light and infrared light. For example, by using an infrared LED, the line-of-sight coordinates can be acquired without causing discomfort to the user.

＜距離の検出＞
また、表情情報取得部１３は、当該区間時系列数値データに含まれる第１の参加者の時系列顔パーツ座標を取得し、第１の参加者と撮像手段（カメラ２１）との間の距離を算出してもよい。また、表情情報取得部１３は、当該区間時系列数値データに含まれる第２の参加者の時系列顔パーツ座標を取得し、第２の参加者と撮像手段（カメラ３１）との間の距離を算出してもよい。参加者と撮像手段との間の距離は、例えば、顔パーツ座標から得られる撮像画像中の顔の目尻距離を顔角度補正したものを目尻距離とし、この目尻距離の逆数として算出することができる。 <Distance detection>
Further, the facial expression information acquisition unit 13 acquires the time-series face part coordinates of the first participant included in the section time-series numerical data, and the distance between the first participant and the image pickup means (camera 21). May be calculated. Further, the facial expression information acquisition unit 13 acquires the time-series face part coordinates of the second participant included in the section time-series numerical data, and the distance between the second participant and the image pickup means (camera 31). May be calculated. The distance between the participant and the imaging means can be calculated as the reciprocal of the outer corner distance of the face, for example, the outer corner distance of the face corrected by the face angle in the captured image obtained from the coordinates of the face parts. ..

（音声情報取得部１４）
音声情報取得部１４は、第１の参加者の発話に関する第１の発話情報と、複数の会議参加者のうち第２の参加者の発話に関する第２の発話情報とを取得する。すなわち、音声情報取得部１４は、通信部１１を介して、第１の端末装置２０及び第２の端末装置３０から第１の参加者及び第２の参加者の発話に関する情報を取得する。 (Voice information acquisition unit 14)
The voice information acquisition unit 14 acquires the first utterance information regarding the utterance of the first participant and the second utterance information regarding the utterance of the second participant among the plurality of conference participants. That is, the voice information acquisition unit 14 acquires information regarding the utterances of the first participant and the second participant from the first terminal device 20 and the second terminal device 30 via the communication unit 11.

音声情報取得部１４は、一例として、上述した第１の参加者に関する区間時系列テキストデータに含まれる時系列発話テキストを取得する。同様に、音声情報取得部１４は、一例として、上述した第２の参加者に関する区間時系列テキストデータに含まれる時系列発話テキストを取得する。 As an example, the voice information acquisition unit 14 acquires the time-series utterance text included in the section time-series text data relating to the first participant described above. Similarly, the voice information acquisition unit 14 acquires, as an example, the time-series utterance text included in the section time-series text data relating to the second participant described above.

また、一例として、音声情報取得部１４は、第１の参加者に関する時系列発話テキストと、当該時系列発話テキストの発話の時点における時系列顔パーツ座標を取得する。音声情報取得部１４は、時系列顔パーツ座標を参照して、時系列発話テキストの発話の時点において第１の参加者の口が開いていれば、当該時系列発話テキストを第１の参加者の発話に関する第１の発話情報に含める。同様に、音声情報取得部１４は、第２の参加者に関する時系列発話テキストと、当該時系列発話テキストの発話の時点における時系列顔パーツ座標を取得する。音声情報取得部１４は、時系列顔パーツ座標を参照して、時系列発話テキストの発話の時点において第２の参加者の口が開いていれば、当該時系列発話テキストを第２の参加者の発話に関する第２の発話情報に含める。これにより、マイク２２又はマイク３２として、指向性の無い簡易なマイクを用いた場合でも、発話した人物を特定できる。 Further, as an example, the voice information acquisition unit 14 acquires the time-series utterance text regarding the first participant and the time-series face part coordinates at the time of utterance of the time-series utterance text. The voice information acquisition unit 14 refers to the time-series face parts coordinates, and if the mouth of the first participant is open at the time of utterance of the time-series utterance text, the first participant uses the time-series utterance text. Included in the first utterance information regarding the utterance of. Similarly, the voice information acquisition unit 14 acquires the time-series utterance text regarding the second participant and the time-series face part coordinates at the time of utterance of the time-series utterance text. The voice information acquisition unit 14 refers to the time-series face parts coordinates, and if the second participant's mouth is open at the time of utterance of the time-series utterance text, the second participant uses the time-series utterance text. Included in the second utterance information regarding the utterance of. Thereby, even when a simple microphone having no directivity is used as the microphone 22 or the microphone 32, the person who has spoken can be specified.

（表情関係性情報生成部１５）
表情関係性情報生成部１５は、第１の表情情報と第２の表情情報とを参照して、第１の参加者と第２の参加者との表情に関する関係性を示す表情関係性情報を生成する。 (Facial expression relationship information generation unit 15)
The facial expression relationship information generation unit 15 refers to the first facial expression information and the second facial expression information, and obtains facial expression relationship information indicating the relationship between the first participant and the second participant regarding the facial expression. Generate.

会議参加者の会議に対する満足度は、会議の内容及び結論のみならず、参加者間の良好なコミュニケーションの有無にも依存する。参加者間のコミュニケーション状態は、会議中の参加者間の関係性により表され、参加者間の関係性は感情の一致度により評価することができる。表情関係性情報生成部１５は、表情情報取得部１３から第１の表情情報及び第２の表情情報を取得し、これらの表情情報を元に、会議中の参加者間の感情の一致度を参加者双方の表情から評価することで、参加者間のコミュニケーション状態をリアルタイムで評価する。 Meeting participants' satisfaction with the meeting depends not only on the content and conclusions of the meeting, but also on the presence or absence of good communication between the participants. The state of communication between participants is represented by the relationships between participants during the meeting, and the relationships between participants can be evaluated by the degree of emotional concordance. The facial expression relationship information generation unit 15 acquires the first facial expression information and the second facial expression information from the facial expression information acquisition unit 13, and based on these facial expression information, determines the degree of emotional matching between the participants during the meeting. By evaluating from the facial expressions of both participants, the communication status between the participants is evaluated in real time.

表情関係性情報生成部１５が表情情報取得部１３から取得するそれぞれの表情情報は、区間時系列数値データを元に算出されたものであり、つまり、各参加者のリアルタイム又は経時的な表情に関する情報に基づいて算出されたものである。表情関係性情報生成部１５は、各参加者のリアルタイム又は経時的な表情情報を元に、表情関係性情報を生成するので、生成された表情関係性情報は、参加者間のリアルタイム又は経時的な表情に関する関係性を表している。 Each facial expression information acquired by the facial expression relationship information generation unit 15 from the facial expression information acquisition unit 13 is calculated based on the interval time series numerical data, that is, regarding the real-time or temporal facial expression of each participant. It was calculated based on the information. Since the facial expression relationship information generation unit 15 generates facial expression relationship information based on the facial expression information of each participant in real time or over time, the generated facial expression relationship information can be obtained in real time or over time between the participants. It shows the relationship related to various facial expressions.

＜表情一致率判定＞
上述のように、第１の表情情報には、第１の参加者の表情を表現する複数の第１の指標が含まれており、第２の表情情報には、第２の参加者の表情を表現する複数の第２の指標が含まれている。 <Facial expression match rate judgment>
As described above, the first facial expression information includes a plurality of first indexes expressing the facial expressions of the first participant, and the second facial expression information includes the facial expressions of the second participant. A plurality of second indicators expressing the above are included.

表情関係性情報生成部１５は、第１の指標と第２の指標との差に関する表情差分情報を生成し、生成した表情差分情報を、表情関係性情報に含めてもよい。 The facial expression relationship information generation unit 15 may generate facial expression difference information regarding the difference between the first index and the second index, and may include the generated facial expression difference information in the facial expression relationship information.

一例として、表情関係性情報生成部１５は、第１の参加者の表情を表現する複数の指標を要素とする第１の表情ベクトルと、第２の参加者の表情を表現する複数の指標を要素とする第２の表情ベクトルとの差の絶対値を用いて、表情不一致量を算出する。算出された表情不一致量は、参加者間の会議中の感情の融和状態を表す指標とも言える。また、表情関係性情報生成部１５は、会議開始から現時点までに、表情が一致した割合を示す指標として表情一致率を算出してもよい。表情一致率は、例えば、会議開始から現時点までの時間から表情が不一致であった時間を引いて、会議開始から現時点までの時間で除算することによって得られる。 As an example, the facial expression relationship information generation unit 15 uses a first facial expression vector having a plurality of indexes expressing the facial expressions of the first participant as elements, and a plurality of indexes expressing the facial expressions of the second participant. The amount of facial expression mismatch is calculated using the absolute value of the difference from the second facial expression vector as an element. The calculated facial expression mismatch amount can be said to be an index showing the state of emotional harmony between the participants during the meeting. Further, the facial expression relationship information generation unit 15 may calculate the facial expression matching rate as an index indicating the ratio of matching facial expressions from the start of the meeting to the present time. The facial expression match rate is obtained, for example, by subtracting the time when the facial expressions did not match from the time from the start of the conference to the present time and dividing by the time from the start of the conference to the present time.

＜視線合致率判定＞
また、第１の表情情報が、第１の参加者の視線方向に関する第１の視線情報を含む構成とし、第２の表情情報が、第２の参加者の視線方向に関する第２の視線情報を含む構成としてもよい。表情関係性情報生成部１５は、第１の視線情報と第２の視線情報とを参照して視線関係性情報を生成し、生成した視線関係性情報を、表情関係性情報に含めてもよい。 <Judgment of line-of-sight matching rate>
Further, the first facial expression information includes the first line-of-sight information regarding the line-of-sight direction of the first participant, and the second facial expression information includes the second line-of-sight information regarding the line-of-sight direction of the second participant. It may be a configuration including. The facial expression relationship information generation unit 15 may generate line-of-sight relationship information with reference to the first line-of-sight information and the second line-of-sight information, and may include the generated line-of-sight relationship information in the facial expression relationship information. ..

一例として、表情関係性情報生成部１５は、視線関係性情報として、第１の参加者と第２の参加者との視線合致率を算出する。算出された視線合致率は、会議中に他の参加者の様子を気にかけている状態を表す指標とも言える。より具体的には、まず、制御部２４又は表情関係性情報生成部１５が、カメラ２１の撮像画像を解析することにより、会議室における第１の参加者の目の位置を特定し、制御部３４又は表情関係性情報生成部１５が、カメラ３１の撮像画像を解析することにより、会議室における第２の参加者の目の位置を特定する。 As an example, the facial expression relationship information generation unit 15 calculates the line-of-sight matching rate between the first participant and the second participant as the line-of-sight relationship information. The calculated line-of-sight matching rate can be said to be an index showing the state of being concerned about the state of other participants during the meeting. More specifically, first, the control unit 24 or the facial expression relationship information generation unit 15 identifies the position of the eyes of the first participant in the conference room by analyzing the captured image of the camera 21, and the control unit. 34 or the facial expression relationship information generation unit 15 identifies the position of the eyes of the second participant in the conference room by analyzing the captured image of the camera 31.

そして、表情関係性情報生成部１５は、各時点において、第１の視線情報が示す第１の参加者の視線方向が、第２の参加者の目に向かっているか否かを判定し、第２の視線情報が示す第２の参加者の視線方向が、第１の参加者の目に向かっているか否かを判定することにより、各時点において、第１の参加者の視線と第２の参加者の視線とが合致しているかを判定する。 Then, the facial expression relationship information generation unit 15 determines at each time point whether or not the line-of-sight direction of the first participant indicated by the first line-of-sight information is toward the eyes of the second participant, and the second By determining whether or not the line-of-sight direction of the second participant indicated by the line-of-sight information of 2 is toward the eyes of the first participant, the line-of-sight of the first participant and the second line of sight are determined at each time point. Determine if the line of sight of the participant matches.

一例として、表情関係性情報生成部１５は、第１の参加者の視線が第２の参加者の目に向かっていると判定した場合に、第１の参加者の視線フラグを１に設定する。また、表情関係性情報生成部１５は、第２の参加者の視線が第１の参加者の目に向かっていると判定した場合に、第２の参加者の視線フラグを１に設定する。そして、表情関係性情報生成部１５は、双方の視線フラグが共に１である場合に、視線が合致していると判定する。 As an example, the facial expression relationship information generation unit 15 sets the line-of-sight flag of the first participant to 1 when it is determined that the line of sight of the first participant is toward the eyes of the second participant. .. Further, the facial expression relationship information generation unit 15 sets the line-of-sight flag of the second participant to 1 when it is determined that the line of sight of the second participant is toward the eyes of the first participant. Then, the facial expression relationship information generation unit 15 determines that the lines of sight match when both line-of-sight flags are 1.

そして、表情関係性情報生成部１５は、会議開始から現時点までに、視線が合致した割合を示す指標として視線合致率を算出する。視線合致率は、例えば、視線が合致した時間を、会議開始から現時点までの時間で除算することによって得られる。 Then, the facial expression relationship information generation unit 15 calculates the line-of-sight matching rate as an index indicating the ratio of the line-of-sight matching from the start of the meeting to the present time. The line-of-sight matching rate is obtained, for example, by dividing the time when the line-of-sight is matched by the time from the start of the conference to the present time.

なお、視線が互いの目に向かっているか否かの判定には、第１の端末装置２０と第２の端末装置３０との相対的な位置関係を示す位置情報を更に参照する構成としてもよい。 In addition, in order to determine whether or not the lines of sight are directed toward each other's eyes, a configuration may be configured in which the position information indicating the relative positional relationship between the first terminal device 20 and the second terminal device 30 is further referred to. ..

また、互いの視線が必ずしも相手の目ではなく、相手の顔又は相手の身体の方向を向いている場合に、視線が合致していると判定する構成としてもよい。 Further, when the eyes of each other are not necessarily the eyes of the other party but are facing the face of the other party or the body of the other party, it may be determined that the lines of sight match.

また、参加者がインターネット等を介して会議する場合には、端末装置の画面を通した参加者間の視線合致率を算出する。より具体的には、一例として、第１の端末装置２０の表示画面に表示される第２の参加者の顔の位置を、当該表示画面上の座標として特定し、特定した座標に対して第１の参加者の視線が向けられている場合に、第１の参加者の視線フラグを１に設定する。同様に、第２の端末装置３０の表示画面に表示される第１の参加者の顔の位置を、当該表示画面上の座標として特定し、特定した座標に対して第２の参加者の視線が向けられている場合に、第２の参加者の視線フラグを１に設定する。 Further, when the participants have a meeting via the Internet or the like, the line-of-sight matching rate between the participants is calculated through the screen of the terminal device. More specifically, as an example, the position of the face of the second participant displayed on the display screen of the first terminal device 20 is specified as the coordinates on the display screen, and the second participant is designated with respect to the specified coordinates. When the line of sight of one participant is directed, the line of sight flag of the first participant is set to 1. Similarly, the position of the face of the first participant displayed on the display screen of the second terminal device 30 is specified as the coordinates on the display screen, and the line of sight of the second participant with respect to the specified coordinates. Is pointed to, the line-of-sight flag of the second participant is set to 1.

＜前のめり率判定＞
また、表情関係性情報生成部１５は、第１の参加者と第２の参加者との前のめり率を算出し、算出した前のめり率を表情関係性情報に含めてもよい。算出された前のめり率は、会議中に他の参加者の発話に興味を示している状態を表す指標とも言える。一例として、表情関係性情報生成部１５は、第１の参加者及び第２の参加者の、それぞれの撮像手段からの距離が、予め設定された一定時間内においてしきい値よりも下回った場合に、第１の参加者及び第２の参加者が前のめり状態であると判定する。 <Judgment of front leaning rate>
Further, the facial expression relationship information generation unit 15 may calculate the front turning rate between the first participant and the second participant, and include the calculated front turning rate in the facial expression relationship information. The calculated pre-flipping rate can be said to be an index showing the state of being interested in the utterances of other participants during the meeting. As an example, in the facial expression relationship information generation unit 15, when the distances of the first participant and the second participant from the respective imaging means fall below the threshold value within a preset fixed time. In addition, it is determined that the first participant and the second participant are in a forward leaning state.

そして、表情関係性情報生成部１５は、会議開始から現時点までの時間において、第１の参加者が前のめりになっている時間の割合を、第１の参加者に関する前のめり率として特定し、第２の参加者が前のめりになっている時間の割合を、第２の参加者に関する前のめり率として特定する。 Then, the facial expression relationship information generation unit 15 specifies the ratio of the time when the first participant is leaning forward in the time from the start of the meeting to the present time as the front leaning rate regarding the first participant, and the second. The percentage of time that a participant is leaning forward is specified as the percentage of time that a second participant is leaning forward.

また、表情関係性情報生成部１５は、第１の参加者及び第２の参加者それぞれの撮像手段からの距離を元に得られる顔画像サイズについて、予め設定された一定時間内の変化から会議中の参加者の姿勢を算出し、表情関係性情報に含めてもよい。算出された参加者の姿勢は、会議中に他の参加者の発話を聞くにふさわしい態度を表す指標とも言える。 Further, the facial expression relationship information generation unit 15 has a meeting from a preset change within a certain period of time regarding the face image size obtained based on the distances from the imaging means of each of the first participant and the second participant. The postures of the participants inside may be calculated and included in the facial expression relationship information. The calculated attitude of the participants can be said to be an index showing the attitude suitable for listening to the utterances of other participants during the meeting.

さらに、表情関係性情報生成部１５は、第１の参加者の姿勢の変化と第２の参加者の第２の表情ベクトルの変化との相関を算出し、その相関を表情関係性情報に含めてもよい。姿勢の変化と表情ベクトルの変化との相関は、一の参加者の姿勢が他の参加者の表情に及ぼす影響を表す指標とも言える。同様に、表情関係性情報生成部１５は、第２の参加者の姿勢の変化と第１の参加者の第１の表情ベクトルの変化との相関を算出し、その相関を表情関係性情報に含めてもよい。 Further, the facial expression relationship information generation unit 15 calculates the correlation between the change in the posture of the first participant and the change in the second facial expression vector of the second participant, and includes the correlation in the facial expression relationship information. You may. The correlation between the change in posture and the change in facial expression vector can be said to be an index showing the influence of the posture of one participant on the facial expressions of other participants. Similarly, the facial expression relationship information generation unit 15 calculates the correlation between the change in the posture of the second participant and the change in the first facial expression vector of the first participant, and uses the correlation as the facial expression relationship information. May be included.

また、表情関係性情報生成部１５は、第１の参加者の姿勢と第２の参加者の姿勢とを参照して、第１の参加者と第２の参加者との姿勢状態の類似度を算出し、算出した類似度を表情関係性情報に含めてもよい。姿勢状態の類似度は、ミラーリング状態を表しており、会議中に他の参加者の発話に興味を示している状態を表す指標とも言える。 Further, the facial expression relationship information generation unit 15 refers to the posture of the first participant and the posture of the second participant, and refers to the degree of similarity between the posture states of the first participant and the second participant. May be calculated and the calculated similarity may be included in the facial expression relationship information. The similarity of the posture states represents the mirroring state, and can be said to be an index showing the state of being interested in the utterances of other participants during the meeting.

なお、表情関係性情報生成部１５は、第１及び第２の参加者の属性を示す参加者情報を更に参照して、表情関係性情報を生成してもよい。参加者の属性を示す参加者情報は、当該参加者の年齢、性別、血液型、性格、出身地、家族関係、役職、勤続年数、転職回数、職務履歴等の少なくとも何れかを含む。また、参加者情報には、当該システムの利用履歴も含まれる。 The facial expression relationship information generation unit 15 may further refer to the participant information indicating the attributes of the first and second participants to generate the facial expression relationship information. Participant information indicating the attributes of a participant includes at least one of the participant's age, gender, blood type, personality, place of origin, family relationship, job title, years of service, number of job changes, job history, and the like. Participant information also includes the usage history of the system.

一例として、表情関係性情報生成部１５は、参加者情報を参照し、当該参加者が特定の表情が出やすいと判断した場合には、当該特定の表情に対応する指標に１より小さい重み係数を乗算する補正を行うことによって当該参加者の表情ベクトルを補正し、補正後の表情ベクトルを用いて表情関係性情報を生成してもよい。
例えば、第１の参加者の属性を示す参加者情報が、当該第１の参加者が内気であることを示している場合、表情関係性情報生成部１５は、「中立（neutral）」の指標に対して重み０．８を乗算し、残り０．２の重みを他の指標に比例配分する等の処理を行うことによって、当該第１の参加者の表情ベクトルを補正し、補正後の表情ベクトルを用いて表情関係性情報を生成する構成としてもよい。 As an example, when the facial expression relationship information generation unit 15 refers to the participant information and determines that the participant is likely to produce a specific facial expression, the index corresponding to the specific facial expression has a weighting coefficient smaller than 1. The facial expression vector of the participant may be corrected by performing a correction by multiplying by, and the facial expression relationship information may be generated using the corrected facial expression vector.
For example, when the participant information indicating the attribute of the first participant indicates that the first participant is shy, the facial expression relationship information generation unit 15 is an index of "neutral". Is multiplied by a weight of 0.8, and the remaining 0.2 weight is proportionally distributed to other indexes to correct the facial expression vector of the first participant, and the corrected facial expression. It may be configured to generate facial expression relationship information using a vector.

情報処理装置１０は、参加者の脈波、脳波等の生体情報と、参加者周囲の温度、湿度、二酸化炭素濃度、照度等の環境情報とをさらに取得する構成とし、表情関係性情報生成部１５は、生体情報及び環境情報を更に参照して、表情関係性情報を生成してもよい。 The information processing device 10 is configured to further acquire biological information such as pulse waves and brain waves of the participant and environmental information such as temperature, humidity, carbon dioxide concentration, and illuminance around the participant, and is a facial expression relationship information generation unit. Reference numeral 15 may generate facial expression relationship information by further referring to biological information and environmental information.

一例として、表情関係性情報生成部１５は、第１の参加者の脈波又は呼吸から判定した参加者のストレス状態と、その直前又はその時点における第２の参加者の表情を表現する第２の指標を参照し、第１の参加者にストレスを与える第２の参加者の表情を推定する。そして、表情関係性情報生成部１５は、推定された第２の参加者の表情を、第１の参加者に対するＮＧ表情と認定し、その情報を表情関係性情報に含めてもよい。一の参加者の他の参加者に対するＮＧ表情は、一の参加者の表情が他の参加者のストレス状態に及ぼす影響を表す指標とも言える。同様に、表情関係性情報生成部１５は、第２の参加者にストレスを与える第１の参加者の表情を推定し、第２の参加者に対するＮＧ表情を認定してもよい。 As an example, the facial expression relationship information generation unit 15 expresses the stress state of the participant determined from the pulse wave or respiration of the first participant and the facial expression of the second participant immediately before or at that time. With reference to the index of, the facial expression of the second participant who stresses the first participant is estimated. Then, the facial expression relationship information generation unit 15 may recognize the estimated facial expression of the second participant as an NG facial expression for the first participant, and may include the information in the facial expression relationship information. The NG facial expression of one participant with respect to the other participants can be said to be an index showing the influence of the facial expression of one participant on the stress state of the other participants. Similarly, the facial expression relationship information generation unit 15 may estimate the facial expression of the first participant who stresses the second participant, and may recognize the NG facial expression for the second participant.

また、表情関係性情報生成部１５は、予め定められた一定期間内の、参加者の周囲の環境情報の変化と、第１の参加者の第１の表情ベクトル及び第２の参加者の第２の表情ベクトルの平均値の変化との相関を算出し、その相関を表情関係性情報に含めてもよい。環境情報の変化と表情ベクトルの平均値の変化との相関は、参加者の周囲の環境が参加者間のコミュニケーション状態に及ぼす影響を表す指標とも言える。 Further, the facial expression relationship information generation unit 15 changes the environmental information around the participant within a predetermined fixed period, the first facial expression vector of the first participant, and the second participant's second. A correlation with a change in the average value of the facial expression vectors of 2 may be calculated, and the correlation may be included in the facial expression relationship information. The correlation between the change in environmental information and the change in the average value of the facial expression vector can be said to be an index showing the influence of the environment around the participants on the communication state between the participants.

＜対話管理処理＞
表情関係性情報生成部１５は、一例として、通信部１１を介して、第１の参加者及び第２の参加者それぞれのユーザＩＤと、当該ユーザＩＤが表す参加者が会議開始した時刻及び終了した時刻を表すタイムスタンプとを参照し、対話管理処理を行う。表情関係性情報生成部１５は、一の参加者に関する区間時系列数値データのある時点のデータについて、その時点に対話している他の参加者のユーザＩＤを抽出し、どの参加者と対話中に得られたデータであるかを判定して、結果を表情関係性情報に含めてもよい。 <Dialogue management process>
As an example, the facial expression relationship information generation unit 15 has the user IDs of the first participant and the second participant, and the time and end of the meeting represented by the user ID, via the communication unit 11. Dialog management processing is performed with reference to the time stamp indicating the time stamp. The facial expression relationship information generation unit 15 extracts the user IDs of other participants who are interacting with each other from the data at a certain time point in the section time series numerical data relating to one participant, and is interacting with which participant. It may be determined whether the data is obtained in the above, and the result may be included in the facial expression relationship information.

（発話関係性情報生成部１６）
発話関係性情報生成部１６は、第１の発話情報と第２の発話情報とを参照して、第１の参加者と第２の参加者との発話に関する関係性を示す発話関係性情報を生成する。発話関係性情報生成部１６は、音声情報取得部１４から第１の発話情報及び第２の発話情報を取得し、これらの発話情報を元に、会議中の参加者間の感情の一致度を参加者双方の発話から評価することで、参加者間のコミュニケーション状態を評価する。 (Utterance relationship information generation unit 16)
The utterance relationship information generation unit 16 refers to the first utterance information and the second utterance information, and provides the utterance relationship information indicating the relationship between the first participant and the second participant regarding the utterance. Generate. The utterance relationship information generation unit 16 acquires the first utterance information and the second utterance information from the voice information acquisition unit 14, and based on these utterance information, the degree of matching of emotions among the participants during the meeting is determined. By evaluating from the utterances of both participants, the communication status between the participants is evaluated.

発話関係性情報生成部１６が音声情報取得部１４から取得するそれぞれの発話情報は、区間時系列テキストデータを元に算出されたものであり、つまり、各参加者のリアルタイム又は経時的な発話に関する情報に基づいて算出されたものである。発話関係性情報生成部１６は、各参加者のリアルタイム又は経時的な発話情報を元に、発話関係性情報を生成するので、生成された発話関係性情報は、参加者間のリアルタイム又は経時的な発話に関する関係性を表している。 Each utterance information acquired by the utterance relationship information generation unit 16 from the voice information acquisition unit 14 is calculated based on the section time-series text data, that is, regarding real-time or temporal utterances of each participant. It was calculated based on the information. Since the utterance relationship information generation unit 16 generates utterance relationship information based on the real-time or temporal utterance information of each participant, the generated utterance relationship information is real-time or temporal between the participants. It shows the relationship between various utterances.

＜発話比率判定＞
発話関係性情報生成部１６は、第１の発話情報が示す第１の参加者の発話時間と、第２の発話情報が示す第２の参加者の発話時間との関係を示す発話時間関係性情報を生成し、生成した発話時間関係性情報を、発話関係性情報に含めてもよい。 <Utterance ratio judgment>
The utterance relationship information generation unit 16 shows the utterance time relationship showing the relationship between the utterance time of the first participant indicated by the first utterance information and the utterance time of the second participant indicated by the second utterance information. Information may be generated, and the generated utterance time relationship information may be included in the utterance relationship information.

一例として、発話関係性情報生成部１６は予め定められた一定時間内の、第１の参加者の発話時間と第２の参加者の発話時間との発話比率を算出し、発話関係性情報に含める。算出された発話比率は、参加者間の関係の対等性を表す指標とも言える。 As an example, the utterance relationship information generation unit 16 calculates the utterance ratio between the utterance time of the first participant and the utterance time of the second participant within a predetermined fixed time, and uses it as the utterance relationship information. include. The calculated utterance ratio can be said to be an index showing the equality of the relationships between the participants.

＜発話頻度判定＞
また、発話関係性情報生成部１６は、第１の発話情報及び第２の発話情報の少なくとも何れかに、特定のカテゴリーに含まれる発話内容が含まれているか否かを判定し、判定した結果に応じた情報を前記発話関係性情報に含めてもよい。 <Utterance frequency judgment>
Further, the utterance relationship information generation unit 16 determines whether or not at least one of the first utterance information and the second utterance information includes the utterance content included in a specific category, and the result of the determination. The information corresponding to the above may be included in the utterance relationship information.

発話内容に含まれる特定のカテゴリーの例には、オープンクエスチョン、行動促しワード（それで？、なるほど、確かに）、オウム返し、発話の遮り、発話の被り、否定ワード（でも、だけど）が含まれる。一例として、発話関係性情報生成部１６は、予め定められた一定時間内における、このような特定のカテゴリーに含まれる発話内容が発話された頻度を算出し、算出した頻度に関する情報を発話関係性情報に含める。 Examples of specific categories in the utterance include open questions, action-prompting words (so? Well, sure), parrots, utterance interruptions, utterance sufferings, and negative words (but, though). .. As an example, the utterance relationship information generation unit 16 calculates the frequency at which the utterance content included in such a specific category is uttered within a predetermined fixed time, and the information regarding the calculated frequency is used for the utterance relationship. Include in information.

具体的には、一例として、発話内容に含まれる特定カテゴリーをオープンクエスチョンと設定し、発話関係性情報生成部１６は、第１の参加者の区間時系列テキストデータから、一定時間内に含まれるオープンクエスチョンを表すテキストデータを抽出する。そして、発話関係性情報生成部１６は、抽出されたテキストデータの単語数を上記一定時間内の全テキストデータの単語数で除算することによって、オープンクエスチョンが発話された頻度をオープン質問率として算出する。同様に、発話関係性情報生成部１６は、第２の参加者の区間時系列テキストデータから、オープン質問率を算出する。そして、発話関係性情報生成部１６は、第１の参加者のオープン質問率と第２の参加者のオープン質問率とを比較して、オープン質問比率を算出し、発話関係性情報に含める。算出されたオープン質問比率は、参加者間の関係の対等性を表す指標とも言える。また、発話関係性情報生成部１６は、第１の参加者のオープン質問率及び第２の参加者のオープン質問率を、発話関係性情報に含めてもよい。 Specifically, as an example, a specific category included in the utterance content is set as an open question, and the utterance relationship information generation unit 16 is included within a certain period of time from the section time-series text data of the first participant. Extract text data that represents an open question. Then, the utterance relationship information generation unit 16 calculates the frequency at which the open question is spoken as the open question rate by dividing the number of words in the extracted text data by the number of words in all the text data within the above-mentioned fixed time. do. Similarly, the utterance relationship information generation unit 16 calculates the open question rate from the section time-series text data of the second participant. Then, the utterance relationship information generation unit 16 compares the open question rate of the first participant with the open question rate of the second participant, calculates the open question ratio, and includes it in the utterance relationship information. The calculated open question ratio can be said to be an index showing the equality of relationships between participants. Further, the utterance relationship information generation unit 16 may include the open question rate of the first participant and the open question rate of the second participant in the utterance relationship information.

同様に、発話内容に含まれる特定カテゴリーを行動促しワードと設定し、発話関係性情報生成部１６は、第１の参加者の区間時系列テキストデータから、一定時間内に含まれる行動促しワードを表すテキストデータを抽出する。そして、発話関係性情報生成部１６は、抽出されたテキストデータの単語数を上記一定時間内の全テキストデータの単語数で除算することによって、行動促しワードが発話された頻度を促し質問率として算出する。
同様に、発話関係性情報生成部１６は、第２の参加者の区間時系列テキストデータから、促し質問率を算出する。そして、発話関係性情報生成部１６は、第１の参加者の促し質問率と第２の参加者の促し質問率とを比較して、促し質問比率を算出し、発話関係性情報に含める。算出された促し質問比率は、参加者間の関係の対等性を表す指標とも言える。また、発話関係性情報生成部１６は、第１の参加者の促し質問率及び第２の参加者の促し質問率を、発話関係性情報に含めてもよい。 Similarly, a specific category included in the utterance content is set as an action urging word, and the utterance relationship information generation unit 16 sets the action urging word included within a certain period of time from the section time-series text data of the first participant. Extract the text data to represent. Then, the utterance relationship information generation unit 16 divides the number of words in the extracted text data by the number of words in all the text data within the above-mentioned fixed time to promote the action and the frequency at which the words are spoken as a question rate. calculate.
Similarly, the utterance relationship information generation unit 16 calculates the prompting question rate from the section time-series text data of the second participant. Then, the utterance relationship information generation unit 16 compares the urged question rate of the first participant with the urged question rate of the second participant, calculates the urged question ratio, and includes it in the utterance relationship information. The calculated prompting question ratio can be said to be an index showing the equality of the relationships between the participants. Further, the utterance relationship information generation unit 16 may include the urged question rate of the first participant and the urged question rate of the second participant in the utterance relationship information.

＜単語に基づく評価＞
また、発話関係性情報生成部１６は、第１の発話情報及び第２の発話情報の少なくとも何れかから、所定時間内において相対的に出現頻度の高い単語を抽出し、抽出した単語を発話関係性情報に含めてもよい。 <Word-based evaluation>
Further, the utterance relationship information generation unit 16 extracts words having a relatively high frequency of appearance within a predetermined time from at least one of the first utterance information and the second utterance information, and the extracted words are utterance-related. It may be included in sexual information.

一例として、発話関係性情報生成部１６は、第１の参加者及び第２の参加者のそれぞれの区間時系列テキストデータから、予め定められた一定時間内に含まれる各単語の出現数を参加者毎に算出して順位付けし、相対的に出現頻度の高い単語を上位から複数抽出する。そして、発話関係性情報生成部１６は、抽出した出現頻度の上位の単語を頻出単語として発話関係性情報に含める。また、発話関係性情報生成部１６は、第１の参加者及び第２の参加者のそれぞれの頻出単語及びその順位が一致しているかを判定し、その判定結果を発話関係性情報に含めてもよい。 As an example, the utterance relationship information generation unit 16 participates in the number of appearances of each word included in a predetermined fixed time from the section time-series text data of each of the first participant and the second participant. It is calculated and ranked for each person, and multiple words with relatively high frequency of appearance are extracted from the top. Then, the utterance relationship information generation unit 16 includes the extracted words having a higher frequency of appearance as frequently occurring words in the utterance relationship information. Further, the utterance relationship information generation unit 16 determines whether the frequently-used words of the first participant and the second participant and their ranks match, and includes the determination result in the utterance relationship information. May be good.

また、発話関係性情報生成部１６は、第１の参加者及び第２の参加者のそれぞれの区間時系列テキストデータから、予め定められた一定時間内に含まれる単語を参加者毎に抽出し、抽出された単語の一致率を算出して、発話関係性情報に含めてもよい。単語の一致率は、予め定められた一定時間内の区間時系列テキストデータに含まれる全単語中における、第１の参加者と第２の参加者とで一致した単語の比率として算出することができる。算出された単語の一致率は、オウム返しができているかの指標とも言える。 Further, the utterance relationship information generation unit 16 extracts words included within a predetermined fixed time for each participant from the section time-series text data of each of the first participant and the second participant. , The matching rate of the extracted words may be calculated and included in the utterance relationship information. The word match rate can be calculated as the ratio of words that match between the first participant and the second participant among all the words contained in the time-series text data in a predetermined fixed time period. can. The calculated word match rate can be said to be an index of whether or not the parrot is returned.

また、発話関係性情報生成部１６は、第１の参加者及び第２の参加者のそれぞれの区間時系列テキストデータから発話タイミングを抽出し、発話タイミングの時間的なオーバーラップを算出してもよい。そして、発話関係性情報生成部１６は、予め定められた一定時間内のオーバーラップの回数をオーバーラップ頻度として算出し、発話関係性情報に含めてもよい。算出したオーバーラップ頻度は、他の参加者の発話をさえぎる頻度を表す指標とも言える。 Further, the utterance relationship information generation unit 16 may extract the utterance timing from the section time-series text data of each of the first participant and the second participant, and calculate the temporal overlap of the utterance timing. good. Then, the utterance relationship information generation unit 16 may calculate the number of overlaps within a predetermined fixed time as the overlap frequency and include it in the utterance relationship information. The calculated overlap frequency can be said to be an index showing the frequency of interrupting the utterances of other participants.

さらに、発話関係性情報生成部１６は、第１及び第２の参加者の属性を示す参加者情報を更に参照して、発話関係性情報を生成してもよい。参加者の属性を示す参加者情報は、当該参加者の年齢、性別、血液型、性格、出身地、家族関係、役職、勤続年数、転職回数、職務履歴等の少なくとも何れかを含む。また、参加者情報には、当該システムの利用履歴も含まれる。 Further, the utterance relationship information generation unit 16 may further refer to the participant information indicating the attributes of the first and second participants to generate the utterance relationship information. Participant information indicating the attributes of a participant includes at least one of the participant's age, gender, blood type, personality, place of origin, family relationship, job title, years of service, number of job changes, job history, and the like. Participant information also includes the usage history of the system.

情報処理装置１０は、参加者の脈波、脳波等の生体情報と、参加者周囲の温度、湿度、二酸化炭素濃度、照度等の環境情報とをさらに取得する構成とし、発話関係性情報生成部１６は、生体情報及び環境情報を更に参照して、発話関係性情報を生成してもよい。 The information processing device 10 is configured to further acquire biological information such as pulse waves and brain waves of the participant and environmental information such as temperature, humidity, carbon dioxide concentration, and illuminance around the participant, and is a speech-relationship information generation unit. Reference numeral 16 may generate speech-related information by further referring to biological information and environmental information.

また、一例として、発話関係性情報生成部１６は、第１の参加者の脈波又は呼吸から判定した参加者のストレス状態と、その直前又はその時点における第２の参加者の区間時系列テキストデータとを参照し、第１の参加者にストレスを与える第２の参加者のテキストデータを推定する。そして、発話関係性情報生成部１６は、推定された第２の参加者のテキストデータを、第１の参加者に対するＮＧワードと認定し、その情報を発話関係性情報に含めてもよい。一の参加者の他の参加者に対するＮＧワードは、一の参加者の発言が他の参加者のストレス状態に及ぼす影響を表す指標とも言える。同様に、発話関係性情報生成部１６は、第２の参加者にストレスを与える第１の参加者のテキストデータを推定し、第２の参加者に対するＮＧワードを認定してもよい。 Further, as an example, the utterance relationship information generation unit 16 determines the stress state of the participant determined from the pulse wave or respiration of the first participant, and the section time series text of the second participant immediately before or at that time. With reference to the data, the text data of the second participant that stresses the first participant is estimated. Then, the utterance relationship information generation unit 16 may recognize the estimated text data of the second participant as an NG word for the first participant, and include the information in the utterance relationship information. The NG word for one participant to another can be said to be an index showing the influence of one participant's remark on the stress state of another participant. Similarly, the speech relationship information generation unit 16 may estimate the text data of the first participant that stresses the second participant and certify the NG word for the second participant.

また、一例として、発話関係性情報生成部１６は、第１の参加者の脳波から判定した参加者の思考の活性度と、その直前又はその時点における第２の参加者の区間時系列テキストデータとを参照し、第１の参加者の思考を活性化させる第２の参加者のテキストデータを推定する。そして、発話関係性情報生成部１６は、推定された第２の参加者のテキストデータを、第１の参加者に対する重要ワードと認定し、その情報を発話関係性情報に含めてもよい。一の参加者の他の参加者に対する重要ワードは、一の参加者の発言が他の参加者の思考の活性化に及ぼす影響を表す指標とも言える。同様に、発話関係性情報生成部１６は、第２の参加者の思考を活性化する第１の参加者のテキストデータを推定し、第２の参加者に対する重要ワードを認定してもよい。 Further, as an example, the utterance relationship information generation unit 16 determines the activity of the participant's thinking determined from the brain waves of the first participant, and the section time-series text data of the second participant immediately before or at that time. And, the text data of the second participant that activates the thinking of the first participant is estimated. Then, the utterance relationship information generation unit 16 may recognize the estimated text data of the second participant as an important word for the first participant, and may include the information in the utterance relationship information. An important word for one participant to another can be said to be an index showing the influence of one participant's remarks on the activation of other participants' thoughts. Similarly, the utterance relationship information generation unit 16 may estimate the text data of the first participant that activates the thinking of the second participant, and may recognize the important word for the second participant.

また、発話関係性情報生成部１６は、参加者間の声のトーンの一致度、発話スピードの一致度、音量の一致度等を算出してもよい。 Further, the utterance relationship information generation unit 16 may calculate the degree of matching of voice tones between participants, the degree of matching of utterance speed, the degree of matching of volume, and the like.

さらに、発話関係性情報生成部１６は、蓄積された区間時系列テキストデータから、第１の参加者及び第２の参加者それぞれの発話を表すログを抽出し、その形態素解析データを取得して、過去の発話中の頻出単語をリストアップし、発話関係性情報に含めてもよい。過去の発話中の頻出単語は、会議中の参加者双方に提示することで、会議のテーマ決定を支援するために利用され得る。 Further, the utterance relationship information generation unit 16 extracts logs representing the utterances of the first participant and the second participant from the accumulated section time-series text data, and acquires the morphological analysis data thereof. , Frequently-used words in past utterances may be listed and included in the utterance relationship information. Frequently spoken words in the past can be used to assist in meeting themes by presenting them to both participants during the meeting.

＜対話管理処理＞
発話関係性情報生成部１６は、一例として、通信部１１を介して、第１の参加者及び第２の参加者それぞれのユーザＩＤと、当該ユーザＩＤが表す参加者が会議開始した時刻及び終了した時刻を表すタイムスタンプとを参照し、対話管理処理を行う。発話関係性情報生成部１６は、一の参加者に関する区間時系列テキストデータのある時点のデータについて、その時点に対話している他の参加者のユーザＩＤを抽出し、どの参加者と対話中に得られたデータであるかを判定して、結果を発話関係性情報に含めてもよい。 <Dialogue management process>
As an example, the utterance relationship information generation unit 16 has the user IDs of the first participant and the second participant, and the time and end of the conference represented by the user ID, via the communication unit 11. Dialog management processing is performed with reference to the time stamp indicating the time stamp. The utterance relationship information generation unit 16 extracts the user IDs of other participants who are interacting with each other from the data at a certain point in the section time series text data relating to one participant, and is interacting with which participant. It may be determined whether the data is obtained in the above, and the result may be included in the utterance relationship information.

（関係性情報生成部１７）
関係性情報生成部１７は、表情関係性情報と前記発話関係性情報とを参照して前記第１の参加者と前記第２の参加者との関係を示すリアルタイム又は経時的な情報である関係性情報を生成する。会議中の参加者双方の表情及び発話の両方を評価することで、参加者間のコミュニケーション状態をより詳細に評価することができる。また、関係性情報生成部１７は、参加者間のリアルタイム又は経時的な表情情報及び発話情報を元に関係性情報を生成するので、参加者間のリアルタイム又は経時的なコミュニケーション状態を評価することができる。 (Relationship information generation unit 17)
The relationship information generation unit 17 refers to the facial expression relationship information and the utterance relationship information, and is a relationship that is real-time or temporal information indicating the relationship between the first participant and the second participant. Generate sexual information. By evaluating both the facial expressions and utterances of both participants during the meeting, the communication status between the participants can be evaluated in more detail. Further, since the relationship information generation unit 17 generates relationship information based on real-time or temporal facial expression information and utterance information between participants, it is necessary to evaluate the real-time or temporal communication state between participants. Can be done.

関係性情報生成部１７は、第１の参加者及び第２の参加者の少なくとも何れかに提示する提示情報を生成してもよい。提示情報には、表情関係性情報と発話関係性情報とに基づき総合的に評価した参加者双方の感情の一致度等が含まれていてもよい（例えば、視線合致率が高く、発話比率が対等であれば感情の一致度を高くする等）。 The relationship information generation unit 17 may generate presentation information to be presented to at least one of the first participant and the second participant. The presented information may include the degree of emotional matching between the participants, which is comprehensively evaluated based on the facial expression relationship information and the utterance relationship information (for example, the line-of-sight matching rate is high and the utterance ratio is high). If they are equal, increase the degree of emotional concordance, etc.).

関係性情報生成部１７が生成した提示情報を参加者に提示することで、参加者間の関係性を参加者にフィードバックすることができる。提示情報をリアルタイムで参加者に提示すれば、会話中にリアルタイムで関係性を確認することができるので、リアルタイムでコミュニケーションの改善を促すことも可能である。 By presenting the presentation information generated by the relationship information generation unit 17 to the participants, the relationship between the participants can be fed back to the participants. By presenting the presented information to the participants in real time, the relationship can be confirmed in real time during the conversation, so it is possible to promote the improvement of communication in real time.

提示情報は、第１の参加者及び第２の参加者の双方に提示するものであってもよいし、いずれか一方に提示するものであってもよい。また、関係性情報は、第１の参加者及び第２の参加者に同じ内容を提示するものであってもよいし、異なる内容を提示するものであってもよい。第１の参加者及び第２の参加者に同じ内容を提示する関係性情報を生成することで、参加者間のフラットな関係性の構築が期待できる。また、提示情報を参加者自身が選択できるようになっていてもよいし、ルール又は参加者間の合意により提示される提示情報が変更されてもよい。 The presented information may be presented to both the first participant and the second participant, or may be presented to either one. Further, the relationship information may present the same content to the first participant and the second participant, or may present different content. By generating relationship information that presents the same content to the first participant and the second participant, it can be expected to build a flat relationship between the participants. In addition, the presented information may be selectable by the participants themselves, or the presented information may be changed by a rule or an agreement between the participants.

関係性情報には、第１の参加者の発話時間と、第２の参加者の発話時間との割合を示す情報、及び、第１の参加者の視線方向と、第２の参加者の視線方向との合致率の経時変化に関する情報が含まれていてもよい。また、関係性情報には、表情一致率又は表情不一致率の経時変化、前のめり率、発話内容のテキスト、頻出単語等に関する情報が含まれていてもよい。さらに、関係性情報には、参加者のＩＤ、参加者自身の顔画像、他の参加者の表情を表すアバター画像、発話内容に基づき蓄積データから抽出した推奨議題又は推奨ワードの表示等が含まれていてもよい。 The relationship information includes information indicating the ratio between the utterance time of the first participant and the utterance time of the second participant, the line-of-sight direction of the first participant, and the line-of-sight of the second participant. It may contain information about the time course of the match rate with the direction. In addition, the relationship information may include information on changes in the facial expression match rate or the facial expression mismatch rate over time, the forward lean rate, the text of the utterance content, frequently-used words, and the like. Furthermore, the relationship information includes the participant's ID, the participant's own facial image, the avatar image showing the facial expressions of other participants, the display of the recommended agenda or recommended words extracted from the accumulated data based on the utterance content, and the like. It may be.

また、提示情報に、会議参加者のコミュニケーションスキルを向上させるための評価結果を含めてもよい。一例として、表情一致率と共に、表情一致率を高めることで反射的傾聴スキルが向上させることを促す情報を提示したり、視線合致率と共に、視線合致率を高めることでコミュニケーションに適した姿勢、態度を取るように促す情報を提示したりしてもよい。また、推奨するワードや質問内容を提示して、対話レベル及び質問レベルの控除を促してもよい。 In addition, the presented information may include evaluation results for improving the communication skills of the conference participants. As an example, along with the facial expression matching rate, information that encourages improvement of reflexive listening skills by increasing the facial expression matching rate can be presented, and by increasing the line-of-sight matching rate, the posture and attitude suitable for communication can be achieved. You may also present information that encourages you to take. In addition, recommended words and question contents may be presented to encourage dialogue-level and question-level deductions.

提示情報を提示する方法は、具体的には、一例として、会議参加者の表示部（表示部２３及び表示部３３のそれぞれ）に表示する方法、会議参加者全員が視認できる共通の表示部に表示する方法、ネットワーク配信等により会議参加者以外にも提示する方法、腕時計型デバイスのようなウェアラブルデバイスからの物理的な作用（振動、電気刺激等）により提示する方法、環境設備（証明、空調、音楽等）からの物理的な作用（議論が白熱した場合に部屋を赤く照らす等）により提示する方法、感情を表す指標に対応した画像イメージ（怒りを表す火山の噴火等）により提示する方法、感情を表す指標に対応したアバターの表情により提示する方法等が挙げられる。 Specifically, as an example, the method of presenting the presented information is a method of displaying on the display unit of the conference participants (each of the display unit 23 and the display unit 33), and a common display unit that can be visually recognized by all the conference participants. Display method, presentation method to other than conference participants by network distribution, presentation method by physical action (vibration, electrical stimulation, etc.) from wearable devices such as watch-type devices, environmental equipment (certification, air conditioning, etc.) , Music, etc.) Physical action (such as illuminating the room red when the discussion heats up), Image image corresponding to an emotional index (such as a volcanic eruption that expresses anger) , A method of presenting by the facial expression of the avatar corresponding to the index expressing emotions and the like.

図４及び５を参照して、表示部２３及び表示部３３の少なくとも一方に提示情報を提示される画面例を説明する。図４は、本発明の一実施形態に係る情報処理装置を含む情報処理システムが提示する情報の一例を示す図であり、図５は、本発明の一実施形態に係る情報処理装置を含む情報処理システムが提示する情報の他の例を示す図である。 A screen example in which presentation information is presented to at least one of the display unit 23 and the display unit 33 will be described with reference to FIGS. 4 and 5. FIG. 4 is a diagram showing an example of information presented by an information processing system including an information processing device according to an embodiment of the present invention, and FIG. 5 is a diagram showing information including an information processing device according to an embodiment of the present invention. It is a figure which shows other example of the information which a processing system presents.

図４に示すように、画面４００において、領域４０１に会議参加者のユーザＩＤを表示し、領域４０２に会議参加者の顔画像を表示して、提示情報を提示する対象を特定する。領域４０３に発話比率をトーク比率として、例えば円グラフで表示し、領域４０４に対話中の参加者の感情を表す指標に対応したアバターの表情を表示し、また、領域４０５に表情一致率の経時変化をグラフで表示することで、会議中にコミュニケーションの状態の現状を瞬時に確認できるようにする。また、領域４０６に、発話内容を表すテキストをＴａｌｋＳｔｒｅａｍとして表示し、また、会議中に推奨される会話テーマや単語を表示する。領域４０５に表示する表情一致率は、表情関係性情報について、過去から現在までの経時的な情報に基づき生成された提示情報の例である。領域４０４に表示するアバターの表情は、表情関係性情報について、リアルタイムの情報に基づき生成された提示情報の例である。 As shown in FIG. 4, on the screen 400, the user ID of the conference participant is displayed in the area 401, the face image of the conference participant is displayed in the area 402, and the target to be presented with the presentation information is specified. The utterance ratio is displayed as a talk ratio in the area 403, for example, in a pie chart, the facial expression of the avatar corresponding to the index representing the emotion of the participant during the dialogue is displayed in the area 404, and the facial expression matching rate with time is displayed in the area 405. By displaying the changes in a graph, it is possible to instantly check the current state of communication during the meeting. Further, in the area 406, a text representing the utterance content is displayed as a Talk Stream, and a conversation theme or a word recommended during the meeting is displayed. The facial expression matching rate displayed in the area 405 is an example of the presentation information generated based on the time-dependent information from the past to the present with respect to the facial expression relationship information. The facial expression of the avatar displayed in the area 404 is an example of the presentation information generated based on the real-time information regarding the facial expression relationship information.

また、図５に示すように、画面５００において、画面４００と同様に、領域５０１にユーザＩＤを表示し、領域５０２に顔画像を表示し、領域５０３にトーク比率を表示し、領域５０４にアバター表情を表示し、領域５０５に発話内容及び推奨テーマ等を表示すると共に、領域５０５に、表情一致率ではなく視線合致率を表示してもよい。 Further, as shown in FIG. 5, on the screen 500, as in the screen 400, the user ID is displayed in the area 501, the facial expression is displayed in the area 502, the talk ratio is displayed in the area 503, and the avatar is displayed in the area 504. The facial expression may be displayed, the utterance content, the recommended theme, and the like may be displayed in the area 505, and the line-of-sight matching rate may be displayed in the area 505 instead of the facial expression matching rate.

＜付記事項１＞
端末装置の制御部における処理の一部又は全部を、情報処理装置１０の備える演算部１２において行う構成としてもよい。例えば、演算部１２が、通信部１１を介して、カメラ２１の撮像画像を取得し、表情情報取得部１３において、第１の参加者の表情に関する第１の表情情報、及び、第２の参加者の表情に関する第２の表情情報を生成する構成としてもよい。 <Appendix 1>
A part or all of the processing in the control unit of the terminal device may be performed in the arithmetic unit 12 provided in the information processing device 10. For example, the calculation unit 12 acquires the captured image of the camera 21 via the communication unit 11, and the facial expression information acquisition unit 13 obtains the first facial expression information regarding the facial expressions of the first participant and the second participation. It may be configured to generate a second facial expression information regarding a person's facial expression.

＜付記事項２＞
また、上記の例では、第１の参加者及び第２の参加者の２名による会議を例にしたが、本実施形態はこれに限定されるものではない。当然、Ｎ名（Ｎは３以上）による会議に対しても本明細書に記載の発明を適用することができる。その場合、Ｎ名中の任意の２人のペアに対して、本明細書に記載の構成を個別に適用することができる。例えば、３名（Ａ、Ｂ、Ｃ）による会議に対しては、（Ａ、Ｂ）（Ａ、Ｃ）（Ｂ、Ｃ）の３組に対して本明細書に記載の発明を個別に適用することができる。
このように、本実施形態に記載の発明は、Ｎ人の状態を表すデータ，Ｎ人の状態の履歴データおよびＮ人の環境情報を用いて、当該Ｎ人の内の一部又は全部の参加者間の関係を示す「関係性情報」を生成するものであると表現することもできる。
＜付記事項３＞
また、上記の例では、第１の参加者及び第２の参加者が共に人間である場合を例に挙げたが、これは本実施形態を限定するものではない。
例えば、第２の参加者は、人間ではなく、予め設定されたアバターやＢＯＴのようにコンピュータによって表現される疑似的な人間であってもよい。このような構成の場合、第２の端末装置は、必須ではなく、表情情報取得部１３及び音声情報取得部１４は、予め作成された当該ＢＯＴが表す表情及び発話内容を、第２の参加者の表情情報及び音声情報として取得する構成とすればよい。
なお、ＢＯＴが表す表情及び発話内容は、会議前に事前に作成されたデータを用いてもよいし、会議中の第１の参加者の表情や発話に応じて適応的に変更される構成としてもよい。 <Appendix 2>
Further, in the above example, a meeting of two participants, a first participant and a second participant, is taken as an example, but the present embodiment is not limited to this. As a matter of course, the invention described in the present specification can be applied to a meeting with N names (N is 3 or more). In that case, the configurations described herein can be applied individually to any two pairs of N names. For example, for a meeting of three people (A, B, C), the inventions described herein are individually applied to the three sets (A, B) (A, C) (B, C). can do.
As described above, the invention described in the present embodiment uses data representing the state of N persons, history data of the state of N persons, and environmental information of N persons, and participates in a part or all of the N persons. It can also be expressed as generating "relationship information" indicating the relationship between persons.
<Appendix 3>
Further, in the above example, the case where the first participant and the second participant are both human beings is taken as an example, but this does not limit the present embodiment.
For example, the second participant may not be a human but a pseudo-human represented by a computer such as a preset avatar or BOT. In the case of such a configuration, the second terminal device is not indispensable, and the facial expression information acquisition unit 13 and the voice information acquisition unit 14 input the facial expression and the utterance content represented by the BOT created in advance to the second participant. It may be configured to be acquired as facial expression information and voice information.
The facial expressions and utterances represented by the BOT may use data created in advance before the meeting, or may be adaptively changed according to the facial expressions and utterances of the first participant during the meeting. May be good.

〔ソフトウェアによる実現例〕
情報処理装置１の制御ブロック（特に演算部１２）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Example of implementation by software]
The control block (particularly, the arithmetic unit 12) of the information processing apparatus 1 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software.

後者の場合、情報処理装置１は、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば１つ以上のプロセッサを備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ＲＯＭ（Read Only Memory）等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムを展開するＲＡＭ（Random Access Memory）などをさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the information processing apparatus 1 includes a computer that executes instructions of a program that is software that realizes each function. The computer includes, for example, one or more processors and a computer-readable recording medium that stores the program. Then, in the computer, the processor reads the program from the recording medium and executes the program, thereby achieving the object of the present invention. As the processor, for example, a CPU (Central Processing Unit) can be used. As the recording medium, a "non-temporary tangible medium", for example, a ROM (Read Only Memory) or the like, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. Further, a RAM (Random Access Memory) for expanding the above program may be further provided. Further, the program may be supplied to the computer via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program. It should be noted that one aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above program is embodied by electronic transmission.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments obtained by appropriately combining the technical means disclosed in the different embodiments. Is also included in the technical scope of the present invention.

１００情報処理システム
１０情報処理装置
２０第１の端末装置
３０第２の端末装置
１３表情情報取得部
１４音声情報取得部
１５表情関係性情報生成部
１６発話関係性情報生成部
１７関係性情報生成部 100 Information processing system 10 Information processing device 20 First terminal device 30 Second terminal device 13 Expression information acquisition unit 14 Voice information acquisition unit 15 Expression relationship information generation unit 16 Speech relationship information generation unit 17 Relationship information generation unit

Claims

A facial expression information acquisition unit that acquires the first facial expression information regarding the facial expression of the first participant among the plurality of participants and the second facial expression information regarding the facial expression of the second participant among the plurality of participants. ,
A voice information acquisition unit that acquires the first utterance information regarding the utterance of the first participant and the second utterance information regarding the utterance of the second participant among the plurality of participants.
A facial expression relationship that generates facial expression relationship information indicating a relationship between the first participant and the second participant with reference to the first facial expression information and the second facial expression information. Information generator and
The utterance relationship that generates the utterance relationship information showing the relationship between the first participant and the second participant by referring to the first utterance information and the second utterance information. Information generator and
A relationship information generation unit that generates relationship information, which is information indicating the relationship between the first participant and the second participant, with reference to the facial expression relationship information and the utterance relationship information.
An information processing device characterized by being equipped with.

The information processing apparatus according to claim 1, wherein the relationship information is real-time or temporal information indicating the relationship between the first participant and the second participant.

The first facial expression information includes a plurality of first indexes expressing the facial expressions of the first participant.
The second facial expression information includes a plurality of second indexes expressing the facial expressions of the second participant.
The facial expression relationship information generation unit is characterized in that it generates facial expression difference information regarding the difference between the first index and the second index, and includes the generated facial expression difference information in the facial expression relationship information. The information processing apparatus according to claim 1 or 2.

The first facial expression information includes the first line-of-sight information regarding the line-of-sight direction of the first participant.
The second facial expression information includes the second line-of-sight information regarding the line-of-sight direction of the second participant.
The facial expression relationship information generation unit generates line-of-sight relationship information with reference to the first line-of-sight information and the second line-of-sight information, and includes the generated line-of-sight relationship information in the facial expression relationship information. The information processing apparatus according to any one of claims 1 to 3, wherein the information processing apparatus is characterized by the above.

The utterance relationship information generation unit
The utterance time relationship information indicating the relationship between the utterance time of the first participant indicated by the first utterance information and the utterance time of the second participant indicated by the second utterance information is generated. The information processing apparatus according to any one of claims 1 to 4, wherein the generated utterance time relationship information is included in the utterance relationship information.

The utterance relationship information generation unit
It is determined whether or not at least one of the first utterance information and the second utterance information includes the utterance content included in a specific category, and the information according to the determination result is the utterance relationship. The information processing apparatus according to any one of claims 1 to 5, wherein the information processing apparatus is included in the information.

The utterance relationship information generation unit
A feature is that words having a relatively high frequency of appearance are extracted from at least one of the first utterance information and the second utterance information within a predetermined time, and the extracted words are included in the utterance relationship information. The information processing apparatus according to any one of claims 1 to 6.

Claim 1 is characterized in that the relationship information generation unit refers to the relationship information and generates presentation information to be presented to at least one of the first participant and the second participant. The information processing apparatus according to any one of 7 to 7.

The presented information includes
Information indicating the ratio between the utterance time of the first participant and the utterance time of the second participant,
as well as,
The information processing apparatus according to claim 8, wherein the information processing apparatus includes information on a change over time in a matching rate between the line-of-sight direction of the first participant and the line-of-sight direction of the second participant.

The facial expression relationship information generation unit and the utterance relationship information generation unit further refer to the participant information indicating the attributes of the first and second participants, and further refer to the facial expression relationship information and the utterance relationship information. The information processing apparatus according to any one of claims 1 to 9, wherein information is generated.

It is an information processing method performed by a computer under the control of software.
A facial expression information acquisition step for acquiring the first facial expression information regarding the facial expression of the first participant among the plurality of participants and the second facial expression information regarding the facial expression of the second participant among the plurality of participants. ,
A voice information acquisition step for acquiring the first utterance information regarding the utterance of the first participant and the second utterance information regarding the utterance of the second participant among the plurality of participants.
A facial expression relationship that generates facial expression relationship information indicating a relationship between the first participant and the second participant with reference to the first facial expression information and the second facial expression information. Information generation step and
The utterance relationship that generates the utterance relationship information showing the relationship between the first participant and the second participant by referring to the first utterance information and the second utterance information. Information generation steps and
A relationship that generates relationship information that is real-time or temporal information indicating the relationship between the first participant and the second participant by referring to the facial expression relationship information and the utterance relationship information. Information generation step and
An information processing method, characterized in that the computer performs the above.

An information processing program for operating a computer as the information processing device according to any one of claims 1 to 10, wherein the facial information information acquisition unit, the voice information acquisition unit, and the facial expression relationship information generation unit. An information processing program for operating a computer as the speech-relationship information generation unit and the relationship information generation unit.