JP6901190B1

JP6901190B1 - Remote dialogue system, remote dialogue method and remote dialogue program

Info

Publication number: JP6901190B1
Application number: JP2021029659A
Authority: JP
Inventors: 宏哉籾倉; 蔵人香月
Original assignee: PocketRD Inc
Current assignee: PocketRD Inc
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2021-07-14
Anticipated expiration: 2041-02-26
Also published as: JP2022130967A

Abstract

【課題】ＷＥＢ会議等の遠隔対話時における参加者の同一性確認を簡易に行う技術を実現することを目的とする。【解決手段】管理サーバ２は、参加端末１から出力された参加情報を入力するための情報入力部１３と、参加情報に応じて個々の参加者の映像を生成する参加者映像生成部１４と、遠隔対話の舞台として設定した仮想対話空間中に参加者映像及び発言内容を表示した対話映像を生成する対話映像生成部１５と、各参加者の過去の挙動情報について記録する挙動情報データベース１６と、過去の挙動情報と現時点の挙動とを比較する挙動情報比較部１９と、挙動情報比較部１９における比較結果に基づき、参加者の同一性について判定を行う同一性判定部２０と、同一性判定部２０の判定結果に基づき必要に応じて警告情報を生成する警告情報生成部２１と、対話映像及び警告情報を出力する情報出力部２２とを備える。【選択図】図１PROBLEM TO BE SOLVED: To realize a technique for easily confirming the identity of participants at the time of remote dialogue such as a WEB conference. SOLUTION: A management server 2 has an information input unit 13 for inputting participation information output from a participating terminal 1, and a participant image generation unit 14 for generating images of individual participants according to the participation information. , A dialogue video generation unit 15 that generates a dialogue video displaying participant videos and remarks in a virtual dialogue space set as a stage for remote dialogue, and a behavior information database 16 that records past behavior information of each participant. , The behavior information comparison unit 19 that compares the past behavior information with the current behavior, the identity determination unit 20 that determines the identity of the participants based on the comparison result in the behavior information comparison unit 19, and the identity determination. It includes a warning information generation unit 21 that generates warning information as needed based on the determination result of the unit 20, and an information output unit 22 that outputs an interactive video and warning information. [Selection diagram] Fig. 1

Description

本発明は、同一場所にいない参加者間における遠隔対話を行う技術に関するものである。 The present invention relates to a technique for performing remote dialogue between participants who are not in the same place.

近年、営利企業等における経営会議や大学等の教育機関における講義等の多人数間における双方向コミュニケーションについて、参加者を単一の会場に集めて開催するのではなく、別個の場所にいる参加者同士をネットワークで接続した上で、あたかも同一会場に参加者がいるかのような態様で遠隔的に会議等を行う技術が提案されている。 In recent years, for multi-person two-way communication such as management meetings at commercial enterprises and lectures at educational institutions such as universities, participants are not held in a single venue but in different places. A technique has been proposed in which a conference or the like is held remotely as if there are participants in the same venue after connecting each other with a network.

例えば特許文献１は、実際の参加者と情報処理装置にて作成した架空のキャラクターとを仮想空間上に形成したＷｅｂ会議室に入出させた上で遠隔会議を開催する技術を開示する。単に参加者同士を通信的に接続する場合と異なり、会議としての現実感を向上させ、参加者が抱く違和感を低減させている。 For example, Patent Document 1 discloses a technique for holding a remote conference after allowing an actual participant and a fictitious character created by an information processing device to enter and exit a Web conference room formed in a virtual space. Unlike the case where participants are simply connected to each other by communication, the reality of the meeting is improved and the discomfort felt by the participants is reduced.

また、特許文献２は、遠隔会議の参加者の映像を適切に表示するために、画面上における画像分割及び参加者の映像に関する歪補正を行う技術を開示する。会議、講義等では発言者の発言内容のみならず発言時の身振り手振りといった非言語的コミュニケーション態様も重要な情報であり、また、聴き手における非言語的な反応も重要な情報となる。特許文献２は参加者の映像を適切に表示することにより、遠隔会議においても同一会場における会議等と同様に、発言内容等の言語的情報のみならず非言語的情報についても参加者間にて共有できる技術を開示している。 Further, Patent Document 2 discloses a technique for performing image division on a screen and distortion correction for a participant's image in order to appropriately display the image of a participant in a remote conference. In meetings, lectures, etc., not only the content of the speaker's remarks but also the nonverbal communication mode such as gestures at the time of remarks is important information, and the nonverbal reaction of the listener is also important information. In Patent Document 2, by appropriately displaying the video of the participants, not only the linguistic information such as the content of remarks but also the non-linguistic information can be obtained among the participants in the remote conference as well as the conference at the same venue. It discloses technologies that can be shared.

特開２０２０−１９４３４３号公報Japanese Unexamined Patent Publication No. 2020-194343 特開２０２０−１６０５３８号公報Japanese Unexamined Patent Publication No. 2020-160538

しかしながら、特許文献１、２は、遠隔会議等に参加している者が、真に参加資格を有する特定人物であるか否かを判定することが困難である。遠隔会議においては、例えば、営利企業の経営会議において役員の身分を仮装した産業スパイが潜入する場合や、大学の講義にいわゆる「代返」目的で受講生以外の者が受講生を装って参加する場合が考えられるところ、特許文献１、２の技術ではこれらの不正を適正に排除することが困難である。 However, in Patent Documents 1 and 2, it is difficult to determine whether or not a person who participates in a remote conference or the like is a specific person who is truly qualified to participate. In a teleconference, for example, when an industrial espionage disguised as an officer infiltrates at a management meeting of a commercial company, or a person other than a student pretends to participate in a university lecture for the purpose of so-called "return". However, it is difficult to properly eliminate these frauds with the techniques of Patent Documents 1 and 2.

そして、昨今の技術発展に鑑みると、顔や全身を映した映像や発話される音声に関する情報のみでは、「参加者」とされる者が真の参加者であるか、それとも参加者を装った第三者であるかを判別できなくなりつつある。例えば、ディープフェイクと称される深層学習を用いた人物画像合成技術や音声合成技術を使用することで、特定の人物の動画映像を作出することが技術的に可能である。このような技術を悪用された場合は、遠隔会議の画面上に表示された参加者の映像や音声によっては、「参加者」とされる者が真の参加者であるか否かを判別することは非常に困難である。 And, in view of recent technological developments, the person who is considered to be a "participant" is a true participant or pretended to be a participant only by the information on the image showing the face and the whole body and the sound spoken. It is becoming difficult to determine whether it is a third party. For example, it is technically possible to create a moving image of a specific person by using a person image synthesis technique or a voice synthesis technique using deep learning called deep fake. When such technology is abused, it is determined whether or not the person who is considered to be a "participant" is a true participant, depending on the video and audio of the participant displayed on the screen of the remote conference. That is very difficult.

また、ＩＤ及びパスワード入力による識別も、ハッキング等により情報漏洩が生じた場合はもとより、講義における「代返」のケースのように正規の参加資格を有する者と第三者が結託した場合には、残念ながら役に立たない。
本発明は上記の課題に鑑みてなされたものであって、ＷＥＢ会議等の遠隔対話時における参加者の同一性確認を簡易に行う技術を実現することを目的とする。 In addition, identification by entering ID and password is not limited to cases where information is leaked due to hacking, etc., but also when a third party colludes with a person who has regular participation qualifications as in the case of "return" in the lecture. Unfortunately, it's useless.
The present invention has been made in view of the above problems, and an object of the present invention is to realize a technique for easily confirming the identity of participants at the time of remote dialogue such as a WEB conference.

上記目的を達成するため、請求項１にかかる遠隔対話システムは、異なる場所に位置する複数の参加者間の対話を実現する遠隔対話システムであって、参加者の過去の挙動に関する情報に基づき１以上の特徴的挙動を検出し、前記特徴的挙動の発現頻度及び発現時における前記特徴的挙動の継続時間に関する情報である基準挙動情報を生成する基準挙動情報生成手段と、現対話時における参加者の前記１以上の特徴的挙動について、前記特徴的挙動の発現頻度及び発現時における前記特徴的挙動の継続時間に関する情報である現在挙動情報を生成する現在挙動情報生成手段と、前記基準挙動情報と前記現在挙動情報とを比較する挙動情報比較手段と、記挙動情報比較手段の比較結果に基づき参加者の同一性について判定する同一性判定手段と、前記同一性判定手段において判断対象となった参加者以外の１以上の参加者に対し、前記同一性判定手段による判定結果を出力する出力手段とを備えたことを特徴とする。 In order to achieve the above object, the remote dialogue system according to claim 1 is a remote dialogue system that realizes dialogue between a plurality of participants located at different locations, and is based on information on the past behavior of the participants. A reference behavior information generating means that detects the above characteristic behavior and generates reference behavior information that is information on the frequency of occurrence of the characteristic behavior and the duration of the characteristic behavior at the time of occurrence, and a participant at the time of the current dialogue. With respect to the one or more characteristic behaviors of the above, the current behavior information generating means for generating the current behavior information which is information on the frequency of occurrence of the characteristic behavior and the duration of the characteristic behavior at the time of occurrence, and the reference behavior information. The behavior information comparison means for comparing the current behavior information, the identity determination means for determining the identity of the participants based on the comparison result of the description behavior information comparison means, and the participation subject to the determination in the identity determination means. It is characterized in that it is provided with an output means for outputting a determination result by the identity determination means to one or more participants other than the person.

また、上記目的を達成するため、請求項２にかかる遠隔対話システムは、上記の発明において、前記基準挙動情報生成手段は、前記基準挙動情報として発現頻度に関する数値情報である基準発現頻度と継続時間に関する数値情報である基準継続時間が含まれる情報を生成し、前記現在挙動情報生成手段は、前記現在挙動情報として発現頻度に関する数値情報である現在発現頻度と継続時間に関する数値情報である現在継続時間が含まれる情報を生成し、前記挙動情報比較手段は、同一の特徴的挙動に関する基準発現頻度と現在発現頻度の差分値の絶対値を基準発現頻度にて除算した値及び同一の特徴的挙動に関する基準継続時間と現在継続時間の差分値の絶対値を基準継続時間にて除算した値を比較結果として導出し、前記同一性判定手段は、前記比較結果の総和が第１の閾値未満の場合に同一性ありと判定し、前記第1の閾値以上第２の閾値未満の場合に同一性の可能性ありと判定し、前記第２の閾値以上の場合に同一性なしと判定することを特徴とする。 Further, in order to achieve the above object, in the remote dialogue system according to claim 2, in the above invention, the reference behavior information generating means is the reference behavior information, which is numerical information regarding the occurrence frequency, the reference expression frequency and the duration. The current behavior information generation means generates information including a reference duration which is numerical information about the current behavior information, and the current behavior information generating means is numerical information about the current occurrence frequency and the duration as the current behavior information. Is generated, and the behavior information comparison means relates to a value obtained by dividing the absolute value of the difference value between the reference occurrence frequency and the current occurrence frequency for the same characteristic behavior by the reference occurrence frequency and the same characteristic behavior. The absolute value of the difference between the reference duration and the current duration is divided by the reference duration to derive the value as the comparison result, and the identity determination means is used when the sum of the comparison results is less than the first threshold value. It is characterized in that it is determined that there is identity, it is determined that there is a possibility of identity when it is equal to or greater than the first threshold value and less than the second threshold value, and it is determined that there is no identity when it is equal to or greater than the second threshold value. To do.

また、上記目的を達成するため、請求項３にかかる遠隔対話システムは、上記の発明において、遠隔対話に参加する複数の参加者の映像である対話映像を特定の参加者に対し表示する表示手段と、前記特定の参加者の映像を撮影する撮影手段と、前記表示手段と前記撮影手段及び前記特定の参加者間の位置関係に基づき、前記撮影手段による撮影時に前記特定の参加者が前記複数の参加者のうちどの参加者に視線を向けているかの情報を含む注目対象情報を生成する注目対象情報生成手段と、前記表示手段に表示する前記特定の参加者の映像として、前記注目対象情報にて示される参加者に視線を向けた態様となるよう前記対話映像を生成する対話映像生成手段とをさらに備えたことを特徴とする。 Further, in order to achieve the above object, the remote dialogue system according to claim 3 is a display means for displaying a dialogue video, which is a video of a plurality of participants participating in the remote dialogue, to a specific participant in the above invention. Based on the positional relationship between the photographing means for photographing the image of the specific participant, the display means, the photographing means, and the specific participant, the specific participant may be present at the time of photographing by the photographing means. The attention target information generation means for generating attention target information including information on which participant the eyes are directed to, and the attention target information as an image of the specific participant to be displayed on the display means. It is characterized in that it is further provided with a dialogue image generation means for generating the dialogue image so as to have a mode in which the line of sight is directed to the participant shown in the above.

また、上記目的を達成するため、請求項４にかかる遠隔対話方法は、異なる場所に位置する複数の参加者間の対話を実現する遠隔対話方法であって、参加者の過去の挙動に関する情報に基づき１以上の特徴的挙動を検出し、前記特徴的挙動の発現頻度及び発現時における前記特徴的挙動の継続時間に関する情報である基準挙動情報を生成する基準挙動情報生成ステップと、現対話時における参加者の前記１以上の特徴的挙動について、前記特徴的挙動の発現頻度及び発現時における前記特徴的挙動の継続時間に関する情報である現在挙動情報を生成する現在挙動情報生成ステップと、前記基準挙動情報と前記現在挙動情報とを比較する挙動情報比較ステップと、前記挙動情報比較ステップの比較結果に基づき参加者の同一性について判定する同一性判定ステップと、前記同一性判定ステップにおいて判断対象となった参加者以外の１以上の参加者に対し、前記同一性判定ステップによる判定結果を出力する出力ステップとを含むことを特徴とする。 Further, in order to achieve the above object, the remote dialogue method according to claim 4 is a remote dialogue method that realizes a dialogue between a plurality of participants located at different places, and provides information on the past behavior of the participants. Based on this, a reference behavior information generation step that detects one or more characteristic behaviors and generates reference behavior information that is information on the frequency of occurrence of the characteristic behavior and the duration of the characteristic behavior at the time of occurrence, and a reference behavior information generation step at the time of the current dialogue. Regarding the one or more characteristic behaviors of the participants, the current behavior information generation step for generating the current behavior information which is information on the frequency of occurrence of the characteristic behavior and the duration of the characteristic behavior at the time of occurrence, and the reference behavior. The behavior information comparison step for comparing the information with the current behavior information, the identity determination step for determining the identity of the participants based on the comparison result of the behavior information comparison step, and the determination target in the identity determination step. It is characterized by including an output step for outputting a determination result by the identity determination step to one or more participants other than the participants.

また、上記目的を達成するため、請求項５にかかる遠隔対話方法は、上記の発明において、遠隔対話に参加する複数の参加者の映像である対話映像を特定の参加者に対し表示する表示ステップと、前記特定の参加者の映像を撮影する撮影ステップと、前記表示ステップにおける表示場所と前記撮影ステップにおける撮影地点及び前記特定の参加者間の位置関係に基づき、前記撮影ステップにおける撮影時に前記特定の参加者が前記複数の参加者のうちどの参加者に視線を向けているかの情報を含む注目対象情報を生成する注目対象情報生成ステップと、前記表示ステップにて表示する前記特定の参加者の映像として、前記注目対象情報にて示される参加者に視線を向けた態様となるよう前記対話映像を生成する対話映像生成ステップとをさらに含むことを特徴とする。 Further, in order to achieve the above object, the remote dialogue method according to claim 5 is a display step of displaying a dialogue video which is a video of a plurality of participants participating in the remote dialogue to a specific participant in the above invention. And, based on the shooting step of shooting the image of the specific participant, the display location in the display step, the shooting point in the shooting step, and the positional relationship between the specific participants, the specification at the time of shooting in the shooting step. Of the attention target information generation step for generating the attention target information including information on which of the plurality of participants the participants are looking at, and the specific participant to be displayed in the display step. The image is further characterized by including a dialogue image generation step of generating the dialogue image so as to have an aspect in which the line of sight is directed to the participant indicated by the attention target information.

また、上記目的を達成するため、請求項６にかかる遠隔対話プログラムは、異なる場所に位置する複数の参加者間の対話をコンピュータに実現させる遠隔対話プログラムであって、参加者の過去の挙動に関する情報に基づき１以上の特徴的挙動を検出し、前記特徴的挙動の発現頻度及び発現時における前記特徴的挙動の継続時間に関する情報である基準挙動情報を生成する基準挙動情報生成機能と、現対話時における参加者の前記１以上の特徴的挙動について、前記特徴的挙動の発現頻度及び発現時における前記特徴的挙動の継続時間に関する情報である現在挙動情報を生成する現在挙動情報生成機能と、前記基準挙動情報と前記現在挙動情報とを比較する挙動情報比較機能と、前記挙動情報比較機能の比較結果に基づき参加者の同一性について判定する同一性判定機能と、前記同一性判定機能において判断対象となった参加者以外の１以上の参加者に対し、前記同一性判定機能による判定結果を出力する出力機能とを実現させることを特徴とする。 Further, in order to achieve the above object, the remote dialogue program according to claim 6 is a remote dialogue program that allows a computer to realize a dialogue between a plurality of participants located at different places, and relates to the past behavior of the participants. A reference behavior information generation function that detects one or more characteristic behaviors based on the information and generates reference behavior information that is information on the frequency of occurrence of the characteristic behavior and the duration of the characteristic behavior at the time of occurrence, and the current dialogue. With respect to the one or more characteristic behaviors of the participants at the time, the current behavior information generation function for generating the current behavior information which is information on the frequency of occurrence of the characteristic behaviors and the duration of the characteristic behaviors at the time of occurrence, and the above-mentioned A behavior information comparison function that compares the reference behavior information with the current behavior information, an identity determination function that determines the identity of participants based on the comparison result of the behavior information comparison function, and a determination target in the identity determination function. It is characterized in that an output function for outputting a determination result by the identity determination function is realized for one or more participants other than the participant who has become.

本発明によれば、ＷＥＢ会議等の遠隔対話時における参加者の同一性確認を簡易に行えるという効果を奏する。 According to the present invention, there is an effect that the identity of participants can be easily confirmed at the time of remote dialogue such as a WEB conference.

実施の形態１にかかる遠隔対話システムの構成を示す模式図である。It is a schematic diagram which shows the structure of the remote dialogue system which concerns on Embodiment 1. FIG. 実施の形態１にかかる遠隔対話システムの動作を説明するためのフローチャートである。It is a flowchart for demonstrating the operation of the remote dialogue system which concerns on Embodiment 1. FIG. 実施の形態２にかかる遠隔対話システムの構成を示す模式図である。It is a schematic diagram which shows the structure of the remote dialogue system which concerns on Embodiment 2. FIG. 実施の形態２にかかる遠隔対話システムの動作を説明するためのフローチャートである。It is a flowchart for demonstrating the operation of the remote dialogue system which concerns on Embodiment 2.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。以下の実施の形態は、本発明の実施の形態として最も適切と考えられる例について記載するものであり、当然のことながら、本発明の内容を本実施の形態にて示された具体例に限定して解すべきではない。同様の作用・効果を奏する構成であれば、実施の形態にて示す具体的構成以外のものであっても、本発明の技術的範囲に含まれることは勿論である。
（実施の形態１） Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. The following embodiments describe examples that are considered to be the most appropriate embodiments of the present invention, and as a matter of course, the contents of the present invention are limited to the specific examples shown in the present embodiments. Should not be understood. It goes without saying that any configuration other than the specific configuration shown in the embodiment is included in the technical scope of the present invention as long as the configuration exhibits the same action and effect.
(Embodiment 1)

まず、実施の形態１にかかる遠隔対話システムについて説明する。図１に示すとおり、本実施の形態１にかかる遠隔対話システムは、参加者の発話内容及び挙動内容に関する情報を含む参加情報を生成・出力すると共に、参加者間の対話映像を表示するための参加端末１と、参加端末１から出力された参加情報に基づき対話映像を生成して参加端末１に向けて出力すると共に、参加端末１から出力された参加情報に含まれる参加者の挙動情報に基づき、参加情報の主体が予め登録されている参加者と同一人物であるか否かを判定する管理サーバ２と、参加端末１と管理サーバ２の間を接続する情報通信網３とを備える。 First, the remote dialogue system according to the first embodiment will be described. As shown in FIG. 1, the remote dialogue system according to the first embodiment generates and outputs participation information including information on the speech content and behavior content of the participants, and displays a dialogue video between the participants. A dialogue video is generated based on the participation terminal 1 and the participation information output from the participation terminal 1 and output to the participation terminal 1, and the behavior information of the participants included in the participation information output from the participation terminal 1 is included. Based on this, the management server 2 for determining whether or not the subject of the participation information is the same person as the participant registered in advance, and the information communication network 3 for connecting between the participating terminal 1 and the management server 2 are provided.

参加端末１は、遠隔対話の参加者に対応して配置され、本実施の形態１においては、原則として参加者の人数分の個数からなる参加端末１が使用されるものとする。参加端末１は、参加者の映像を撮影する撮影部４と、参加者の発言内容を記録する発言記録部５と、撮影部４にて撮影された映像から挙動情報を生成する挙動情報生成部６と、挙動情報及び音声記録部５にて記録された発言データを含む参加情報を生成する参加情報生成部７と、管理サーバ２に対し参加情報を出力する情報出力部８と、管理サーバ２から送信された対話映像を入力する情報入力部９と、入力された対話映像を表示する表示部１０とを備える。 Participating terminals 1 are arranged in response to participants in remote dialogue, and in the first embodiment, in principle, participating terminals 1 composed of the number of participants are used. The participating terminal 1 includes a shooting unit 4 that shoots a participant's image, a speech recording unit 5 that records the participant's remarks, and a behavior information generation unit that generates behavior information from the video captured by the shooting unit 4. 6, the participation information generation unit 7 that generates the participation information including the behavior information and the speech data recorded by the voice recording unit 5, the information output unit 8 that outputs the participation information to the management server 2, and the management server 2. The information input unit 9 for inputting the dialogue video transmitted from the above and the display unit 10 for displaying the input dialogue video are provided.

撮影部４は、遠隔対話の開催時における参加者の映像を取得するためのものである。本実施の形態１における撮影部４の具体的な構成としては、参加者の全身について３次元的な立体映像を取得できる３Ｄスキャナとする。 The photographing unit 4 is for acquiring a video of a participant at the time of holding a remote dialogue. The specific configuration of the photographing unit 4 in the first embodiment is a 3D scanner capable of acquiring a three-dimensional stereoscopic image of the whole body of the participant.

発言記録部５は、参加者の発言内容を記録するためのものである。具体的には、発言記録部５は、音声入力を行うためのマイク機能と、入力された音声を文字情報に変換する音声認識機能とを備えるものとする。もっとも、発言記録部５の構成をこれに限定する必要はなく、例えば、最も簡易な構成としては、マイク機能を介して入力された音声データをそのまま管理サーバ２に向けて出力することとしてもよいし、マイク機能の代わりにキーボード、ポインティングデバイス等の入力機構を備え、発言内容として文字情報を直接入力する構成としてもよい。 The remark recording unit 5 is for recording the remark contents of the participants. Specifically, the speech recording unit 5 is provided with a microphone function for performing voice input and a voice recognition function for converting the input voice into character information. However, it is not necessary to limit the configuration of the speech recording unit 5 to this, and for example, as the simplest configuration, the audio data input via the microphone function may be output to the management server 2 as it is. However, instead of the microphone function, an input mechanism such as a keyboard or a pointing device may be provided, and character information may be directly input as the content of remarks.

挙動情報生成部６は、撮影部４にて取得された参加者の映像に基づき、参加者の挙動情報を生成するためのものである。具体的には、挙動情報生成部６は、参加者の映像における所定の特徴点の時間経過に伴う位置変動に関する情報を生成する機能を有する。 The behavior information generation unit 6 is for generating the behavior information of the participants based on the video of the participants acquired by the shooting unit 4. Specifically, the behavior information generation unit 6 has a function of generating information on position fluctuations of predetermined feature points in a participant's image with the passage of time.

特徴点とは、撮影対象における外表面上の特徴箇所及び骨格構造上の特徴箇所を意味し、例えば外表面に対応した特徴点としては、例えば、目、鼻、口等の各部位における特定箇所（目尻、瞳、鼻頭、口角等）をいい、骨格構造に対応した特徴点としては、首、肩、肘、手首、指先、腰、膝、足首等の関節等をいう。挙動情報生成部６は、撮影部４にて取得された映像から抽出した参加者の特徴点の意味内容（特徴点Ａは右目の目尻に相当する点である、等）及び時間経過に伴う位置変化を導出し、これらを含む情報である挙動情報を生成する機能を有する。 The feature points mean the feature points on the outer surface and the feature points on the skeletal structure of the object to be photographed. For example, the feature points corresponding to the outer surface include specific points in each part such as eyes, nose, and mouth. (Outer corners of the eyes, eyes, nose tip, corners of the mouth, etc.), and the feature points corresponding to the skeletal structure are joints such as the neck, shoulders, elbows, wrists, fingertips, hips, knees, and ankles. The behavior information generation unit 6 has the meaning and content of the participant's feature points extracted from the video acquired by the shooting unit 4 (feature points A are points corresponding to the outer corners of the right eye, etc.) and positions with the passage of time. It has a function of deriving changes and generating behavior information that is information including these.

参加情報生成部７は、参加端末１を利用して遠隔対話に参加する参加者に関する情報を生成するためのものである。具体的には、参加情報生成部７は、発言記録部５にて記録された発言情報と、挙動情報生成部６にて生成された挙動情報とを含む情報である参加情報を生成する機能を有する。なお、発言情報と挙動情報については、互いに別個独立な情報として参加情報を生成する扱いとしてもよいが、より現実感のある対話映像生成及び正確な挙動判定に役立てる観点からは、発言情報と挙動情報とを時間的に同調させた形式とすることが望ましい。 The participation information generation unit 7 is for generating information about participants who participate in the remote dialogue by using the participation terminal 1. Specifically, the participation information generation unit 7 has a function of generating participation information which is information including the speech information recorded by the speech recording unit 5 and the behavior information generated by the behavior information generation unit 6. Have. It should be noted that the speech information and the behavior information may be treated as generating the participation information as information that is independent of each other, but from the viewpoint of useful for more realistic dialogue video generation and accurate behavior judgment, the speech information and the behavior It is desirable to have a format in which information is synchronized in time.

情報出力部８は、参加情報生成部７によって生成された参加情報を、管理サーバ２に対して出力するためのものである。情報出力部８の具体的構成は、情報通信網３の形態や接続プロトコルに応じて異なるものの、有線接続・無線接続のいずれでもよく、無線接続の場合でも無線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）等いかなる形式でも採用することが可能である。 The information output unit 8 is for outputting the participation information generated by the participation information generation unit 7 to the management server 2. Although the specific configuration of the information output unit 8 differs depending on the form of the information communication network 3 and the connection protocol, it may be either a wired connection or a wireless connection, and even in the case of a wireless connection, any wireless LAN, Bluetooth (registered trademark), etc. It can also be adopted in the form.

情報入力部９は、管理サーバ２によって生成された対話映像を入力するためのものである。情報入力部９の具体的構成は、情報通信網３の形態や接続プロトコルに応じて異なるものの、有線接続・無線接続のいずれでもよく、無線接続の場合でも無線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）等いかなる形式でも採用することが可能である。 The information input unit 9 is for inputting the dialogue video generated by the management server 2. Although the specific configuration of the information input unit 9 differs depending on the form of the information communication network 3 and the connection protocol, it may be either a wired connection or a wireless connection, and even in the case of a wireless connection, any wireless LAN, Bluetooth (registered trademark), etc. It can also be adopted in the form.

表示部１０は、管理サーバ２から入力された対話映像を表示するためのものである。参加者は、自らの発言内容や挙動について出力する一方で、各参加者の発言内容や挙動に基づき生成された対話映像が表示されたものを閲覧することにより、遠隔対話における議論等に参加することが可能となる。 The display unit 10 is for displaying the dialogue video input from the management server 2. Participants participate in discussions in remote dialogues, etc. by viewing the dialogue video generated based on each participant's remarks and behavior while outputting their own remarks and behavior. It becomes possible.

次に、管理サーバ２について説明する。管理サーバ２は、参加端末１から出力された参加情報を入力するための情報入力部１３と、入力された参加情報に応じて個々の参加者の映像を生成する参加者映像生成部１４と、遠隔対話の舞台として設定した仮想対話空間中に参加者映像及び発言内容を表示した対話映像を生成する対話映像生成部１５と、各参加者の過去の挙動情報について記録する挙動情報データベース１６と、過去の挙動情報と現時点の挙動とを比較する挙動情報比較部１９と、挙動情報比較部１９における比較結果に基づき、参加者の同一性について判定を行う同一性判定部２０と、同一性判定部２０の判定結果に基づき必要に応じて警告情報を生成する警告情報生成部２１と、対話映像及び警告情報を出力する情報出力部２２とを備える。 Next, the management server 2 will be described. The management server 2 includes an information input unit 13 for inputting participation information output from the participation terminal 1, a participant image generation unit 14 for generating images of individual participants according to the input participation information, and a participant image generation unit 14. A dialogue video generation unit 15 that generates a dialogue video displaying participant videos and remarks in a virtual dialogue space set as a stage for remote dialogue, a behavior information database 16 that records past behavior information of each participant, and A behavior information comparison unit 19 that compares past behavior information with the current behavior, an identity determination unit 20 that determines the identity of participants based on the comparison results in the behavior information comparison unit 19, and an identity determination unit. A warning information generation unit 21 that generates warning information as needed based on the determination result of 20 and an information output unit 22 that outputs an interactive video and warning information are provided.

情報入力部１３は、参加端末１によって生成された参加情報を入力するためのものである。情報入力部１３の具体的構成は、情報通信網３の形態や接続プロトコルに応じて異なるものの、有線接続・無線接続のいずれでもよく、無線接続の場合でも無線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）等いかなる形式でも採用することが可能である。 The information input unit 13 is for inputting the participation information generated by the participating terminal 1. Although the specific configuration of the information input unit 13 differs depending on the form of the information communication network 3 and the connection protocol, it may be either a wired connection or a wireless connection, and even in the case of a wireless connection, any wireless LAN, Bluetooth (registered trademark), etc. It can also be adopted in the form.

参加者映像生成部１４は、情報入力部１３を介して入力された参加情報と、事前に設定された参加者ごとのキャラクター映像情報に基づき、仮想対話空間中に表示する参加者映像を生成するためのものである。キャラクター映像情報とは、参加者を表現するキャラクターの外観等に関する情報であって、参加者として表現されるキャラクターの外表面に表れる姿形（スキン等）、内部的な骨格構造（ボーン・ジョイント等）及び姿形と骨格構造の関係を規定する情報（ウェイト等）を含む３次元コンピュータグラフィックスの形式にて構成される。また、キャラクター映像情報には、挙動情報生成部６での処理に用いられる特徴点に関する情報も記録されている。 The participant video generation unit 14 generates a participant video to be displayed in the virtual dialogue space based on the participation information input via the information input unit 13 and the character video information for each participant set in advance. Is for. Character video information is information about the appearance of a character expressing a participant, such as a figure (skin, etc.) appearing on the outer surface of the character expressed as a participant, and an internal skeletal structure (bone joint, etc.). ) And information (weights, etc.) that defines the relationship between the figure and the skeletal structure. In addition, the character video information also records information about feature points used for processing in the behavior information generation unit 6.

すなわち、キャラクター映像情報中には、参加者のキャラクターにおける目、鼻、口等の外表面における特定箇所（目尻、瞳、鼻頭、口角等）や首、肩、肘、手首、指先、腰、膝、足首等の関節等の骨格構造における特定箇所に対応した位置を特徴点として記録されている。なお、参加者のキャラクターの具体的態様としては、本人を模した３次元コンピュータグラフィックとしてもよいし、本人とは全く異なる人物、さらには動物、アニメーションに登場するキャラクター等であってもよいが、好ましくは、挙動情報生成部６にて使用されるものと対応関係にある特徴点に関する情報を含むこととする。 That is, in the character video information, specific parts (outer corners of eyes, eyes, nose tips, corners of mouth, etc.) on the outer surface of the participants' characters such as eyes, nose, and mouth, neck, shoulders, elbows, wrists, fingertips, hips, and knees. , Positions corresponding to specific points in the skeletal structure such as joints such as ankles are recorded as feature points. The specific mode of the participant's character may be a three-dimensional computer graphic that imitates the person himself / herself, a person completely different from the person himself / herself, an animal, a character appearing in an animation, or the like. Preferably, it includes information about feature points that correspond to those used in the behavior information generation unit 6.

このような構成を有するキャラクター映像情報を利用することによって、参加者映像生成部１４は、参加端末１から入力された挙動情報に含まれる特徴点の位置変化態様をキャラクター映像情報における特徴点に適用して動作させることで、撮影部４にて撮影された参加者の挙動と同様の挙動をキャラクター映像に演じさせる機能を有する。 By using the character video information having such a configuration, the participant video generation unit 14 applies the position change mode of the feature points included in the behavior information input from the participating terminal 1 to the feature points in the character video information. By operating the character image, the character image has a function similar to that of the participant photographed by the photographing unit 4.

また、参加者映像生成部１４は、参加端末１から入力された発言情報に基づき、参加者の発言内容を参加者映像に同調させる。同調させる発言内容のデータ形式としては、音声データとしてもよいし、文字データとしてもよい。音声データとする場合は、参加者本人の音声的特徴を付加したデータとしてもよいし、参加者本人の音声とは全く異なる別個の音声としてもよい。 In addition, the participant video generation unit 14 synchronizes the content of the participant's remarks with the participant video based on the remark information input from the participating terminal 1. The data format of the remark content to be synchronized may be voice data or character data. In the case of voice data, it may be data to which the voice characteristics of the participant himself / herself are added, or it may be a separate voice completely different from the voice of the participant himself / herself.

対話映像生成部１５は、各参加者が実際に同一空間に集合して対話を行っているかのような仮想映像からなる対話映像を生成するためのものである。具体的には、対話映像生成部１５は、予め用意した仮想対話空間中に参加者映像生成部１４にて生成された参加者映像を配置することにより、あたかも各参加者が仮想対話空間中にて直接会って議論を行っているかのような対話映像を生成する機能を有する。なお、対話映像の内容については、各参加者が保有する参加端末１に対し全て同じ映像としてもよいし、参加端末１ごとに異なる映像としてもよい。基本映像については同一としつつも、参加端末１における操作によって表示アングルを変化させたり、参加者（正確には仮想対話空間中に配置された参加者のキャラクター映像）自身の視野に合わせた表示アングルの映像に切り替えられる態様としてもよい。逆に、仮想対話空間をシンプルなものとして単に参加者映像を並べて表示する態様としてもよい。 The dialogue video generation unit 15 is for generating a dialogue video composed of virtual images as if each participant actually gathers in the same space and has a dialogue. Specifically, the dialogue video generation unit 15 arranges the participant video generated by the participant video generation unit 14 in the virtual dialogue space prepared in advance, so that each participant is placed in the virtual dialogue space. It has a function to generate a dialogue video as if they were meeting in person and having a discussion. The contents of the dialogue video may be the same video for all the participating terminals 1 owned by each participant, or may be different for each participating terminal 1. Although the basic video is the same, the display angle can be changed by operating the participating terminal 1 or the display angle can be adjusted to the field of view of the participant (to be exact, the character video of the participant placed in the virtual dialogue space). It may be a mode in which the image can be switched to. On the contrary, the virtual dialogue space may be simply displayed side by side with the participant images.

挙動情報データベース１６は、参加者の過去の挙動について記録するためのものである。挙動情報データベース１６は、過去の挙動として参加者が過去に対話に参加した際における挙動映像をそのまま記録してもよいし、挙動映像に基づき特徴点の位置変動態様について記録することとしてもよい。
基準挙動情報生成部１７は、挙動情報データベース１６に記録された参加者の過去の挙動に関する情報から、基準挙動情報を生成するためのものである。具体的には、基準挙動情報生成部１７は、過去の挙動に関する情報に基づき、参加者の特徴的な挙動（特徴点の位置変化を類型化することによって特定する。）である特徴的挙動を１以上検出し、検出した特徴的挙動のそれぞれについて、特徴的挙動の態様（対応する特徴点の位置変動態様）に加え、基準発現頻度及び基準継続時間を含む基準挙動情報を生成する機能を有する。 The behavior information database 16 is for recording the past behavior of the participants. The behavior information database 16 may record the behavior video as it is when the participant participated in the dialogue in the past as the past behavior, or may record the position variation mode of the feature point based on the behavior video.
The reference behavior information generation unit 17 is for generating reference behavior information from the information on the past behavior of the participants recorded in the behavior information database 16. Specifically, the reference behavior information generation unit 17 determines the characteristic behavior, which is the characteristic behavior of the participants (identified by typifying the change in the position of the feature point), based on the information on the past behavior. It has a function to detect one or more and generate reference behavior information including the reference occurrence frequency and the reference duration in addition to the aspect of the characteristic behavior (position fluctuation mode of the corresponding feature point) for each of the detected characteristic behaviors. ..

特徴的挙動としては、例えば、左腕において人差し指を立てて指先を鼻頭に接触させる挙動（左腕人差し指、左手首、左肘等に対応した特徴点の位置変化によってかかる挙動の有無を特定できる。）や、机の上に両肘を置き、顔の前で両掌を合わせる挙動（両腕の肘、掌の各部分等に対応した特徴点の位置変化によって特定できる。）、さらには両目を強く閉じると共に左右の口角が上がるなどの表情変化（顔の各部分に対応した特徴点の位置変化によって特定できる。）等、特徴点の位置変化によって特定できるものであれば、あらゆる挙動を含む。なお、同一性判定の正確性を向上させる観点からは、記憶する特徴的挙動の種類は参加者ごとに複数あることが望ましく、他方で、「特徴的挙動」といっても他の参加者には見られない独自の挙動である必要はない。同じような挙動であっても発現頻度、継続時間等に差が生じるのが通常であり、複数の特徴的挙動に基づき同一性判定を行う場合において、あらゆる挙動が一致するケースが稀であるためであるため、独自の挙動でなくても高精度の同一性判定が可能なためである。 As characteristic behaviors, for example, the behavior of raising the index finger on the left arm and bringing the fingertip into contact with the nose tip (the presence or absence of such behavior can be specified by the position change of the feature point corresponding to the index finger of the left arm, the left wrist, the left elbow, etc.) and , Place both elbows on the desk and put both palms together in front of the face (identified by the position change of the feature points corresponding to each part of the elbows and palms of both arms), and close both eyes strongly It includes all behaviors as long as it can be specified by the position change of the feature point, such as the facial expression change such as the left and right corners of the mouth rising (it can be specified by the position change of the feature point corresponding to each part of the face). From the viewpoint of improving the accuracy of identity determination, it is desirable that there are multiple types of characteristic behaviors to be memorized for each participant, and on the other hand, "characteristic behaviors" can be referred to as "characteristic behaviors" by other participants. Does not have to be a unique behavior that is not seen. Even if the behavior is similar, there is usually a difference in the frequency of occurrence, duration, etc., and when the identity is judged based on a plurality of characteristic behaviors, it is rare that all the behaviors match. Therefore, it is possible to determine the identity with high accuracy even if the behavior is not unique.

基準発現頻度及び基準継続時間については、挙動情報データベース１６は、過去の対話における発現頻度及び継続時間の単純平均値としてもよいが、現在に近い時点に開催された対話における値ほど重要度が高いものとした加重平均（重み付き平均）とすることが好ましい。 Regarding the reference occurrence frequency and the reference duration, the behavior information database 16 may be a simple average value of the occurrence frequency and the duration in the past dialogue, but the value in the dialogue held near the present is more important. It is preferable to use a weighted average (weighted average).

現在挙動情報生成部１８は、現に参加端末１から入力されてくる現時点の挙動情報に基づき、現在挙動情報を生成するためのものである。具体的には、現在挙動情報生成部１８は、現時点の挙動情報から、基準挙動情報生成部１７におけるものと同じ特徴的挙動の有無を検出し、検出した特徴的挙動のそれぞれについて、特徴的挙動の態様（対応する特徴点の位置変動態様）に加え、発現頻度及び継続時間に関する情報である現在挙動情報を生成する機能を有する。本実施の形態１においては、現在挙動情報には発現頻度に関する数値情報である現在発現頻度と、継続時間に関する数値情報である現在継続時間とが含まれることとする。 The current behavior information generation unit 18 is for generating the current behavior information based on the current behavior information actually input from the participating terminal 1. Specifically, the current behavior information generation unit 18 detects the presence or absence of the same characteristic behavior as that in the reference behavior information generation unit 17 from the current behavior information, and for each of the detected characteristic behaviors, the characteristic behavior. In addition to the above-mentioned mode (positional variation mode of the corresponding feature point), it has a function of generating current behavior information which is information on the expression frequency and duration. In the first embodiment, the current behavior information includes the current expression frequency, which is numerical information regarding the expression frequency, and the current duration, which is numerical information regarding the duration.

なお、現時点における特徴的挙動の発現の有無を判断するにあたって、現在挙動情報生成部１８は、当該特徴的挙動に対応する特徴点群の位置変化が、基準挙動情報生成部が検出したものと完全に一致した場合のみならず、対応する特徴点群のそれぞれにおける位置変化の差分値の合計が所定の閾値未満となる場合についても、特徴的挙動が発現したものと判断する。すなわち、多少の位置変化、例えば顔の前で両掌を合わせる挙動において両掌の高さが過去のものと多少異なっていたとしても、同一の特徴的挙動が発現したものと判定する。 In determining whether or not the characteristic behavior is present at the present time, the behavior information generation unit 18 is completely different from the position change of the feature point group corresponding to the characteristic behavior detected by the reference behavior information generation unit. It is judged that the characteristic behavior is exhibited not only when the above is the case but also when the total difference value of the position change in each of the corresponding feature point groups is less than the predetermined threshold value. That is, even if the heights of both palms are slightly different from those in the past in a slight change in position, for example, the behavior of putting both palms together in front of the face, it is determined that the same characteristic behavior is exhibited.

挙動情報比較部１９は、基準挙動情報と現在挙動情報とを比較するためのものである。具体的には、挙動情報比較部１９は、まず、個々の特徴的挙動に関する出現頻度及び継続時間について、基準出現頻度と現在出現頻度の差分値の絶対値を基準出現頻度で除算した値を算出し、また、基準継続時間と現在継続時間の差分値の絶対値を基準継続時間で除算した値を算出する。そして、算出した値について全ての特徴的挙動に関して加算する（より好ましくは、個々の値について重要度に応じた係数を乗算した上で加算する。）。かかる加算結果を比較結果として、同一性判定部２０に対し出力する。 The behavior information comparison unit 19 is for comparing the reference behavior information with the current behavior information. Specifically, the behavior information comparison unit 19 first calculates a value obtained by dividing the absolute value of the difference value between the reference appearance frequency and the current appearance frequency by the reference appearance frequency for the appearance frequency and duration of each characteristic behavior. Then, the absolute value of the difference between the reference duration and the current duration is divided by the reference duration to calculate the value. Then, the calculated values are added for all characteristic behaviors (more preferably, each value is multiplied by a coefficient according to its importance and then added). The addition result is output to the identity determination unit 20 as a comparison result.

同一性判定部２０は、挙動情報比較部１９によって得られた加算結果に応じて、参加者の同一性（事前に登録された「参加者」と、現に参加している者が同一であるか否か）に関する判定を行うためのものである。本実施の形態１においては、加算結果が第１の閾値未満の値となった場合に「同一性あり」と判定し、第１の閾値以上、第２の閾値未満の値となった場合に「同一でない可能性あり」と判定し、第２の閾値以上の値となった場合に「同一性なし」と判定することとする。 The identity determination unit 20 determines the identity of the participants (whether the pre-registered "participant" and the person who is actually participating are the same, depending on the addition result obtained by the behavior information comparison unit 19. It is for making a judgment regarding (whether or not). In the first embodiment, when the addition result is a value less than the first threshold value, it is determined that there is "identity", and when the value is equal to or more than the first threshold value and less than the second threshold value. It is determined that "there is a possibility that they are not the same", and when the value is equal to or greater than the second threshold value, it is determined that there is no identity.

警告情報生成部２１は、同一性判定部２０による判定結果に応じて、参加者の同一性に関する警告情報を生成するためのものである。具体的には、同一性判定部２０にて「同一性あり」と判定された場合は警告情報を生成せず、「同一でない可能性あり」と判定した場合は、判定対象となった参加者以外の参加者宛に「同一性について確認を推奨する。」旨の内容からなる警告情報を生成する。また、警告情報生成部２１は、同一性判定部２０にて「同一性なし」と判定された場合は、遠隔対話の運営責任者宛に「当該参加者との接続解除を推奨する。」旨の内容からなる警告情報を生成する。 The warning information generation unit 21 is for generating warning information regarding the identity of the participants according to the determination result by the identity determination unit 20. Specifically, if the identity determination unit 20 determines that there is "identity", no warning information is generated, and if it is determined that "there is a possibility that they are not the same", the participant who is the determination target. Generate warning information to other participants with the content "Recommend confirmation of identity." In addition, when the identity determination unit 20 determines that there is no identity, the warning information generation unit 21 sends the remote dialogue operation manager "recommends disconnection from the participant". Generate warning information consisting of the contents of.

情報出力部２２は、対話映像生成部１５によって生成された対話映像と、警告情報生成部２１によって生成された警告情報を、参加端末１に向けて出力するためのものである。警告情報に関しては、情報内に宛先に関する情報も規定されていることから、宛先として指定されている参加端末１に対してのみ出力する構成とする。また、情報出力部２２の具体的構成は、情報通信網３の形態や接続プロトコルに応じて異なるものの、有線接続・無線接続のいずれでもよく、無線接続の場合でも無線ＬＡＮ、Ｂｌｕｅｔｏｏｔｈ（登録商標）等いかなる形式でも採用することが可能である。 The information output unit 22 is for outputting the dialogue video generated by the dialogue video generation unit 15 and the warning information generated by the warning information generation unit 21 to the participating terminals 1. As for the warning information, since the information about the destination is also specified in the information, it is configured to be output only to the participating terminal 1 designated as the destination. Further, although the specific configuration of the information output unit 22 differs depending on the form of the information communication network 3 and the connection protocol, it may be either a wired connection or a wireless connection, and even in the case of a wireless connection, a wireless LAN or Bluetooth (registered trademark) may be used. It is possible to adopt any format such as.

次に、本実施の形態１にかかる遠隔対話システムにおける、特定の参加者に関する警告情報生成処理について図２を参照しつつ説明する。まず、基準挙動情報生成部１７にて、過去の挙動情報から特定の特徴的挙動の有無を検出して、当該特徴的挙動の発現頻度である基準発現頻度と、発現時の継続時間である基準継続時間とを算出する（ステップＳ１０１）。そして、現在挙動情報生成部１８において、ステップＳ１０１と同じ特徴的挙動について、現在開催されている対話における発現頻度及び継続時間である現在発現頻度と現在継続時間を算出する（ステップＳ１０２）。具体的には、参加端末１から入力された挙動情報を分析し、参加者の特徴点の位置変化が基準挙動情報生成部１７にて検出された特定の特徴的挙動と同様の変化をしている場合に、同一の特徴的挙動が発現したものと判断し、発現頻度及び発現時における継続時間を算出する。 Next, the warning information generation process for a specific participant in the remote dialogue system according to the first embodiment will be described with reference to FIG. First, the reference behavior information generation unit 17 detects the presence or absence of a specific characteristic behavior from past behavior information, and determines the reference expression frequency, which is the frequency of occurrence of the characteristic behavior, and the reference, which is the duration at the time of occurrence. The duration is calculated (step S101). Then, the current behavior information generation unit 18 calculates the current expression frequency and the current duration, which are the expression frequency and duration in the currently held dialogue, for the same characteristic behavior as in step S101 (step S102). Specifically, the behavior information input from the participating terminal 1 is analyzed, and the position change of the participant's feature point is changed in the same manner as the specific characteristic behavior detected by the reference behavior information generation unit 17. If so, it is determined that the same characteristic behavior has occurred, and the frequency of expression and the duration at the time of expression are calculated.

その後、挙動情報比較部１９において、基準発現頻度と現在発現頻度の差分値の絶対値を算出し、算出結果を基準出現頻度にて除算する（ステップＳ１０３）。そして、同様に挙動情報比較部１９において、基準継続時間と現在継続時間の差分値の絶対値を算出し、算出結果を基準継続時間にて除算する（ステップＳ１０４）。すべての特徴的挙動についての処理が終わっていなければ（ステップＳ１０５、Ｎｏ）、再びステップＳ１０２に戻り他の特徴的挙動について同様の処理を行う。すべての特徴的挙動についての処理が終わっていれば（ステップＳ１０５、Ｙｅｓ）、算出値を加算し、加算結果を出力する（ステップＳ１０６）。 After that, the behavior information comparison unit 19 calculates the absolute value of the difference between the reference expression frequency and the current expression frequency, and divides the calculation result by the reference appearance frequency (step S103). Then, similarly, the behavior information comparison unit 19 calculates the absolute value of the difference value between the reference duration and the current duration, and divides the calculation result by the reference duration (step S104). If the processing for all the characteristic behaviors is not completed (step S105, No), the process returns to step S102 again and the same processing is performed for the other characteristic behaviors. If the processing for all the characteristic behaviors is completed (step S105, Yes), the calculated values are added and the addition result is output (step S106).

その後、同一性判定部２０による同一性判定処理が行われる。まず、同一性判定部２０は、ステップＳ１０６で得られた加算結果が、第１の閾値未満であるか否かの判定を行う（ステップＳ１０７）。加算結果が第１の閾値未満である場合（ステップＳ１０７、Ｙｅｓ）は、特徴的挙動が過去のものと同程度の頻度・継続時間にて出現したとして参加者の同一性が肯定され、特に警告情報が生成されることなく処理を終了する。 After that, the identity determination unit 20 performs the identity determination process. First, the identity determination unit 20 determines whether or not the addition result obtained in step S106 is less than the first threshold value (step S107). When the addition result is less than the first threshold value (step S107, Yes), the identity of the participants is affirmed as the characteristic behavior appears with the same frequency and duration as the past ones, and a special warning is given. The process ends without any information being generated.

ステップＳ１０６で得られた加算結果が第１の閾値以上であった場合（ステップＳ１０７、Ｎｏ）は、さらに加算結果が第１の閾値以上の値である第２の閾値未満であるか否かの判定を行う（ステップＳ１０８）。加算結果が第２の閾値未満であった場合（ステップＳ１０８、Ｙｅｓ）は、警告情報生成部２１に対しその旨の情報が出力され、警告情報生成部２１は、判定対象となっている特定の参加者以外の参加者に向けて、「同一性について確認を推奨する。」との内容の警告情報を生成する（ステップＳ１０９）。 When the addition result obtained in step S106 is equal to or greater than the first threshold value (step S107, No), whether or not the addition result is less than or equal to the second threshold value which is equal to or greater than the first threshold value. The determination is made (step S108). When the addition result is less than the second threshold value (step S108, Yes), information to that effect is output to the warning information generation unit 21, and the warning information generation unit 21 is the specific determination target. For participants other than the participants, warning information with the content "It is recommended to confirm the identity" is generated (step S109).

また、加算結果が第２の閾値以上であった場合（ステップＳ１０８、Ｎｏ）は、警告情報生成部２１に対しその旨の情報が出力され、警告情報生成部２１は、遠隔対話の運営責任者に向けて「同一でない可能性が高く、接続を遮断することを推奨する。」との内容の警告情報を生成する（ステップＳ１１０）。ステップＳ１０９、Ｓ１１０にて生成された警告情報は、情報出力部２２を通じて名宛人に対し出力され、すべての処理が終了する。 If the addition result is equal to or greater than the second threshold value (step S108, No), information to that effect is output to the warning information generation unit 21, and the warning information generation unit 21 is responsible for the operation of the remote dialogue. A warning information with the content "It is highly possible that they are not the same and it is recommended to cut off the connection" is generated (step S110). The warning information generated in steps S109 and S110 is output to the addressee through the information output unit 22, and all the processing is completed.

次に、本実施の形態１にかかる遠隔対話システムの利点について説明する。まず、本実施の形態１にかかる遠隔対話システムでは、参加者の同一性を確認する際に挙動情報を使用することとし、これにより確度の高い同一性確認を可能としている。 Next, the advantages of the remote dialogue system according to the first embodiment will be described. First, in the remote dialogue system according to the first embodiment, the behavior information is used when confirming the identity of the participants, which enables highly accurate identity confirmation.

ふとした瞬間に右手を額にあてる、発言時に左手の人差し指を立てて左右に振るなど、人間は活動時に一見無意味な挙動を行うことが多く、かつ、このような挙動の態様、頻度及び継続時間のすべてが他の人物と完全に一致することは通常ありえない。そのため、対話に参加予定の者について予め挙動情報を記録し（例えば、複数回開催される対話であれば、開始直後の対話時における挙動情報を記録する。）、現時点における挙動と比較することで、現時点で特定の参加者と称して参加している者が、本当に当該参加者であるか否かの判定を行うことができるという利点を有する。 Humans often behave in a seemingly meaningless manner during activities, such as placing the right hand on the forehead at a moment's notice, raising the index finger of the left hand and shaking it to the left or right when speaking, and the mode, frequency, and duration of such behavior. It is usually not possible for all of them to match exactly with another person. Therefore, by recording the behavior information of the person who is scheduled to participate in the dialogue in advance (for example, in the case of a dialogue held multiple times, the behavior information at the time of the dialogue immediately after the start is recorded), and comparing with the current behavior. It has an advantage that it is possible to determine whether or not a person who is currently participating as a specific participant is really the participant.

遠隔地から利用されるサービスにおける本人確認の手法としては、今までもＩＤとパスワード入力や、顔画像や音声データを用いた手法が用いられている。しかしながら、ＩＤ・パスワード入力については、ハッキング等による情報漏洩や、参加者本人が第三者と通じていてＩＤ等を提供していた場合には、本人確認手段として機能しない。また、顔画像についても、現在の技術では偽造が容易であり、例えば１枚の２次元画像について３次元形状の推定処理を行うと共に、特徴点抽出を行うことによって発言内容に合わせた表情変化を施した動画を作成することが可能である。このような場合に、従前の顔認識技術等による本人確認手続は意味をなさない。音声データについても同様である。現在の技術では、２０秒ほどキーワードについて発声するのみで、あたかもその人物が発言したかのような音声データを作成することが可能である。音声認識技術によっても、十分な本人確認が困難であるのが実情である。 As a method of identity verification in a service used from a remote place, a method of inputting an ID and a password, a face image, and voice data has been used. However, the ID / password input does not function as an identity verification means if information is leaked due to hacking or the like, or if the participant himself / herself communicates with a third party and provides the ID or the like. In addition, it is easy to forge a face image with the current technology. For example, a three-dimensional shape is estimated for one two-dimensional image and feature points are extracted to change the facial expression according to the content of the statement. It is possible to create a video that has been applied. In such a case, the identity verification procedure using the conventional face recognition technology or the like does not make sense. The same applies to voice data. With the current technology, it is possible to create voice data as if the person spoke by only speaking about the keyword for about 20 seconds. The reality is that it is difficult to sufficiently verify the identity even with voice recognition technology.

これに対し本実施の形態１では、挙動情報を利用することで参加者の同一性確認を行っているところ、具体的な挙動態様、発現頻度及び継続時間からなる挙動情報を第三者が事前に取得することは困難である（他方で顔画像の合成は写真１枚あれば可能であるし、音声データの合成は２０秒程度の音声情報があれば可能である。）。また、仮に挙動情報を取得できたとしても、特定挙動を挙動情報の内容どおりの頻度及び継続時間にてリアルタイムで展開される会話中に盛り込むことは技術的にも容易ではない。したがって、参加者を装って対話に参加した第三者においても、挙動情報について偽装することはほぼ不可能であり、よって、本実施の形態１にかかる遠隔対話システムでは、効率的かつ正確に、参加者の同一性判定を行えるという利点を有することとなる。 On the other hand, in the first embodiment, the identity of the participants is confirmed by using the behavior information, and a third party obtains the behavior information including the specific behavior mode, the frequency of occurrence, and the duration in advance. (On the other hand, it is possible to synthesize a face image with only one photo, and to synthesize voice data with about 20 seconds of voice information). Further, even if the behavior information can be acquired, it is technically not easy to incorporate the specific behavior into the conversation developed in real time at the frequency and duration according to the content of the behavior information. Therefore, it is almost impossible for a third party who participates in the dialogue under the guise of a participant to disguise the behavior information. Therefore, in the remote dialogue system according to the first embodiment, the remote dialogue system is efficient and accurate. It has the advantage of being able to determine the identity of the participants.

また、本実施の形態１にかかる遠隔対話システムは、複数の特徴的挙動について挙動情報を生成し、同一性判定の資料として使用することによって、さらに高精度な同一性判定を行えるという利点を有する。すなわち、単一の特徴的挙動であった場合、特徴的挙動が典型的な挙動である等の事情によっては、偶然にも発現頻度及び継続時間の違いが僅かとなり、実際には異なる人物であったにもかかわらず、同一性があるものと誤って判定するリスクが皆無ではない。本実施の形態１では複数の特徴的挙動について挙動情報を生成して使用することとしているため、このような誤判定が生じる可能性を抑制し、さらに高精度な同一性判定を実現できるという利点がある。 Further, the remote dialogue system according to the first embodiment has an advantage that more accurate identity determination can be performed by generating behavior information for a plurality of characteristic behaviors and using it as a material for identity determination. .. That is, in the case of a single characteristic behavior, the difference in the frequency of occurrence and the duration becomes slight by chance depending on the circumstances such as the characteristic behavior being a typical behavior, and the person is actually a different person. Nevertheless, there is a risk of falsely determining that they are identical. In the first embodiment, since behavior information is generated and used for a plurality of characteristic behaviors, there is an advantage that the possibility of such an erroneous determination can be suppressed and a more accurate identity determination can be realized. There is.

さらに、本実施の形態１にかかる遠隔対話システムは、同一性あり、なしという２通りの判定に加え、同一性に疑いがあるとの判定も行い、当該判定時には対象参加者以外の参加者に対し「同一性について確認を推奨する。」との警告情報を通知する構成を採用している。例えば機密情報を取り扱う経営会議の場合には、同一性がないと断定できない場合であっても、念のため機密情報の開示を控える等の対応が望ましい。本実施の形態１では、疑い事例についても警告情報を発する扱いとすることで、各参加者に情報保持等に関する警告を与えることができるという利点が生ずる。また、各参加者においては、このような警告情報を受領することで、同一性が疑われる「参加者」に対し、参加者本人しか知りえない事実を持ち出して「参加者」の反応を確かめるなど、臨機応変な手法による本人確認が可能となる。 Further, the remote dialogue system according to the first embodiment determines that the identity is suspected in addition to the two determinations of identity and non-identity, and at the time of the determination, the participants other than the target participant are determined. On the other hand, we have adopted a configuration that notifies warning information that "confirmation of identity is recommended." For example, in the case of a management meeting that handles confidential information, it is desirable to refrain from disclosing confidential information just in case, even if it cannot be determined that there is no identity. In the first embodiment, by treating the suspected case as issuing warning information, there is an advantage that each participant can be given a warning regarding information retention and the like. In addition, by receiving such warning information, each participant brings up the facts that only the participant can know to the "participant" who is suspected of being the same, and confirms the reaction of the "participant". It is possible to verify the identity by a flexible method such as.

さらに運用面の利点として、本実施の形態１にかかる遠隔対話システムは、個人情報を使用することなく参加者の同一性確認が可能であるという利点を有する。すなわち、ＩＤ，パスワードや顔、声といった情報は個人情報として保護されるもの又は個人情報には該当しないもののプライバシーの観点から慎重な取り扱いを求められるものであり管理が煩雑である。これに対し、挙動情報は参加者の「挙動」に関する情報に過ぎず、プライバシー保護の対象とならないものであるため、通常の情報管理体制にて管理できる点で、運用面での利点が生じる。
（実施の形態２） Further, as an operational advantage, the remote dialogue system according to the first embodiment has an advantage that the identity of the participants can be confirmed without using personal information. That is, information such as ID, password, face, and voice is protected as personal information or does not correspond to personal information, but careful handling is required from the viewpoint of privacy, and management is complicated. On the other hand, the behavior information is only information about the "behavior" of the participants and is not subject to privacy protection. Therefore, there is an operational advantage in that it can be managed by a normal information management system.
(Embodiment 2)

次に、実施の形態２にかかる遠隔対話システムについて説明する。実施の形態２において、実施の形態１と同一名称かつ同一符号を付した構成要素に関しては、特に言及しない限り、実施の形態１における構成要素と同一の機能を発揮するものとする。 Next, the remote dialogue system according to the second embodiment will be described. Unless otherwise specified, the components having the same name and the same reference numerals as those in the first embodiment in the second embodiment shall exhibit the same functions as the components in the first embodiment.

図３に示すとおり、実施の形態２にかかる遠隔対話システムは、実施の形態１における構成要素に加え、参加者が注目する対象に関する注目対象情報を生成する注目対象情報生成部２４を新たに備えた参加端末２３と、同じく実施の形態１における構成要素に加え、参加者の挙動を強調表示する参加者映像生成部２６及び注目対象情報に整合した対話映像を生成する対話映像生成部２７を新たに備えた管理サーバ２５とを備える。 As shown in FIG. 3, the remote dialogue system according to the second embodiment is newly provided with the attention target information generation unit 24 for generating the attention target information regarding the target to be noticed by the participants, in addition to the components in the first embodiment. In addition to the participating terminals 23 and the components of the first embodiment, the participant image generation unit 26 that highlights the behavior of the participants and the dialogue image generation unit 27 that generates the dialogue image that matches the information of interest are newly added. A management server 25 is provided in preparation for the above.

注目対象情報生成部２４は、対話に参加している参加者が、他の参加者のうちどの者に注目しているかを示す注目対象情報を生成するためのものである。具体的には、注目対象情報生成部２４は、表示部１０における各表示、撮影部４及び撮影される参加者の間の相対的な位置関係を予め把握して、撮影部４にて撮影された映像から、参加者が表示部１０上に表示されている他の参加者のうちどの対象者に視線を向けているかを示す視線方向情報と、参加者の両腕のそれぞれがどの対象者に向けられているかを示す腕方向情報を含む注目対象情報を生成する。注目対象情報生成部２４によって生成された注目対象情報は、情報出力部８を介して管理サーバ２５に向けて出力される。 The attention target information generation unit 24 is for generating attention target information indicating which of the other participants the participants participating in the dialogue are paying attention to. Specifically, the attention target information generation unit 24 grasps in advance the relative positional relationship between each display on the display unit 10, the photographing unit 4, and the participants to be photographed, and is photographed by the photographing unit 4. From the video, the line-of-sight direction information indicating which of the other participants displayed on the display unit 10 the line of sight is directed, and which target person each of the participants' arms is assigned to. Generates attention target information including arm direction information indicating whether or not it is aimed. The attention target information generated by the attention target information generation unit 24 is output to the management server 25 via the information output unit 8.

参加者映像生成部２６は、実施の形態１における参加者映像生成部１４と同様に、原則的には参加端末２３から出力された挙動情報に基づき、挙動情報に含まれる特徴点の位置変化態様をキャラクター映像情報における特徴点に適用して参加者映像を生成する機能を有する。その上で、参加者映像生成部２６は、両腕及び顔に対応した特徴点に関しては、挙動情報に示される単位時間当たりの位置変動量を所定の割合（例えば１０％）増幅した場合に対応した位置情報に変換した上で、キャラクター映像情報中の特徴点に適用する機能を有する。 Similar to the participant image generation unit 14 in the first embodiment, the participant image generation unit 26 is basically based on the behavior information output from the participating terminal 23, and the position change mode of the feature points included in the behavior information. Has a function to generate a participant image by applying to a feature point in the character image information. In addition, the participant image generation unit 26 corresponds to the case where the position fluctuation amount per unit time shown in the behavior information is amplified by a predetermined ratio (for example, 10%) with respect to the feature points corresponding to both arms and the face. It has a function to apply it to the feature points in the character video information after converting it into the obtained position information.

対話映像生成部２７は、実施の形態１における対話映像生成部１５と同様に、予め用意した仮想対話空間中に参加者映像を配置した対話映像を生成する機能を有する。その上で、対話映像生成部２７は、参加端末１から出力された注目対象情報に整合するように、参加者映像における視線の向きと両腕の向きを変化させた上で、対話映像を生成する機能を有する。具体的には、対話映像生成部２７は、対話映像中における参加者映像の視線方向について、注目対象情報に含まれる視線方向情報にて特定された対象者の位置に向けた方向となるよう、参加者映像全体を（例えば、背骨に対応したボーンを回転軸として）回転させる機能を有する。また、対話映像生成部２７は、参加者映像の両腕の方向について、注目対象情報に含まれる腕方向情報にて特定される対象者の位置に向けた方向となるよう、両腕の位置を変化させる機能を有する。 The dialogue video generation unit 27 has a function of generating a dialogue video in which participant videos are arranged in a virtual dialogue space prepared in advance, similarly to the dialogue video generation unit 15 in the first embodiment. Then, the dialogue video generation unit 27 generates the dialogue video after changing the direction of the line of sight and the directions of both arms in the participant video so as to match the attention target information output from the participating terminal 1. Has the function of Specifically, the dialogue image generation unit 27 makes the line-of-sight direction of the participant image in the dialogue image the direction toward the position of the target person specified by the line-of-sight direction information included in the attention target information. It has a function to rotate the entire participant image (for example, the bone corresponding to the spine is used as the rotation axis). Further, the dialogue image generation unit 27 sets the positions of both arms so that the directions of both arms of the participant image are directed toward the position of the target person specified by the arm direction information included in the attention target information. It has a function to change.

次に、本実施の形態２にかかる遠隔対話システムについて、図４を参照しつつ管理サーバ２５による対話映像の生成処理について説明する。まず、参加者映像生成部２６において、顔及び両腕に対応した特徴点について挙動情報から導出される単位時間当たりの位置変動量に１より大きい値からなる所定の係数を乗算し、当該乗算結果として得られる位置変動量に対応した位置となるよう各特徴点の位置を算出する（ステップＳ２０１）。そして、参加者映像生成部２６は、ステップＳ２０１にて定めた位置となるよう、キャラクター映像情報中の特徴点の位置を変化させることにより、参加者映像を生成する（ステップＳ２０２）。 Next, with respect to the remote dialogue system according to the second embodiment, the process of generating the dialogue video by the management server 25 will be described with reference to FIG. First, in the participant image generation unit 26, the position fluctuation amount per unit time derived from the behavior information for the feature points corresponding to the face and both arms is multiplied by a predetermined coefficient consisting of a value greater than 1, and the multiplication result is obtained. The position of each feature point is calculated so as to correspond to the position fluctuation amount obtained as (step S201). Then, the participant image generation unit 26 generates the participant image by changing the position of the feature point in the character image information so as to be the position determined in step S201 (step S202).

その後、対話映像生成部２７は、生成された参加者映像を、予め用意した仮想対話空間中に配置する（ステップＳ２０３）。すべての参加者についてステップＳ２０１〜Ｓ２０３の処理が完了した場合（ステップＳ２０４、Ｙｅｓ）は、ステップＳ２０５に移行する。 After that, the dialogue video generation unit 27 arranges the generated participant video in the virtual dialogue space prepared in advance (step S203). When the processing of steps S201 to S203 is completed for all the participants (steps S204, Yes), the process proceeds to step S205.

そして、対話映像生成部２７は、配置した参加者映像の視線方向が注目対象情報生成部２４によって生成された視線方向情報と一致するか否かを判定し（ステップＳ２０５）、一致する場合（ステップＳ２０５、Ｙｅｓ）はステップＳ２０７に移行し、一致しない場合（ステップＳ２０５、Ｎｏ）は、参加者映像の背骨に対応するボーンを軸として参加者映像の視線方向と視線方向情報に手示される方向が一致するよう、参加者映像を所定角度だけ回転させる（ステップＳ２０６）。 Then, the dialogue image generation unit 27 determines whether or not the line-of-sight direction of the arranged participant image matches the line-of-sight direction information generated by the attention target information generation unit 24 (step S205), and if they match (step S205). S205, Yes) shifts to step S207, and if they do not match (step S205, No), the line-of-sight direction of the participant image and the direction indicated by the line-of-sight direction information are the directions indicated by the line-of-sight direction information about the bone corresponding to the backbone of the participant image. Participant images are rotated by a predetermined angle so that they match (step S206).

その後、対話映像生成部２７は、配置した参加者映像の両腕の方向が注目対象情報生成部２４によって生成された腕方向情報と一致するか否かを判定し（ステップＳ２０７）、一致しない場合（ステップＳ２０７、Ｎｏ）は、一致するよう両腕のそれぞれの位置を変化させる（ステップＳ２０８）。位置変化が終了した場合又はそもそも方向が一致していた場合（ステップＳ２０７、Ｙｅｓ）は、全ての参加者について処理が行われたか否かを判定し（ステップＳ２０９）、終了していなければ（ステップＳ２０９、Ｎｏ）ステップＳ２０５に戻って同様の処理を繰り返し、終了していれば（ステップＳ２０９、Ｙｅｓ）対話映像の作成処理は終了し、情報出力部２２を介して各参加者の参加端末２３に対して対話映像が出力される。 After that, the dialogue video generation unit 27 determines whether or not the directions of both arms of the arranged participant video match the arm direction information generated by the attention target information generation unit 24 (step S207), and if they do not match. (Step S207, No) changes the positions of both arms so as to match (step S208). When the position change is completed or the directions are the same in the first place (step S207, Yes), it is determined whether or not the processing has been performed for all the participants (step S209), and if it is not completed (step S209). S209, No) Return to step S205 and repeat the same process, and if it is completed (step S209, Yes), the process of creating the dialogue video is completed, and the participant terminal 23 of each participant is contacted via the information output unit 22. On the other hand, a dialogue video is output.

次に、本実施の形態２にかかる遠隔対話システムの利点について説明する。まず、本実施の形態２にかかる遠隔対話システムでは、顔と両腕の特徴点の位置情報について、挙動情報に示される位置変動量よりも増幅（例えば１０％程度）した位置となるよう変換処理を行うことで、参加者の挙動を強調した参加者映像を生成できるという利点を有する。かかる構成を採用することにより、参加端末にて遠隔対話の映像を見る他の参加者は、当該参加者の挙動について容易に認識することが可能となる。 Next, the advantages of the remote dialogue system according to the second embodiment will be described. First, in the remote dialogue system according to the second embodiment, the position information of the feature points of the face and both arms is converted so as to be a position amplified (for example, about 10%) from the amount of position fluctuation shown in the behavior information. By performing the above, there is an advantage that a participant image that emphasizes the behavior of the participant can be generated. By adopting such a configuration, other participants who view the video of the remote dialogue on the participating terminal can easily recognize the behavior of the participant.

画面を通じて対話映像を閲覧する遠隔対話においては、直接集まって対話を開催する場合のように他の参加者の挙動についてリアルに認識することは困難であり、これにより、話者の挙動に基づく非言語的コミュニケーションに支障が生じることがある。本実施の形態２では、参加者の挙動について強調表示を行うことで、直接集まった場合と同程度に他の参加者の挙動について理解でき、これにより発言を通じた言語的コミュニケーションのみならず挙動を通じた非言語的コミュニケーションについても円滑化し、もって活発な議論を実現できるという利点が生じる。 In remote dialogue, where the dialogue video is viewed through the screen, it is difficult to realistically recognize the behavior of other participants as in the case of directly gathering and holding a dialogue, which makes it difficult to recognize the behavior of other participants. Interfering with linguistic communication. In the second embodiment, by highlighting the behavior of the participants, it is possible to understand the behavior of other participants as much as when they gather directly, and thereby through not only linguistic communication through remarks but also behavior. It also has the advantage of facilitating nonverbal communication and enabling lively discussions.

また、本実施の形態２にかかる遠隔対話システムは、他の参加者に対する注目状況を示す注目対象情報を生成し、かつ、対話映像にて表示される参加者映像において、かかる注目対象情報と一致するよう視線及び両腕の向きを調整することとしている。撮影部が撮影した参加者の映像はあくまで撮影部からの視点により撮影されたものに過ぎず、映像そのものからは、当該映像で参加者が具体的に何を見て発現しているのか、挙動が何に向けて発せられているのかを理解することは困難である。この点に鑑みて、本実施の形態２では、撮像部４による撮影時に、参加者が表示部１０に表示されている他の参加者のうちどの人物を見ながら発言しているのか、どの人物に対し指差し等の挙動を行っているのかに関する視線方向情報、腕方向情報を生成し、これらの情報に整合するように仮想対話空間中の参加者映像を調整する。これにより、表示部１０を通じて対話映像を視聴する各参加者は、ある参加者の発言及び身振り手振りがどの参加者に向けられたものであるかを容易に認識でき、これによって、言語的コミュニケーションのみならず非言語的コミュニケーションについてもさらに円滑化できるという利点が生じる。 Further, the remote dialogue system according to the second embodiment generates attention target information indicating the attention status to other participants, and matches the attention target information in the participant video displayed in the dialogue video. The line of sight and the direction of both arms are adjusted so as to do so. The video of the participants taken by the shooting department is only taken from the viewpoint of the shooting department, and from the video itself, what the participants specifically see and express in the video, the behavior It is difficult to understand what is being directed at. In view of this point, in the second embodiment, which person is speaking while looking at which of the other participants displayed on the display unit 10 at the time of shooting by the imaging unit 4. The line-of-sight direction information and the arm direction information regarding whether or not the person is pointing or the like are generated, and the participant image in the virtual dialogue space is adjusted so as to match these information. As a result, each participant who views the dialogue video through the display unit 10 can easily recognize to which participant the remarks and gestures of a certain participant are directed, thereby only linguistic communication. However, there is an advantage that nonverbal communication can be further facilitated.

以上、実施の形態において本発明の内容について説明したが、もとより本発明の技術的範囲は実施の形態に記載した具体的構成に限定して解釈されるべきではなく、本発明の機能を実現できるものであれば、上記実施の形態に対する様々な変形例、応用例についても、本発明の技術的範囲に属することはもちろんである。 Although the contents of the present invention have been described above in the embodiments, the technical scope of the present invention should not be construed as being limited to the specific configurations described in the embodiments, and the functions of the present invention can be realized. As long as it is, it goes without saying that various modifications and applications to the above-described embodiment also belong to the technical scope of the present invention.

例えば、本実施の形態１、２ではいわゆる遠隔対話を例に本発明の内容について説明しているところ、本発明の用途は営利企業等における遠隔会議、大学等の教育機関におけるオンライン授業及び少人数間（１対１でもよい）におけるテレビ電話システム等においても使用することが可能である。また、「対話」の態様も音声対話のみならず、いわゆる「チャット」のようなテキスト情報のやり取りによる「対話」の場合でも、本発明の技術を活用することが可能であるし、「遠隔」といっても、例えば同一建物の隣室間での対話について本発明を適用することが可能である。 For example, in the first and second embodiments of the present invention, the content of the present invention is explained by taking so-called remote dialogue as an example. It can also be used in a videophone system or the like in between (may be one-to-one). Further, the mode of "dialogue" is not limited to voice dialogue, but also in the case of "dialogue" by exchanging text information such as so-called "chat", the technique of the present invention can be utilized and "remote". However, it is possible to apply the present invention to, for example, dialogue between adjacent rooms in the same building.

また、撮影部４の具体的構成は３Ｄスキャナ以外のものとすることが可能であり、かつ、撮影する参加者の映像についても、３次元映像に限定して解釈する必要はない。２次元映像であってもよいし、動画のみならず複数の静止画（例えば１秒間隔で撮影した静止画像）であってもよい。２次元映像であっても特徴点を抽出することは技術的に可能であるし、２次元映像における特徴点の位置変動に基づき（２次元的なものはもちろん）３次元的な参加者映像を生成することが可能である。 Further, the specific configuration of the photographing unit 4 can be other than the 3D scanner, and it is not necessary to interpret the image of the participant to be photographed only in the three-dimensional image. It may be a two-dimensional image, or may be not only a moving image but also a plurality of still images (for example, still images taken at 1-second intervals). It is technically possible to extract feature points even in a two-dimensional image, and a three-dimensional participant image (not to mention two-dimensional ones) is created based on the positional fluctuation of the feature points in the two-dimensional image. It is possible to generate.

さらに、挙動情報生成部６は、挙動に関する情報である挙動情報を特徴点の位置変動によって表現する態様にて生成するが、かかる態様に限定する必要はない。深層学習等により映像から直接的に挙動を検出して情報化してもよいし、映像に映された参加者の輪郭を切り取り、当該輪郭の変形に応じて挙動情報を生成する態様としてもよい。 Further, the behavior information generation unit 6 generates behavior information, which is information on the behavior, in a mode of expressing the behavior information by the position change of the feature point, but it is not necessary to limit the behavior information to such a mode. The behavior may be directly detected from the video by deep learning or the like and converted into information, or the contour of the participant projected on the video may be cut out and the behavior information may be generated according to the deformation of the contour.

また、参加者映像生成部１４、２６についても、必ずしも３次元的な映像にて参加者映像を生成する必要はなく、２次元映像であってもよい。対話映像生成部１５、２７が生成する対話映像についても同様である。 Further, the participant video generation units 14 and 26 do not necessarily have to generate the participant video as a three-dimensional video, and may be a two-dimensional video. The same applies to the dialogue video generated by the dialogue video generation units 15 and 27.

また、挙動情報比較部１９における比較方法についても、基準値と現時点の値とを比較する手法であればよく、基準値と現時点の値の差分値の絶対値を基準値で除算した値を算出する方法以外であってもよい。例えば基準値と現時点の値の差分値のみとしてもよいし、基準値と現時点の値の比によって比較することとしてもよい。 Further, the comparison method in the behavior information comparison unit 19 may be any method of comparing the reference value and the current value, and the absolute value of the difference between the reference value and the current value is divided by the reference value to calculate the value. It may be other than the method of doing. For example, only the difference value between the reference value and the current value may be used, or the comparison may be made based on the ratio of the reference value and the current value.

さらに、同一性判定部２０における同一性判定手法についても、複数の比較結果（実施の形態１、２では基準値と現時点の値の差分値の絶対値を基準値で除算した値）の加算結果と第１の閾値、第２の閾値を比較しているが、複数の比較結果の加重平均と閾値とを比較してもよいし、複数の比較結果それぞれについて閾値を設定し、閾値を超過した（あるいは下回った）比較結果の数によって同一性の有無等を判断することとしてもよい。また、同一性判定部２０は２通りの閾値を用いて３通りの判定結果（同一性あり、なし、同一性なしの可能性あり）を示す扱いとしているが、閾値を１通り、３通り以上等と設定してもよいし、判定結果についても同一性あり、なしの２通り、あるいは同一性がある（又は「ない」）確率に応じて４通り以上の判定結果を導出することとしてもよい。 Further, regarding the identity determination method in the identity determination unit 20, the addition result of a plurality of comparison results (in the first and second embodiments, the absolute value of the difference value between the reference value and the current value is divided by the reference value). Is compared with the first threshold value and the second threshold value, but the weighted average of a plurality of comparison results and the threshold value may be compared, or a threshold value is set for each of the plurality of comparison results and the threshold value is exceeded. The presence or absence of identity may be determined based on the number of (or less) comparison results. Further, the identity determination unit 20 uses two threshold values to indicate three determination results (possible with identity, none, and no identity), but one threshold value and three or more threshold values are used. Etc., and the judgment results may be derived in two ways with or without the same, or four or more kinds of judgment results may be derived depending on the probability of having the same (or "not"). ..

また、警告情報生成部２１によって生成される警告情報の内容・態様についても実施の形態１にて説明した態様に限定されず、例えば、「同一性なし」と判断した場合に遠隔対話の責任者に警告情報を提供するのではなく、対話システムを構成する機器の通信機能を司る箇所に直接的に「通信状態を遮断せよ」との指示を送る態様としてもよい。 Further, the content / mode of the warning information generated by the warning information generation unit 21 is not limited to the mode described in the first embodiment. For example, when it is determined that there is no identity, the person in charge of the remote dialogue Instead of providing warning information to the user, an instruction to "block the communication status" may be sent directly to the part that controls the communication function of the devices constituting the dialogue system.

さらに、実施の形態２において、対話映像生成部２７は、参加者映像の視線方向を視線方向情報と一致させる際に背骨に対応したボーンを軸とした回転だけでなく、参加者映像の首部分を回す態様にて視線方向を一致させてもよいし、参加者映像の目に対応した部分のみ変化させる態様としてもよい。同様に、参加者映像の腕の方向を腕方向情報と一致させる際にも、腕だけを動かすのではなく、体の他の部分も腕に連動させて動かしてもよいし、また、仮想対話空間中の位置関係によっては腕の方向を参加者映像の背後方向に変化させるような場合が生じうるところ、その場合に腕のみを位置変化させるのではなく、上半身を後ろ方向に振り返らせるように位置変化させてもよい。 Further, in the second embodiment, the dialogue image generation unit 27 not only rotates about the bone corresponding to the spine when matching the line-of-sight direction of the participant image with the line-of-sight direction information, but also the neck portion of the participant image. The line-of-sight directions may be matched by turning, or only the portion corresponding to the eyes of the participant image may be changed. Similarly, when matching the direction of the arm in the participant image with the arm direction information, not only the arm may be moved, but other parts of the body may be moved in conjunction with the arm, and virtual dialogue may be performed. Depending on the positional relationship in the space, the direction of the arm may be changed to the back direction of the participant image, but in that case, instead of changing the position of only the arm, the upper body should be turned backward. The position may be changed.

本発明は、同一場所にいない参加者間における遠隔対話を行う技術として利用可能である。 The present invention can be used as a technique for performing remote dialogue between participants who are not in the same place.

１、２３参加端末
２、２５管理サーバ
３情報通信網
４撮影部
５発言記録部
６挙動情報生成部
７参加情報生成部
８、２２情報出力部
９、１３情報入力部
１０表示部
１４、２６参加者映像生成部
１５、２７対話映像生成部
１６挙動情報データベース
１７基準挙動情報生成部
１８現在挙動情報生成部
１９挙動情報比較部
２０同一性判定部
２１警告情報生成部
２４注目対象情報生成部
1, 23 Participating terminals 2, 25 Management server 3 Information communication network 4 Shooting unit 5 Speech recording unit 6 Behavior information generation unit 7 Participation information generation unit 8, 22 Information output unit 9, 13 Information input unit 10 Display unit 14, 26 Participation Person video generation unit 15, 27 Dialogue video generation unit 16 Behavior information database 17 Reference behavior information generation unit 18 Current behavior information generation unit 19 Behavior information comparison unit 20 Identity judgment unit 21 Warning information generation unit 24 Attention target information generation unit

Claims

A remote dialogue system that enables dialogue between multiple participants located in different locations.
A reference behavior that detects one or more characteristic behaviors based on information on the past behaviors of participants and generates reference behavior information that is information on the frequency of occurrence of the characteristic behaviors and the duration of the characteristic behaviors at the time of occurrence. Information generation means and
With respect to the one or more characteristic behaviors of the participants at the time of the current dialogue, the current behavior information generating means for generating the current behavior information which is the information on the frequency of occurrence of the characteristic behavior and the duration of the characteristic behavior at the time of occurrence. ,
A behavior information comparison means for comparing the reference behavior information with the current behavior information,
An identity determination means for determining the identity of participants based on the comparison result of the behavior information comparison means, and
An output means for outputting the determination result by the identity determination means to one or more participants other than the participant who was the determination target in the identity determination means, and an output means.
A remote dialogue system characterized by being equipped with.

The reference behavior information generating means generates information including the reference duration, which is numerical information regarding the expression frequency and the reference duration, as the reference behavior information.
The current behavior information generation means generates information including the current duration, which is numerical information regarding the current expression frequency and the duration, which is numerical information regarding the expression frequency, as the current behavior information.
The behavior information comparison means is a value obtained by dividing the absolute value of the difference value between the reference expression frequency and the current expression frequency for the same characteristic behavior by the reference expression frequency, and the reference duration and the current duration for the same characteristic behavior. The absolute value of the difference value divided by the reference duration is derived as the comparison result.
The identity determination means determines that there is identity when the sum of the comparison results is less than the first threshold value, and determines that there is a possibility of identity when the sum of the comparison results is equal to or more than the first threshold value and less than the second threshold value. The remote dialogue system according to claim 1, wherein it is determined that there is no identity when the value is equal to or greater than the second threshold value.

A display means for displaying a dialogue video, which is a video of multiple participants participating in a remote dialogue, to a specific participant, and
A shooting means for shooting a video of the specific participant, and
Information on which of the plurality of participants the specific participant is looking at at the time of photographing by the photographing means based on the positional relationship between the display means, the photographing means, and the specific participant. Attention target information generation means for generating attention target information including
As the image of the specific participant to be displayed on the display means, the dialogue image generation means for generating the dialogue image so as to direct the line of sight to the participant indicated by the attention target information, and the dialogue image generation means.
The remote dialogue system according to claim 1 or 2, further comprising.

A remote dialogue method that enables dialogue between multiple participants located in different locations.
A reference behavior that detects one or more characteristic behaviors based on information on the past behaviors of participants and generates reference behavior information that is information on the frequency of occurrence of the characteristic behaviors and the duration of the characteristic behaviors at the time of occurrence. Information generation steps and
With respect to the one or more characteristic behaviors of the participants at the time of the current dialogue, the current behavior information generation step for generating the current behavior information which is information on the frequency of occurrence of the characteristic behaviors and the duration of the characteristic behaviors at the time of occurrence. ,
A behavior information comparison step for comparing the reference behavior information with the current behavior information,
An identity determination step for determining the identity of participants based on the comparison result of the behavior information comparison step, and
An output step for outputting the determination result by the identity determination step to one or more participants other than the participant who was the determination target in the identity determination step, and an output step.
A remote dialogue method characterized by including.

A display step that displays a dialogue video, which is a video of multiple participants participating in a remote dialogue, to a specific participant, and
The shooting step of shooting the video of the specific participant, and
Based on the display location in the display step, the shooting point in the shooting step, and the positional relationship between the specific participants, the specific participant looks at which of the plurality of participants during shooting in the shooting step. Attention target information generation step to generate attention target information including information on whether or not the user is pointing
As the image of the specific participant to be displayed in the display step, the dialogue image generation step of generating the dialogue image so as to direct the line of sight to the participant indicated by the attention target information, and the dialogue image generation step.
4. The remote dialogue method according to claim 4, further comprising.

A remote dialogue program that enables a computer to interact with multiple participants located in different locations.
A reference behavior that detects one or more characteristic behaviors based on information on the past behaviors of participants and generates reference behavior information that is information on the frequency of occurrence of the characteristic behaviors and the duration of the characteristic behaviors at the time of occurrence. Information generation function and
With respect to the one or more characteristic behaviors of the participants at the time of the current dialogue, a current behavior information generation function that generates current behavior information which is information on the frequency of occurrence of the characteristic behaviors and the duration of the characteristic behaviors at the time of occurrence. ,
A behavior information comparison function that compares the reference behavior information with the current behavior information,
An identity determination function that determines the identity of participants based on the comparison result of the behavior information comparison function, and
An output function that outputs a judgment result by the identity judgment function to one or more participants other than the participant who is the judgment target in the identity judgment function, and an output function.
A remote dialogue program characterized by realizing.