JP2016048855A

JP2016048855A - Remote communication device and program

Info

Publication number: JP2016048855A
Application number: JP2014173171A
Authority: JP
Inventors: 建鋒徐; Kenho Jo; 茂之酒澤; Shigeyuki Sakasawa
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2014-08-27
Filing date: 2014-08-27
Publication date: 2016-04-07
Anticipated expiration: 2034-08-27
Also published as: JP6410346B2

Abstract

PROBLEM TO BE SOLVED: To provide a remote communication device capable of avoiding influences of delay in a remote communication.SOLUTION: An analysis part 2 acquires a sound and an action from one user as data. In accordance with data of an action transmitted from a device of another user, a presentation part 5 displays the action of the other user. On the basis of the data acquired by the analysis part 2, a predictive action part 3 discriminates whether the one user starts talking and, in the case where it is discriminated that the one user starts talking, performs control so as to replace the action display of the other user on the presentation part 5 with a predetermined predictive action. A reaction control part 4 performs control in such a manner that, after the action display of the other user on the presentation part 5 is replaced with the predetermined predictive action, the action display is returned to display according to the data of the action transmitted from the device of the other user.SELECTED DRAWING: Figure 4

Description

本発明は、ネットワークを介して遠隔で行われる意思疎通において遅延の影響を回避することができる遠隔意思疎通装置及びプログラムに関する。 The present invention relates to a remote communication device and program capable of avoiding the effect of delay in communication performed remotely via a network.

二酸化炭素排出量を低減する環境保護や出張費など経費削減というメリットがあり、テレビ電話で遠隔会議を行うことは仕事の場で増えてきた。但し、市販の大半のテレビ電話システムでは、遠隔地それぞれで撮影した映像を録音した音声と共にそのまま相手側で表示するという構成を取るため、視線が合わせにくい課題と匿名発言できない課題とがある。匿名発言が好ましい場面として例えば、クライアント企業が製品についてグループインタビューを行う時に、パネリストが遠隔で参加できると、より柔軟な対応ができる。更に、クライアント企業も議論に参加するために、匿名参加が望ましい。 There are benefits such as environmental protection that reduces carbon dioxide emissions and business trip expenses, and remote conference calls via videophones have increased in the workplace. However, since most videophone systems on the market have a configuration in which the video captured at each remote location is displayed as it is together with the recorded audio on the other party's side, there is a problem that the line of sight is difficult to match and a problem that it is impossible to speak anonymously. For example, when a client company conducts a group interview on a product as a case where anonymous speech is preferable, a more flexible response can be achieved if a panelist can participate remotely. Furthermore, anonymous participation is desirable for client companies to participate in discussions.

上記の二つの課題に関連して、遠隔会議において視線を反映して匿名参加を可能とする従来技術として、特許文献１がある。特許文献１では、出席者を表す３次元コンピュータモデル（アバター）を、現実の世界の出席者の動きに従ってアニメ化する。以下、当該アニメ化による遠隔会議を、アバター遠隔会議と呼ぶ。 In relation to the above two problems, there is Patent Document 1 as a prior art that enables anonymous participation by reflecting the line of sight in a remote conference. In Patent Document 1, a three-dimensional computer model (avatar) representing an attendee is animated according to the movement of the attendee in the real world. Hereinafter, the remote conference based on the animation is referred to as an avatar remote conference.

アバター遠隔会議では具体的には、各ユーザー・ステーションにおいて、出席者を表すアバターを含む３次元コンピュータモデルの画像が表示される。一対のカメラを使用して、ボディ・マーカーとライトが取り付けられたヘッドホンを着用したユーザの画像データを処理して、ユーザの動きと注視している表示画像中の点とを決定する。この情報はその他のユーザー・ステーションへ伝送され、アバターの頭部の動きがユーザの頭部の換算された動きに対応するようにアニメ化される。 Specifically, in the avatar remote conference, an image of a three-dimensional computer model including an avatar representing an attendee is displayed at each user station. A pair of cameras is used to process image data of a user wearing headphones with body markers and lights to determine the user's movement and the point in the display image being watched. This information is transmitted to other user stations and animated so that the movement of the avatar's head corresponds to the converted movement of the user's head.

一方、Face-to-faceの、すなわち、面と向かった人間同士による現実のコミュニケーションは、相槌（あいづち）その他の、相手のフィードバックを見ながら進行する。非特許文献１においてその問題が検討されているように、遠隔会議のようにコミュニケーションが遠隔で実施される場合、ネット伝送や計算処理で遅延が発生するので、フィードバックなど反応が遅れる。当該遅延は、『コミュニケーションに違和感がある』、『コミュニケーションが妨害されている』、『相手の話がわかりにくい』、『会話が盛り上がらない』等といったように、コミュニケーションの円滑さを妨げる原因になると言われている。 On the other hand, face-to-face, that is, actual communication between humans facing each other proceeds while looking at other people's feedback. As discussed in Non-Patent Document 1, when communication is performed remotely as in a remote conference, a delay occurs in network transmission or calculation processing, and thus a response such as feedback is delayed. The delay may cause the smoothness of communication such as “I feel uncomfortable in communication”, “I am disturbed by communication”, “I don't understand the other party's story”, “I don't enjoy talking”, etc. It is said.

遅延による遠隔会議の円滑進行の妨げを低減するため、特許文献２では、テレビ電話における相手側で反応信号を検出すると、相手側からは反応信号があるというメッセージだけを送って、前記メッセージを受信すると、ストリーミング画像に割り込んで事前に用意した反応時画像を表示させる。 In order to reduce the hindrance to the smooth progress of the remote conference due to delay, in Patent Document 2, when a reaction signal is detected at the other party in the videophone, only the message that there is a reaction signal is sent from the other party and the message is received. Then, the reaction time image prepared in advance by interrupting the streaming image is displayed.

特開２０００−２４４８８６号公報JP 2000-244886 A 特開２０１０−１５４３８７号公報JP 2010-154387 A 特開２０１３−０８９１７０号公報JP 2013-089170 A

Karen Ruhleder, Brigitte Jordan, Meaning-Making Across Remote Sites: How Delays in Transmission Affect Interaction, ECSCW'99, pp 411-429, 1999Karen Ruhleder, Brigitte Jordan, Meaning-Making Across Remote Sites: How Delays in Transmission Affect Interaction, ECSCW'99, pp 411-429, 1999 Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011. Realtime performance-based facial animation. ACM Trans. Graph. 30, 4, Article 77 (July 2011), 10 pages.Thibaut Weise, Sofien Bouaziz, Hao Li, and Mark Pauly. 2011.Realtime performance-based facial animation.ACM Trans.Graph. 30, 4, Article 77 (July 2011), 10 pages.

ここで、特許文献１のようなアバター遠隔会議の遅延を分析すると、遅延は、双方向のネット伝送時間に加えて、頷きなど反応を検出する時間とアバターの表示時間の合計となる。図１に、当該遅延のイメージ図を示す。 Here, when analyzing the delay of the avatar remote conference as in Patent Document 1, the delay is the sum of the time for detecting a reaction such as whispering and the display time of the avatar in addition to the bidirectional network transmission time. FIG. 1 shows an image diagram of the delay.

図１はすなわち、場所LAの参加者Aが発言開始した瞬間t1から、当該発言開始に対して遠隔地の場所LAにおける参加者Bの反応が場所L1の参加者Aに表示されることで伝わる瞬間t9までの合計の遅延DL7が、要素としての遅延DL1〜DL6の足し合わせとなることを、両者A,Bの時間軸上に表している。具体的には、次の通りである。 In FIG. 1, from the moment t1 when the participant A in the place LA starts speaking, the response of the participant B in the remote place LA is displayed to the participant A in the place L1 in response to the start of the statement. The fact that the total delay DL7 up to the instant t9 is the sum of the delays DL1 to DL6 as elements is shown on the time axes of both A and B. Specifically, it is as follows.

まず、参加者Aは時刻t1に発言開始して、時刻t3に至るまで発言SA1を行う。当該発言開始の瞬間t1の参加者Aの状態をネットワーク伝送するためのデータ（場所LBにてアバター表示させるためのデータ）に変換するための遅延として、反応検出遅延DL1が発生し、時刻t2から当該データの送信が開始される。矢印AR2として示すように、当該データは場所LAから場所LBへ送信される際にネット遅延DL2が発生し、時刻t4に場所LBに参加者Aの発言開始瞬間t1のデータが到達する。 First, the participant A starts speaking at time t1 and performs speaking SA1 until time t3. Reaction detection delay DL1 occurs as a delay for converting the state of participant A at the instant t1 when the speech starts to data for network transmission (data for displaying an avatar at location LB), from time t2. Transmission of the data is started. As indicated by an arrow AR2, when the data is transmitted from the location LA to the location LB, a net delay DL2 occurs, and the data of the participant A's speech start instant t1 arrives at the location LB at time t4.

次いで、当該到達したデータを場所LBにおいてアバター表示するために表示遅延DL3が発生し、時刻t5で初めて場所LBの参加者Bに対して、場所LAの参加者Aの発言開始の瞬間t1の様子がアバター表示される。従って、参加者Aの発言SA1に対して、参加者Bは時刻t5から時刻t7に至るまでの間、相槌その他の応答RB1を行う。当該応答RB1の開始瞬間t5のデータは場所LAでアバター表示するためのデータに変換するのに、反応検出遅延DL4を伴い、時刻t6に当該データが場所LAへ向けて送信開始される。 Next, a display delay DL3 occurs in order to display the arrived data at the location LB as an avatar, and at the time t5, the participant t Is displayed as an avatar. Accordingly, the participant B makes a response or other response RB1 from the time t5 to the time t7 in response to the speech SA1 of the participant A. The data at the start instant t5 of the response RB1 is converted to data for displaying an avatar at the place LA, with a reaction detection delay DL4, and transmission of the data toward the place LA is started at time t6.

さらに、矢印AR5として示すように、当該時刻t6に送信されたデータはネット遅延DL5を伴い、時刻t8に場所LAに到達した後、アバター表示するための表示遅延DL6を伴うことで、時刻t9に初めて、場所LAの参加者Aが時刻t1に発言開始した瞬間に対する参加者Bの応答が、場所LAにおける参加者Aに表示されて伝わることとなる。従って、以上の遅延D1〜D6を合計した遅延D7が、時刻t1〜t9の一連の期間として発生することとなる。 Furthermore, as indicated by the arrow AR5, the data transmitted at the time t6 is accompanied by the net delay DL5, and after reaching the place LA at the time t8, accompanied by the display delay DL6 for displaying the avatar, at the time t9. For the first time, the response of the participant B to the moment when the participant A in the place LA starts speaking at time t1 is displayed and transmitted to the participant A in the place LA. Accordingly, a delay D7 obtained by adding the above delays D1 to D6 occurs as a series of periods from time t1 to t9.

なお、図１では参加者Aの発言SA1は、当該時刻t9においては既に終了しているような長さとして描かれているが、当該長さは実際の発言に応じて任意に変動しうるものである。従って、自身の発言SA1に対する参加者Bからの応答の様子が参加者Aに初めて見えるようになった時刻t9においても発言SA1が継続している場合もある。 In FIG. 1, participant A's utterance SA1 is drawn as having already ended at the time t9, but the length may vary arbitrarily depending on the actual utterance. It is. Accordingly, the utterance SA1 may continue even at time t9 when the participant A can first see the state of the response from the participant B to his utterance SA1.

以上、特許文献１のようなアバター遠隔会議における遅延を図１のように分析すると、アバター遠隔会議に対して特許文献２の手法を適用しても、遅延低減の効果が少ないという課題が浮上する。すなわち、特許文献２の手法は、実写画像のようなテレビ電話に適用すると、伝送データの大幅削減及び画像のエンコードとデコード時間の省略によって遅延低減の効果があるが、アバター遠隔会議において、当該効果は十分ではない。 As described above, when the delay in the avatar remote conference as in Patent Document 1 is analyzed as shown in FIG. 1, even if the method of Patent Document 2 is applied to the avatar remote conference, the problem that the effect of delay reduction is small appears. . In other words, when the technique of Patent Document 2 is applied to a videophone such as a live-action image, there is an effect of reducing delay by greatly reducing transmission data and omitting encoding and decoding time of the image. Is not enough.

当該効果が十分でない理由は、アバター遠隔会議で伝送するデータとして人の関節の動きデータのみになり、画像データより十分小さいためであり、画像のエンコードとデコードも存在しないためである。つまり、アバター遠隔会議に特許文献２の手法を適用した場合を考えると、当該適用による変更点は、場所LBから場所LAへの送信データが参加者Bの関節の動きデータから参加者Bに反応がある旨のデータに置き換わることのみである。従って、その効果は、データ量がわずかに削減されることによって図１のネット遅延DL5を若干短くするだけであり、トータル遅延の短縮に効果が薄い。 The reason why the effect is not sufficient is that the data transmitted in the avatar remote conference is only human joint motion data and is sufficiently smaller than the image data, and there is no encoding and decoding of the image. In other words, considering the case where the method of Patent Document 2 is applied to an avatar remote conference, the change due to the application is that the transmission data from the place LB to the place LA reacts to the participant B from the motion data of the joint of the participant B It only replaces the data that there is. Therefore, the effect is that the net delay DL5 in FIG. 1 is only slightly shortened by slightly reducing the data amount, and the effect is small in reducing the total delay.

またそもそも、アバター遠隔会議に特許文献２の手法を適用したとしても、参加者Aの関節の動きデータを送信するための図１のネット遅延DL2自体は、当該データは画像に比べ小さいとはいえ、回避することができない。 In the first place, even if the method of Patent Document 2 is applied to the avatar remote conference, the net delay DL2 in FIG. 1 for transmitting the motion data of the joint of the participant A itself is smaller than the image. Can not be avoided.

すなわち、特許文献２の手法は、映像という大きなデータを双方向に送受することに代えてデータ量の小さい反応信号を送受し、実際の反応があるまで事前に用意してある反応自画像で表示を代替することで映像の場合と比べて遅延の影響を低減している。しかしながら、図１に示すような互いに遠隔の場所LA,LB間で何らかの信号（すなわち、データ量の小さい反応信号）を交換する必要があるという点では、特許文献２の手法においても、映像を送受する場合と同種類の遅延（すなわち、ネット遅延DL2,DL5）それ自体を回避することはできない。 That is, the technique of Patent Document 2 transmits and receives a reaction signal with a small amount of data instead of transmitting and receiving large data such as video in both directions, and displays a response self-portrait prepared in advance until there is an actual reaction. By substituting, the effect of delay is reduced compared to the case of video. However, since it is necessary to exchange some signal (that is, a reaction signal with a small amount of data) between the remote locations LA and LB as shown in FIG. The same kind of delay (ie, net delays DL2 and DL5) itself cannot be avoided.

上記従来技術の課題に鑑み、本発明は、アバター遠隔会議その他といったような遠隔の意思疎通における遅延の影響を回避することができる遠隔意思疎通装置及びプログラムを提供することを目的とする。 In view of the above-described problems of the prior art, an object of the present invention is to provide a remote communication device and a program that can avoid the influence of delay in remote communication such as an avatar remote conference.

上記目的を達成するため、本発明は、他ユーザの装置との間で双方向に音声及び動作データを送受することで、他ユーザと自ユーザとの意思疎通を仲介する遠隔意思疎通装置であって、自ユーザより音声及び動作をデータとして取得する解析部と、他ユーザの装置より送信された動作のデータに従って他ユーザを動作表示する提示部と、前記解析部で取得したデータに基づき、自ユーザが発言開始したか否かを判定し、当該発言開始したと判定した場合に、前記提示部における他ユーザの動作表示を所定の予測動作に置き換えるよう制御する予測動作部と、前記所定の予測動作に置き換えられた後に、前記提示部における他ユーザの動作表示を、他ユーザの装置より送信された動作のデータに従う表示に戻すよう制御する反応制御部と、を備えることを特徴とする。 In order to achieve the above object, the present invention is a remote communication device that mediates communication between another user and the user by bidirectionally transmitting and receiving voice and operation data to and from another user's device. Based on the data acquired by the analysis unit, the analysis unit that acquires the voice and the operation as data from the own user, the presentation unit that displays the other user according to the operation data transmitted from the device of the other user, and the data acquired by the analysis unit. Determining whether or not the user has started speaking, and determining that the user has started speaking, the prediction operation unit that controls to replace the other user's operation display in the presenting unit with a predetermined prediction operation; and the predetermined prediction A reaction control unit that controls to return the operation display of the other user in the presenting unit to the display according to the operation data transmitted from the other user's device after being replaced with the operation. It is characterized by.

また、本発明は、コンピュータを前記遠隔意思疎通装置として機能させるプログラムであることを特徴とする。 In addition, the present invention is a program that causes a computer to function as the remote communication device.

本発明によれば、自ユーザが発言開始したと判断された際に、他ユーザの動作表示を所定の予測動作に切り替えることができる。ここで、自ユーザが発言開始したか否かは、ネットワークを介して情報を送受せずに自ユーザの遠隔意思疎通装置において単独で判断可能な情報であるため、遅延の影響を回避することが可能となる。 According to the present invention, when it is determined that the user has started speaking, the operation display of other users can be switched to a predetermined prediction operation. Here, whether or not the user has started speaking is information that can be independently determined by the user's remote communication device without transmitting or receiving information via the network, and therefore the influence of delay can be avoided. It becomes possible.

特許文献１におけるアバター遠隔会議での遅延を分析したイメージ図である。It is the image figure which analyzed the delay in the avatar remote conference in patent documents 1. 一実施形態に係る遠隔意思疎通システムの構成例を示す図である。It is a figure which shows the structural example of the remote communication system which concerns on one Embodiment. ある参加者が利用する遠隔意思疎通装置において提示される画面の例を示す図である。It is a figure which shows the example of the screen shown in the remote communication apparatus which a certain participant uses. 一実施形態に係る遠隔意思疎通装置の機能ブロック図である。It is a functional block diagram of the remote communication apparatus which concerns on one Embodiment. 提示部における他ユーザの動作表示に対する、予測動作部及び反応制御部による割り込み処理のフローチャートである。It is a flowchart of the interruption process by the prediction operation | movement part and reaction control part with respect to the operation display of the other user in a presentation part. 割り込み処理において利用される他ユーザの動作データの例を時間軸上に概念的に示す図である。It is a figure which shows notionally the example of the operation data of the other user utilized in an interruption process on a time-axis. 2台の遠隔意思疎通装置の間で双方向にやりとりする際のデータ授受を含めて描いた機能ブロック図である。It is a functional block diagram drawn including data exchange at the time of exchanging bidirectionally between two remote communication devices. 発言内容に応じて複数の予測動作を定義しておく例を表形式で示す図である。It is a figure which shows the example which defines several prediction operation | movement according to the content of a statement in a table format. ブレンディング処理を説明するための概念図である。It is a conceptual diagram for demonstrating a blending process.

図２は、一実施形態に係る遠隔意思疎通システムの構成例を示す図である。遠隔意思疎通システム10は、遠隔会議等の遠隔での意思疎通を各ユーザA〜Dが互いに行うための複数の遠隔意思疎通装置1A〜1Dを備える。図示するように、各ユーザA〜Dはそれぞれ遠隔地LA〜LDに存在し、自身の遠隔意思疎通装置1A〜1Dを利用することにより、ネットワークNを介して相互に意思疎通を行う。 FIG. 2 is a diagram illustrating a configuration example of a remote communication system according to an embodiment. The remote communication system 10 includes a plurality of remote communication devices 1A to 1D for allowing users A to D to perform remote communication such as a remote conference. As shown in the figure, the users A to D exist in remote locations LA to LD, respectively, and communicate with each other via the network N by using their own remote communication devices 1A to 1D.

なお、図２では一例として4人のユーザA〜Dがそれぞれ遠隔意思疎通装置1A〜1Dを利用する例を示しているが、任意数の遠隔意思疎通装置によって遠隔意思疎通システム10を実現することができる。また、1つの遠隔地に存在する1台の遠隔意思疎通装置を、2人以上のユーザで共有して利用するようにしてもよい。 FIG. 2 shows an example in which four users A to D use the remote communication devices 1A to 1D as an example, but the remote communication system 10 is realized by an arbitrary number of remote communication devices. Can do. Further, one remote communication device existing in one remote location may be shared by two or more users.

当該遠隔における相互の意思疎通の実現のため、各々の遠隔意思疎通装置1A〜1Dにおいては、自ユーザの音声及び動作の情報を他ユーザの装置へ向けて送信すると共に、他ユーザの装置から他ユーザの音声及び動作の情報を受信することを継続的に実施する。例えば図示するように、ユーザAが利用する遠隔意思疎通装置1Aでは、自ユーザAの音声及び動作の情報を他ユーザB〜Dの遠隔意思疎通装置1B〜1Dへ向けて送信すると共に、当該他ユーザB〜Dの遠隔意思疎通装置1B〜1Dから他ユーザB〜Dのそれぞれの音声及び動作の情報を受け取る。 In order to realize the mutual communication in the remote, each of the remote communication devices 1A to 1D transmits the voice and operation information of the own user to the other user's device, and from the other user's device to the other Continuously receiving the user's voice and operation information. For example, as shown in the figure, in the remote communication device 1A used by the user A, the voice and operation information of the own user A are transmitted to the remote communication devices 1B to 1D of the other users B to D, and the other The voice and operation information of each of the other users B to D is received from the remote communication devices 1B to 1D of the users B to D.

図３は、図２のユーザAの遠隔意思疎通装置1Aにおいて他ユーザB〜Dとの間で遠隔の意思疎通を実現するために、遠隔意思疎通装置1AがユーザAに対して表示する画面の例である。図示するように、画面D[A]には、遠隔で意思疎通する各ユーザA〜DがそれぞれアバターAB[A]〜AB[D]として表示される。当該画面D[A]において自ユーザAのアバターAB[A]は、遠隔意思疎通装置1Aが取得した自ユーザAの動きに従って動き、他ユーザB〜DのアバターAB[B]〜AB[D]は、それぞれの遠隔意思疎通装置1B〜1Dにて取得され送信された他ユーザB〜Dの動きに従って動く。 FIG. 3 shows a screen displayed by the remote communication device 1A for the user A in order to realize remote communication with other users B to D in the remote communication device 1A of the user A in FIG. It is an example. As illustrated, on the screen D [A], users A to D who communicate remotely are displayed as avatars AB [A] to AB [D], respectively. In the screen D [A], the avatar AB [A] of the own user A moves according to the movement of the own user A acquired by the remote communication device 1A, and the avatars AB [B] to AB [D] of other users B to D Move according to the movements of the other users B to D acquired and transmitted by the respective remote communication devices 1B to 1D.

こうして、ユーザAにおいては自身のアバターAB[A]と遠隔ユーザB〜DのアバターAB[B]〜AB[D]が画面D[A]において一堂に会し、各ユーザの実際の動きに従って動く様子を見ながら、遠隔での意思疎通を行うことができる。その他のユーザB〜Dにおいても画面D[A]と同様の画面が提供され、遠隔での意思疎通を行うことができる。またこの際、図２で説明したように、画面と共に音声（各ユーザの発言）も各ユーザの装置において再生されることで、遠隔での意思疎通を行うことができる。 In this way, user A's own avatar AB [A] and remote user B ~ D's avatars AB [B] -AB [D] meet together on screen D [A] and move according to each user's actual movement. You can communicate remotely while watching the situation. Other users B to D are also provided with the same screen as the screen D [A], and can communicate remotely. At this time, as described with reference to FIG. 2, the voice (the remarks of each user) is also reproduced on each user's device together with the screen, thereby enabling remote communication.

ここで、図１を参照して説明したように、図２のような構成の遠隔意思疎通システムにおけるそれぞれの遠隔意思疎通装置を仮に従来技術で構成したとすると、遠隔のユーザから送信される動作及び音声に不可避な遅延が発生してしまうという課題が存在する。例えば、図３のような画面D[A]において自ユーザAと連動するアバターAB[A]が発言開始しているのに、他ユーザB〜DのアバターAB[B]〜AB[D]は当該発言に対して何らの反応を示していないという状況が生まれ、画面D[A]を見ている自ユーザAが違和感を覚える等の意思疎通上の不都合が生ずる。本発明によれば、当該不都合を回避することができる。以下、その詳細を説明する。 Here, as described with reference to FIG. 1, assuming that each remote communication device in the remote communication system configured as shown in FIG. 2 is configured by the prior art, an operation transmitted from a remote user In addition, there is a problem that inevitable delay occurs in the voice. For example, in the screen D [A] as shown in FIG. 3, the avatar AB [B] to AB [D] of other users B to D is started even though the avatar AB [A] linked to the own user A has started to speak. A situation occurs in which no response is given to the remarks, and communication inconvenience occurs such that the user A looking at the screen D [A] feels uncomfortable. According to the present invention, the inconvenience can be avoided. Details will be described below.

図４は、一実施形態に係る遠隔意思疎通装置1の機能ブロック図である。図２にて説明したように、遠隔意思疎通装置1は当該装置を利用する1人あるいは2人以上のユーザ毎に存在するが、図４ではそのうちの任意の1台として「遠隔意思疎通装置1」を説明する。遠隔意思疎通装置1は、解析部2、予測動作部3、反応制御部4及び提示部5を備える。当該各部の概要は以下の通りである。 FIG. 4 is a functional block diagram of the remote communication device 1 according to an embodiment. As described with reference to FIG. 2, the remote communication device 1 exists for each user or two or more users who use the device. In FIG. Is explained. The remote communication device 1 includes an analysis unit 2, a prediction operation unit 3, a reaction control unit 4, and a presentation unit 5. The outline of each part is as follows.

解析部2は、マイクやカメラ、センサ等を含んで構成されることで、自ユーザ（遠隔意思疎通装置1の利用ユーザ）の音声及び体の動作等をデータとして取得し、当該データを予測動作部3と、提示部5と、他ユーザの装置（他ユーザの遠隔意思疎通装置）と、にそれぞれ渡す。なお、他ユーザの装置へ渡す際は、図２で説明したようにネットワークNを介してデータ送信が行われる。 The analysis unit 2 is configured to include a microphone, a camera, a sensor, etc., so that the voice and body movements of the own user (user using the remote communication device 1) are acquired as data, and the data is predicted. To the unit 3, the presentation unit 5, and the other user's device (another user's remote communication device). When the data is transferred to another user's device, data transmission is performed via the network N as described with reference to FIG.

提示部5は、解析部2から送られる自ユーザの音声及び動作等のデータと、1名以上の他ユーザの遠隔意思疎通装置から送られる他ユーザの音声及び動作等のデータと、を受け取ることで、自ユーザが他ユーザとの間で遠隔意思疎通を行うための情報を提示する。 The presentation unit 5 receives the voice and operation data of the own user sent from the analysis unit 2, and the voice and operation data of the other user sent from one or more other users' remote communication devices. Thus, information for the user to communicate remotely with other users is presented.

遠隔の意思疎通として遠隔会議を実現する場合であれば、提示部5は図３で説明したようなアバター遠隔会議を行うための画面及び音声等を自ユーザに提示することができる。この場合、画面内の各アバターを対応するユーザ（自ユーザ又は他ユーザ）の動作データに従って動作させるように、提示部5は制御処理を行う。また、当該画面表示と連動させて、各ユーザの発言している音声を再生する処理を提示部5は実施する。なお、以下では、当該アバター遠隔会議を実現する場合を例として本発明の説明を行うこととする。 If a remote conference is realized as a remote communication, the presentation unit 5 can present the user with a screen and a voice for performing the avatar remote conference as described in FIG. In this case, the presentation unit 5 performs control processing so that each avatar in the screen is operated according to the operation data of the corresponding user (self user or other user). In addition, the presentation unit 5 performs a process of playing back the voice spoken by each user in conjunction with the screen display. In the following, the present invention will be described by taking as an example the case of realizing the avatar remote conference.

予測動作部3及び反応制御部4では、上記の提示部5における他ユーザの動作表示すなわちアバター動作表示に対して、自ユーザの立場から遅延の影響があると判定される場合に、自ユーザの立場で遅延の影響が感じられないような所定の動作表示を割り込みで実施させるよう、提示部5における他ユーザの動作表示処理を制御する。当該割り込みの処理が実施されていない間は、提示部5では上記のように、各時点で得られている自ユーザ及び他ユーザの動作データに従って対応するアバターを動作表示する。 In the predictive action unit 3 and the reaction control part 4, when it is determined that there is an influence of delay from the standpoint of the own user on the action display of the other user in the presenting part 5, that is, the avatar action display, The operation display processing of other users in the presentation unit 5 is controlled so that a predetermined operation display in which the influence of delay is not felt from the standpoint is performed by interruption. While the interruption process is not being performed, the presentation unit 5 displays the corresponding avatar according to the operation data of the user and other users obtained at each time point as described above.

当該割り込みの処理の概要は次の通りである。まず、予測動作部3は、提示部5における他ユーザの動作表示に、自ユーザの立場で遅延の影響があるか否かを判定する。具体的には、解析部2より送られる自ユーザのデータを調べることで、自ユーザが発言を開始すると判定された場合に、以降の所定期間を自ユーザの立場で遅延の影響があるものとして判定する。 The outline of the interrupt processing is as follows. First, the prediction operation unit 3 determines whether or not the other user's operation display in the presentation unit 5 is affected by delay from the standpoint of the user. Specifically, if it is determined that the user starts to speak by examining the data of the user sent from the analysis unit 2, the following predetermined period is assumed to be affected by the delay from the user's standpoint. judge.

当該判定は具体的には、図１を参照して説明したような意思疎通上の不都合な状況の発生を事前検出するというものである。すなわち、自ユーザが発言開始しているのにもかかわらず、遅延を伴うデータを仮にそのまま用いて提示部5で他ユーザの動作を表示したとすると、最初の所定期間は自ユーザの発言に対して他ユーザが何も反応を示していない画面が表示される、という不都合な状況である。なお、自ユーザが発言を開始したか否かの判定処理の詳細は後述する。 Specifically, this determination is performed in advance to detect the occurrence of an inconvenient situation for communication as described with reference to FIG. That is, even if the user has started speaking, if the data of the delay is used as it is and the operation of another user is displayed on the presentation unit 5, the first predetermined period will be This is an inconvenient situation where a screen on which no other users are responding is displayed. The details of the process for determining whether or not the user has started speaking will be described later.

予測動作部3はさらに、上記のように自ユーザが発言開始した旨を判定した場合、他ユーザの遠隔意思疎通装置から送られてくる動作をそのまま表示させることに代えて、所定の予測動作を一時的に表示させるように、提示部5における他ユーザの表示処理を制御する。こうして、自ユーザの立場では、自身が発言開始するとただちに、提示部5に表示される他ユーザが所定の予測動作を行うことで、自身の発言に対する応答の様子を見て取ることができるので、不自然な感じを受けることなく発言を継続することができる。 Further, when it is determined that the user has started speaking as described above, the prediction operation unit 3 performs a predetermined prediction operation instead of displaying the operation sent from the remote communication device of the other user as it is. The display processing of other users in the presentation unit 5 is controlled so as to be temporarily displayed. In this way, from the user's standpoint, as soon as he / she starts speaking, other users displayed on the presentation unit 5 can perform a predetermined predictive action so that the state of the response to his / her speech can be seen. You can continue to speak without feeling uncomfortable.

ここで特に、当該所定の予測動作の表示は、実際の他ユーザからの応答の動作を待つことなく、また、他ユーザの遠隔意思疎通装置と自ユーザの遠隔意思疎通装置1との間で前掲の特許文献２における反応信号の送受といったような、特定の情報の送受を行うことなく、ただちに実施することができるので、本発明ではネットワーク遅延の影響を回避することができる。すなわち、自ユーザが発言を開始したか否かという情報は、自ユーザの側の遠隔意思疎通装置1内で単独に判断可能な情報であり、当該判断にはネットワークを介して他ユーザの遠隔意思疎通装置との間でやりとりを行うことは不要な情報である。本発明ではこのような情報をトリガとして、予測動作の表示を実施するため、ネットワーク遅延の影響を回避することができる。 Here, in particular, the display of the predetermined prediction operation is performed without waiting for an actual response operation from another user and between the remote communication device of the other user and the remote communication device 1 of the own user. Since it can be implemented immediately without performing transmission / reception of specific information such as transmission / reception of reaction signals in Patent Document 2, the present invention can avoid the influence of network delay. That is, the information as to whether or not the own user has started speaking is information that can be independently determined within the remote communication device 1 on the own user's side. It is unnecessary information to communicate with the communication device. In the present invention, since the prediction operation is displayed using such information as a trigger, the influence of the network delay can be avoided.

次いで、反応制御部4は、自ユーザが発言開始したことに対する他ユーザからの実際の応答の動作が（ネットワーク遅延等を伴って）受信されるようになった場合に、予測動作部3によって提示部5において所定の予測動作に従って動くように制御されている他ユーザの動作を、受信されるようになった実際の他ユーザの動作へと切り替えるように、提示部5を制御する。当該切り替えられた後は、他ユーザの実際の動作に従って提示部5で他ユーザの動作が表示されるようになる。また、当該切り替える際は、予測動作と実際の動作とが滑らかに接続する処理が行われるが、その詳細は後述する。 Next, the reaction control unit 4 presents the response by the prediction operation unit 3 when an actual response operation from another user in response to the start of speech by the own user is received (with a network delay or the like). The presenting unit 5 is controlled to switch the operation of the other user who is controlled to move according to the predetermined prediction operation in the unit 5 to the actual operation of the other user who has received the operation. After the switching, the operation of the other user is displayed on the presentation unit 5 according to the actual operation of the other user. Further, when the switching is performed, a process of smoothly connecting the predicted operation and the actual operation is performed, and details thereof will be described later.

図５は、提示部5における他ユーザの動作表示に対する、予測動作部3及び反応制御部4による以上説明した割り込み処理を、フローチャートとして示す図である。また、図６は、当該割り込み処理において利用される他ユーザの動作データの例を時間軸上に概念的に示す図である。 FIG. 5 is a flowchart showing the interrupt processing described above by the prediction operation unit 3 and the reaction control unit 4 for the operation display of other users in the presentation unit 5. FIG. 6 is a diagram conceptually showing an example of operation data of other users used in the interrupt process on the time axis.

以下、図６を適宜参照しながら、図５の各ステップを説明する。なお、図６では、説明のため、自ユーザをユーザAとし、その動作表示に関して割り込み処理が行われる他ユーザをユーザBとする。図６にて(1)が自ユーザAにおいて表示される他ユーザBのデータの時系列D1,D2,P1,D7,D8であり、(2)が当該表示する元データとなる他ユーザBの実動作データの時系列D1〜D8である。 Hereafter, each step of FIG. 5 is demonstrated, referring FIG. 6 suitably. In FIG. 6, for the sake of explanation, it is assumed that the user A is the user A, and the other user who is interrupted with respect to the operation display is the user B. In FIG. 6, (1) is the time series D1, D2, P1, D7, D8 of the data of other user B displayed in the own user A, and (2) is the other user B's original data to be displayed. Time series D1 to D8 of actual operation data.

図６において、割り込み処理で表示されるデータは(1)のP1であるが、当該データP1は割り込み処理を簡潔に説明するための例であり、割り込むタイミング（時刻T5,T9）でただちに動作表示を切り替えているため、自ユーザAにおいて他ユーザBの動作が自然に連続して見えるようにするための処理が省略されている例である。当該自然に見えるようにする処理を省略せずに施すことで、データP1を一部分修正したものが、(3)のデータP10である。すなわち、動作が自然に見えるという観点からは、データP10が実際に割り込み処理によって表示するのに好ましいものである。 In FIG. 6, the data displayed in the interrupt processing is P1 of (1), but the data P1 is an example for briefly explaining the interrupt processing, and the operation is displayed immediately at the interrupt timing (time T5, T9). This is an example in which the process for allowing the user A to see the actions of the other user B naturally and continuously is omitted. The data P10 in (3) is obtained by partially correcting the data P1 by performing the process of making the image appear natural without being omitted. That is, from the viewpoint that the operation looks natural, it is preferable that the data P10 is actually displayed by the interrupt process.

図５のフローにおいては、時刻をカウンタ変数tにより時刻tとして参照する。当該フローの全体構造は次の通りである。すなわち、各時刻tにおいて置かれた状況により場合分けがなされることで、各時刻tではステップS11、S20又はS30のいずれかにおいて、提示部5が当該時刻tにおいて表示すべき他ユーザの動作に応じたフレーム画像を表示するということを、時刻tを次の最新の時点へと更新しながら継続するという全体構造である。 In the flow of FIG. 5, the time is referred to as the time t by the counter variable t. The overall structure of the flow is as follows. That is, by dividing the case according to the situation placed at each time t, at each time t, in any one of steps S11, S20, or S30, the presentation unit 5 determines the actions of other users to be displayed at the time t. The overall structure is that the display of the corresponding frame image is continued while updating the time t to the next latest time point.

当該フローはステップS1で開始されると、まずステップS10において、予測動作部3が当該時刻tにおける解析部2の自ユーザのデータを解析して、自ユーザが発言を開始したか否かを判定する。開始していると判定されればステップS20へ進み、開始していないと判定されればステップS11へ進む。当該ステップS10における判定の詳細は後述する。 When the flow starts in step S1, first, in step S10, the prediction operation unit 3 analyzes the data of the own user of the analysis unit 2 at the time t to determine whether or not the own user has started speaking. To do. If it is determined that it has started, the process proceeds to step S20, and if it is determined that it has not started, the process proceeds to step S11. Details of the determination in step S10 will be described later.

ステップS11では、提示部5が現在時刻tにおいて受信され表示可能となっている他ユーザの実データにより、他ユーザを表示して、ステップS12へと進む。ステップS12では時刻tを次の最新時点へと更新してから、ステップS10に戻る。 In step S11, the presenting unit 5 displays other users based on the actual data of other users received and displayed at the current time t, and proceeds to step S12. In step S12, the time t is updated to the next latest time point, and then the process returns to step S10.

こうして、ステップS10,S11,S12のループ内に留まって各時刻tの他ユーザを連続的に表示している間は、他ユーザから送信される実際の動作がそのまま提示部5において表示され続けることとなる。アバター遠隔会議であれば、実際の動作がアバターを介して表示され続ける。 Thus, while the other users at each time t are continuously displayed in the loop of steps S10, S11, and S12, the actual operation transmitted from the other users is continuously displayed on the presentation unit 5. It becomes. In the case of an avatar remote conference, the actual operation continues to be displayed via the avatar.

図６の例では、(1)における時刻T3〜T5の一連の区間におけるデータD1,D2で他ユーザの動作を表示している状態が、図５におけるステップS10,S11,S12のループ内に留まる状態に対応する。この場合、(2)に示すように、他ユーザが当該データD1,D2の動きを行ったのは時刻T3〜T5間より前の時刻T1〜T3間であり、ネットワーク遅延等の遅延を伴って提示部5に他ユーザの動作が表示されているが、自ユーザは発言開始しておらず、他ユーザからの即座の応答を見て取る必要はない状態にあるので、特に意思疎通上の問題は生じない。 In the example of FIG. 6, the state in which the operations of other users are displayed with the data D1 and D2 in the series of time periods T3 to T5 in (1) remains in the loop of steps S10, S11, and S12 in FIG. Corresponds to the state. In this case, as shown in (2), the other users moved the data D1 and D2 during the time T1 to T3 before the time T3 to T5, with a delay such as a network delay. Although the actions of other users are displayed on the presentation unit 5, the user himself / herself has not started speaking and is in a state where there is no need to see the immediate response from other users. Absent.

図５に戻り、ステップS20では、提示部5は予測動作部3からの制御を受けて、予測動作に移行しながら、又は、既に予測動作に移行済みの状態にあれば当該移行後の予測動作に即した形で、当該時刻tにおける他ユーザのフレーム画像（アバター遠隔会議であればアバター画像となる。）を表示して、ステップS21へ進む。当該ステップS20での表示処理の詳細は後述する。 Returning to FIG. 5, in step S <b> 20, the presentation unit 5 receives the control from the prediction operation unit 3 and shifts to the prediction operation, or if the state has already been transferred to the prediction operation, the prediction operation after the transfer The frame image of the other user at the time t (displayed as an avatar image in the case of an avatar remote conference) is displayed in conformity with the above, and the process proceeds to step S21. Details of the display process in step S20 will be described later.

ステップS21では、当該時刻tにおいて反応制御部4が他ユーザからの応答の受信があったか否かを判定し、応答受信があった場合はステップS30へ進み、なかった場合にはステップS22へ進む。ステップS22では時刻tを次の最新時点へと更新してから、ステップS20に戻る。 In step S21, the reaction control unit 4 determines whether or not a response has been received from another user at the time t. If a response has been received, the process proceeds to step S30. If not, the process proceeds to step S22. In step S22, time t is updated to the next latest time point, and then the process returns to step S20.

なお、ステップS21において反応制御部4が他ユーザからの応答受信の有無を判定するためには、次のようにすればよい。すなわち、直近のステップS10において予測動作部3が自ユーザの発言開始を判定した場合、当該時刻t100において解析部2から他ユーザの遠隔意思疎通装置へと向けて送信する自ユーザの動作データ等に、自ユーザにおいて発言開始の判定がされた旨のフラグ情報を追加しておく。そして、他ユーザの遠隔意思疎通装置においては、遅延ΔT1を伴って時刻t100+ΔT1において表示することとなる自ユーザ（他ユーザにとっての他ユーザ）の動作データに当該フラグ情報が追加されている場合に、当該時刻t100+ΔT1における他ユーザ自身の動作データに、返信用のフラグ情報として、自ユーザの発言開始に対する応答が他ユーザにおいて開始された旨のフラグ情報を追加して、自ユーザの遠隔意思疎通装置1に返信する。さらに遅延ΔT2を伴って、当該応答が開始された旨のフラグ情報が追加された他ユーザの動作データを時刻t100+ΔT1+ΔT2において受け取った自ユーザの反応制御部4は、当該時刻t100+ΔT1+ΔT2においてステップS21における上記の肯定の判断を下すことが可能となる。 In order to determine whether or not the response control unit 4 has received a response from another user in step S21, the following may be performed. That is, when the prediction operation unit 3 determines the start of the user's speech in the latest step S10, the operation data of the own user transmitted from the analysis unit 2 to the remote communication device of the other user at the time t100 In addition, flag information indicating that the start of speech has been determined by the user is added. In the remote communication device of another user, when the flag information is added to the operation data of the own user (another user for another user) to be displayed at time t100 + ΔT1 with a delay ΔT1 In addition, flag information indicating that the response to the start of the user's speech is started by the other user is added to the operation data of the other user himself / herself at the time t100 + ΔT1 as reply flag information. Reply to the communication device 1. Further, the response control unit 4 of the own user who has received the operation data of the other user to which the flag information indicating that the response has been started is added at time t100 + ΔT1 + ΔT2 with the delay ΔT2, the time t100 + ΔT1 It becomes possible to make the affirmative determination in step S21 at + ΔT2.

以上のフラグ情報の送受は、図６において矢印AR10及びAR11として例示されている。自ユーザAは時刻T5においてステップS10の肯定判断すなわち発言開始の旨の判定を下し、他ユーザBに向けて送信する当該時刻T5における自身の動作データに発言開始フラグを追加する。矢印AR10として示すように、他ユーザBにおいては当該発言開始フラグ付与された自ユーザAの動作を受診後、実際に時刻T7において当該フラグ付与された自ユーザAの動作データを表示する。従って、当該時刻T7において送信する他ユーザB自身の動作データD7に応答開始した旨のフラグを追加して、送信する。矢印AR11に示すように、時刻T9において当該応答開始された旨のフラグ追加された他ユーザBの動作データD7を表示する自ユーザAは、当該時刻T9においてステップS21における肯定の判断を下すことができる。 The above transmission / reception of flag information is illustrated as arrows AR10 and AR11 in FIG. The own user A makes an affirmative determination in step S10 at time T5, that is, a determination to start speech, and adds a speech start flag to his / her operation data at time T5 to be transmitted to the other user B. As indicated by the arrow AR10, the other user B displays the operation data of the own user A to which the flag is actually assigned at time T7 after receiving the operation of the own user A to which the speech start flag is assigned. Accordingly, a flag indicating that a response has been started is added to the operation data D7 of the other user B himself / herself to be transmitted at the time T7 and transmitted. As indicated by the arrow AR11, the own user A who displays the operation data D7 of the other user B to which the response has been started at time T9 may make a positive determination in step S21 at the time T9. it can.

図５に戻り、ステップS30では、提示部5は反応制御部4からの制御を受けて、直近のステップS20,S21,S22のループ内で実現されていた予測動作による表示から、他ユーザの遠隔意思疎通装置より逐次的に送信されている他ユーザの実際の動作データに従う形での表示へと徐々に戻る形で、当該時刻tにおける他ユーザのフレーム画像を表示して、ステップS31へ進む。当該ステップS30での表示処理の詳細は後述する。 Returning to FIG. 5, in step S30, the presentation unit 5 receives control from the reaction control unit 4, and from the display by the prediction operation realized in the loop of the latest steps S20, S21, S22, the other user's remote The frame image of the other user at the time t is displayed while gradually returning to the display in the form according to the actual operation data of the other user sequentially transmitted from the communication device, and the process proceeds to step S31. Details of the display process in step S30 will be described later.

ステップS31では、直近のステップS30において徐々に他ユーザの実際の動作に戻って表示する処理が完全に完了しているか否かを反応制御部4が判断し、完了していればステップS10に戻り、完了していなければステップS32へ進む。ステップS32では時刻tを次の最新時点へと更新してから、ステップS30に戻る。 In step S31, the reaction control unit 4 determines whether or not the process of gradually returning to the actual operation of the other user and displaying in the latest step S30 is completely completed. If completed, the process returns to step S10. If not completed, the process proceeds to step S32. In step S32, the time t is updated to the next latest time point, and then the process returns to step S30.

以上、図５の各ステップを説明したが、ステップS20,S21,S22のループ内又はステップS30,S31,S32内のループ内に留まっている間の提示部5による他ユーザの動作の表示処理が、割り込み処理によって所定の予測動作に置き換えられている状態に（概ね）対応している。ここで、提示部5は前者のステップS20,S21,S22のループ内にある際は予測動作部3による制御を受けており、後者のS30,S31,S32内のループ内にある際は反応制御部4による制御を受けている。 As described above, each step of FIG. 5 has been described. However, the display process of other users' actions by the presentation unit 5 while remaining in the loop of steps S20, S21, and S22 or the loop of steps S30, S31, and S32 is performed. This corresponds to (substantially) a state in which the predetermined prediction operation is replaced by the interrupt process. Here, when the presentation unit 5 is in the loop of the former steps S20, S21, and S22, it is controlled by the prediction operation unit 3, and when it is in the latter loop of S30, S31, and S32, the reaction control Controlled by part 4.

特に、他ユーザの動作表示を滑らかに自然に実現する観点からは好ましくない実施形態であるが、ステップS10からステップS20に至った時点でただちに実際の他ユーザの動きデータによる表示から予測動作による表示に切り替え、同様にして逆に、ステップS21からステップS30に至った時点でただちに予測動作による表示から実際の他ユーザの動きデータによる表示に切り替えるようにする場合、図６の(1)におけるデータP1が、当該割り込み処理によって表示する他ユーザの動作データ（予測動作のデータ）となる。 In particular, this is an unpreferable embodiment from the viewpoint of smoothly and naturally realizing the operation display of other users. However, immediately after reaching step S10 from step S10, the display based on the motion data of other users is displayed by the prediction operation. Similarly, conversely, when switching from the display based on the prediction operation to the display based on the actual motion data of another user immediately after reaching step S21 to step S30, the data P1 in (1) of FIG. Is the operation data (predicted operation data) of another user displayed by the interrupt process.

しかしながら、当該瞬間的に表示データを切り替える実施形態では動作表示が不連続となる可能性が高いので、滑らかに切り替わるようにした実施形態における表示データの例が、図６の(3)におけるデータP10である。 However, since there is a high possibility that the operation display is discontinuous in the embodiment in which the display data is switched instantaneously, an example of display data in the embodiment in which the display is switched smoothly is the data P10 in (3) of FIG. It is.

データP10の場合、データP1におけるように時刻T5（図５のステップS10で肯定判断を得た時刻）でただちに切り替えるのではなく、時刻T5の後に滑らかに切り替え可能な時刻T51を検出したうえで、当該時刻T51より予測動作を開始させるようにしている。同様に、データP1におけるように時刻T9（図５のステップS21で肯定判断を得た時刻）でただちに切り替えるのではなく、その後の時刻T91までの間は予測動作と実際の他ユーザの動作D7とのハイブリッド状態の動作を設定するようにして、時刻T91において完全に動作D7に戻るようにしている。当該データP10の構成の詳細は後述する。 In the case of data P10, instead of immediately switching at time T5 (time when affirmative determination is obtained in step S10 in FIG. 5) as in data P1, after detecting time T51 that can be switched smoothly after time T5, The prediction operation is started at the time T51. Similarly, instead of switching immediately at time T9 (the time when the affirmative determination was obtained in step S21 in FIG. 5) as in data P1, until the time T91 thereafter, the predicted operation and the actual operation D7 of other users The operation in the hybrid state is set so that the operation completely returns to the operation D7 at time T91. Details of the configuration of the data P10 will be described later.

図７は、2ユーザA,B間でそれぞれの遠隔意思疎通装置が双方向にやりとりをする際のデータ授受を含めて描いた遠隔意思疎通装置1A,1Bの機能ブロック図である。各装置1A,1Bの構成は図７に示すように、図４で示したものと共通である。ただし、ユーザA,Bの装置1A,1Bの区別を設けるため、機能部の参照番号の最後にA,Bを付与してある。すなわち、自ユーザAの音声及び動作データを取得した解析部2Aは、当該データを自身の提示部5A及び予測動作部3Aに渡すと共に、相手側の他ユーザBの予測動作部3B及び反応制御部4Bへも渡す。他ユーザBの解析部2Bも図示するようにこれと対称なデータ授受を行う。 FIG. 7 is a functional block diagram of the remote communication devices 1A and 1B drawn including data exchange when the remote communication devices communicate bidirectionally between the two users A and B. The configuration of each device 1A, 1B is the same as that shown in FIG. 4, as shown in FIG. However, in order to distinguish between the devices 1A and 1B of the users A and B, A and B are given at the end of the reference numbers of the functional units. That is, the analysis unit 2A that acquired the voice and operation data of the own user A passes the data to the presenting unit 5A and the prediction operation unit 3A, and the other user B's prediction operation unit 3B and the reaction control unit Pass to 4B. The analysis unit 2B of the other user B also performs data exchange symmetrical to this as shown in the figure.

例えば、図６の矢印AR10,AR11として説明したフラグ情報は、解析部2Aの動作データ等に追加されたうえで遠隔意思疎通装置1Bの該当機能部へ、また解析部2Bの動作データ等に追加されたうえで遠隔意思疎通装置1Aの該当機能部へ、それぞれ送信されることとなる。 For example, the flag information described as arrows AR10 and AR11 in FIG. 6 is added to the operation data of the analysis unit 2A and then added to the corresponding function unit of the remote communication device 1B and to the operation data of the analysis unit 2B. Then, the data is transmitted to the corresponding functional unit of the remote communication device 1A.

また、本発明における遠隔会議等の遠隔意思疎通は、自ユーザに対する相手となる他ユーザが1名以上であっても実現可能であるが、この場合、複数の他ユーザのそれぞれの遠隔意思疎通装置との間で、図７に示すような双方向のデータ授受が行われることとなる。こうして、例えば合計4人のユーザがそれぞれ1台の遠隔意思疎通装置を利用して遠隔会議等を行う場合であれば、各ユーザの遠隔意思疎通装置1の提示部5において図３のような画面を表示することができる。 In addition, the remote communication such as the remote conference in the present invention can be realized even when there is one or more other users who are counterparts to the own user. In this case, each of the remote communication devices of a plurality of other users Bidirectional data exchange as shown in FIG. Thus, for example, if a total of four users each conduct a remote conference using one remote communication device, a screen as shown in FIG. 3 is displayed on the presentation unit 5 of each user's remote communication device 1. Can be displayed.

以上、遠隔意思疎通装置1の全体的な動作について説明した。以下では、当該全体的な動作を実現している要素技術の詳細に関して、図４の各部を説明する。特に、図３のような画面に表示するための、動作データの詳細について説明する。 The overall operation of the remote communication device 1 has been described above. Below, each part of FIG. 4 is demonstrated regarding the detail of the element technology which implement | achieves the said whole operation | movement. In particular, the details of the operation data to be displayed on the screen as shown in FIG. 3 will be described.

＜解析部2について＞
解析部2はデバイスでユーザの音声とモーションを求める。複数ユーザが存在する場合は、ユーザ毎に当該データを求める。利用できるデバイスはカメラとマイク、またはKinect（登録商標）、またはGoogle Glass（登録商標）などが挙げられる。 <About Analysis Unit 2>
The analysis unit 2 obtains the user's voice and motion with the device. If there are multiple users, the data is obtained for each user. Available devices include cameras and microphones, Kinect (registered trademark), or Google Glass (registered trademark).

例えばKinectを利用する実施形態では、頭の姿勢、表情、上半身のモーション、音声の有無と対象者が発言しているかどうかを推定する。それと同時に、Kinectで映像、音声、深度データを収録する。Microsoft（登録商標）が提供しているSDK（ソフトウェア開発キット）でKinectのデータから頭のPitch, yaw, roll（ピッチ、ヨー、ロール）という姿勢と顔のパーツAU（AU0〜AU5）（アクションユニット）を追跡する。更に、表情は顔のパーツAUで判定する。例えば、AU4 （Lip Corner Depressor）を利用すると、0=neutral、-1=pos、+1=negと三種類に判定する。なお、以上の頭の姿勢及び顔のパーツAUの追跡は以下のURLに開示されている。
[URL] http://msdn.microsoft.com/en-us/library/jj130970.aspx For example, in the embodiment using Kinect, the posture of the head, the facial expression, the motion of the upper body, the presence or absence of voice, and whether the target person is speaking are estimated. At the same time, video, audio and depth data are recorded with Kinect. The SDK (software development kit) provided by Microsoft (registered trademark), the posture of the head Pitch, yaw, roll (pitch, yaw, roll) and face parts AU (AU0 to AU5) (action unit) from Kinect data ). Furthermore, the facial expression is determined by the facial part AU. For example, when AU4 (Lip Corner Depressor) is used, three types are determined: 0 = neutral, -1 = pos, and + 1 = neg. The tracking of the head posture and face part AU is disclosed in the following URL.
[URL] http://msdn.microsoft.com/en-us/library/jj130970.aspx

また、Microsoftが提供しているSDKでKinectのデータから人のモーションを取得する。会議の場合は下半身のモーションを取りにくいが、重要ではないので、Seatedのモードで上半身のモーションのみを取得すればよい。当該モーション取得の詳細は以下のURLに開示されている。
[URL] http://msdn.microsoft.com/en-us/library/hh973077.aspx Also, human motion is obtained from Kinect data using the SDK provided by Microsoft. In the case of a meeting, it is difficult to take the motion of the lower body, but since it is not important, it is only necessary to acquire the motion of the upper body in the Seated mode. Details of the motion acquisition are disclosed at the following URL.
[URL] http://msdn.microsoft.com/en-us/library/hh973077.aspx

また、Kinectの音声データで対象者の発言有無を判定する。まず、音量がしきい値を超えると、発言有りと判定し、逆にしきい値以下の場合、発言無しと判定する。ここで、1台の遠隔意思疎通装置1を複数のユーザで利用している場合は、発言有りの場合にさらに、声紋の照合に基づく周知の話者照合技術を利用し、いずれのユーザの発言かどうかを判定する。ここで、各ユーザの声紋データ等は予め登録しておく。 In addition, the presence / absence of the subject's speech is determined based on Kinect audio data. First, when the volume exceeds a threshold value, it is determined that there is a speech. Conversely, when the volume is equal to or less than the threshold value, it is determined that there is no speech. Here, when one remote communication device 1 is used by a plurality of users, when there is a speech, a known speaker verification technology based on voiceprint verification is further used to Determine whether or not. Here, the voice print data of each user is registered in advance.

また、当該いずれのユーザの発言であるかを特定することについての別実施例として、ユーザごとにマイクを持っておくようにして、その音量の相対差から行う方法も可能である。すなわち、各時刻において複数のマイクの録音を解析して、最大音量であるマイクを利用しているユーザがその時点における発言者であると判定してもよい。 Further, as another embodiment for specifying which user the utterance is, it is possible to use a method in which a microphone is provided for each user and the relative difference in sound volume is used. That is, the recording of a plurality of microphones may be analyzed at each time, and the user using the microphone having the maximum volume may be determined as the speaker at that time.

＜提示部5について＞
提示部5では、以上の解析部2からのデータを受け取り、自ユーザ及び他ユーザのアバターをそれぞれアニメーション動作させる。ここで、他ユーザのアバターを動作させる際の制御に関しては、図５や図６で説明したように所定条件が満たされた場合に、本発明特有の予測動作部3及び反応制御部4による制御に従うこととなる。しかし、当該制御に従う場合と従わない場合のいずれにおいても、アバターをアニメーション動作させる技術自体には、周知技術を利用することができる。 <About Presentation Unit 5>
The presentation unit 5 receives the data from the analysis unit 2 described above, and causes the own user and other users' avatars to animate. Here, regarding the control when operating the other user's avatar, the control by the prediction operation unit 3 and the reaction control unit 4 unique to the present invention is performed when a predetermined condition is satisfied as described in FIG. 5 and FIG. Will follow. However, a well-known technique can be used as the technique for causing the avatar to perform an animation operation regardless of whether the control is followed or not.

例えば、Kinectを用いてリアルタイムでアバターの表情及び動作をアニメーション動作させることができる。Kinectを用いたリアルタイムでアバターの表情をアニメーション動作させる技術は例えば、前掲の非特許文献２のような手法が知られている。また、Kinectを用いたリアルタイムでアバターの上半身をアニメ化させるSDKはMicrosoftが提供しており、以下のURL等に開示されている。
[URL] http://msdn.microsoft.com/en-us/library/hh973077.aspx For example, an avatar's facial expression and motion can be animated in real time using Kinect. As a technique for animating an avatar expression in real time using Kinect, for example, a technique as described in Non-Patent Document 2 described above is known. In addition, an SDK that animates the upper body of an avatar in real time using Kinect is provided by Microsoft and is disclosed at the following URL.
[URL] http://msdn.microsoft.com/en-us/library/hh973077.aspx

なお、自ユーザの提示部5でアニメーション動作させる他ユーザのアバターに関して、その「リアルタイム」のデータにどのようなものを用いるかを制御するのが、予測動作部3及び反応制御部4である。自ユーザの提示部5における自ユーザのアバターのアニメーション動作は、常に自ユーザ立場における「リアルタイム」のデータを利用することとなる。 Note that it is the prediction operation unit 3 and the reaction control unit 4 that control what is used for the “real-time” data regarding the avatars of other users that are animated by the presentation unit 5 of the own user. The animation operation of the own user's avatar in the own user's presentation unit 5 always uses "real time" data in the own user's position.

＜予測動作部3について＞
予測動作部3は、図５や図６で説明したように、自ユーザが発言開始したと判定された場合に、提示部5で表示する他ユーザのアバターを、自ユーザの立場で不自然に見えないように、所定の予測動作に従うアニメーション動作へと移行させる。 <About Predictive Action Unit 3>
As described with reference to FIGS. 5 and 6, the prediction operation unit 3 unnaturally displays other users' avatars to be displayed on the presentation unit 5 when it is determined that the user has started speaking. It shifts to an animation operation according to a predetermined prediction operation so that it cannot be seen.

当該発言開始の判定に関しては、上記の解析部2における閾値判断による発言有無に従って判定すればよい。例えば、発言が無いと判定されることが所定期間以上継続した後、発言有りの判定が得られた時点で、または発言有の判定が所定期間以上継続した時点で、発言開始の旨を判定すればよい。 The determination of the start of speech may be made according to the presence / absence of speech by the threshold determination in the analysis unit 2 described above. For example, when it is determined that there is no utterance for a predetermined period or longer, when it is determined that there is a utterance, or when the presence of utterance continues for a predetermined period or longer, the start of the utterance is determined. That's fine.

当該所定の予測動作のデータに関しては、解析部2で取得されるデータと同種類のデータを、発言開始した自ユーザの立場で不自然に見えないような所定動作より予め抽出しておけばよい。予測動作部3では当該予め抽出されているデータを用いて、提示部5での他ユーザのアニメーション動作を制御する。 As for the data of the predetermined prediction operation, the same kind of data as the data acquired by the analysis unit 2 may be extracted in advance from a predetermined operation that does not look unnatural from the standpoint of the user who started the speech. . The prediction operation unit 3 controls the animation operation of another user in the presentation unit 5 using the data extracted in advance.

所定の予測動作は具体的には、次のようなものを採用することができる。例えば、発言開始後において、発言や意図的な凝視をしていない時でも自然に見えるように、表示される他ユーザのアバターを自動的に頷かせたり、考える姿勢をさせるような予測動作を採用してよい。 Specifically, the following prediction operation can be employed. For example, after starting to speak, predictive action is adopted that automatically makes the displayed other user's avatar appear or thinks even when not speaking or intentionally staring. You can do it.

ただし、頷きや考える姿勢について具体的なモーションデータはその人の個性があるため、他ユーザごとに事前に個性があるモーションを取得しておくことが好ましい。例えば、過去の会議に他ユーザが参加していれば、その際に取得したモーションデータにおいて手動で頷きや考える姿勢を切り出せばよい。また、そうした会議等のデータがなければ、参加者に指示し、頷きや考える姿勢のデータを収録すればよい。 However, since specific motion data regarding the whispering and thinking posture has the personality of the person, it is preferable to acquire a motion having personality in advance for each other user. For example, if another user has participated in a past meeting, it is only necessary to manually cut out a thinking or thinking attitude in the motion data acquired at that time. Also, if there is no such data, you can instruct the participants and record the data for whispering and thinking.

また、予測動作部3において発言開始判定後に、提示部5での他ユーザのアバター動作を実データに従うものから予測動作のデータに従うものに切り替えるタイミングは、次のように決定すればよい。 Moreover, after the speech start determination in the prediction operation unit 3, the timing of switching the avatar operation of the other user in the presentation unit 5 from the one according to the actual data to the one according to the data of the prediction operation may be determined as follows.

一実施形態では、予測動作のデータの開始フレームと最も距離が小さいフレームを、当該時点において受信済みだが提示部5における表示処理待ちとなっている他ユーザの動作データ内（いわゆるバッファーのデータ内）から探索し、当該探索された時点から、予測動作のデータに切り替えるようにしてよい。 In one embodiment, the frame having the shortest distance from the start frame of the predicted motion data is received in the motion data of other users that have been received at that time but are waiting for display processing in the presentation unit 5 (so-called buffer data). The search may be switched to the prediction operation data from the time of the search.

例えば、図６の(3)の例であれば、発言開始時点T5でバッファーのデータがD3であったとすると、当該データD3内のフレームで予測動作の開示フレームと最も距離の小さいフレームが時刻T51のものであると探索されることで、時刻T51より予測動作による表示P10を開始する。 For example, in the example of (3) in FIG. 6, if the buffer data is D3 at the speech start time T5, the frame within the data D3 that has the smallest distance from the disclosed frame of the prediction operation is the time T51. As a result, the display P10 based on the prediction operation is started from time T51.

別の一実施形態では、予測動作部3において発言開始判定した時点でただちに、予測動作による表示に切り替えるようにしてよい。ただし、当該切り替えられる予測動作の再生開始位置を、当該判定した時点で表示対象となっている他ユーザの姿勢に最も距離が近いものとして、決定する。この場合、図６の例であれば(1)のように、判定時刻T5でただちに予測動作による表示P1が開示される。 In another embodiment, the display may be switched to the display based on the prediction operation as soon as the prediction operation unit 3 determines to start speaking. However, the reproduction start position of the predicted operation to be switched is determined as the one closest to the posture of the other user who is the display target at the time of the determination. In this case, in the example of FIG. 6, as in (1), the display P1 by the prediction operation is immediately disclosed at the determination time T5.

なお、以上の各実施形態では、予測動作データ内のフレームにおける姿勢と、他ユーザの実データにおけるフレームの姿勢と、の距離を計算している。当該距離の計算については、後述する反応制御部4の説明における式(1)と同様にして計算することが可能である。 In each of the above embodiments, the distance between the posture in the frame in the predicted motion data and the posture of the frame in the actual data of other users is calculated. The calculation of the distance can be performed in the same manner as Equation (1) in the description of the reaction control unit 4 described later.

なお、予測動作は他ユーザごとに1種類のみ用意しておいてもよいし、発言開始した自ユーザの発言内容に応じた適切な予測動作を2種類以上用意しておいてもよい。例えば、図８に表形式で示すように、2種類の予測動作「考える」及び「頷く」を用意しておき、発言開始した自ユーザの発言内容が質問しているものである場合は「考える」予測動作を利用し、質問以外のものである場合は「頷く」予測動作を利用するようにしてもよい。 Note that only one type of prediction operation may be prepared for each other user, or two or more types of prediction operations appropriate for the content of the user's speech that has started speaking may be prepared. For example, as shown in a table form in FIG. 8, two types of predictive actions “think” and “buzz” are prepared, and if the content of the speech of the user who started speaking is a question, The “predictive action” may be used, and if it is something other than the question, the “nodding” predictive action may be used.

ここで、図８における質問しているか否か等の発言内容の判定に関しては、解析部2で取得した自ユーザの発言音声に対し、予測動作部3が周知の発言内容解析手法（音声解析及びテキスト解析）を適用することで判定すればよい。 Here, regarding the determination of the content of the utterance such as whether or not the question is made in FIG. 8, the prediction operation unit 3 uses a well-known utterance content analysis method (voice analysis and It may be determined by applying (text analysis).

＜反応制御部4について＞
反応制御部4は、実際の聞き手（他ユーザ）の応答信号が得られたとき、前記予測動作と、実際の応答動作の間を滑らかにつなげるよう、提示部5での他ユーザの動作表示を制御する。なお、反応信号については、図６で説明した矢印AR11における応答のあった旨のフラグ情報として取得することができる。 <About Reaction Control Unit 4>
When the response signal of the actual listener (other user) is obtained, the reaction control unit 4 displays the operation display of the other user in the presentation unit 5 so as to smoothly connect the predicted operation and the actual response operation. Control. Note that the reaction signal can be acquired as flag information indicating that there is a response in the arrow AR11 described in FIG.

当該滑らかにつなげる処理に関して、予測動作部3において図８で説明したように、発言内容に応じて複数の予測動作が設定されている場合に関してまず説明する。 Regarding the processing for smoothly connecting, first, the case where a plurality of prediction operations are set according to the content of a statement as described in FIG. 8 in the prediction operation unit 3 will be described.

まず、反応信号を受信した時、複数の中から決定された予測動作と実際の応答動作が異なる場合（ケース１：予測動作部3の結果が間違った場合）と同種の場合（ケース２：予測動作部3の結果が当たった場合）という二つのケースを分ける。 First, when a response signal is received, the predicted action determined from a plurality is different from the actual response action (case 1: when the result of the prediction action unit 3 is wrong) (case 2: prediction). The two cases are divided into two cases: when the result of operation part 3 hits.

具体的に、予測動作部3による制御で提示部5にて再生された予測動作において現時点から一定時間のモーション（以下モーションデータM１）と他ユーザの解析部から受信した他ユーザの実際の応答動作（以下モーションデータM２）において各フレームの距離を以下の式(1)で算出し、最も小さい距離を求める。なお、モーションデータM2は、応答信号を得ている時点から所定長のものを利用すればよい。 Specifically, in the prediction operation reproduced by the presentation unit 5 under the control of the prediction operation unit 3, the motion for a certain period of time from now (hereinafter referred to as motion data M1) and the actual response operation of the other user received from the analysis unit of the other user In (hereinafter referred to as motion data M2), the distance of each frame is calculated by the following equation (1) to obtain the smallest distance. The motion data M2 may be data having a predetermined length from the time when the response signal is obtained.

ここで、モーションデータM１のあるフレームＦ^ｉ _ＢとモーションデータM2のあるフレームＦ^ｊ _Ｂとの距離ｄ（Ｆ^ｉ _Ｂ，Ｆ^ｊ _Ｂ）は以下の式（１）で算出することができる。 Here, the distance d (F ⁱ _B , F ^j _B ) between the frame F ⁱ _B with the motion data M1 and the frame F ^j _B with the motion data M2 can be calculated by the following equation (1).

但し、ｑ_ｉ，ｋはフレームＦ^ｉ _Ｂのｋ番目のジョイントの四元数（quaternion）であり、前掲の特許文献３に開示されるように姿勢を表すものである。ｗ_ｋはｋ番目のジョイントに係る重みである。重みｗ_ｋは予め設定される。 However, q _{i, k} is a quaternion of the k-th joint of the frame F ⁱ _B , and represents the posture as disclosed in the above-mentioned Patent Document 3. w _k is a weight related to the k-th joint. The weight w _k is preset.

上記の式(1)の全(i, j)の組み合わせの中における最小距離が一定値以上になると、ケース１に判定する。ケース１の場合は、モーションデータM１がしばらく予測動きを続けて、ニュートラルなポーズに到達した時点で、モーションデータM2をニュートラルなポーズから再生するようにすればよい。 When the minimum distance among all the combinations (i, j) in the above formula (1) becomes a certain value or more, the case 1 is determined. In the case 1, when the motion data M1 continues the predicted motion for a while and reaches the neutral pose, the motion data M2 may be reproduced from the neutral pose.

なお、ニュートラルなポーズについても、上記式(1)と同様に、ジョイントの四元数として所定値で定義しておくことで、モーションデータM1,M2内からそれぞれ、ニュートラルなポーズと判定可能なデータを式(1)と同様に最小距離のものとして決定することができる。 As for the neutral pose, data that can be determined as a neutral pose from within each of the motion data M1 and M2 by defining the joint quaternion with a predetermined value as in the above equation (1). Can be determined as that of the minimum distance as in the equation (1).

また、上記の式(1)の全(i, j)の組み合わせの中における最小距離が一定値未満の場合はケース２に判定する。ケース２では、上記式(1)における最も距離が小さいフレームペア(i, j)を最適な分岐点（繋がりポイント）として選ぶ。そして、当該最適な分岐点を中心に一定時間のフレームをモーションブレンディング（以下、ブレンディングとする）すればよい。なお、ブレンディングは、前掲の特許文献３等に開示された周知手法であり、異なる動作同士を滑らかに繋げる手法である。 Further, when the minimum distance among all (i, j) combinations of the above formula (1) is less than a certain value, it is determined as case 2. In Case 2, the frame pair (i, j) with the shortest distance in the above equation (1) is selected as the optimum branch point (connection point). Then, it is only necessary to perform motion blending (hereinafter referred to as blending) for a frame of a certain time around the optimum branch point. Blending is a well-known technique disclosed in the above-mentioned Patent Document 3 and the like, and is a technique for smoothly connecting different operations.

図９は、ブレンディング処理を説明するための概念図である。ブレンディング処理では、フレームｉを有するモーションデータM１とフレームｊを有するモーションデータM２に対して、動きのつながりが不自然にならないように、両者のモーションデータの接続部分を混合した補間データ（ブレンディングデータ）MB１＿２を生成する。 FIG. 9 is a conceptual diagram for explaining the blending process. In the blending process, interpolated data (blending data) in which the motion data M1 having the frame i and the motion data M2 having the frame j are mixed with each other so that the motion connection is not unnatural. MB1_2 is generated.

一実施形態では、一定時間分のフレームを使用しクォータニオンによる球面線形補間を利用して連結部分を補間することができる。具体的には、図９に示すように、モーションデータM１とモーションデータM２を接続する接続区間（区間長ｍ、但し、ｍは所定値）のブレンディングデータMB１＿２を、モーションデータM１のフレームｉを中心に周りの区間長ｍのデータM１＿ｍとモーションデータM２のノードｊを中心に区間長ｍのデータM２＿ｍを用いて生成する。（当該データM１＿ｍ及びM２＿ｍは図９には不図示である。） In one embodiment, the connected portions may be interpolated using spherical linear interpolation by quaternions using frames for a certain period of time. Specifically, as shown in FIG. 9, the blending data MB1_2 in the connection section (section length m, where m is a predetermined value) connecting the motion data M1 and the motion data M2 is centered on the frame i of the motion data M1. Are generated using the data M2_m of the section length m around the node j of the surrounding section length data M1_m and the motion data M2. (The data M1_m and M2_m are not shown in FIG. 9)

このとき、接続区間の区間長ｍに対する接続区間の先頭からの距離ｕの比（ｕ／ｍ）に応じて、データM１＿ｍのうち距離ｕに対応するフレームｉとデータM２＿ｍのうち距離ｕに対応するフレームｊを混合する。具体的には、以下の式(2)および式(3)により、ブレンディングデータMB１＿２を構成する各フレームを生成する。なお、式(2)は、ある一つの骨についての式となっている。 At this time, according to the ratio (u / m) of the distance u from the head of the connection section to the section length m of the connection section, the frame i corresponding to the distance u in the data M1_m corresponds to the distance u in the data M2_m. Mix frame j. Specifically, each frame constituting the blending data MB1_2 is generated by the following equations (2) and (3). Equation (2) is an equation for one bone.

但し、ｍはブレンディング動きデータMB１＿２を構成するフレーム（ブレンディングフレーム）の総数（所定値）、ｕはブレンディングフレームの先頭からの順番（１≦ｕ≦ｍ）、ｑ（ｋ，ｕ）はｕ番目のブレンディングフレームにおける第ｋ骨の四元数、ｑ（ｋ，ｉ）はフレームｉにおける第ｋ骨の四元数、ｑ（ｊ）はフレームｊにおける第k骨の四元数、である。但し、ルートにはブレンディングを行わない。なお、式(3)はslerp（spherical linear interpolation；球面線形補間）の算出式である。 Where m is the total number (predetermined value) of the frames (blending frames) constituting the blending motion data MB1_2, u is the order from the top of the blending frame (1 ≦ u ≦ m), and q (k, u) is the uth The quaternion of the kth bone in the blending frame, q (k, i) is the quaternion of the kth bone in frame i, and q (j) is the quaternion of the kth bone in frame j. However, blending is not performed on the route. Equation (3) is a calculation formula of slerp (spherical linear interpolation).

ブレンディングデータMB１＿２は、モーションデータM１とモーションデータM２の接続部分のデータとする。こうして、最終的に再生されるデータは、図９におけるM1_[前方]、MB1_2及びM2_[後方]となる。 The blending data MB1_2 is data of a connection portion between the motion data M1 and the motion data M2. Thus, the finally reproduced data is M1 _[front] , MB1_2 and M2 _[rear] in FIG.

以下、本発明における補足事項を説明する。 Hereinafter, supplementary matters in the present invention will be described.

（１）他ユーザから送られる音声データについては、自ユーザの提示部5で再生する際に、他ユーザから送られる同時刻の動作データの表示と同時に再生すればよい。ここで、予測動作部3及び反応制御部4が他ユーザの動作表示を制御している場合であっても、当該制御とは独立に常に当該時点で受信されている音声データを再生するようにしてもよいし、動作表示で所定の予測動作を割り込ませるのと同様に、所定の予測音声を再生するようにしてもよい。 (1) The voice data sent from another user may be played back simultaneously with the display of the operation data sent from the other user at the same time when played back by the presenting unit 5 of the user. Here, even when the prediction operation unit 3 and the reaction control unit 4 control the operation display of other users, the audio data received at that time is always reproduced independently of the control. Alternatively, a predetermined predicted sound may be reproduced in the same manner as when a predetermined prediction operation is interrupted by the operation display.

（２）本発明はアバター遠隔会議を実現する場合を例として説明したが、自ユーザの発言に対して他ユーザの応答動作を表示する際の遅延による不自然さを解消することが必要となるような、任意の遠隔意思疎通において本発明は適用可能である。すなわち、業務その他における会議に限らず、ネットワークを介した遠隔の教育における教師・生徒間の意思疎通や、遠隔に存在する参加者同士でゲーム等を実行する場合等にも、本発明は適用可能である。 (2) Although this invention demonstrated as an example the case where an avatar remote conference is implement | achieved, it is necessary to eliminate the unnaturalness by the delay at the time of displaying the response action of another user with respect to a user's own speech. The present invention is applicable to any remote communication as described above. In other words, the present invention can be applied not only to meetings in business and others, but also to communication between teachers and students in remote education via a network, and when a game etc. is executed between participants who exist remotely. It is.

（３）また同様に、アニメーション表示に基づくアバター表示に限らず、ユーザを実写で表示する場合にも本発明は適用可能である。この場合、予測動作部3及び反応制御部4が提示部5の表示を割り込み制御する場合のみに、アバター表示で利用する際の動作情報を利用することで、他ユーザの「実写アバター」を表示し、その他の場合には他ユーザの実写映像を表示するようにすればよい。ただし、割り込み表示の切り替え処理の際には、実写映像においてもアバターとしての動作情報を紐付けるようにすることで、前述の式(1)を利用する各種の切り替え処理を実現すればよい。 (3) Similarly, the present invention is applicable not only to avatar display based on animation display but also to displaying a user in live action. In this case, only when the prediction operation unit 3 and the reaction control unit 4 interrupt and control the display of the presentation unit 5, the “actual avatar” of other users is displayed by using the operation information when using the avatar display. In other cases, a live-action video of another user may be displayed. However, when the interrupt display is switched, various switching processes using the above-described equation (1) may be realized by associating operation information as an avatar with a live-action video.

こうして例えば生放送における遠隔地との中継の際に、遠隔地に存在する相手側を画面表示する際に、当該「実写アバター」を表示することで、相手側に質問を発した際の不自然さを解消するようにしてもよい。 Thus, for example, when relaying to a remote location in live broadcasting, when displaying the other party in the remote location on the screen, by displaying the “live-action avatar”, unnaturalness when issuing a question to the other side May be eliminated.

なお、実写表示を実現する際は、相手側からは画像情報が送信されてくることとなるが、当該画像情報を受信することで、相手の動作データも受信していることとなる。また、割り込み表示を行う際は、画像情報に紐付ける形で、アバターをアニメーション動作させるための動作データを利用する必要がある。従って、解析部2においては当該紐付処理を行ったうえで、他ユーザの装置へと自身の動作データを送信することとなる。 It should be noted that, when realizing the live-action display, image information is transmitted from the partner side, but by receiving the image information, the operation data of the partner is also received. In addition, when performing an interrupt display, it is necessary to use operation data for causing an avatar to perform an animation operation in a form associated with image information. Therefore, the analysis unit 2 performs the association process and then transmits its own operation data to the other user's device.

（４）アバター遠隔会議の例では、提示部5においてリアルタイムで常に自ユーザのアバターを動作表示させるものとしたが、本発明を適用するその他の用途等に応じて、自ユーザを画面に表示することは省略してもよい。例えば上記生放送における遠隔地の相手の画面表示では、自ユーザは表示しなくともよい。 (4) In the example of the avatar remote conference, the presentation unit 5 always displays the avatar of the own user in real time, but displays the own user on the screen according to other uses to which the present invention is applied. This may be omitted. For example, in the screen display of the remote party in the live broadcast, the user does not need to display it.

（５）遠隔意思疎通装置1で実現する「遠隔」の意思疎通は、その物理的な距離の大小を問わない。装置間でネットワークを介して双方向に音声及び動作データが授受される任意の場合に本発明の遠隔意思疎通装置1が利用可能である。例えば同一建物内の隣部屋同士の間でLANを介して意思疎通するような、物理的な距離が十分に小さいような場合にも、本発明の遠隔意思疎通装置1が利用可能であり、ユーザ同士の意思疎通を仲介する。 (5) “Remote” communication realized by the remote communication device 1 does not matter whether the physical distance is large or small. The remote communication device 1 of the present invention can be used in any case where voice and motion data are exchanged between devices via a network in both directions. For example, the remote communication device 1 of the present invention can be used even when the physical distance is sufficiently small, such as communication between adjacent rooms in the same building via a LAN. Mediate communication between each other.

（６）予測動作部3で利用する予測動作のデータは、所定の単位動作を繰り返し再生可能なようなデータとして用意しておくことが好ましい。例えば「頷く」動作であれば、1回又は複数回の頷き動作を単位動作として、繰り返し滑らかに再生可能なデータとして用意しておくことが好ましい。 (6) Predictive motion data used in the predictive motion unit 3 is preferably prepared as data that can repeatedly reproduce a predetermined unit motion. For example, in the case of a “whipping” operation, it is preferable to prepare data that can be reproduced smoothly and repeatedly, with one or a plurality of whispering operations as unit operations.

また同様に、自ユーザの発言が継続することが想定される所定の最大長に渡って、予測動作データを用意しておいてもよい。この場合、予測動作データ内に周期的にニュートラルと判定されるポーズが現れるようにしておくことが好ましい。 Similarly, predictive motion data may be prepared over a predetermined maximum length in which the user's speech is expected to continue. In this case, it is preferable that a pause determined periodically as neutral appears in the predicted motion data.

（７）本発明は、コンピュータを遠隔意思疎通装置1として機能させるプログラムとしても提供可能である。当該コンピュータは、CPU(中央演算装置)、メモリ及び各種I/Fといった周知のハードウェアで構成することができ、当該プログラムを読み込んで実行するCPUが遠隔意思疎通装置1の各部として機能することとなる。 (7) The present invention can also be provided as a program that causes a computer to function as the remote communication device 1. The computer can be configured with known hardware such as a CPU (Central Processing Unit), a memory, and various I / Fs, and the CPU that reads and executes the program functions as each part of the remote communication device 1. Become.

1…遠隔意思疎通装置、2…解析部、3…予測動作部、4…反応制御部、5…提示部 1 ... Remote communication device, 2 ... Analysis unit, 3 ... Predictive action unit, 4 ... Reaction control unit, 5 ... Presentation unit

Claims

A remote communication device that mediates communication between another user and the user by bidirectionally transmitting and receiving voice and operation data to and from another user's device,
An analysis unit that acquires voice and motion as data from the user;
A presentation unit for displaying an operation of another user according to the operation data transmitted from the device of the other user;
Based on the data acquired by the analysis unit, it is determined whether or not the user has started speaking, and when it is determined that the speaking has started, the operation display of other users in the presenting unit is replaced with a predetermined prediction operation A predictive motion unit to control;
A reaction control unit configured to control the display of the other user's operation in the presenting unit to return to the display according to the operation data transmitted from the other user's device after being replaced with the predetermined prediction operation. Remote communication device.

A plurality of types of the predetermined prediction operations are prepared in advance,
The prediction operation unit, when determined, selects a predetermined prediction operation according to the content of the user's utterance determined to have started the utterance from the plurality of types, and then the other user in the presentation unit The remote communication device according to claim 1, wherein the operation display is controlled to be replaced with the selected predetermined prediction operation.

The predetermined prediction operation is prepared for each other user in advance,
3. The control according to claim 1, wherein, when the determination is made, the prediction operation unit performs control to replace an operation display of another user in the presentation unit with a predetermined prediction operation corresponding to the other user. Remote communication device.

The prediction operation unit starts reproduction of the predetermined prediction operation from the state determined to be closest to the posture of the other user displayed by the presentation unit at the time of the determination when the determination is made. 4. The remote communication device according to claim 1, wherein the remote communication device controls the operation display of the other user in the presentation unit to be replaced with a predetermined prediction operation.

The reaction control unit receives operation data transmitted from another user's device, and is associated with response information indicating that there is a response from the other user in response to the start of the user's own speech. 5. The remote communication device according to claim 1, wherein after the time point, the operation display of the other user in the presenting unit is returned to the display according to the operation data transmitted from the device of the other user. .

The reaction control unit includes each frame in the prediction operation data for a certain period of time that will be displayed by the control of the prediction operation unit after the time when the response information is received, and other frames in a certain period after the time point. By comparing each frame in the motion data received from the user's device, the closest frames are identified, and the identified frame is used as a connection point when controlling to replace the frame. The remote communication device according to claim 5, wherein:

The reaction control unit may perform motion blending of the motion data predicted for a certain period and motion data received from another user's device for a certain period after the time point with the connection point as a center. Item 7. The remote communication device according to Item 6.

The reaction control unit identifies the time point when the posture is determined to be neutral in the data of the predicted motion that is to be displayed by the control of the prediction motion unit after the time point when the response information is received, 6. The remote communication device according to claim 5, wherein the operation display of the other user in the presenting unit is returned to the display according to the operation data transmitted from the other user's device.

The remote communication device according to claim 1, wherein the presenting unit displays an operation of the other user as an avatar according to the operation data transmitted from the device of the other user.

A program for causing a computer to function as the remote communication device according to any one of claims 1 to 9.