JP2003108502A

JP2003108502A - Physical media communication system

Info

Publication number: JP2003108502A
Application number: JP2001302341A
Authority: JP
Inventors: Tomio Watanabe; 富夫渡辺; Hiromoto Ogawa; 浩基小川
Original assignee: INTERROBOT Inc
Current assignee: INTERROBOT Inc
Priority date: 2001-09-28
Filing date: 2001-09-28
Publication date: 2003-04-11

Abstract

PROBLEM TO BE SOLVED: To provide a communication system for allowing closer communication when the subject and a mate separated in time or a distance have a conversation. SOLUTION: In this physical media communication system, subject and mating communication terminals 101 and 102 exchange subject and mating voice while displaying subject pseudo-personality on a back face on this side of a virtual space image screen 3, and simultaneously displaying mating pseudo-personality on a front face in the inmost part of the virtual space image screen 3, the subject pseudo-personality is moved according to a talker body action based on the subject voice or a listener body action based on the mating voice inputted from a voice input means 5 via a telecommunication line 4, the mating pseudo- personality is moved according to the talker body action based on the mating vice via the telecommunication line 4 or the listener body action based on the subject voice inputted from the voice input means 5, and the mating voice is outputted.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、距離的に隔てられ
た本人及び相手が通信回線を介して接続する各通信端末
を用いて親密なコミュニケーションを図る際に用いる通
信システムに関する。ここで、本発明における本人及び
相手は相対的な呼称である。また、本人対相手の数的関
係は１対１である場合を基本とするが、１対多、多対１
又は多対多である場合も含むものとする。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a communication system used for intimate communication between a person and a partner who are separated from each other by using communication terminals connected via a communication line. Here, the principal and the other party in the present invention are relative names. In addition, the numerical relationship between the person and the other party is basically one-to-one, but one-to-many and many-to-one.
Or, the case of many-to-many is also included.

【０００２】[0002]

【従来の技術】例えば、「チャット(お喋り)」と呼ばれ
る通信システムは、コンピュータネットワークを介した
簡易なコミュニケーション手段として認知され、インタ
ーネットの利用に比例して普及している。インターネッ
トにおける基本的なチャット通信システムは、クライア
ントとなる各個人のコンピュータに送受信用のチャット
アプリケーションをインストールし、本人が接続を許し
た相手方との間で、リアルタイムにテキスト又は音声を
やり取りする。2. Description of the Related Art For example, a communication system called "chat" is recognized as a simple communication means via a computer network, and is widely used in proportion to the use of the Internet. A basic chat communication system on the Internet installs a chat application for transmission / reception on each individual computer serving as a client, and exchanges text or voice in real time with the other party to whom the person has permitted connection.

【０００３】近年では、本人又は相手方を互いのディス
プレイ上でアバタ(Avatar、疑似人格)として表示し、よ
り実際の会話に近い感じが得られるようにするチャット
通信システムも見られるようになってきている。これら
は、当初動きのないキャラクタに吹き出しを付すのみで
あったが、最近はやり取りするテキスト又は音声に従っ
てアバタに身体動作(頷き、瞬き、口の開閉や身振り動
作等)をさせるものもある。例えば、特開2001-160021
「仮想空間による通信システム」や、特開2001-230801
「通信システムとその方法、通信サービスサーバおよび
通信端末装置」がある。In recent years, a chat communication system has become available in which the person or the other party is displayed as an avatar (pseudo-personality) on the display of each other so that a feeling closer to an actual conversation can be obtained. There is. Initially, only a bubble was attached to a character that did not move, but recently, there is also a character that causes an avatar to perform a physical action (nodding, blinking, opening / closing of mouth, gesturing action, etc.) according to the text or voice that is exchanged. For example, Japanese Patent Laid-Open No. 2001-160021
"Communication system in virtual space" and Japanese Patent Laid-Open No. 2001-230801
There are "communication system and its method, communication service server and communication terminal device".

【０００４】特開2001-160021は、メッセージの送り手
の感情を、実際の会話時と同様に、相手ユーザに臨場感
を伴って適確に伝達可能な仮想空間による通信システム
の提供を目的としている。具体的には、話し手ユーザか
らのチャットメッセージに、アバタ管理サーバが、例え
ば「こんにちは」の文字列を検出すると聞き手ユーザの
アバタにお辞儀をさせ、クライアントの打鍵速度が所定
値を越えたユーザは興奮会話状態と判定してアバタの周
囲に光状表示をし、クライアントの操作停止からの所定
時間の経過でユーザの厭きを判定してアバタに欠伸をさ
せたりする。こうして、ユーザはディスプレイの表示
で、相手ユーザの感情や心身の活動状態を、臨場感を伴
って確認し相互に相手の感情や身体状態を理解した会話
ができる。[0004] Japanese Unexamined Patent Publication No. 2001-160021 aims to provide a communication system in a virtual space in which the emotion of the sender of a message can be accurately transmitted to the other user with a sense of presence, as in the actual conversation. There is. Specifically, the chat messages from the speaker the user, the avatar management server, to bow the avatar listeners user detects a character string such as "Hello", the user excitement keying speed of the client exceeds a predetermined value When it is determined that the user is in a conversation state, a light state is displayed around the avatar, and when a predetermined time elapses after the operation of the client is stopped, the user's hatefulness is determined and the avatar is inflated. In this manner, the user can confirm the emotions and the physical and mental activity states of the other user with a sense of presence on the display, and can have a conversation while mutually understanding the emotions and the physical state of the other user.

【０００５】また、特開2001-230801は、チャット環境
において、参加者の個性や性格を表現して臨場感あふれ
る対話を実現することに加え、翻訳、関連情報の提供等
のサービスも享受できる通信システムを提供することを
目的にした技術である。具体的には、入力されたメッセ
ージはメッセージ受信部及びメッセージ送信部を介して
各端末に配信され、アバタは入力される動作指示信号又
はメッセージに基づいて、アバタ情報記憶部に記憶され
ている性格情報及び性格別アクションデータベースを参
照して、アバタ動作制御部で動きを制御している。ま
た、キーワード抽出部で対話内容のキーワードを抽出
し、広告処理部で関連する広告情報を獲得し、各端末に
配信したり、必要に応じて翻訳処理部がメッセージを翻
訳結果を獲得し、その翻訳結果を端末に送信する。[0005] Further, Japanese Patent Laid-Open No. 2001-230801 discloses a communication in which, in a chat environment, in addition to realizing a realistic dialogue by expressing the individuality and personality of participants, services such as translation and provision of related information can be enjoyed. It is a technology aimed at providing a system. Specifically, the input message is distributed to each terminal through the message receiving unit and the message transmitting unit, and the avatar is stored in the avatar information storage unit based on the input operation instruction signal or message. The motion is controlled by the avatar motion control unit by referring to the information and the action database for each character. In addition, the keyword extraction unit extracts the keyword of the dialogue content, the advertisement processing unit acquires related advertisement information and distributes it to each terminal, or the translation processing unit acquires the translation result of the message as necessary, and Send the translation result to the terminal.

【０００６】[0006]

【発明が解決しようとする課題】コンピュータの性能向
上につれて、チャット通信システムが単なるテキスト又
は音声のやりとりから上述のようなアバタを用い、更に
はアバタに身体動作を付与するようになってきている。
これは、会話は単なるテキスト又は音声のやりとりでは
なく、テキスト又は音声と身体的動作とが一体となって
より現実感を伴って会話になることを、多くの人が感覚
的に認識していることに基づく。しかし、テキスト又は
音声と全く無関係な身体動作では、かえって現実感を損
なうから、身体動作の選定が重要事となる。As the performance of computers has improved, chat communication systems have begun to use avatars as described above from simple text or voice exchanges, and to give physical activity to avatars.
Many people sensuously recognize that a conversation is not a simple exchange of text or voice, but text and voice and physical movement are integrated into a conversation with a more realistic feeling. Based on that. However, it is important to select the body motion because the body motion that is completely unrelated to the text or the voice impairs the sense of reality.

【０００７】テキスト又は音声と身体動作との関連性の
観点から上記先行技術を眺めると、特開2001-160021
は、(1)特定のテキストに特定の身体動作を対応させた
り、(2)テキストを打ち込む打鍵速度から本人又は相手
の興奮状態を推測したり、(3)操作停止時間から本人又
は相手の厭きを判定してアバタに欠伸をさせる等、本人
又は相手の操作とアバタの身体動作とを関連づけようと
していることがわかる。また、特開2001-230801は、ア
バタ情報記憶部に記憶されている性格情報及び性格別ア
クションデータベースを参照して、アバタ動作制御部で
動きを制御するようにしており、これも本人又は相手の
操作とアバタの身体動作とを関連づけようとしていると
言える。音声認識技術が不完全な現在においては、テキ
スト又は音声から会話の意味を理解して適切な身体動作
を選定することは未だ難しいと言わざるを得ず、これら
先行技術の手段が妥当な範囲と見ることもできる。Looking at the above-mentioned prior art from the viewpoint of the relationship between text or voice and body movements, Japanese Patent Laid-Open No. 2001-160021
(1) Correspond a specific physical action to a specific text, (2) guess the excitement state of the person or the other party from the keystroke speed at which the text is typed, or (3) hate the person or the other party from the operation stop time. It can be seen that it is attempting to correlate the operation of the person or the other party with the physical action of the avatar, for example, by making a determination to cause the avatar to lack. Further, Japanese Patent Laid-Open No. 2001-230801 refers to the personality information and personality action database stored in the avatar information storage unit to control the movement by the avatar motion control unit. It can be said that he is trying to associate the operation with the physical action of the avatar. In the present day when speech recognition technology is incomplete, it must be said that it is still difficult to understand the meaning of conversation from text or voice and select an appropriate physical action, and these prior art means are considered to be in a reasonable range. You can also see.

【０００８】チャット通信システムにおける本人及び相
手の一体感又は現実感は、本人及び相手の会話リズムが
同調してくることに基づく「身体的引込現象(以下、引
込現象と略する)」によりもたらされる。すなわち、会
話リズムが同調してくると引込現象が発現され、この引
込現象が発現されるとより会話リズムの同調が進むので
ある。この状態においては、本人及び相手はそれぞれの
会話の中味を理解しやすくなり、この会話に対する理解
度も一体感及び現実感の創出に寄与する。これから、い
かに引込現象を発現させるか、すなわち身体動作と引込
現象との関連性を高めることが課題となることが分か
る。そこで、時間的又は距離的に隔てられた本人と相手
とが会話をする場合に、より親密なコミュニケーション
を可能にする通信システム、例えばチャット通信システ
ムを、アバタに付与する身体動作の選定に着目して検討
した。The sense of unity or reality of the person and the other party in the chat communication system is brought about by a "physical pull-in phenomenon (hereinafter abbreviated as pull-in phenomenon)" based on the fact that the conversation rhythms of the person and the other party are synchronized. . That is, when the conversation rhythm is synchronized, the pull-in phenomenon is expressed, and when the pull-in phenomenon is expressed, the synchronization of the conversation rhythm is further advanced. In this state, the person and the other person can easily understand the contents of each conversation, and the degree of understanding of this conversation also contributes to the creation of a sense of unity and reality. From this, it can be seen that how to develop the pull-in phenomenon, that is, to increase the relevance between the body movement and the pull-in phenomenon is an issue. Therefore, a communication system that enables more intimate communication, for example, a chat communication system when the person and the person who are separated from each other temporally or distancely have a conversation, paying attention to the selection of the physical action to be given to the avatar. I examined it.

【０００９】[0009]

【課題を解決するための手段】検討の結果開発したもの
が、距離的に隔てられた本人及び相手が通信回線を介し
て接続する各通信端末を用いて親密なコミュニケーショ
ンを図る際に用いる通信システムであって、本人及び相
手通信端末は仮想空間画面手前に本人疑似人格を背面表
示し、かつ仮想空間画面奥に相手疑似人格を同時に正面
表示しながら本人及び相手音声をやりとりしてなり、本
人疑似人格は本人音声に基づく話し手身体動作又は通信
回線を介して音声入力手段から入力した相手音声に基づ
く聞き手身体動作に従って動かし、相手疑似人格は通信
回線を介した相手音声に基づく話し手身体動作又は音声
入力手段から入力した本人音声に基づく聞き手身体動作
に従って動かすと共に、相手音声を出力する身体性メデ
ィア通信システムである。[Means for Solving the Problems] A communication system developed as a result of study is used when intimate communication is performed by using a communication terminal in which a person and a partner who are separated from each other are connected through a communication line. However, the person and the other party's communication terminal display the person's pseudo personality on the back in front of the virtual space screen, and exchange the person's and the other party's voice while simultaneously displaying the person's pseudo person on the back of the virtual space screen. The personality moves according to the speaker's physical movement based on the person's voice or the listener's physical movement based on the other party's voice input from the voice input means via the communication line, and the other person's pseudo personality is the speaker's physical movement or voice input based on the other party's voice via the communication line. A physical media communication system that moves according to the listener's physical movement based on the person's voice input from the means and outputs the other person's voice A.

【００１０】通信端末は専用の装置でもよいし、コンピ
ュータでもよい。この場合、コンピュータを音声の入出
力手段や疑似人格の身体動作の生成及び制御手段に用
い、コンピュータ相互はローカルネットワーク、電話回
線又はインターネット等の通信回線を用いて相互に接続
することになる。ここで、本発明にいう疑似人格とは前
記コンピュータのディスプレイ等の画面に表示するアバ
タを意味し、ディスプレイ等が仮想空間画面となる。疑
似人格をコンピュータに接続したロボットで構成するこ
とも可能であるが、複数の人間が一度にチャットを試み
る場合を想定すると、簡易に疑似人格の総数を変動させ
ることができるディスプレイ表示の疑似人格が好まし
い。The communication terminal may be a dedicated device or a computer. In this case, a computer is used as a voice input / output unit and a generation and control unit of a physical action of a pseudo-personality, and the computers are connected to each other using a local network, a telephone line, or a communication line such as the Internet. Here, the pseudo personality referred to in the present invention means an avatar displayed on a screen such as a display of the computer, and the display or the like is a virtual space screen. Although it is possible to configure the pseudo personality with a robot connected to a computer, assuming that multiple people try to chat at one time, the pseudo personality on the display that can easily change the total number of pseudo personalities is preferable.

【００１１】本発明の通信システムは、上述した「引込
現象」を利用して、本人及び相手のより親密なコミュニ
ケーションを実現する。まず、前記引込現象の発現をも
たらす第１の手段として、(A)仮想空間画面手前に本人
疑似人格を背面表示し、かつ仮想空間画面奥に相手疑似
人格を同時に正面表示することで、会話する者同士が仮
想的に空間を共有する感覚を本人及び相手に与える。す
なわち、仮想空間画面は本人及び相手疑似人格を対面関
係で表示する。相手疑似人格を画面奥に正面表示する点
は従来にも見られるが、本発明では本人疑似人格を画面
手前に背面表示する点が異なる。これは、本人に自分を
表す本人疑似人格を客観的に見せることにより、仮想空
間画面全体への引込を促す作用を発揮する。また、視覚
的に本人及び相手疑似人格の各身体動作の関連を合わせ
て認識することで、全体としてよりよく引込現象を発現
できる利点もある。The communication system of the present invention utilizes the above-mentioned "pull-in phenomenon" to realize more intimate communication between the person and the other party. First, as a first means for causing the expression of the attraction-in phenomenon, (A) the person's pseudo personality is displayed in the back in front of the virtual space screen, and the other person's pseudo personality is displayed in the front in the back of the virtual space at the same time to talk. Gives the person and the other person a feeling of virtually sharing a space with each other. That is, the virtual space screen displays the person and the other person's pseudo personality in a face-to-face relationship. Although the fact that the opponent's pseudo personality is displayed in front on the back of the screen has been seen in the past, the present invention is different in that the pseudo personality of the other person is displayed on the back in front of the screen. This has the effect of encouraging the person to be drawn into the entire virtual space screen by objectively showing the person's pseudo-personality expressing himself / herself. Further, by visually recognizing the relationship between the body movements of the person himself / herself and the pseudo-personality of the other person, there is an advantage that the attraction-in phenomenon can be better expressed as a whole.

【００１２】具体的な本人及び相手通信端末は、本人及
び相手疑似人格の身体動作を音声に基づいて作る疑似人
格制御部と、本人及び相手通信端末間でデータ又は音声
をやりとりする送受信処理部と、本人又は相手通信端末
に対する音声入出力部とからなり、本人通信端末の音声
入力部から入力した本人音声に基づき疑似人格制御部が
作る話し手身体動作又は本人通信端末の受信処理部から
入力した相手音声に基づき疑似人格制御部が作る聞き手
身体動作に従って仮想空間画面の本人疑似人格を動かす
と共に送信処理部が本人音声を相手通信端末の受信処理
部へ送信し、本人通信端末の受信処理部から入力した相
手音声に基づき疑似人格制御部が作る話し手身体動作又
は本人通信端末の音声入力部から入力した本人音声に基
づき疑似人格制御部が作る聞き手身体動作に従って仮想
空間画面の相手疑似人格を動かすと共に音声出力部が相
手音声を出力する構成である。[0012] A specific person and a partner communication terminal, a pseudo personality control unit for making a body motion of the person and a partner pseudo personality based on voice, and a transmission and reception processing unit for exchanging data or voice between the person and the partner communication terminal. , A voice input / output unit for the person or the other party's communication terminal, and the person's body movements made by the pseudo-personality control unit based on the person's voice input from the voice input section of the person's communication terminal or the partner input from the reception processing section of the person's communication terminal Based on the voice, the pseudo-personality control unit creates the pseudo-personality of the virtual space screen according to the listener's physical movements, and the transmission processing unit transmits the self-voice to the reception processing unit of the partner communication terminal and inputs from the reception processing unit of the personal communication terminal. Pseudo-personality control based on the speaker's body motion made by the pseudo-personality control unit based on the other party's voice or the person's voice input from the voice input unit of the personal communication terminal The audio output unit with moving the opponent pseudo personality of the virtual space screen is configured to output the other party voice according to audience body movements make it.

【００１３】上記コンピュータを通信端末とした例で言
えば、疑似人格制御部はコンピュータそのもの、送受信
処理部はコンピュータの通信インタフェース、そして音
声入出力部はそれぞれコンピュータ接続又は内蔵のマイ
ク及びスピーカである。このように、本発明の通信端末
はコンピュータで容易に構成でき、基本的にはコンピュ
ータを特定処理させるプログラムとして提供することと
なる。従来のチャット通信システムでもチャットアプリ
ケーションがプログラムとして提供されていることか
ら、本発明の通信システムでも同様な提供形態を用い
る。If the computer is used as a communication terminal, the pseudo-personality control unit is the computer itself, the transmission / reception processing unit is the computer communication interface, and the voice input / output unit is a computer-connected or built-in microphone and speaker. As described above, the communication terminal of the present invention can be easily configured by a computer, and basically, it is provided as a program for causing the computer to perform a specific process. Since the chat application is provided as a program even in the conventional chat communication system, the same providing form is used in the communication system of the present invention.

【００１４】引込現象の発現は、両疑似人格それぞれの
身体動作が本質となる。そこで、引込現象を発現させる
第２の手段として、(B)各疑似人格が話し手として振る
舞う場合と、聞き手として振る舞う場合とで分けて、話
し手身体動作と聞き手身体動作とを本人又は相手音声に
関連づけることで、音声と会話における本人及び相手疑
似人格の各身体動作に関連性を持たせた。例えば、本人
疑似人格が本人音声に従う場合は話し手、相手音声に従
う場合は聞き手として振る舞い、同時に相手疑似人格は
本人音声に従う場合は聞き手、相手音声に従う場合は話
し手として振る舞う。同一疑似人格に対し、話し手とし
ての身体動作と聞き手としての身体動作とが同時に作ら
れた場合には、予め設定した優先順位に従って疑似人格
を動かしてもよいし、単純に身体動作を合成又は排他的
に組み合わせて動かしてもよい。実際の人の会話におい
ても、音声と対応付けられた明確な身体動作は少ないの
で、多少のニュアンスの違いや身体動作の組合わせは、
かえって通信システムを用いた会話の現実感を醸成する
ことになる。The manifestation of the pull-in phenomenon is essentially the physical actions of both pseudo-personals. Therefore, as a second means of expressing the pull-in phenomenon, (B) the pseudo-personality acts as a speaker and as a listener, and the body movement of the speaker and the body movement of the listener are associated with the person or the other party's voice. By doing so, it was made to have a relation with each physical action of the person and the pseudo-personal person in the conversation with the voice. For example, when the pseudo-personal personality follows the person's voice, it behaves as a speaker, and when it follows the other party's voice, it behaves as a listener, while at the same time, the other person's pseudo-person acts as a listener when the other person's voice follows, and as a speaker when following the other party's voice. When a physical movement as a speaker and a physical movement as a listener are made at the same time for the same pseudo personality, the pseudo personality may be moved according to a preset priority, or the physical movements can be simply combined or excluded. You may move in combination. Even in actual human conversation, there are few clear body movements associated with voices, so some differences in nuances and combinations of body movements are
On the contrary, the reality of conversation using the communication system will be cultivated.

【００１５】上記本人及び相手通信端末に対し、本人及
び相手疑似人格の身体動作を指定に基づいて作る特定動
作制御部と、疑似人格制御部及び特定動作制御部それぞ
れが作った身体動作を合成する身体動作合成部と、本人
又は相手通信端末に対する指定動作入力部とを追加して
設けてなり、本人通信端末の指定動作入力部から入力し
た本人指定に基づき特定動作制御部が作る話し手身体動
作又は本人通信端末の受信処理部から入力した相手指定
に基づき特定動作制御部が作る聞き手身体動作を、疑似
人格制御部の作る話し手身体動作又は聞き手身体動作と
身体動作合成部により合成して仮想空間画面の本人疑似
人格を動かし、本人通信端末の受信処理部から入力した
相手指定に基づき特定動作制御部が作る話し手身体動作
又は本人通信端末の指定動作入力部から入力した本人指
定に基づき特定動作制御部が作る聞き手身体動作を、疑
似人格制御部の作る話し手身体動作又は聞き手身体動作
と身体動作合成部により合成して仮想空間画面の相手疑
似人格を動かしてもよい。For the above-mentioned person and the other party's communication terminal, a specific movement control section for making the body movements of the person and the other person's pseudo personality based on the designation, and a body movement made by each of the pseudo personality control section and the particular movement control section are synthesized. A body motion synthesis unit and a designated motion input unit for the person or the partner communication terminal are additionally provided, and the speaker body motion or the speaker body motion created by the specific motion control unit based on the person's designation input from the designated motion input unit of the personal communication terminal. A virtual space screen is created by combining the listener's body motion created by the specific motion control unit based on the designation of the other party input from the reception processing unit of the personal communication terminal by the talker body motion created by the pseudo-personality control unit or the listener's body motion and body motion composition unit. The person's pseudo-personality is moved, and the speaker's body motion or personal communication terminal is created by the specific motion control unit based on the designation of the other party input from the reception processing unit of the personal communication terminal. Based on the person's designation input from the designated movement input section, the listener's body movement made by the specific movement control section is synthesized by the speaker's body movement made by the pseudo-personality control section or the listener's body movement and the body movement synthesis section to simulate a virtual space screen partner. You may move your personality.

【００１６】音声と対応付けられた明確な身体動作は少
ないが、中には特定の音声(例えば「さようなら」)と身
体動作(例えば手を振る)とが対応付けられることもあ
る。よって、こうした明確な対応付けのある身体動作
は、音声とは違って、本人又は相手の明確な動作指定に
よって疑似人格を動かすとよい。上記例示で言えば、特
定動作制御部及び身体動作合成部はコンピュータそのも
の、指定動作入力部はキーボード、マウスやジョイステ
ィック等、コンピュータ一般に見られる既存の入力機器
を用いることができる。例えば、キーボードの特定キー
に「さようなら」の意味付けをしておき、この特定キー
を本人が叩くと、本人通信端末の仮想空間画面では本人
疑似人格が手を振ると共に、相手通信端末の仮想空間画
面では相手疑似人格が手を振る(本人は、相手から見た
相手になる)。ここで、身体動作合成部における疑似人
格制御部及び特定動作制御部それぞれが作った身体動作
の合成は、予め設定した優先順位に従っていずれかの身
体動作にのみ従って疑似人格を動かしてもよいし、単純
に身体動作を組み合わせて動かしてもよい。Although there are few distinct body movements associated with voices, in some cases specific voices (eg, “goodbye”) and body movements (eg, waving) may be associated. Therefore, unlike a voice, such a body motion with a clear correspondence may be caused to move the pseudo-personality by a clear motion designation of the person or the other party. In the above example, the specific motion control unit and the body motion synthesis unit can use the computer itself, and the designated motion input unit can use an existing input device generally found in computers such as a keyboard, a mouse and a joystick. For example, if a particular key on the keyboard is given the meaning of "goodbye," and the person hits this particular key, the pseudo-personal character of the person on the virtual space screen of the person's communication terminal waving On the screen, the other person's pseudo-personality sways (the person himself becomes the other person's perspective). Here, the synthesizing of the body motions created by the pseudo-personality control unit and the specific motion control unit in the body motion synthesizing unit may move the pseudo-personality according to only one of the body motions according to a preset priority order, You may move by simply combining physical movements.

【００１７】引込現象を発現させる疑似人格の身体動作
は、前記疑似人格が聞き手として振る舞う場合と、話し
手として振る舞う場合とで異なる。そこで、引込現象の
発現には会話リズムに同調する身体動作が最も重要であ
るとの観点から、頷き動作タイミングを決定するアルゴ
リズムを解明し、その他の身体動作も同様のアルゴリズ
ム又は頷き動作に起因するアルゴリズムに従うものとし
て、聞き手身体動作及び話し手身体動作を構成した。ま
ず、(C)疑似人格制御部が作る聞き手身体動作は、頭の
頷き動作、目の瞬き動作又は身体の身振り動作の選択的
な組み合わせからなり、(a)頷き動作は音声のON/OFFか
ら推定される頷き予測値が頷き閾値を越えた頷き動作タ
イミングで実行し、(b)瞬き動作は前記頷き動作タイミ
ングを起点として経時的に指数分布させた瞬き動作タイ
ミングで実行し、そして(c)身振り動作は音声のON/OFF
から推定される身振り予測値が身振り閾値を越えた身振
り動作タイミングで実行する。前記以外の身体動作を排
除するものではないが、前記身体動作の組合わせが聞き
手身体動作の主となる。身振り動作とは、腕の動き、腰
のひねり等、いわゆるボディーアクションと呼ばれる全
身(実際は上半身)の動きである。The body movement of the pseudo-personality that causes the pull-in phenomenon differs depending on whether the pseudo-personality behaves as a listener or as a speaker. Therefore, we have clarified the algorithm that determines the nodding action timing from the viewpoint that the body action that synchronizes with the conversation rhythm is the most important for the expression of the pull-in phenomenon, and other body actions are also caused by the same algorithm or the nodding action. The listener's body movements and the speaker's body movements were constructed to follow the algorithm. First, (C) the listener's body movements made by the pseudo-personality control unit consist of a selective combination of head nodding movements, eye blinking movements or body gesturing movements, and (a) nodding movements from voice ON / OFF. The estimated nodding predicted value is executed at the nodding operation timing exceeding the nod threshold, (b) the blink operation is performed at the blink operation timing that is exponentially distributed with the nod operation timing as a starting point, and (c). Gestures are voice ON / OFF
It is executed at the gesture motion timing when the predicted gesture value estimated from the above exceeds the gesture threshold value. Although the body movements other than those mentioned above are not excluded, the combination of the body movements is the main body movement of the listener. The gesture motion is a movement of the whole body (actually the upper body), which is so-called body action, such as a movement of an arm and a twist of the waist.

【００１８】(a)頷き動作は音声のON/OFFから推定され
る頷き予測値が頷き閾値を越えた頷き動作タイミングで
実行する。頷き予測値は、音声と頷き動作又は身振り動
作とを線形又は非線形に関係付ける予測モデル(MAモデ
ル(Moving-Average Model)やニューラルネットワークモ
デル)から算出する経時的変数である。本発明は、音声
入力開始から継続して算出できる頷き予測値が予め定め
た頷き閾値を超えた時点を頷き動作タイミングとする。
具体的には、例えば音声を経時的な電気信号のON/OFFと
して捉え、この経時的な電気信号のON/OFFから得た頷き
予測値(例えばONと判定した電気信号の電圧値)を頷き閾
値と比較して、頷き動作タイミングを導き出すわけであ
る。このような音声の意味を解析することなく、単にON
/OFFの電気信号として音声を伝達、処理することは、通
信端末個々における計算量が少なく、個人所有のコンピ
ュータでもリアルタイムな身体動作を作ることができる
利点となる。音声をON/OFFの電気信号と見る場合、経時
的な電気信号の変化を示す韻律や抑揚をも併せて考慮し
てもよい。(A) The nod operation is executed at the nod operation timing when the nod prediction value estimated from ON / OFF of the voice exceeds the nod threshold. The nod prediction value is a temporal variable calculated from a prediction model (MA model (Moving-Average Model) or neural network model) that linearly or non-linearly associates a voice with a nod motion or a gesture motion. In the present invention, the nod operation timing is the time when the nod prediction value that can be continuously calculated from the start of voice input exceeds a predetermined nod threshold.
Specifically, for example, the voice is captured as ON / OFF of the electric signal with time, and the nod prediction value obtained from the ON / OFF of the electric signal with time (for example, the voltage value of the electric signal determined to be ON) is nod. The nod operation timing is derived by comparing with the threshold value. Without analyzing the meaning of such voice, simply turn on
Transmitting and processing voice as an electric signal of / OFF has an advantage that the amount of calculation in each communication terminal is small and a personal computer can make a real-time body motion. When the voice is regarded as an ON / OFF electric signal, the prosody and the intonation indicating changes in the electric signal over time may be considered together.

【００１９】(b)瞬き動作は前記頷き動作タイミングを
起点として経時的に指数分布させた瞬き動作タイミング
で実行する。この瞬き動作タイミングを決定する前記ア
ルゴリズムは、頷き動作タイミングを決定するアルゴリ
ズムに起因して働く。こうして頷き動作と瞬き動作とを
関連づけることにより、引込現象の発現を促すことがで
きる。そして(c)身振り動作は音声のON/OFFから推定さ
れる身振り予測値が身振り閾値を越えた身振り動作タイ
ミングで実行する。この身振り動作タイミングを決定す
る前記アルゴリズムは、頷き動作タイミングを決定する
アルゴリズムと類似である。ここで、身振り動作は頷き
動作よりも頻繁に実行させることが望ましいため、頷き
動作に比較して身振り動作のタイミング決定の頻度が多
くなるように、算出する身振り予測値や予め設定する身
振り閾値を頷き動作タイミングの場合より相対的に低め
にするとよい。瞬き予測値が身振り予測値を兼ねたり、
瞬き閾値が身振り閾値を兼ねてもよい。(B) The blinking operation is executed at the blinking operation timing, which is exponentially distributed with time, starting from the nodling operation timing. The algorithm for determining the blinking operation timing works due to the algorithm for determining the nodling operation timing. By associating the nodling motion and the blinking motion in this way, the occurrence of the pull-in phenomenon can be promoted. Then, (c) the gesture motion is executed at the gesture motion timing when the gesture prediction value estimated from ON / OFF of the voice exceeds the gesture threshold. The algorithm for determining the gesture motion timing is similar to the algorithm for determining the nod motion timing. Here, since it is desirable to perform the gesture motion more frequently than the nodling motion, the gesture prediction value to be calculated or the preset gesture threshold is set so that the timing of the gesture motion is determined more frequently than the nodling motion. It is better to make it relatively lower than the nod operation timing. The blink prediction value doubles as the gesture prediction value,
The blink threshold may double as the gesture threshold.

【００２０】聞き手身体動作に対して、(D)疑似人格制
御部が作る話し手身体動作は、頭の振り動作、口の開閉
動作、目の瞬き動作又は身体の身振り動作の選択的な組
み合わせから構成した。ここで、頭の振り動作とは、頷
き動作が頭の前後方向の動きであることに対して、方向
を問わず頭を動かすことで、相対的に左右方向の動きと
なる。また、口の開閉動作は従来同様音声の大小に比例
して駆動するとよい。本発明は引込現象を発現する疑似
人格の身体動作が重要であり、口の開閉動作を無理に言
語に合わせて正確に実施する必要はない。よって、口の
開閉動作を除く話し手身体動作のうち、(a)振り動作は
音声のON/OFFから推定される振り予測値が振り閾値を越
えた振り動作タイミングで実行し、(b)瞬き動作は音声
のON/OFFから推定される目の瞬き予測値が瞬き閾値を越
えた瞬き動作タイミングで実行し、そして(c)身体の身
振り動作は音声のON/OFFから推定される身振り予測値が
身振り閾値を超えた身振り動作タイミングで実行する。In contrast to the listener's body movement, (D) the speaker's body movement made by the pseudo-personality control unit is composed of a selective combination of a head swinging motion, a mouth opening / closing motion, an eye blinking motion, or a body gesturing motion. did. Here, the swinging motion of the head means that the nod motion is a motion in the front-back direction of the head, whereas moving the head in any direction results in a relative motion in the left-right direction. Also, the opening / closing operation of the mouth may be driven in proportion to the volume of the sound as in the conventional case. In the present invention, it is important that the body movements of the pseudo-personality that express the withdrawal phenomenon are important, and it is not necessary to force the opening and closing movements of the mouth according to the language. Therefore, of the speaker's body movements excluding the mouth opening / closing movements, (a) swinging movements are performed at swinging movement timings when the estimated swinging value estimated from voice ON / OFF exceeds the swinging threshold, and (b) blinking movements. Is performed at the blink motion timing when the eye blink prediction value estimated from voice ON / OFF exceeds the blink threshold, and (c) The body gesture motion is the gesture prediction value estimated from voice ON / OFF. It is executed at a gesture motion timing that exceeds the gesture threshold.

【００２１】振り予測値、瞬き予測値及び身振り予測値
は、上述の聞き手身体動作における瞬き動作タイミング
の決定アルゴリズム同様のアルゴリズムに従って算出す
る。また、振り閾値、瞬き閾値及び身振り閾値も、上述
の聞き手身体動作における瞬き動作タイミングの決定に
用いる瞬き閾値と同様の意味である。ここで、聞き手身
体動作の身振り予測値及び身振り閾値と、話し手身体動
作の身振り予測値及び身振り閾値とは、同じでも違って
もよい。また、話し手身体動作は、頭の振り動作、瞬き
動作及び身振り動作は類似のアルゴリズムを用いるた
め、すべての予測値又は閾値を同一にして兼用してもよ
い。The swing predicted value, the blink predicted value, and the gesture predicted value are calculated according to the same algorithm as the algorithm for determining the blink motion timing in the listener's physical motion described above. The swing threshold, blink threshold, and gesture threshold also have the same meaning as the blink threshold used to determine the blink motion timing in the listener's body motion. Here, the predicted gesture value and the gesture threshold value of the listener's body movement may be the same as or different from the predicted gesture value and the gesture threshold value of the speaker's body movement. Further, the speaker's body motion uses similar algorithms for the head swinging motion, blinking motion, and gesturing motion, and therefore all prediction values or thresholds may be the same.

【００２２】[0022]

【発明の実施の形態】以下、本発明の実施形態について
図を参照しながら説明する。図１は本発明を適用したチ
ャット通信システムのシステム構成図、図２は利用者Ａ
の通信端末１のディスプレイ２の正面図(仮想空間画面
３)、図３は利用者Ｃの通信端末１のディスプレイ２の
正面図(仮想空間画面３)、図４は利用者Ａ及びＢの各通
信端末１,１を対照表示して各部の作動を表した通信端
末構成図、図５は各通信端末１の疑似人格の聞き手制御
を表す制御フローチャートであり、図６は各通信端末１
の疑似人格の話し手制御を表す制御フローチャートであ
る。本例は、本発明の適用により親密なコミュニケーシ
ョンを図る効果が最も顕著に現れるチャット通信システ
ムに関する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a system configuration diagram of a chat communication system to which the present invention is applied, and FIG. 2 is a user A.
Front view of the display 2 of the communication terminal 1 (virtual space screen 3), FIG. 3 is a front view of the display 2 of the communication terminal 1 of the user C (virtual space screen 3), and FIG. 4 is each of the users A and B. FIG. 5 is a control flow chart showing the listener control of the pseudo personality of each communication terminal 1, and FIG. 6 is each communication terminal 1.
5 is a control flowchart showing the speaker control of the pseudo personality of FIG. This example relates to a chat communication system in which the effect of intimate communication is most noticeable by applying the present invention.

【００２３】本発明によるチャット通信システムの概要
は、図１に見られるように既存のインターネット４を利
用した各種通信システムと変わりはない。インターネッ
ト４を介して相互に接続した各通信端末１,１,１,…
を、利用者Ａ,Ｂ,Ｃ,…が利用して、図２又は図３に見
られるディスプレイ２に表示した仮想空間画面３を見な
がら音声による会話を楽しむ。図１に示すシステム構成
では、通信端末１相互がインターネット４を介して直接
接続する例を図示している。しかし、チャット通信シス
テムとしての規模が大きくなる場合、各通信端末を仲介
するサーバを設け、通信端末相互がやりとりする音声の
経路特定を図ったり、場合によっては後述するような各
通信端末による疑似人格の身体動作生成をサーバに集約
して担わせ、サーバから各通信端末へ疑似人格の制御デ
ータを送信するようにしてもよい。The outline of the chat communication system according to the present invention is the same as the various communication systems using the existing Internet 4 as shown in FIG. Communication terminals 1, 1, 1, ... Connected to each other via the Internet 4
Are used by users A, B, C, ... While enjoying the conversation by voice while watching the virtual space screen 3 displayed on the display 2 shown in FIG. 2 or 3. The system configuration shown in FIG. 1 illustrates an example in which the communication terminals 1 are directly connected to each other via the Internet 4. However, when the scale of the chat communication system becomes large, a server that mediates each communication terminal is provided to identify the route of voices exchanged between the communication terminals, and in some cases, a pseudo personality by each communication terminal as described later. It is also possible to have the server centralize the generation of the body motion and transmit the pseudo-personal control data from the server to each communication terminal.

【００２４】本発明における従来チャット通信システム
との違いは、まず各通信端末１のディスプレイ２に表示
する仮想空間画面３に現れる。図１中、利用者Ａ,Ｂ及
びＣが互いにチャット通信している場合、利用者Ａの通
信端末１では、図２に見られるように、自分を表す本人
疑似人格ａを背面表示かつ相対的拡大表示し、相手(利
用者Ｂ,Ｃ)を表す相手疑似人格ｂ及びｃを正面表示かつ
相対的縮小表示する。これは、遠近手法を利用した仮想
空間画面３の奥行き表現であり、本人又は相手に対して
コミュニケーション空間を感得させる手段である。本例
では、更に相手疑似人格ｂ及びｃの間でも大きさを変え
ている。これは、本人疑似人格ａに対する相手疑似人格
ｂ及びｃの大きさを違えることにより、例えばチャット
通信に参加した順や、利用者Ａの親密度に基づく違いを
表現する。The difference from the conventional chat communication system in the present invention first appears in the virtual space screen 3 displayed on the display 2 of each communication terminal 1. When users A, B, and C are chatting with each other in FIG. 1, the communication terminal 1 of the user A displays the pseudo personality a representing himself as a rear view and a relative display as shown in FIG. The enlarged display is performed, and the pseudo personalities b and c representing the opponents (users B and C) are displayed in front and relatively reduced. This is a depth representation of the virtual space screen 3 using the perspective method, and is a means for making the person or the other person feel the communication space. In this example, the size is also changed between the pseudo-personal characters b and c. This expresses a difference based on, for example, the order in which they participated in chat communication or the familiarity of the user A, by making the sizes of the pseudo personalities b and c different from the pseudo personality a.

【００２５】各通信端末１は、図４に見られるように、
音声に基礎を置くチャット通信システムであることから
音声入出力部５,６を設けているほか、本例では疑似人
格に指定動作を実行させるための指定動作入力部７を設
けている。本例では、疑似人格制御部８、特定動作制御
部９、身体動作合成部10及び送受信処理部11,12を備え
たコンピュータ13を中核として、仮想空間画面３を表示
するディスプレイ２、音声入力部５としてコンピュータ
13に接続したマイク、指定動作入力部７としてコンピュ
ータ13に接続したキーボード又はマウス、そして音声出
力部６としてコンピュータ13に接続したスピーカから構
成する全体を、通信端末としている。Each communication terminal 1, as seen in FIG.
Since the chat communication system is based on voice, voice input / output units 5 and 6 are provided, and in the present example, a designated action input unit 7 for causing the pseudo-personality to perform the designated action is provided. In this example, a computer 13 including a pseudo-personality control unit 8, a specific motion control unit 9, a body motion synthesis unit 10, and transmission / reception processing units 11 and 12 is used as a core, a display 2 for displaying a virtual space screen 3, a voice input unit. Computer as 5
A communication terminal is composed of a microphone connected to 13, a keyboard or mouse connected to the computer 13 as the designated operation input unit 7, and a speaker connected to the computer 13 as the audio output unit 6.

【００２６】コンピュータ13は、疑似人格制御部８、特
定動作制御部９、身体動作合成部10を、コンピュータ付
属の記憶装置(図示略)に記憶させて用いるソフトウェア
によりCPU等(図示略)を制御して機能的に構成する。こ
れら制御部８,９及び合成部10は、専用IC等を用いてハ
ード的な処理を担わせることができるが、機能の拡張又
は修正といったメンテナンスの観点から、汎用のコンピ
ュータ13をソフトウェア的に制御して構成する方がよ
い。また、コンピュータ13に接続又は内蔵するモデム、
TA又はネットワーク機器等が送受信処理部11,12を構成
する。マウス、キーボード又はスピーカはコンピュータ
が内蔵している場合(コンピュータがノート型の場合)
や、マウスに代えて各種入力デバイス(トラックパッ
ド、トラックポイント等)を用いることもある。The computer 13 controls the CPU and the like (not shown) by software using the pseudo personality control unit 8, the specific action control unit 9, and the physical action synthesis unit 10 stored in a storage device (not shown) attached to the computer. And functionally configure. These control units 8 and 9 and the synthesizing unit 10 can perform hardware processing by using a dedicated IC or the like, but from the viewpoint of maintenance such as expansion or modification of functions, the general-purpose computer 13 is controlled by software. It is better to configure. Also, a modem connected to or built in the computer 13,
The TA or the network device constitutes the transmission / reception processing units 11 and 12. If the computer has a mouse, keyboard or speaker (if the computer is a laptop)
Alternatively, various input devices (track pad, track point, etc.) may be used instead of the mouse.

【００２７】本発明は、本人又は相手の音声を基礎にし
て、各通信端末１の仮想空間画面３に表示する疑似人格
を話し手又は聞き手として振る舞わせ、仮想空間画面３
を見ている本人及び相手を会話リズムに引込む引込現象
を発現させる。この疑似人格は、簡素な３次元モデルで
表現する人体で構わない。本発明では、本人及び相手疑
似人格の対面表示と、各疑似人格の身体動作及び特定動
作が重要であり、疑似人格自体の画像のリアルさは特に
問わないからである。また、疑似人格の顔の表情又は顔
色を変化させる方が好ましいが、本発明に必須の要素で
はない。これから、チャット通信システムとして本発明
を利用する場合、疑似人格は動物や無生物を模した３次
元又は２次元モデルであってもよい。リアルタイムな身
体動作の生成、実行を鑑みた場合は簡素な疑似人格が好
ましいが、例えば通信端末の画像処理能力が高ければ、
本人及び相手の実写モデルとして疑似人格を構成し、前
記顔の表情や顔色を変化させることもできる。According to the present invention, the pseudo-personality displayed on the virtual space screen 3 of each communication terminal 1 behaves as a speaker or a listener based on the voice of the person or the other party, and the virtual space screen 3 is displayed.
The attraction phenomenon that draws the person who is watching and the other party into the conversation rhythm is expressed. The pseudo-personality may be a human body expressed by a simple three-dimensional model. This is because in the present invention, the face-to-face display of the person and the pseudo-personality of the person and the physical action and the specific action of each pseudo-personality are important, and the realness of the image of the pseudo-personality itself does not matter. Further, it is preferable to change the facial expression or the complexion of the pseudo personality, but it is not an essential element of the present invention. Therefore, when the present invention is used as a chat communication system, the pseudo personality may be a three-dimensional or two-dimensional model imitating an animal or an inanimate object. Considering the generation and execution of real-time body movements, a simple pseudo-personality is preferable, but if the image processing capability of the communication terminal is high, for example,
It is also possible to configure a pseudo personality as a live-action model of the person and the other party and change the facial expression and the complexion of the face.

【００２８】次に、利用者Ａが利用者Ｃに対して話す場
合を例に取り、各通信端末１の身体動作生成及び各疑似
人格の操作について説明する。利用者Ａの通信端末１を
本人通信端末101、利用者Ｃの通信端末を相手通信端末1
02とし、各通信端末101,102はサーバを仲介せず直接音
声をやり取りする。また、利用者Ａは本人通信端末101
では背面表示の本人疑似人格ａ(話し手、図２参照)、相
手通信端末102では正面表示の相手疑似人格ａ(話し手、
図３参照)として表示、利用者Ｃは本人通信端末101では
正面表示の相手疑似人格ｃ(聞き手、図２参照)、相手通
信端末102では背面表示の本人疑似人格ｃ(聞き手、図３
参照)として表示する。Next, taking the case where the user A speaks to the user C as an example, the generation of a physical action of each communication terminal 1 and the operation of each pseudo personality will be described. The communication terminal 1 of the user A is the own communication terminal 101, and the communication terminal of the user C is the partner communication terminal 1
02, the communication terminals 101 and 102 directly exchange voices without mediating the server. Further, the user A is the personal communication terminal 101.
In the rear display, the pseudo-personality a (speaker, see FIG. 2), and in the partner communication terminal 102, the pseudo-personality a (speaker,
(See FIG. 3), the user C shows the pseudo-personal character c in front view on the personal communication terminal 101 (listener, see FIG. 2), and the pseudo-personal character c in back view on the personal communication terminal 102 (listener, FIG. 3).
Display).

【００２９】音声入力部５から入力された利用者Ａの本
人音声(太破線矢印表示、以下同じ)は、本人通信端末10
1の疑似人格制御部８及び送信処理部11へ送られる。こ
の利用者Ａの本人音声は、本人通信端末101の疑似人格
制御部８では、(1)相手疑似人格ｃの聞き手身体動作
と、(2)本人疑似人格ａの話し手身体動作とを決定する
基礎となる。また、送信処理部11からインターネット４
を介して相手通信端末102へ送られた利用者Ａの本人音
声は、相手通信端末102の疑似人格制御部８で、(3)本人
疑似人格ｃの聞き手身体動作と、(4)相手疑似人格ａの
話し手身体動作を決定する基礎となる。このように、本
発明は一つの音声(例示では利用者Ａの本人音声)を基礎
として、本人及び相手の各通信端末101.102で計４体の
本人及び相手疑似人格ａ,ｃの話し手及び聞き手身体動
作を決定する。実際には、各利用者相互で本人及び相手
の関係が入れ代わりながら会話が進むため、各疑似人格
については話し手身体動作及び聞き手身体動作が同時に
決定される。この場合、疑似人格制御部８は聞き手身体
動作及び話し手身体動作の複合的身体動作を作り出し、
前記複合的身体動作に従って疑似人格を動かす。The personal voice of the user A input from the voice input unit 5 (displayed by a thick broken line arrow, the same applies hereinafter) is the personal communication terminal 10
It is sent to the pseudo personality control unit 8 and the transmission processing unit 11. The voice of the user A is the basis for determining, in the pseudo personality control unit 8 of the personal communication terminal 101, (1) the listener's physical action of the opponent pseudo personality c and (2) the speaker's physical action of the pseudo personality a. Becomes Also, from the transmission processing unit 11 to the Internet 4
The personal voice of the user A sent to the partner communication terminal 102 via the personal computer is transmitted by the pseudo personality control unit 8 of the partner communication terminal 102 to (3) the listener's body motion of the personal pseudo personality c, and (4) the pseudo personality of the partner. It is the basis for determining the speaker physical movement of a. As described above, the present invention is based on one voice (in the example, the voice of the user A), and the communication terminal 101.102 of the user and the other party have a total of four persons and the speaker and listener body of the pseudo-personal characters a and c. Determine the action. In reality, since the conversation progresses while the relationship between the user and the other person is exchanged with each other, the speaker physical movement and the listener physical movement are simultaneously determined for each pseudo personality. In this case, the pseudo-personal control unit 8 creates a complex physical motion of the listener's physical motion and the speaker's physical motion,
The pseudo-personality is moved according to the complex physical action.

【００３０】利用者Ａに対して親密なコミュニケーショ
ンを感じさせる引込現象を発現させるには、本人通信端
末101の仮想空間画面３に写し出す相手疑似人格ｃを聞
き手身体動作に従って振る舞わせればよい。この相手疑
似人格ｃの聞き手としての振る舞いは、頷き動作を中心
とする聞き手身体動作となり、頷き動作タイミングの決
定が重要となる。この頷き動作タイミングの決定は次の
手順に従う。まず、疑似人格制御部８へ入力した本人音
声をON/OFFの電気信号とみなして、図５に見られるよう
に、まずMAモデルを利用した頷き推定から頷き予測値Ｎ
0を算出する。こうした頷き予測値Ｎ0の計算は、非常に
簡単であり、比較的処理能力の低いコンピュータ13であ
ってもリアルタイムに実施できる。これにより、本発明
は従来のチャット通信システムと異なり、特にサーバを
介さずに利用者相互を直接結んで利用できることが、利
点に繋がっている。In order to make the user A feel an intimate communication effect, the pseudo-personality c of the opponent displayed on the virtual space screen 3 of the personal communication terminal 101 may be made to act in accordance with the listener's physical movement. The behavior of the opponent pseudo-personality c as a listener is the listener's body movements centering on the nodding movement, and it is important to determine the nodding movement timing. The nod operation timing is determined according to the following procedure. First, considering the person's voice input to the pseudo-personality control unit 8 as an ON / OFF electric signal, as shown in FIG. 5, the nod prediction value N is first calculated from the nod estimation using the MA model.
Calculate 0. The calculation of the nod prediction value N0 is very simple and can be performed in real time even by the computer 13 having a relatively low processing capacity. Therefore, unlike the conventional chat communication system, the present invention has an advantage that users can be directly connected to each other without using a server.

【００３１】頷き動作タイミングは、前記頷き予測値Ｎ
0を予め定めた頷き閾値Ｎaと比較して、Ｎ0≧Ｎaであれ
ば、頷き予測値Ｎ0の時間タイミングとして決定する。
そして、前記頷き動作タイミングに頷き動作を実行す
る。本例では、毎回同じ頷き動作では自然な動作として
見られない虞れがあることを考慮して、頭の動く大きさ
や動く速度等を変えた複数の動作パターンを設定し、ラ
ンダム又は他の身体動作等の関係、例えば音声の強弱に
従って適宜違う動作パターンを選択する。音声の強弱は
感情の起伏を表すバロメータであり、この音声の強弱に
従って動作パターンの強弱等を決定しておくと好まし
い。また、音声の強弱に従って、同一パターンの動きに
大小をつけてもよい。あくまで動作パターンが複数ある
のみで、頷き動作タイミングは、先に決定した時点とす
る。こうした動作パターンの多様性は、引込現象の発現
にも有効である。The nod operation timing is the nod predicted value N.
0 is compared with a predetermined nodling threshold value Na, and if N0 ≧ Na, it is determined as the time timing of the nodling predicted value N0.
Then, the nod operation is executed at the nod operation timing. In this example, considering that there is a possibility that the same nodding motion may not be seen as a natural motion every time, a plurality of motion patterns in which the size and speed of movement of the head are changed are set, and random or other body motions are set. A different operation pattern is appropriately selected according to the relationship of the operation, for example, the strength of the voice. The strength of the voice is a barometer showing the ups and downs of emotions, and it is preferable to determine the strength of the motion pattern according to the strength of the voice. Further, the movement of the same pattern may be changed according to the strength of the voice. There are only a plurality of operation patterns, and the nod operation timing is the time point determined previously. Such a variety of motion patterns is also effective for manifesting the pull-in phenomenon.

【００３２】相手疑似人格ｃの聞き手身体動作では、瞬
き動作を頷き動作に係らせて瞬き動作タイミングを決定
する。具体的には、上述のように決定した頷き動作タイ
ミングを基礎として、経時的な指数分布計算により瞬き
動作タイミングを決定する。この瞬き動作においても、
画一的な瞬き動作では自然な感じが損なわれるので、複
数の動作パターンを設定しておき、ランダム又は他の身
体動作等、例えば音声の強弱に従って適宜違う動作パタ
ーンを選択する。瞬き動作は、身体動作全体から見て付
随的なものであるため、例えば本人音声を単純なON/OFF
信号とみなして、実行させてもよい。In the listener's body motion of the pseudo-personal character c, the blink motion timing is determined by relating the blink motion to the nod motion. Specifically, based on the nodding motion timing determined as described above, the blink motion timing is determined by the exponential distribution calculation over time. Even in this blinking motion,
Since a uniform blinking motion impairs a natural feeling, a plurality of motion patterns are set in advance, and a different motion pattern such as a random motion or another physical motion, for example, the strength of a voice is appropriately selected. Blinking is an incidental aspect of the whole body movement, so for example, the person's voice can be simply turned on / off.
It may be regarded as a signal and executed.

【００３３】相手疑似人格ｃの身振り動作は、基本的に
は頷き動作と同様な手順により身振り動作タイミングを
決定する。動作態様そのものは頷き動作及び身振り動作
で異なるが、各動作タイミングが同期するとやはり自然
な感じが損なわれる。そこで、頷き動作タイミングの決
定と同じ頷き予測値Ｎ0を身振り閾値として用いなが
ら、比較する身振り閾値Ｇaを頷き閾値Ｎ0と異ならせる
ことで、実際の身振り動作タイミングを頷き動作タイミ
ングとずらすようにしている。更に、この身振り動作タ
イミングに実行する具体的な身振り動作の動作パターン
を複数設定しておき、ランダム又は他の身体動作等、例
えば音声の強弱に従って適宜違う動作パターンを選択す
ることで多様性を実現している。In the gesture motion of the opponent pseudo-personality c, the gesture motion timing is basically determined by the same procedure as the nod motion. Although the motion mode itself differs depending on the nod motion and the gesture motion, when the motion timings are synchronized, the natural feeling is also lost. Therefore, the actual noisy motion timing is shifted from the nodding motion timing by using the same nod prediction value N0 as the determination of the nod motion timing as the gesture threshold while making the gesture threshold Ga to be compared different from the nod threshold N0. . Furthermore, by setting a plurality of specific motion patterns of specific gesture motions to be executed at this gesture motion timing and selecting different motion patterns such as random or other physical motions, for example, according to the strength of the voice, diversity is realized. is doing.

【００３４】本発明では、同一仮想空間画面３内に相手
疑似人格ｃを正面表示するのみならず、本人疑似人格ａ
を背面表示することで、仮想空間画面３全体を親密なコ
ミュニケーションの空間として感得させ、引込現象の発
現を促している。この話し手としての本人疑似人格ａの
振る舞いは、上記聞き手制御手順に類似した話し手制御
に従うが、話し手では頷き動作に代えて頭の振り動作を
用いるほか、口の開閉動作を伴う点が異なっている。ま
ず、疑似人格制御部へ入力した本人音声をON/OFFの電気
信号とみなして、図６に見られるように、MAモデルを利
用した頭の振り推定から振り予測値Ｍ0を算出する。前
記MAモデルは、当然聞き手の場合と話し手の場合とでは
異なることが望ましいが、同一であっても差し支えな
い。単純に頭の振り動作と頷き動作とを比較することは
できないが、頭の振り動作は頷き動作よりも頻度を相対
的に高くするとよい。According to the present invention, not only the opponent pseudo-personality c is displayed in front on the same virtual space screen 3, but also the pseudo-personality a
By displaying on the back side, the entire virtual space screen 3 is perceived as a space for intimate communication, and the occurrence of the pull-in phenomenon is promoted. The behavior of the pseudo-personal character a as the speaker follows a speaker control similar to the above listener control procedure, except that the speaker uses a swinging motion of the head instead of the nodding motion and a mouth opening / closing motion. . First, the person's voice input to the pseudo-personality control unit is regarded as an ON / OFF electric signal, and as shown in FIG. 6, the swing predicted value M0 is calculated from the swing estimation of the head using the MA model. Of course, it is preferable that the MA model is different between the listener and the speaker, but the MA model may be the same. Although it is not possible to simply compare the head shaking motion and the nodling motion, it is preferable that the frequency of the head shaking motion is relatively higher than that of the nodding motion.

【００３５】本人疑似人格ａの話し手身体動作では、振
り動作と瞬き動作とは基本的に無関係であり、瞬き動作
は振り動作と類似手順で別途瞬き動作タイミングを決定
する。すなわち、瞬き推定により瞬き予測値Ｅ0を算出
し、この瞬き予測値Ｅ0を瞬き閾値Ｍaと比較することに
より、瞬き動作タイミングを決定する。この瞬き動作に
おいても、画一的な瞬き動作では自然な感じが損なわれ
るので、複数の動作パターンを設定しておき、ランダム
又は他の身体動作等、例えば音声の強弱に従って適宜違
う動作パターンを選択する。瞬き動作は、身体動作全体
から見て付随的なものであるため、例えば本人音声を単
純なON/OFF信号とみなしてもよい。この瞬き動作は、本
人通信端末101では本人疑似人格ａが背面表示される関
係から本人が見ることはなく、省略できるが、相手通信
端末102の仮想空間画面３に表示した話し手としての相
手疑似人格ａには必要となる(図２及び図３比較対照)。In the talker's body motion of the pseudo-personality a, the swinging motion and the blinking motion are basically unrelated, and the blinking motion is determined separately by the procedure similar to the swinging motion. That is, the blink prediction value E0 is calculated by blink estimation, and the blink operation timing is determined by comparing this blink prediction value E0 with the blink threshold Ma. Even in this blinking motion, the natural feeling is impaired by the uniform blinking motion, so multiple motion patterns are set and random or other physical motions such as different motion patterns are selected according to the strength of the voice, for example. To do. Since the blinking motion is incidental to the whole body motion, the person's voice may be regarded as a simple ON / OFF signal, for example. This blinking operation can be omitted because the person himself / herself is not seen because the pseudo personality a is displayed on the back of the personal communication terminal 101, but it can be omitted, but the pseudo personality as the speaker displayed on the virtual space screen 3 of the partner communication terminal 102 is displayed. a is required (FIGS. 2 and 3 for comparison).

【００３６】話し手としての本人疑似人格ａに特有な口
の開閉動作は、話し手の本人音声と密接な関係にあり、
最も簡易には本人音声をON/OFFの電気信号とみなして従
わせる方がよい。そこで、本例では、リアルタイムな本
人音声のON/OFFの変化に従って口の開閉動作を実施して
いる。この開閉動作においても、画一的な瞬き動作では
自然な感じが損なわれるので、複数の動作パターンを設
定しておき、ランダム又は他の身体動作等、例えば音声
の強弱に従って適宜違う動作パターンを選択する。ま
た、音声の強弱に従って、同一パターンの動きに大小を
つけてもよい。この開閉動作は、本人通信端末101では
本人疑似人格ａが背面表示される関係から本人が見るこ
とはなく、省略できるが、相手通信端末102の仮想空間
画面３に表示した話し手としての相手疑似人格ａには必
要となる(図２及び図３比較対照)。The mouth opening / closing action peculiar to the person himself / herself as a speaker is closely related to the voice of the speaker.
In the simplest case, it is better to regard the person's voice as an ON / OFF electric signal and follow it. Therefore, in this example, the opening / closing operation of the mouth is performed according to the ON / OFF change of the person's voice in real time. Even in this opening / closing operation, the natural feeling is lost in the uniform blinking operation, so multiple operation patterns are set, and a different operation pattern is selected appropriately according to the strength of the sound, such as random or other body movements. To do. Further, the movement of the same pattern may be changed according to the strength of the voice. This opening / closing operation can be omitted because the person himself / herself is not seen because the pseudo personality a is displayed on the back of the personal communication terminal 101, but the pseudo personality as the speaker displayed on the virtual space screen 3 of the partner communication terminal 102 can be omitted. a is required (FIGS. 2 and 3 for comparison).

【００３７】話し手としての本人疑似人格ａの身振り動
作はボディーアクションであり、本例では振り予測値Ｍ
0を身振り予測値として利用し、振り動作タイミングの
決定と同様な手順で身振り動作タイミングを決定してい
る。しかし、振り動作と身振り動作とが同期すると不自
然な感じを与えやすいので、頷き予測値Ｎ0を身振り予
測値として用いながら、比較する身振り閾値Ｇaを頷き
閾値Ｎaと異ならせている。更に、この身振り動作タイ
ミングに実行する具体的な身振り動作の動作パターンを
複数設定しておき、ランダム又は他の身体動作等、例え
ば音声の強弱に従って適宜違う動作パターンを選択する
ことで多様性を実現している。また、音声の強弱に従っ
て、同一パターンの動きに大小をつけてもよい。The gesture motion of the pseudo-personal character a as a speaker is a body action, and in this example, a predicted motion value M
By using 0 as the gesture prediction value, the gesture motion timing is determined in the same procedure as the determination of the swing motion timing. However, since it is easy to give an unnatural feeling when the swing motion and the gesture motion are synchronized, the gesture threshold value Ga to be compared is made different from the nod threshold value Na while using the nod prediction value N0 as the gesture prediction value. Furthermore, by setting a plurality of specific motion patterns of specific gesture motions to be executed at this gesture motion timing and selecting different motion patterns such as random or other physical motions, for example, according to the strength of the voice, diversity is realized. is doing. Further, the movement of the same pattern may be changed according to the strength of the voice.

【００３８】ここで、利用者Ａが指定動作入力部７から
特定動作を指定した場合、特定動作制御部９が入力され
た指定信号(図４中太線矢印表示)に対応した特定動作を
本人疑似人格ａに与える。この特定動作の指定は利用者
Ａの本人音声と並列に入力されるから、疑似人格制御部
８が作り出す身体動作と特定動作制御部９が作り出す特
定動作とは依存する。そこで、身体動作合成部10が両動
作を合成し、一体の動作として話し手としての本人疑似
人格ａを動かす。身体動作と特定動作とは、優先度を設
けて択一的に選択又は重み付けして合成したりできる。
実際には、特定の疑似人格に対して同時に聞き手身体動
作及び話し手身体動作が作り出され、疑似人格制御部８
で複合的身体動作として合成しているので、身体動作合
成部10では前記複合的身体動作と特定動作とを合成する
ことになる。Here, when the user A designates a specific action from the designated action input unit 7, the specific action control unit 9 simulates the specific action corresponding to the input designated signal (indicated by a thick arrow in FIG. 4). Give to personality a. Since the designation of the specific action is input in parallel with the voice of the user A, the physical action created by the pseudo-personality control unit 8 and the specific action created by the specific action control unit 9 depend on each other. Therefore, the body movement synthesizing unit 10 synthesizes both movements and moves the pseudo-personal character a as the speaker as an integral movement. The body movement and the specific movement can be combined by selectively assigning or weighting them with priorities.
Actually, the listener's body movement and the talker's body movement are simultaneously generated for a specific pseudo personality, and the pseudo personality control unit 8
Since it is synthesized as a complex body motion in (3), the body motion synthesizer 10 synthesizes the complex body motion and the specific motion.

【００３９】利用者Ａの本人音声は、そのまま送信処理
部11からインターネット４を介して利用者Ｃの相手通信
端末102における受信処理部12へ送られる。相手通信端
末102では、受信処理部12から利用者Ａの本人音声を受
けた疑似人格制御部８が、相手疑似人格ａの話し手身体
動作と、本人疑似人格ｃの聞き手身体動作を上記話し手
制御手順又は聞き手制御手順に従って作り出す。そし
て、本人音声と同時に送信されてきた特定動作の指定信
号に従って、特定動作制御部９が相手疑似人格ａの特定
動作を作り出し、前記話し手身体動作と合成して相手疑
似人格ａを動かすことになる。このように、相手通信端
末102では仮想空間画面３に表示する本人及び相手疑似
人格ａ,ｃの位置関係が変わり、正面表示及び背面表示
が逆転する違いはあるものの、身体動作の生成等は上述
したところと変わるところはない。このため、相手通信
端末102における具体的な身体動作や身体動作及び特定
動作の組合わせについての説明は省略する。The voice of the user A is sent from the transmission processing unit 11 as it is to the reception processing unit 12 of the partner communication terminal 102 of the user C via the Internet 4. In the partner communication terminal 102, the pseudo personality control unit 8 that receives the voice of the user A from the reception processing unit 12 determines the talker physical action of the pseudo personality a of the partner and the listener's physical action of the pseudo personality c of the user as the speaker control procedure. Or create according to listener control procedures. Then, the specific motion control unit 9 creates a specific motion of the opponent pseudo-personality a in accordance with the specific motion designation signal transmitted at the same time as the voice of the person, and synthesizes it with the talker's body motion to move the partner pseudo-personality a. . As described above, in the partner communication terminal 102, although the positional relationship between the person and the pseudo-personal characters a and c displayed on the virtual space screen 3 is changed, and the front display and the back display are reversed, the generation of the body motion is described above. There is no difference from what I did. Therefore, a description of a specific physical action or a combination of the physical action and the specific action in the partner communication terminal 102 will be omitted.

【００４０】このように、本発明では、本人音声を基礎
として、本人通信端末101における本人疑似人格ａの話
し手身体動作、相手疑似人格ｃの聞き手身体動作と、相
手通信端末102における相手疑似人格ａの話し手身体動
作、本人疑似人格ｃの聞き手身体動作とを決定し、それ
ぞれの通信端末101,102における仮想空間画面３内で各
疑似人格を動かす。実際には、本人と相手とは相対的な
関係にあり、交互に話し手と聞き手とが入れ代わりなが
ら会話が進められるので、前記本人音声を基礎とするほ
かに、相手音声を基礎として、本人通信端末101におけ
る本人疑似人格ａの聞き手身体動作、相手疑似人格ｃの
話し手身体動作と、相手通信端末102における相手疑似
人格ａの聞き手身体動作、本人疑似人格ｃの話し手身体
動作とが加わることになる。各疑似人格の聞き手身体動
作は、相手が複数である場合、更に複数生成される。こ
のような話し手身体動作と複数の聞き手身体動作が同時
に作られた場合、疑似人格制御部８が各身体動作の協調
を図り、複合的身体動作として身体動作合成部10へ制御
データ又は信号を出力するとよい。前記身体動作の協調
は、身体動作合成部10で実施してもよい。As described above, according to the present invention, based on the voice of the person, the speaker's body movement of the person's pseudo personality a in the person's communication terminal 101, the listener's body movement of the person's pseudo personality c, and the person's pseudo person a in the other party's communication terminal 102 are performed. And the listener's body movements of the person's pseudo personality c are determined, and each pseudo personality is moved within the virtual space screen 3 of the respective communication terminals 101, 102. In reality, the person and the other party are in a relative relationship, and the conversation can proceed while the speaker and the listener are alternately exchanged. Therefore, in addition to the person's voice, the person's communication terminal is based on the person's voice. The listener's body motion of the pseudo-personal personality a and the talker's body motion of the partner pseudo-personality c in 101, and the listener's body motion of the partner pseudo-personality a and the talker's body motion of pseudo-personal personality c in the partner communication terminal 102 are added. If there are a plurality of opponents, a plurality of listener physical actions of each pseudo-personality are further generated. When such a talker's body movement and a plurality of listener's body movements are simultaneously produced, the pseudo-personal control unit 8 coordinates each body movement and outputs control data or a signal to the body movement synthesis unit 10 as a composite body movement. Good to do. The body movement coordination may be performed by the body movement synthesis unit 10.

【００４１】特定動作は、話し手である本人又は相手の
意識的な動作指定によって、本人疑似人格の話し手身体
動作に付加する動作であり、相手疑似人格の聞き手身体
動作は基本的に無関係である。しかし、特定動作によっ
ては本人疑似人格の話し手特定動作に対応する相手疑似
人格の聞き手特定動作を予め定めておき、相手疑似人格
の聞き手身体動作に前記聞き手特定動作を合成してもよ
い。例えば、図２及び図３では、利用者Ｃがチャット通
信を終了するに当たって離別を示唆する手を振る行為
(「さようなら」を示唆する行為)を指定し、話し手特定
動作として話し手身体動作に合成して相手疑似人格ｃ
(図２)及び本人疑似人格ｃ(図３)を動かしている。前記
話し手特定動作に対して、本人疑似人格ａ(図２)及び相
手疑似人格ａ(図３)は、応答する聞き手特定動作とし
て、若干手を挙げるようにしている。特定動作は、特定
の相手のみに発する場合と、通信相手全員に発する場合
とが考えられ、基本となる聞き手身体動作を抑えた大き
な動きは好ましくない。このため、例示での本人疑似人
格ａ(図２)及び相手疑似人格ａ(図３)の動きは控えめに
しているわけである。The specific motion is a motion added to the speaker's physical motion of the pseudo-personal personality by intentionally specifying the motion of the speaker or the other party, and the listener's physical motion of the pseudo-personal personality is basically irrelevant. However, depending on the specific action, the listener-specific action of the pseudo-personality of the opponent corresponding to the speaker-identification action of the pseudo-personality of the other person may be defined in advance, and the listener-specific action may be combined with the listener's physical action of the pseudo-personality of the other person. For example, in FIG. 2 and FIG. 3, when the user C terminates the chat communication, he / she shakes his hand to suggest separation.
Designate (action that suggests "goodbye"), synthesize as a speaker-specific action into the speaker's physical action, and simulate the opponent's personality c
(Fig. 2) and the pseudo-personality c (Fig. 3) are moving. In response to the speaker specifying action, the pseudo personality a (FIG. 2) and the pseudo partner personality a (FIG. 3) slightly raise their hands as responding listener specifying actions. It is considered that the specific motion is issued only to a specific partner, or to all communication partners, and a large movement that suppresses the basic listener's physical motion is not preferable. For this reason, the movements of the pseudo-personal personality a (FIG. 2) and the pseudo-personal personality a (FIG. 3) in the example are restrained.

【００４２】本発明は、上述の例示のように、音声チャ
ット通信システムでの利用が最も相応しいが、本人又は
相手のいずれか一方にのみ本発明を適用する利用形態を
想定すれば、例えば通信システムを利用した企業のコー
ルセンタのサービスシステムや、メールの音声読上装置
としても利用できる。コールセンタのサービスシステム
としては、顧客に応対するサービス員が本発明適用の通
信端末を用い、サービス員及び顧客からの音声を基礎と
してサービス員の受付端末で疑似人格を動かす。従来、
サービス員の対応が問題とされることもあるが、本発明
を利用することでサービス員の対応改善が期待されるほ
か、引込現象の発現によりサービス員が顧客の話をより
よく理解できるようになる。The present invention is most suitable for use in a voice chat communication system as described above, but if a usage form in which the present invention is applied to either the person or the other party, for example, a communication system is used. It can also be used as a service system of a corporate call center that uses, or as a voice reading device for mail. As a service system of a call center, a service person who responds to a customer uses a communication terminal to which the present invention is applied, and a pseudo personality is moved by a reception terminal of the service person based on voices from the service person and the customer. Conventionally,
Although the service personnel's response may be a problem, the use of the present invention is expected to improve the service personnel's response, and the service employee can understand the customer's story better by the occurrence of the pull-in phenomenon. Become.

【００４３】メールの音声読上装置として構成する場合
は、音声合成するソフトウェアを用いてメールの文章を
音声化し、本発明の通信端末をスタンドアローンで利用
することになる。この場合、本人となる利用者(メール
受信者)が音声を発しなくても、仮想空間画面内におい
てはメールを読み上げる相手疑似人格に応答して本人疑
似人格が聞き手身体動作に従って動くため、仮想空間画
面を見る利用者に親密なコミュニケーションを感得させ
ることができる。この場合、相手(メール送信者)が発す
る相手音声とは異なり、明確な言葉を基礎とした合成音
声を利用するため、言葉の判別が容易である。これを利
用して、音声認識による言葉の意味を特定動作の指定に
利用し、合成音性を基礎とした身体動作に前記特定動作
を合成するようにしてもよい。こうした音声認識による
言葉の意味から特定動作を指定する方法は、上記チャッ
ト通信システムやコールセンタのサービスシステムでも
利用できる。In the case of being configured as a device for reading a voice of a mail, the text of the mail is converted into a voice by using software for voice synthesis, and the communication terminal of the present invention is used as a stand-alone device. In this case, even if the user (mail recipient) who is the main person does not utter a voice, in the virtual space screen, the pseudo personality moves in response to the pseudo personality of the person who reads the mail, and the virtual personality moves according to the listener's physical movements. It is possible to make users who see the screen feel intimate communication. In this case, unlike the voice of the other party (mail sender), a synthesized voice based on a clear word is used, so that the word can be easily identified. By utilizing this, the meaning of the word by the voice recognition may be used for designating the specific action, and the specific action may be synthesized with the physical action based on the synthetic voice. Such a method of designating a specific action based on the meaning of a word by voice recognition can also be used in the chat communication system and the service system of the call center.

【００４４】[0044]

【発明の効果】本発明により、時間的又は距離的に隔て
られた本人と相手とが会話をする場合に、より親密なコ
ミュニケーションを可能にする通信システムを提供でき
るようになる。近年、特定の言葉の意味を理解させたり
して、より親密なコミュニケーションを実現しようとす
る流れがあるが、そうした人間レベルの知的判断を伴う
通信システムでは、通信端末に求められる性能が高くな
り、通信環境も高速な通信速度を要求しかねない。本発
明は、こうした現状を踏まえ、親密なコミュニケーショ
ンは会話リズムによりもたらされることを解明し、この
会話リズムを作り出す引込現象を発現する点に注力し
て、比較的低性能の通信端末と低速な通信環境を利用し
ながらも、十分親密なコミュニケーションを実現した点
に大きな効果がある。As described above, according to the present invention, it is possible to provide a communication system which enables more intimate communication when the person and the other party who are separated from each other in time or distance have a conversation. In recent years, there has been a trend to realize more intimate communication by understanding the meaning of specific words, but in communication systems that involve such human-level intelligent judgment, the performance required of communication terminals is high. However, the communication environment may require a high communication speed. Based on such a situation, the present invention has clarified that intimate communication is brought about by a conversation rhythm, and focuses on expressing a pull-in phenomenon that creates this conversation rhythm. There is a great effect in realizing sufficiently close communication while using the environment.

【００４５】通信端末の性能を過大に要求しない理由
は、各疑似人格の身体動作を生成する際に用いる制御手
順が、簡単な処理で済む点に負うところが大きい。音声
を基礎として、会話のリズムを作り出す頷き動作又は頭
の振り動作、瞬き動作、口の開閉動作及び身振り動作
は、音声を単なるON/OFFの電気信号として扱うため、処
理時間及び負担が非常に小さいわけである。また、低速
な通信環境しか要求しない理由は、各通信端末間でやり
とりするものがデータ量の少ない音声のみで、この音声
を基礎としてすべての身体動作を各通信端末上で生成で
きるからである。これは、サーバを介さないピアツーピ
アのチャット通信システムでも本発明が容易に利用でき
る意味を有する。The reason why the performance of the communication terminal is not excessively required is that the control procedure used for generating the body motion of each pseudo-personality requires only simple processing. The nodding motion or head shaking motion, blinking motion, mouth opening / closing motion, and gesture motion that create the rhythm of conversation based on voice treats the voice as a simple ON / OFF electric signal, so processing time and burden are extremely high. It's small. Further, the reason why only the low-speed communication environment is required is that what is exchanged between the communication terminals is only a voice with a small amount of data, and all the physical motions can be generated on each communication terminal based on this voice. This means that the present invention can be easily applied to a peer-to-peer chat communication system that does not go through a server.

【００４６】本発明は、本人及び相手双方が同等の通信
端末を用いてするチャット通信システムでの利用が最も
相応しいが、既述したように、本人又は相手一方にのみ
本発明を適用しても、本発明を利用する本人に、より親
密なコミュニケーションを図る効果をもたらし、間接的
に相手にも会話リズムを改善するという効果を与えるこ
とができる。これは、本発明が単独での利用が可能であ
ることを意味する。音声を基礎とした疑似人格の身体動
作の生成が可能であれば本発明が利用でき、前記音声は
必ずしも本人又は相手の肉声でなくてもよいわけであ
る。これから、例えば音楽を音声に代えて用い、より音
楽を楽しむ装置としての応用例も考えられる。このよう
に、本発明は直接的な会話又は広く音を伴うコミュニケ
ーションの改善をもたらすシステムを提供する。The present invention is most suitable for use in a chat communication system in which both the person and the other party use the same communication terminal, but as described above, even if the present invention is applied to only one person or the other party. It is possible to bring about the effect of more intimate communication with the person who uses the present invention, and indirectly to give the effect of improving the conversation rhythm to the other person. This means that the present invention can be used alone. The present invention can be used as long as it is possible to generate a voice-based pseudo-personal motion, and the voice need not necessarily be the real voice of the person or the other person. From this, for example, an application example as a device for enjoying music more by using music instead of voice can be considered. Thus, the present invention provides a system that provides improved direct conversation or broader sound communication.

[Brief description of drawings]

【図１】本発明を適用したチャット通信システムのシス
テム構成図である。FIG. 1 is a system configuration diagram of a chat communication system to which the present invention is applied.

【図２】利用者Ａの通信端末のディスプレイの正面図
(仮想空間画面)である。FIG. 2 is a front view of a display of a communication terminal of user A.
(A virtual space screen).

【図３】利用者Ｃの通信端末のディスプレイの正面図
(仮想空間画面)である。FIG. 3 is a front view of a display of a communication terminal of user C.
(A virtual space screen).

【図４】利用者Ａ及びＢの各通信端末を対照表示して各
部の作動を表した通信端末構成図である。FIG. 4 is a communication terminal configuration diagram showing the operation of each unit by displaying the communication terminals of users A and B in contrast.

【図５】各通信端末の疑似人格の聞き手制御を表す制御
フローチャートである。FIG. 5 is a control flowchart showing listener control of a pseudo personality of each communication terminal.

【図６】各通信端末の疑似人格の話し手制御を表す制御
フローチャートである。FIG. 6 is a control flowchart showing speaker control of a pseudo personality of each communication terminal.

[Explanation of symbols]

１通信端末２ディスプレイ３仮想空間画面４インターネット５音声入力部(マイク) ６音声出力部(スピーカ) ７指定動作入力部(キーボード又はマウス) ８疑似人格制御部９特定動作制御部 10 身体動作合成部 11 送信処理部 12 受信処理部 13 コンピュータ 101 本人通信端末 102 相手通信端末Ａ利用者Ｂ利用者Ｃ利用者ａ疑似人格ｂ疑似人格ｃ疑似人格 1 communication terminal 2 display 3 Virtual space screen 4 Internet 5 Voice input section (microphone) 6 Audio output section (speaker) 7 Designated operation input section (keyboard or mouse) 8 Pseudo personality control section 9 Specific operation control unit 10 Body movement synthesis section 11 Transmission processing unit 12 Reception processing unit 13 Computer 101 Personal communication terminal 102 Remote communication terminal A user B user C user a pseudo personality b Pseudo personality c pseudo personality

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｍ 1/00 Ｇ１０Ｌ 3/00 Ｑ ─────────────────────────────────────────────────── ─── Continued Front Page (51) Int.Cl. ⁷ Identification Code FI Theme Coat (Reference) H04M 1/00 G10L 3/00 Q

Claims

[Claims]

1. A communication system used for intimate communication between a person and a partner who are separated from each other by using communication terminals connected via a communication line,
The person and the other party's communication terminal display the pseudo person's personality on the back in front of the virtual space screen, and exchange the person and the other party's voice while simultaneously displaying the person's pseudo personality on the back side of the virtual space screen. Voice-based speaker physical movement or movement according to the listener's physical movement based on the other party's voice input from the voice input means via the communication line, and the pseudo-personality of the other party is input from the speaker's physical movement based on the other party's voice via the communication line or the voice input means A physical media communication system that moves according to the listener's body movements based on the person's own voice and outputs the other party's voice.

2. The person and the other party communication terminal, a pseudo personality control section for making a body motion of the person and the other party pseudo person based on voice, and a transmission / reception processing section for exchanging data or voice between the person and the other party communication terminal. It consists of a voice input / output unit for the person or the other party's communication terminal, and the speaker's body motion made by the pseudo-personality control unit based on the person's voice input from the voice input section of the person's communication terminal or the partner's voice input from the reception processing section of the person's communication terminal Based on the listener's physical action created by the pseudo-personality control unit, the pseudo-personality of the virtual space screen is moved according to the listener's physical movement, and the transmission processing unit transmits the self-voice to the reception processing unit of the partner communication terminal and inputs from the reception processing unit of the personal communication terminal. Created by the pseudo-personality control unit based on the other party's voice Created by the pseudo-personality control unit based on the speaker's body movement or the person's voice input from the voice input unit of the personal communication terminal 2. The physical media communication system according to claim 1, wherein the voice output unit outputs the voice of the other party while moving the other person's pseudo-personality on the virtual space screen according to the listener's physical movement.

3. The person and the partner communication terminal combine a specific motion control unit that creates physical motions of the person and the other party's pseudo-personality on the basis of designation, and body motions created by the pseudo-personality control unit and the particular motion control unit, respectively. A body motion synthesis unit and a designated motion input unit for the person or the partner communication terminal are additionally provided, and the speaker body motion or the speaker body motion created by the specific motion control unit based on the person's designation input from the designated motion input unit of the personal communication terminal. A virtual space screen is created by combining the listener's body motion created by the specific motion control unit based on the designation of the other party input from the reception processing unit of the personal communication terminal by the talker body motion created by the pseudo-personality control unit or the listener's body motion and body motion composition unit. The person's pseudo personality is moved, and the speaker's physical movement or the designated movement of the personal communication terminal made by the specific movement control unit based on the designation of the other party input from the reception processing unit of the personal communication terminal. Based on the person's designation input from the work input section, the listener's body movements made by the specific movement control section are synthesized by the speaker's body movements or the listener's body movements and the body movements synthesis section made by the pseudo-personality control section, and the other person's pseudo-personality on the virtual space screen is displayed. The physical media communication system according to claim 2, wherein the physical media communication system moves.

4. The listener's body motion made by the pseudo-personality control unit is composed of a selective combination of head nodding motion, eye blinking motion or body gesturing motion, and the nodding motion is voice ON / O.
4. The physical media communication system according to claim 1, wherein the predicted nod value estimated from FF is executed at a nod operation timing exceeding a nod threshold.

5. The physical media communication system according to claim 4, wherein the blinking operation is performed at a blinking operation timing that is exponentially distributed over time with the nodling operation timing as a starting point.

6. The physical media communication system according to claim 4, wherein the gesture motion is executed at a gesture motion timing when a gesture prediction value estimated from ON / OFF of voice exceeds a gesture threshold value.

7. The talker's body motion made by the pseudo-personality control unit is a selective combination of a head swinging motion, a mouth opening / closing motion, an eye blinking motion, or a body gesturing motion, and the shaking motion is voice ON / OFF. The physical media communication system according to claim 1, 2 or 3, which is executed at a swing motion timing when a swing predicted value estimated from OFF exceeds a swing threshold.

8. The physical media communication system according to claim 7, wherein the blinking operation is executed at a blinking operation timing at which the eye blink predicted value estimated from ON / OFF of the voice exceeds the blinking threshold value.

9. The physical media communication system according to claim 7, wherein the gesture motion of the body is executed at a gesture motion timing when a gesture prediction value estimated from ON / OFF of voice exceeds a gesture threshold value.