JP2003248841A

JP2003248841A - Virtual television intercom

Info

Publication number: JP2003248841A
Application number: JP2002362596A
Authority: JP
Inventors: Yoshiyuki Mochizuki; 義幸望月; Katsunori Orimoto; 勝則折本; Toshinori Hijiri; 利紀樋尻; Naotake Otani; 尚毅大谷; Toshiya Naka; 俊弥中; Goji Yamamoto; 剛司山本; Shigeo Asahara; 重夫浅原
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2001-12-20
Filing date: 2002-12-13
Publication date: 2003-09-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide a communication terminal with a display function which displays a virtual three-dimensional CG character that a receiver selects as a communication party and enables a voice conversation through the CG character. <P>SOLUTION: A communication part 1 performs voice communication and a CG character corresponding to a communication party is selected through a character background selective input part 2. A voice processing part 5 performs voice processing needed for a telephone call. A voice conversion part 6 performs voice conversion and a voice output part 7 outputs a voice. A voice input part 8 obtains the voice. A voice analysis part 9 takes a voice analysis and a feeling estimation part 10 estimates a feeling from a voice analysis result. A lips movement control part 11, a body movement control part 12, and a feeling control part 13 send control information to a three-dimensional drawing part 14 to generate an image, which is displayed at a display part 15. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、利用者が仮想の３
次元CG（Computer Graphics）キャラクタを介すること
によって、ビジュアル的に音声会話を楽しめることを目
的とした、表示装置付きの通信端末装置によるバーチャ
ルテレビ通話に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention
The present invention relates to a virtual video call by a communication terminal device with a display device for the purpose of visually enjoying a voice conversation through a three-dimensional CG (Computer Graphics) character.

【０００２】[0002]

【従来の技術】従来、テレビ通信と呼ばれていたもの
は、カメラと表示装置付きの電話装置相互で、カメラに
よって撮影された通話相手の顔画像を見ながら通話する
という装置である。一般には伝送量を減らすため、撮影
した顔画像データは圧縮し、音声データと多重化して受
信者に送信される。受信者側では多重化データを音声デ
ータと圧縮された画像データに分離し、画像データを伸
長して、音声データと同期をとりながら、音声出力と画
像表示が行われる。最近では、MPEG-4(Moving Picture
Experts Group Phase 4)の標準画像圧縮規格に基づい
た、次世代移動体通信（IMT-2000）向けのビデオフォン
と呼ばれる携帯電話が開発されている（非特許文献１参
照。）。2. Description of the Related Art Conventionally, what has been called television communication is a device in which a camera and a telephone device equipped with a display device communicate with each other while looking at the face image of the other party taken by the camera. Generally, in order to reduce the amount of transmission, the captured face image data is compressed, multiplexed with audio data, and transmitted to the receiver. On the receiver side, the multiplexed data is separated into audio data and compressed image data, the image data is expanded, and audio output and image display are performed in synchronization with the audio data. Recently, MPEG-4 (Moving Picture
A mobile phone called a videophone for next-generation mobile communication (IMT-2000) based on the standard image compression standard of Experts Group Phase 4) has been developed (see Non-Patent Document 1).

【０００３】一方、上記のように画像を多重化して送信
する場合、従来の音声通信の枠組をはずれた広帯域の通
信規格やそれを実現するためのインフラストラクチャが
必要である。そのため、このような画像圧縮の手法によ
らずに、音声データ通信のみで、疑似的にテレビ通信と
類似の機能を実現しようとしたものがある（特許文献１
参照。）。この発明は、予め相手の顔画像を口のない状
態に加工した顔の静止画像と、「あ」「い」
「う」...，の母音の発音をした状態の口形状の静止画
像を電話に保持しておく。相手から送信された音声デー
タを音声認識技術を使って送信されてきた音声の母音解
析を行い、解析結果に応じた口形状データを顔画像と合
成した画像を随時表示して、相手の喋っている状態を表
示するものである。この発明の利点は、通常の音声通信
の枠組の中で疑似的なテレビ通信を実現できる点にある
が、利用者が口以外動かない静止画に対して違和感を感
じたり、また、本人と会話していると感じられるかにつ
いては疑問がある。On the other hand, in the case of multiplexing and transmitting images as described above, a broadband communication standard that deviates from the conventional voice communication framework and an infrastructure for realizing it are necessary. Therefore, there is one that attempts to realize a function similar to that of television communication in a pseudo manner only by audio data communication without using such an image compression method (Patent Document 1).
reference. ). This invention is a still image of a face obtained by processing the face image of the other party into a mouthless state in advance, and "A" and "I".
Hold a still image of the mouth shape with the pronunciation of the vowel "u" ... on the phone. The voice data transmitted from the other party is analyzed by vowels of the voice transmitted using the voice recognition technology, and the image in which the mouth shape data corresponding to the analysis result is combined with the face image is displayed at any time to speak the other person. It is to display the state of being. An advantage of the present invention is that pseudo TV communication can be realized within the framework of normal voice communication, but the user feels uncomfortable with a still image that does not move except the mouth, and also has a conversation with the person. I doubt if it feels like they are doing it.

【０００４】音声通信の枠組からは外れるが、画像を送
信するよりはデータ量が減らせるということで、画像認
識技術と組み合わせた発明もある（特許文献２参
照。）。この発明では、画像認識により、表情と口形状
を認識し、パラメータ化して音声データとともに送信す
る。受信側では、予め相手の３次元モデルを保持してお
り、音声出力時に、受信したパラメータに基づき３次元
モデルを変形して表示するというものである。Although it is out of the framework of voice communication, there is also an invention combined with image recognition technology because the amount of data can be reduced rather than transmitting an image (see Patent Document 2). In the present invention, facial expressions and mouth shapes are recognized by image recognition, parameterized, and transmitted together with voice data. The receiving side holds the other party's three-dimensional model in advance, and when the voice is output, the three-dimensional model is transformed and displayed based on the received parameters.

【０００５】上記３つの技術は、いずれも通信相手の顔
を見ながら会話することを目的としたもので、会話自体
の娯楽性を高めることを目的としたものではない。以上
は、いわゆる電話の技術に関するものであるが、インタ
ーネットの普及により、文字ベースの会話が主体ではあ
るが、パーソナルコンピュータによる会話も可能であ
る。そのような中、共通の仮想空間に自分の代理CG(Com
puter Graphics)キャラクタを参加させ、仮想空間の中
でその代理のCGキャラクタを介して、そこに参加した他
の人の代理のCGキャラクタと会話を楽しむというものも
ある(特許文献３参照。)。この発明の目的は、自分が匿
名性を有した状態で相手と会話することにあり、現実の
自分と遊離した状態で参加しているため、虚偽を含めた
架空の会話を楽しむことが多い。All of the above three techniques are intended to talk while looking at the face of the other party of communication, and not to enhance the entertainment of the conversation itself. The above is related to so-called telephone technology, and although the character-based conversation is mainly due to the spread of the Internet, the conversation by a personal computer is also possible. In such a situation, my proxy CG (Com
There is also a technique in which a putter Graphics) character is allowed to participate and enjoy a conversation with a CG character acting on behalf of another person who participates in the character via a CG character acting on behalf of the character (see Patent Document 3). The object of the present invention is to talk with the other party in a state of having anonymity and to participate in a state of being separated from the actual one, so that the user often enjoys fictitious conversation including falsehood.

【０００６】なお、代理のCGキャラクタは、通信を行う
本人が選択した自分の仮身として働くため、アバター(A
vatar)と呼ばれる。このアバターは参加者自身が選択す
るものであり、会話相手がそのアバターのキャラクタを
変更することはできない。また、このアバター自体、他
の参加者にとっては相手を特定するためのものでしかな
いので変更する必要もない。実現性の問題でいうと、参
加者の端末コンピュータ（クライアントコンピュータ）
の他に、参加者が募る共通の仮想空間の管理制御や参加
者の状態制御、それらの情報通知などを行うサーバコン
ピュータが必要である。Since the proxy CG character works as a virtual body selected by the person who communicates, the avatar (A
vatar). This avatar is selected by the participant himself, and the conversation partner cannot change the character of the avatar. Further, since this avatar itself is for the other participants only to identify the other party, there is no need to change it. In terms of feasibility, the participant's terminal computer (client computer)
In addition to the above, a server computer for performing management control of a common virtual space inviting participants, status control of participants, and notification of information about them is required.

【０００７】仮想的なCGキャラクタと会話を行うという
技術も、例えばインターネット上のエクステンポ・シス
テム社(Extempo Systems )のホームページで公開されて
いる。ここでは、利用者は、ネット上の専門キャラクタ
と会話するものだが、これは文字ベースのもので、音声
での会話ではない。A technique for having a conversation with a virtual CG character is also disclosed on the website of Extempo Systems on the Internet, for example. Here, the user has a conversation with a specialized character on the Internet, but this is a character-based conversation, not a voice conversation.

【０００８】また、技術的には、予め、会話辞書をキー
ワードで分類した辞書を作成しておき、相手の会話内容
に対して分類したキーワードに対する適合性を解析し、
最も適合性の高い会話文書を表示することで、CGキャラ
クタと人間の会話を成立させるものである。人間の理解
能力が高いため、適当な文書でもそれなりに会話として
成り立つが、会話文書の数は高々登録数のため、何度か
会話のやり取りをしているうちに、徐々に文書繰り返し
が起こる。仮想のCGキャラクタと会話するという新たな
娯楽性はあるが、実際の人間との会話とは、柔軟性や多
様性、適切性、個性の点で大きな違いがある。このよう
な技術の目指すところは、如何に現実の人間の会話能力
に近付けるかという点にある。Further, technically, a dictionary in which the conversation dictionary is classified by keywords is prepared in advance, and the suitability of the conversation contents of the other party to the classified keywords is analyzed,
By displaying the most suitable conversation document, the conversation between the CG character and the human being is established. Since human comprehension is high, even a proper document can be formed as a conversation as it is, but since the number of conversation documents is at most registered, the document is gradually repeated while exchanging conversations several times. Although there is a new entertainment feature of talking with a virtual CG character, there is a big difference from the conversation with an actual human being in terms of flexibility, variety, suitability, and individuality. The aim of such technology is how to approach the conversation ability of a real person.

【０００９】[0009]

【特許文献１】特開昭６２−２７４９６２[Patent Document 1] Japanese Unexamined Patent Publication No. 62-274962

【００１０】[0010]

【特許文献２】特開平０５−１５３５８１[Patent Document 2] Japanese Patent Laid-Open No. 05-153581

【００１１】[0011]

【特許文献３】米国特許５８８０７３１[Patent Document 3] US Pat. No. 5,880,731

【００１２】[0012]

【非特許文献１】NIKKEIELECTRONICS 1999.11.1(No.75
6), pp99-117[Non-Patent Document 1] NIKKEI ELECTRONICS 1999.11.1 (No.75
6), pp99-117

【００１３】[0013]

【発明が解決しようとする課題】以上の従来の技術の特
徴をまとめると、上記の最初から３つまでの技術は、い
ずれも通信相手の顔を見ながら会話したいという要求に
対して発明されたもので、その通信相手の表情や様子な
どを確認しながら会話することが目的である。そのた
め、受話者が独自に何らかの作用をさせて、表示映像や
音声を加工して娯楽性を高めることを目的としたもので
はなく、またその技術的な開示もなされていない。Summarizing the features of the above-mentioned conventional techniques, all of the above three techniques were invented in response to a request to have a conversation while looking at the face of a communication partner. The purpose is to have a conversation while checking the facial expressions and appearance of the communication partner. Therefore, it is not intended for the listener to perform some action by himself or herself to process the display video and audio to enhance the entertainment, and the technical disclosure thereof is not made.

【００１４】次に、４つ目の従来例は、仮想的なコミュ
ニティー空間に自分の選択したCGキャラクタを参加さ
せ、匿名性を有して、逆に、匿名性を有するが故に、遠
慮のない本音の会話や架空、虚偽の会話を楽しむもので
ある。従って、会話相手のCGキャラクタは、単に特定が
できれば良いだけのものであり、その会話相手のCGキャ
ラクタや音声に対して何らかの作用をさせて、娯楽性を
高めた会話を楽しむというものではない。５つ目の従来
例は、人工知能的な会話機能を持つ仮想的なCGキャラク
タとの会話を楽しむという面があるものの、現実の人間
との会話とは似て非なるものであり、実際の人間との会
話とは、柔軟性や多様性、適切性、個性の点で大きな違
いがある。Next, in the fourth conventional example, the CG character selected by the user is allowed to participate in the virtual community space, and has anonymity, and conversely, since it has anonymity, it is not a problem. It is a place to enjoy true conversation, fictitious conversation, and false conversation. Therefore, the CG character of the conversation partner is only required to be identified, and the CG character or voice of the conversation partner is not allowed to have any action to enjoy the conversation with enhanced entertainment. The fifth conventional example has the aspect of enjoying a conversation with a virtual CG character having an artificial intelligence conversation function, but it is not similar to a conversation with a real person, Conversations with humans differ greatly in terms of flexibility, diversity, suitability, and individuality.

【００１５】そこで、前記課題に鑑み、本発明は、通信
相手を受話者が選択した仮想の３次元CGキャラクタとし
て表示し、通信相手の会話を利用することで、仮想の３
次元CGキャラクタと音声会話が行える表示機能付の通信
端末を提供することを目的としている。これにより、
「通信相手の顔が見られる、又は、それに類似した映像
が見られる」、「架空のキャラクタになりすます」とい
う機能とは別の方法で、音声会話の娯楽性を高めた新た
な通信端末の実現が可能となる。In view of the above problems, the present invention displays the communication partner as a virtual three-dimensional CG character selected by the listener and uses the conversation of the communication partner to create a virtual three-dimensional character.
The purpose of the present invention is to provide a communication terminal with a display function that allows a voice conversation with a three-dimensional CG character. This allows
Realization of a new communication terminal that enhances the entertainment of voice conversations by a method different from the functions of "you can see the face of the communication partner or a video similar to it" and "impersonate a fictional character" Is possible.

【００１６】また、本発明は、上記従来技術のようにサ
ーバのような装置を用いない仮想空間での会話を実現す
る表示装置付の通話装置の提供を目的とする。さらに、
本発明は、通話中の会話に適した感情表現を３次元CGキ
ャラクタが行う新たな通話装置の提供をも目的とする。Another object of the present invention is to provide a communication device with a display device for realizing a conversation in a virtual space without using a device such as a server as in the above-mentioned prior art. further,
It is also an object of the present invention to provide a new call device in which a three-dimensional CG character expresses emotions suitable for a conversation during a call.

【００１７】[0017]

【課題を解決するための手段】上記目的を達成するため
に、本発明のバーチャルテレビ通話装置は、音声通信を
行う通信手段と、利用者本人又は通信相手の少なくとも
一方のCGキャラクタ形状データの選択を行うキャラクタ
選択手段と、利用者本人の音声の入力を行う音声入力手
段と、通信相手の音声の出力を行う音声出力手段と、前
記通信手段で受信した通信相手の音声データ又は前記受
信した通信相手の音声データと前記音声入力手段で入力
した利用者本人の音声データの両方に対して音声解析を
行う音声解析手段と、前記音声解析手段の音声解析結果
を用いて、通信相手又は通信相手と利用者本人の感情状
態を推定する感情推定手段と、前記CGキャラクタの動作
の制御を前記感情推定手段に基づいて行う動作制御手段
と、前記CGキャラクタ形状データと前記動作制御手段の
制御情報に基づいて生成された動作データを用いて描画
処理を行って画像を生成する描画手段と、前記描画手段
で生成された画像を表示する表示手段とを備えることを
特徴とする。In order to achieve the above object, a virtual video call device of the present invention selects a communication means for performing voice communication and selection of CG character shape data of at least one of a user himself or a communication partner. Character selection means, a voice input means for inputting the voice of the user himself, a voice output means for outputting the voice of the communication partner, voice data of the communication partner received by the communication means, or the received communication A voice analysis unit for performing voice analysis on both voice data of the other party and voice data of the user himself / herself input by the voice input unit, and a communication partner or a communication partner by using the voice analysis result of the voice analysis unit. Emotion estimation means for estimating the emotional state of the user himself, movement control means for controlling the movement of the CG character based on the emotion estimation means, and the CG character The image forming apparatus includes a drawing unit that performs drawing processing using the shape data and the motion data generated based on the control information of the motion control unit to generate an image, and a display unit that displays the image generated by the drawing unit. It is characterized by

【００１８】また、本発明のバーチャルテレビ通話装置
は、前記感情推定手段は、当該感情推定手段での推定結
果を前記動作制御手段に通知し、前記動作制御手段は、
その通知結果に基づき前記動作データを特定することを
特徴とする。Further, in the virtual video call device according to the present invention, the emotion estimation means notifies the operation control means of the estimation result of the emotion estimation means, and the operation control means
The operation data is specified based on the notification result.

【００１９】尚、本発明は、上述のようなバーチャルテ
レビ通話装置として実現できるのみではなく、このバー
チャルテレビ通話装置が備える手段をステップとするバ
ーチャルテレビ通信方法として実現したり、そのステッ
プを用いるバーチャルテレビ通信システムとしても実現
することができる。The present invention can be realized not only as the above-described virtual TV call device, but also as a virtual TV communication method using the steps of the virtual TV call device as steps, or using the steps. It can also be realized as a television communication system.

【００２０】また、前記バーチャルテレビ通信方法をコ
ンピュータ等で実現させるプログラムとして実現した
り、当該プログラムをＣＤ−ＲＯＭ等の記録媒体や通信
ネットワーク等の伝送媒体を介して流通させることがで
きるのは言うまでもない。Further, it goes without saying that the virtual television communication method can be realized as a program realized by a computer or the like, or the program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as a communication network. Yes.

【００２１】[0021]

【発明の実施の形態】（第１の実施例）以下、本発明の
第１の実施例のバーチャルテレビ通話装置について、図
面を参照しながら説明する。BEST MODE FOR CARRYING OUT THE INVENTION (First Embodiment) A virtual video call device according to a first embodiment of the present invention will be described below with reference to the drawings.

【００２２】図１は本発明の第１の実施例におけるバー
チャルテレビ通話装置の構成を示すものである。このバ
ーチャルテレビ通話装置は、通信部１、キャラクタ背景
選択入力部２、データ管理部３、音声選択入力部４、音
声処理部５、音声変換部６、音声出力部７、音声入力部
８、音声解析部９、感情推定部１０、口唇動作制御部１
１、身体動作制御部１２、表情制御部１３、３次元描画
部１４、表示部１５、動作表情入力部１６、視点変更入
力部１７、キャラクタ形状データ保存部１８、キャラク
タ動作データ保存部１９、背景データ保存部２０、テク
スチャデータ保存部２１、及び音楽データ保存部２２を
含む。FIG. 1 shows the configuration of a virtual video call apparatus according to the first embodiment of the present invention. This virtual video call device includes a communication unit 1, a character background selection input unit 2, a data management unit 3, a voice selection input unit 4, a voice processing unit 5, a voice conversion unit 6, a voice output unit 7, a voice input unit 8, a voice. Analysis unit 9, emotion estimation unit 10, lip motion control unit 1
1, body motion control unit 12, facial expression control unit 13, three-dimensional drawing unit 14, display unit 15, motion facial expression input unit 16, viewpoint change input unit 17, character shape data storage unit 18, character motion data storage unit 19, background. The data storage unit 20, the texture data storage unit 21, and the music data storage unit 22 are included.

【００２３】以上のように構成された本発明の第１の実
施例におけるバーチャルテレビ通話装置について、以
下、詳細に説明を行う。本発明の第１の実施例では、設
定時の動作と送着信会話時の動作に分けることができる
ので、各々について順次説明を行うが、それらに共通の
事項として、まず最初に、装置に保存されているデータ
とその管理の説明を行う。The virtual video call apparatus according to the first embodiment of the present invention configured as described above will be described in detail below. In the first embodiment of the present invention, since the operation at the time of setting and the operation at the time of sending / receiving conversation can be divided, each will be described in order. However, the common point to them is that the operation is first saved in the device. Explain the data and its management.

【００２４】（保存データと管理の説明）キャラクタ形
状データ保存部１８には、CGキャラクタの形状データお
よびそれに対応するサムネールデータ（CGキャラクタの
容姿が判る画像データ）がアドレスで管理保存されてい
る。キャラクタ形状データは、一般に、頭部、上肢、体
幹、下肢などの部品で構成され、各部品は更に、例え
ば、頭部ならば目、鼻、口、頭髪などのサブ部品、上肢
なら手、前腕、上腕などのサブ部品で構成される。もっ
と詳細なキャラクタ形状ならば、更に手が指や手の平な
どのサブ部品で構成される。これらの階層構造関係が、
キャラクタ形状の構造を示しており、一般にはシーング
ラフと呼ばれる。(Explanation of Saved Data and Management) The character shape data storage unit 18 manages and saves the shape data of the CG character and the corresponding thumbnail data (image data showing the appearance of the CG character) by address. Character shape data is generally composed of parts such as the head, upper limbs, trunk, and lower limbs, and each part further includes sub-parts such as eyes, nose, mouth, and hair for the head, and hands for the upper limbs, It consists of sub-parts such as the forearm and upper arm. If the character shape is more detailed, the hand is made up of sub-parts such as fingers and palm. These hierarchical relationships
It shows a character-shaped structure, and is generally called a scene graph.

【００２５】各部品、サブ部品は、通常、サーフェース
モデルと呼ばれる、物体表面のみをポリゴン近似して面
の集合体で表現され、３次元空間での頂点座標、同頂点
での法線ベクトル成分（光源輝度計算の際には必須）、
テクスチャ座標（テクスチャマッピングを行う際には必
須）のインデックス化された点列データと、そのつなが
り方を表すトポロジカルなデータ（例えば、頂点インデ
ックスが１、２、３の順に書かれていたら、点１、２、
３を頂点に持つ三角形を表す）で構成され、更に各面の
反射率（拡散反射率、鏡面反射率）や環境光強度、物体
色などの属性データが含まれる。なお、CGキャラクタが
身にまとう衣類などをテクスチャマッピングによって表
現する場合には、CGキャラクタの形状データの該当する
部品に、使用するテクスチャのテクスチャデータ保存部
２１でのアドレス又はそれに対応する識別子のIDが明示
されている。Each of the parts and sub-parts is usually expressed as a surface model by approximating only the surface of the object, which is called a surface model, and is represented by a set of surfaces. The vertex coordinates in the three-dimensional space and the normal vector component at the same vertex. (Required for light source brightness calculation),
Indexed point sequence data of texture coordinates (required when performing texture mapping) and topological data indicating how they are connected (for example, if vertex indices are written in the order of 1, 2, 3 then point 1 2,
3 represents the triangle having the apex), and further includes attribute data such as reflectance (diffuse reflectance, specular reflectance) of each surface, ambient light intensity, and object color. When a CG character wears clothing or the like by texture mapping, the corresponding part of the shape data of the CG character has an address of the texture to be used in the texture data storage unit 21 or an ID of an identifier corresponding to the address. Is clearly stated.

【００２６】キャラクタ動作データ保存部１９には、CG
キャラクタの身体の動作データと、身体動作の遷移グラ
フデータである身体動作パターンデータ、表情データと
表情パターンデータ、口唇動作データと口唇動作パター
ンデータがアドレスで管理保存されている。The character motion data storage unit 19 stores CG
The body motion data of the character, body motion pattern data that is transition graph data of body motion, facial expression data and facial expression pattern data, lip motion data and lip motion pattern data are managed and stored by addresses.

【００２７】身体動作データは、通常のCGキャラクタニ
メーションで行われているように、３次元空間での身体
の代表点であるルートの身体全体の移動を表す平行移動
量と、身体全体の姿勢状態を表す３次元空間の３つの座
標軸回りの回転角度量又は回転の中心軸を表すベクトル
の成分のベクトルの回りの回転角度量、および各関節で
定義される局所座標系の座標軸回りの回転角度量の時系
列データである。これらルート位置や関節部での局所座
標系での変換系で、CGキャラクタ形状データを変換し
て、各時刻のCGキャラクタのいる位置や向き、CGキャラ
クタの身体のポーズを生成して３次元描画処理をし、こ
れを時間的に連続に行うことでCGアニメーションを実現
することができる。キーフレームアニメーションの技術
を使う場合は、全フレームの身体動作データを持たず
に、時間的に飛び飛びの時系列データで、その間の時刻
の動作状態は補間計算で行うので、身体動作データとし
ては、時間的に飛び飛びの上記の平行移動量や角度量の
時系列データが身体動作データである。The body movement data includes the amount of parallel movement representing the movement of the whole body of the route, which is the representative point of the body in the three-dimensional space, and the posture state of the whole body, as is done in normal CG character animation. Amounts of rotation around three coordinate axes in the three-dimensional space that represents the vector, or amounts of rotation around a vector of components of a vector that represents the center axis of rotation, and amounts of rotation around the coordinate axes of the local coordinate system defined by each joint. Is the time series data of. By converting the root position and the joint coordinate system in the local coordinate system, the CG character shape data is converted to generate the position and orientation of the CG character at each time, the pose of the CG character's body, and three-dimensional drawing. A CG animation can be realized by processing and performing this continuously in time. When using the key frame animation technology, the body motion data does not have for all frames, it is time-series data that is discontinuous in time, and the motion state at the time during that time is calculated by interpolation calculation. The above-mentioned time-series data of the amount of parallel movement and the amount of angle, which are discontinuous in time, are body movement data.

【００２８】身体動作パターンデータは、図６（ｂ）に
示したような、有限状態のグラフデータで、ある動作か
ら移行が可能な動作の関係と実体動作情報（動作ID、デ
ータ種別、各実体身体動作のアドレスとフレーム数、各
遷移の移行確率）からなるデータである。例えば、図６
（ｂ）では、標準状態を表す身体動作データから動作
A、動作C、動作D、動作Eヘの移行が可能であることが判
る。CGキャラクタが標準状態にある時、予め定められた
何らかのイベントが発生したら、実体動作情報に記され
た移行確率に基づく選択処理によって、動作A、動作C、
動作D、動作Eから動作が選択され、その動作の実体をア
ドレスによって取得する。The physical motion pattern data is graph data in a finite state as shown in FIG. 6B, and the relationship between motions that can be transferred from a certain motion and substance motion information (motion ID, data type, each substance). It is data consisting of addresses of body movements, the number of frames, and transition probability of each transition). For example, in FIG.
In (b), the motion is calculated from the physical motion data representing the standard state.
It can be seen that the transition to A, operation C, operation D, and operation E is possible. When a predetermined event occurs while the CG character is in the standard state, by the selection process based on the transition probability described in the substantive motion information, motion A, motion C,
The action is selected from the action D and the action E, and the substance of the action is acquired by the address.

【００２９】また、本実施例の場合は、通話開始後の身
体動作パターンデータは、感情推定部１０による、通常
状態、笑い状態、泣き状態、怒り状態、悩み状態、納得
状態などの推定結果や動作表情入力部１６の入力結果を
イベントとして遷移が起こるものとして説明するが、よ
り複雑な推定結果や他の入力部によるイベントで遷移が
起こる場合でも同様に実施可能である。Further, in the case of the present embodiment, the body movement pattern data after the start of the call includes the estimation result of the normal state, the laughing state, the crying state, the angry state, the worried state, the convincing state, etc. Although it is described that a transition occurs as an event based on the input result of the motion facial expression input unit 16, the present invention can be similarly performed even when the transition occurs due to a more complicated estimation result or an event by another input unit.

【００３０】なお、身体動作は形状データの構造（骨格
構造、階層構造）に依存しており（例えば、６歩足の昆
虫の動作を２足歩行の人間には適用できない）、身体動
作が全ての形状データに適用できる訳ではないため、そ
れらを実体動作情報のデータ種別によって適用可能な形
状データとの分類を行う。また、以上の身体動作パター
ンデータは、上位階層に新たに身体動作パターンデータ
を設け、複数の身体動作パターンデータの実体のアドレ
スを、この上位層の身体動作パターンデータが管理する
ことで、この上位層の身体動作パターン１つにすること
も可能である。例えば、シーンチェンジのように、身体
動作パターンを切替えて用いると、非常に効果的であ
る。The body motion depends on the structure of the shape data (skeletal structure, hierarchical structure) (for example, the motion of an insect with 6 legs cannot be applied to a human with 2 legs), and all body motions are Since it is not applicable to the shape data of No. 3, they are classified as applicable shape data according to the data type of the substantive motion information. Further, regarding the above-mentioned physical motion pattern data, the physical motion pattern data is newly provided in the upper layer, and the addresses of the entities of the plurality of physical motion pattern data are managed by this physical motion pattern data of the upper layer. It is also possible that there is only one body movement pattern of the layers. For example, it is very effective to switch and use a body movement pattern like a scene change.

【００３１】表情動作データは、図６（ａ）のように、
CGキャラクタの顔の表情を生成するためのデータであ
る。表情の生成の仕方には、通常行われているフェイシ
ャルアニメーション技術を用いることで行うが、例え
ば、顔の形状を変形して行う方法や、顔のテクスチャを
貼り変える方法などがある。顔の形状を変形する場合
は、顔の形状データのうち、表情を生成する眉、目や口
などの端点に対応する頂点座標の移動量の時系列データ
が表情動作データである。これらの移動量は、顔の筋肉
モデルに基づいてシミュレーション計算で算出すること
もできる。変換を行う頂点が複数の変換系に跨る場合
は、それぞれの変換に対する重み付けを頂点に与えて、
各々の変換系でその頂点を一旦変換した複数の頂点を算
出し、それらを重み付けを考慮して平均した座標に変換
するという、エンベロープ手法も用いられる。The facial expression motion data is as shown in FIG.
This is data for generating facial expressions of CG characters. The facial expression is generated by using a facial animation technique that is usually used, but for example, there are a method of deforming the shape of the face and a method of pasting the texture of the face. When the face shape is deformed, the facial movement data is the time series data of the movement amount of the vertex coordinates corresponding to the end points of the eyebrows, the eyes, the mouth, etc., which generate the facial expression, among the facial shape data. These movement amounts can also be calculated by simulation calculation based on the facial muscle model. If the vertices to be converted span multiple conversion systems, give weighting to each conversion to the vertices,
An envelope method is also used in which a plurality of vertices whose vertices are once transformed are calculated in each transformation system and the vertices are transformed into averaged coordinates in consideration of weighting.

【００３２】図６（ａ）においては、目の形、鼻の大き
さ、耳、顔の形等の形状を変形させることにより各感情
を表現する。また、テクスチャを貼り変えて行う場合
は、笑った表情や泣いた表情のテクスチャ、その途中段
階のテクスチャが表情データである。表情パターンデー
タは、これら表情データの遷移グラフデータで、身体動
作データの遷移グラフデータと同様に、ある表情データ
からある表情データへの移行が可能な有限状態グラフと
実体表情情報（表情ID、データ種別、各実体表情動作デ
ータのアドレスとフレーム数、各遷移の移行確率）であ
る。例えば、図６-（ａ）に示したもので、この例では
通常顔を経由しなければ他の顔へは移行できないことを
示しており、移行先の選択は実体表情情報の移行確率に
基づいて行われる。なお、身体動作の時と同様、実体表
情情報のデータ種別によって表情動作なのかテクスチャ
なのかと適用可能な形状を特定する。例えば、データ種
別の１桁目を表情かテクスチャかの分類に用い、２桁目
以上の数字を形状用の識別番号とする。なお、以上の表
情パターンデータは、身体動作パターンデータの時と同
様に、上位層の表情パターンデータを設けることで、複
数の表情パターンを１つにすることも可能である。In FIG. 6A, each emotion is expressed by deforming the shape of eyes, the size of nose, the shape of ears, the shape of faces, and the like. When the textures are pasted together, the laughing expression, the crying expression, and the intermediate texture are the expression data. Facial expression pattern data is transition graph data of these facial expression data. Similar to the transition graph data of body movement data, a finite state graph that allows transition from certain facial expression data to certain facial expression data and actual facial expression information (facial expression ID, data) The type, the address and the number of frames of each actual facial expression motion data, and the transition probability of each transition). For example, as shown in FIG. 6- (a), this example shows that the face cannot be transferred to another face without passing through the normal face, and the selection of the transfer destination is based on the transfer probability of the actual facial expression information. Is done. Similar to the case of the physical movement, the applicable shape is identified depending on the data type of the actual facial expression information, whether it is a facial movement or a texture. For example, the first digit of the data type is used for classification of facial expression or texture, and the second digit or more is used as the shape identification number. Note that the facial expression pattern data described above may have a plurality of facial expression patterns by providing upper-layer facial expression pattern data, as in the case of the body movement pattern data.

【００３３】また、本実施例の場合は、通話後の表情パ
ターンデータは、感情推定部１０により、通常状態、笑
い状態、泣き状態、怒り状態、悩み状態を推定し、その
推定結果や動作表情入力部１６の入力結果をイベントと
して遷移が起こるものとして説明するが、より複雑な推
定結果や他の入力部によるイベントで遷移が起こる場合
でも同様に実施可能である。In the case of the present embodiment, the emotion estimation unit 10 estimates the normal state, the laughing state, the crying state, the angry state, and the worried state for the facial expression pattern data after a call, and the estimation result and the action facial expression. Although the description will be made assuming that the transition occurs as an event based on the input result of the input unit 16, the present invention can be similarly implemented even when the transition occurs due to a more complicated estimation result or an event by another input unit.

【００３４】口唇動作データについても、表情動作デー
タ、表情動作パターンデータと同様で、口の形状を変形
して行う方法や、テクスチャを貼り変える方法がある。
但し、口唇動作データは、音声解析処理の内容に依存
し、もし、後述のように音強度解析結果に基づいて、口
唇動作を生成する場合には、単に口の開ける量に応じた
動作データが保存されている（図５（ａ）参照）。音素
解析までの処理が行える場合は、例えば、母音解析と
「ん」音解析が行える場合には、その音に合わせた口唇
形状を生成するための形状変形データや、その口唇のテ
クスチャデータが動作データとして保存されている（図
５（ｂ）参照）。The lip motion data is similar to the facial expression motion data and the facial expression motion pattern data, and there are a method of deforming the shape of the mouth and a method of pasting the texture.
However, the lip motion data depends on the content of the voice analysis process, and if the lip motion is generated based on the sound intensity analysis result as described later, the motion data corresponding to the amount of opening of the mouth is simply generated. It has been saved (see FIG. 5A). When processing up to phoneme analysis is possible, for example, when vowel analysis and “n” sound analysis can be performed, shape deformation data and texture data of the lip to generate a lip shape matching the sound operate. It is stored as data (see FIG. 5B).

【００３５】口唇パターンデータは、以上のような何種
類かの口唇動作データの集合を表すもので、実体口唇情
報（各口唇ID、データ種別、各実体口唇動作アドレスと
フレーム数）からなるデータである。各実体口唇IDは、
例えば、図５（ａ）のように音強度によって制御を行う
場合ならば、レベルに相当するものを識別子としたもの
で、０をレベル０、・・・、３をレベル３のように与え
た識別子、図５（ｂ）のように音素解析に基づくなら
ば、「ん」、「あ」、・・・、「お」に相当する識別子
を各々０、１、・・・、５と与えた識別子である。さら
に、音強度解析と音素解析を組み合わせることも可能
で、同じ「あ」音でも音強度の大きな「あ」や小さな
「あ」を設ける。この場合、図５（ｂ）の縦方向に図５
（ａ）のレベルが並んだものになり、口唇IDは２次元の
識別子として定義すれば良い。The lip pattern data represents a set of several types of lip motion data as described above, and is data including real lip information (each lip ID, data type, each real lip movement address and the number of frames). is there. Each real lip ID is
For example, in the case of controlling by sound intensity as shown in FIG. 5A, the identifier corresponding to the level is used, and 0 is given as level 0, ..., 3 as level 3. If the identifier is based on the phoneme analysis as shown in FIG. 5B, the identifiers corresponding to “n”, “a”, ..., “O” are given as 0, 1, ..., 5 respectively. It is an identifier. Furthermore, it is possible to combine sound intensity analysis and phoneme analysis, and even for the same "a" sound, a large "a" and a small "a" are provided. In this case, the vertical direction of FIG.
The levels in (a) are arranged side by side, and the lip ID may be defined as a two-dimensional identifier.

【００３６】背景データ保存部２０は、CGキャラクタを
表示した時の背景のデータとして、背景の形状データも
しくは背景の画像と、それに対応するサムネール画像を
アドレス管理して保存する。背景の形状データは、CGキ
ャラクタの形状データと同様で、形状として背景となる
物体である。背景の画像データは、例えば、空や遠景の
画像データで、背景の物体と組合せて用いることもでき
る。なお、背景の物体の形状データにテクスチャマッピ
ングで模様などを付ける場合には、使用するテクスチャ
データ保存部２１でのアドレス又はそれに対応する識別
子のIDが明示されている。The background data storage unit 20 manages the address of the background shape data or the background image and the thumbnail image corresponding to the background shape data as the background data when the CG character is displayed, and stores it. The background shape data is the same as the shape data of the CG character, and is an object that serves as a background as a shape. The background image data is, for example, image data of a sky or a distant view, and can be used in combination with a background object. When adding a pattern or the like to the shape data of the background object by texture mapping, the address in the texture data storage unit 21 to be used or the ID of the identifier corresponding thereto is specified.

【００３７】テクスチャデータ保存部２１は、３次元描
画部１４でテクスチャマッピングを行う際に用いる、CG
キャラクタが身にまとっている衣類などのテクスチャの
画像データや、背景で使う物体のテクスチャマッピング
用の画像データがアドレス管理されて保存されている。The texture data storage unit 21 is a CG used for texture mapping in the three-dimensional drawing unit 14.
Image data of textures such as clothes worn by the character and image data for texture mapping of objects used in the background are address-managed and stored.

【００３８】音楽データ保存部２２は、音楽データがア
ドレス管理されて保存されている。これは、送信相手か
らの着信時に鳴動させて、合図として用いるものであ
る。データ管理部３は、保存データの管理と、設定デー
タの保存管理、設定データの通知を行うものである。こ
こでは、まず、キャラクタ形状データ保存部１８、キャ
ラクタ動作データ保存部１９、背景データ保存部２０、
テクスチャデータ保存部２１、音楽データ保存部２２に
保存されたデータの管理について説明する。The music data storage unit 22 stores the music data by address management. This is used as a signal by ringing when an incoming call is received from the other party. The data management unit 3 manages stored data, manages stored setting data, and notifies setting data. Here, first, the character shape data storage unit 18, the character motion data storage unit 19, the background data storage unit 20,
The management of the data stored in the texture data storage unit 21 and the music data storage unit 22 will be described.

【００３９】図３はデータ管理部３が保持しているテー
ブルの一つで、CGキャラクタデータ管理テーブル３ａを
示したものである。CGキャラクタデータは、CGキャラク
タの名前、CGキャラクタ形状データの実体のあるキャラ
クタ形状データ保存部１８でのアドレス、CGキャラクタ
形状データに明示された衣類などのテクスチャに対し
て、利用者の指定に基づき交換を行う際の衣類テクスチ
ャデータのテクスチャデータ保存部２１での交換前の衣
類テクスチャのアドレスと交換後の衣類テクスチャデー
タのアドレス（複数記述可能）、キャラクタ動作データ
保存部１９に保存された表情パターンデータの通話開始
前と通話開始後の２つアドレスと、口唇動作パターンの
アドレス、キャラクタ形状データ保存部１８に保存され
たサムネール画像のアドレスからなり、それらをCGキャ
ラクタIDによる識別子でテーブル化したものがCGキャラ
クタデータ管理テーブル３ａである。FIG. 3 is one of the tables held by the data management unit 3 and shows the CG character data management table 3a. The CG character data is based on the user's designation with respect to the name of the CG character, the address in the character shape data storage unit 18 having the substance of the CG character shape data, and the texture of clothing etc. specified in the CG character shape data. The address of the clothing texture before the exchange and the address of the clothing texture data after the exchange (a plurality of addresses can be described) in the texture data storage unit 21 of the clothing texture data when performing the exchange, and the facial expression pattern stored in the character motion data storage unit 19. It consists of two addresses before and after the call of data, the address of the lip movement pattern, and the address of the thumbnail image stored in the character shape data storage unit 18, and these are tabulated by the CG character ID identifier. Is the CG character data management table 3a.

【００４０】その他の保存データの管理用のテーブルと
しては、背景データ管理テーブル、動作パターン管理テ
ーブル、音声管理テーブルの３種類あり、CGキャラクタ
データ管理テーブル３ａを加えて合計４種類ある。背景
データ管理テーブルは、背景の物体や遠景の画像データ
の名前と、背景データ保存部２０でのアドレスを背景ID
による識別子でテーブル化したものである。動作パター
ン管理テーブルは、身体動作パターンデータの名前と、
キャラクタ動作データ保存部１９でのアドレスを動作パ
ターンIDによる識別子でテーブル化したものである。音
楽データ管理テーブルは音楽データの名前と、音楽デー
タ保存部２２でのアドレスを音楽IDによる識別子でテー
ブル化したものである。As other stored data management tables, there are three types of background data management table, motion pattern management table, and voice management table, and there are a total of four types including the CG character data management table 3a. In the background data management table, the name of the background object or the image data of the distant view and the address in the background data storage unit 20 are set as the background ID.
It is made into a table with an identifier by. The motion pattern management table contains the names of the physical motion pattern data,
The addresses in the character motion data storage unit 19 are tabulated by the motion pattern ID. The music data management table is a table in which the names of music data and the addresses in the music data storage unit 22 are tabulated by the identifiers of music IDs.

【００４１】（設定時の動作）通信部１には、図４
（ａ）に示したように、通信者管理テーブル１ａが保存
されている。通信者管理テーブル１ａは、通信相手を送
信者ID、電話番号、氏名、表示モードの内容を管理する
ものである。表示モードは、CGキャラクタを表示しない
で通常の音声通信で通話する場合の、非表示モード、通
信相手のみをCGキャラクタとして表示して、バーチャル
テレビ通信として通話を行う相手表示モード、相手だけ
でなく利用者自身もCGキャラクタとして表示して、バー
チャルテレビ通信として通話を行う本人同時表示モード
があり、これを識別子により管理する。本実施例では、
非表示モードを０、相手表示モードを１、本人同時表示
モードを２として識別子を割り当てるものとして説明す
る。(Operation at the time of setting)
As shown in (a), the correspondent management table 1a is stored. The correspondent management table 1a manages the contents of the sender ID, telephone number, name, and display mode of the communication partner. The display mode is a non-display mode when talking in normal voice communication without displaying the CG character, a display mode in which only the communication partner is displayed as the CG character and the call is performed as virtual TV communication, not only the other party There is a simultaneous display mode in which the user himself / herself also displays it as a CG character and makes a call as virtual television communication, and this is managed by an identifier. In this embodiment,
The description will be given assuming that the non-display mode is 0, the partner display mode is 1, the principal simultaneous display mode is 2, and the identifiers are assigned.

【００４２】なお、送信者IDの番号０は、本人を示すも
のとして予め定められているものとする。なお、本実施
例では電話通信を基本として考えているため、電話番号
により送着信の管理が行われているものとして話を進め
るが、例えば、インターネットならばTCP/IPに基づくIP
アドレスや利用者に対するメールアドレスなどでも良
い。これらは通信インフラに依存して決まる、通信者特
定を行うための識別子なので、このような条件を満たす
識別子なら全てに対応可能である。It is assumed that the sender ID number 0 is predetermined as an identification of the person. In this embodiment, since telephone communication is basically considered, it is assumed that transmission / reception is managed by telephone number.For example, in the case of the Internet, IP based on TCP / IP
It may be an address or an email address for the user. Since these are identifiers that are determined depending on the communication infrastructure and are used to identify a correspondent, any identifier that satisfies such conditions can be used.

【００４３】図４（ａ）のCGデータ管理テーブル３ｂ
は、データ管理部３に保存されたテーブルで通信相手に
対するCGデータの設定を保存管理するためのテーブルで
ある。送信者に対して決定した、CGキャラクタデータ管
理テーブル３ａにおけるCGキャラクタID、背景データ管
理テーブルにおける背景ID、動作パターン管理テーブル
における通話開始前と通話開始後の身体動作パターンID
からなる項目を送信者IDによって管理する。CG data management table 3b of FIG. 4 (a)
Is a table stored in the data management unit 3 for storing and managing CG data settings for a communication partner. The CG character ID in the CG character data management table 3a, the background ID in the background data management table, and the physical motion pattern IDs before and after the call start in the motion pattern management table, which are determined for the sender.
Items consisting of are managed by sender ID.

【００４４】図４（ａ）の音声管理テーブル３ｃも、デ
ータ管理部３に保存されたテーブルで、送信者に対して
決定した、通信相手に対する音声変換数値パラメータ、
着信時の音楽データIDからなる項目を送信者IDによって
管理するためのものである音声変換数値パラメータは、
音声変換部６で用いるもので、バンドパスフィルタによ
って音声変換を掛ける場合は、各バンドパスフィルタに
割り振った識別子である。例えば、０はフィルタなし、
１は１kHz以下のフィルタ、２は１〜５kHzのフィルタ、
３は５kHz以上のフィルタというように識別子を割り振
る。The voice management table 3c of FIG. 4 (a) is also a table stored in the data management section 3, and is a voice conversion numerical parameter for the communication partner determined for the sender,
The voice conversion numerical parameter for managing the item consisting of the music data ID at the time of incoming call by the sender ID is
It is used by the voice conversion unit 6, and is an identifier assigned to each bandpass filter when voice conversion is performed by a bandpass filter. For example, 0 is no filter,
1 is a filter of 1 kHz or less, 2 is a filter of 1 to 5 kHz,
3 assigns an identifier such as a filter of 5 kHz or more.

【００４５】このように、変換に必要なパラメータを識
別子化したものなので、変換方法（例えば、ピッチ変換
によって音声変換を行う場合でも、変換に必要なパラメ
ータの組を識別子化しておけば良い）には依存しない。
尚、前記音声変換数値パラメータは、音声の高低を決定
する識別子であり利用者が設定を変更することによりボ
イスチェンジャーのような効果を有する。また音楽デー
タIDは、いわゆる着信メロディーを決定する識別子とな
る。As described above, since the parameters required for conversion are identified, the conversion method (for example, even when voice conversion is performed by pitch conversion, the set of parameters required for conversion may be identified). Does not depend on.
The voice conversion numerical parameter is an identifier for determining the level of voice, and has an effect like a voice changer when the user changes the setting. The music data ID serves as an identifier for determining a so-called incoming melody.

【００４６】設定時の動作について図４（ｂ）に基づい
て説明する。キャラクタ背景選択入力部２に備えられた
当該設定状態移行入力部を利用者が操作すると、設定可
能状態に移行することがデータ管理部３に通知され、デ
ータ管理部３は通信部１に保存された通信者管理テーブ
ル１ａの内容を読み出し、３次元描画部１４へ送る（Ｓ
４０１）。３次元描画部１４では予め保持した設定画面
データに基づき、送られてきた通信者管理テーブル１ａ
の内容を反映した設定画面を生成して表示部１５に設定
画面を表示する。キャラクタ背景選択入力部２で、通信
者の選択をおこない（Ｓ４０２）、その通信者に対する
前述の識別子に従った表示モードを入力する。選択が非
表示モードを表す０の場合（Ｓ４０３）は、設定は終了
する。The operation at the time of setting will be described with reference to FIG. When the user operates the setting state transition input section provided in the character background selection input section 2, the data management section 3 is notified that the setting state transition input section is entered, and the data management section 3 is saved in the communication section 1. The contents of the correspondent management table 1a are read and sent to the three-dimensional drawing unit 14 (S
401). The three-dimensional drawing unit 14 sends the correspondent management table 1a based on the setting screen data stored in advance.
A setting screen reflecting the contents of is generated and the setting screen is displayed on the display unit 15. The character background selection input unit 2 selects a correspondent (S402) and inputs the display mode according to the above-mentioned identifier for the correspondent. If the selection is 0 indicating the non-display mode (S403), the setting ends.

【００４７】次に、表示モードが相手のみをCGキャラク
タとして表示する表示モード１、又は利用者本人もCGキ
ャラクタとして表示する表示モード２の場合は、その結
果がデータ管理部３を介して通信部１と３次元描画部１
４に通知される。通信部１では、通信者管理テーブル１
ａに選択結果の表示モードを記入保存する。３次元描画
部１４では、図３に示したような予め定めておいたキャ
ラクタ選択設定画面や、衣類テクスチャ設定画面、身体
動作パターン設定画面を順次生成して表示部１５で表示
する。Next, when the display mode is the display mode 1 in which only the other party is displayed as a CG character or the display mode 2 in which the user himself is also displayed as a CG character, the result is the communication unit via the data management unit 3. 1 and 3D drawing unit 1
4 will be notified. In the communication unit 1, the correspondent management table 1
The display mode of the selection result is entered and saved in a. The three-dimensional drawing unit 14 sequentially generates a predetermined character selection setting screen as shown in FIG. 3, a clothing texture setting screen, and a body motion pattern setting screen and displays them on the display unit 15.

【００４８】なお、キャラクタ選択画面では、CGキャラ
クタデータ管理テーブル３ａに示されたサムネールのア
ドレスや名前を基に、図３のようにCGキャラクタの画像
や名前を描画する。そして、CGキャラクタ選択設定画面
や、衣類テクスチャ設定画面、身体動作パターン設定画
面が順次表示されるが、そのうちキャラクタ背景選択入
力部２で選択入力したデフォルトや特定通信者に対する
前記CGキャラクタ選択設定画面での選択結果、前記身体
動作パターン設定画面での選択結果は、データ管理部３
に保存されたCGデータ管理テーブル３ｂの該当欄にその
IDが記録される。また、前記衣類テクスチャ設定画面で
の選択結果は、データ管理部３に保存されたCGキャラク
タ管理テーブル３ａの該当欄に記録される。On the character selection screen, the image and name of the CG character are drawn as shown in FIG. 3 based on the thumbnail address and name shown in the CG character data management table 3a. Then, a CG character selection setting screen, a clothing texture setting screen, and a body movement pattern setting screen are sequentially displayed. Among them, the CG character selection setting screen for the default or specific correspondent selected and input by the character background selection input unit 2 is displayed. The selection result on the physical movement pattern setting screen is the data management unit 3
The CG data management table 3b stored in
The ID is recorded. The selection result on the clothing texture setting screen is recorded in the corresponding column of the CG character management table 3a stored in the data management unit 3.

【００４９】なお、身体動作パターンの選択は、通話開
始前のものと通話開始後のものの２種類を選択し、その
際には、動作パターン管理テーブルに記載された名前を
設定画面に表示することも可能である。この表示によ
り、利用者は身体動作のイメージが掴み易くなるので選
択がし易い。例えば、マンボダンスとか、ワルツダン
ス、アナウンサーの動き、有名タレントの動きなどであ
る（Ｓ４０４）。For selecting the physical movement pattern, two types, one before the call start and one after the call start, are selected. At that time, the name described in the motion pattern management table is displayed on the setting screen. Is also possible. With this display, the user can easily grasp the image of the body motion, and thus can easily select the image. For example, mambo dance, waltz dance, announcer movement, famous talent movement, etc. (S404).

【００５０】同様に、音声選択入力部４によって、音声
変換パラメータや音楽データの設定入力を行うが、その
入力モードへの移行は音声選択入力部４に予め定められ
た当該設定状態移行入力部を利用者が操作すると、その
移行が通信部１を経由してデータ管理部３を介して、３
次元描画部１４に通知される。３次元描画部１４は、予
め定められた設定画面生成して表示部１５に表示する。
表示された設定画面に基づき、利用者は音声選択入力部
４によって、音声変換パラメータや音楽データを選択入
力する。入力された選択結果は、データ管理部３に保存
された音声管理テーブル３ｃに記録される（Ｓ４０
４）。Similarly, the voice selection input unit 4 inputs and sets voice conversion parameters and music data. The transition to the input mode is made by the preset state transition input unit preset in the voice selection input unit 4. When the user operates, the transition is made through the communication unit 1 and the data management unit 3
The dimension drawing unit 14 is notified. The three-dimensional drawing unit 14 generates a predetermined setting screen and displays it on the display unit 15.
Based on the displayed setting screen, the user selects and inputs a voice conversion parameter and music data using the voice selection input unit 4. The input selection result is recorded in the voice management table 3c stored in the data management unit 3 (S40).
4).

【００５１】相手表示モードの場合は次に背景の選択設
定に移行する（Ｓ４０５）。また、本人同時表示モード
が選択された場合は、利用者本人に対するCGキャラク
タ、衣類テクスチャ、動作パターンの選択入力をキャラ
クタ背景選択入力部２によって上記と同様に行った後
（Ｓ４０６）、背景の選択に移行する。In the case of the partner display mode, the process proceeds to the background selection setting (S405). When the person simultaneous display mode is selected, the selection of the CG character, clothing texture, and motion pattern for the user is performed by the character background selection input unit 2 in the same manner as above (S406), and then the background is selected. Move to.

【００５２】背景の選択についても、予め定められた背
景設定画面が表示され、キャラクタ背景選択入力部２に
よって背景を選択する（Ｓ４０７）。選択結果は、デー
タ管理部３に保存されたCGデータ管理テーブル３ｂに記
憶される。Regarding the selection of the background, a predetermined background setting screen is displayed, and the background is selected by the character background selection input unit 2 (S407). The selection result is stored in the CG data management table 3b stored in the data management unit 3.

【００５３】最後に、上記のCGキャラクタの設定および
身体動作パターンの設定の際に、表情パターンデータの
中の特定の表情動作データのアドレス、身体動作パター
ンデータの中の特定の身体動作データアドレスを動作表
情入力部１６に通知する。動作表情入力部１６では、通
知された身体動作データのアドレスと表情動作データの
アドレスを保持し、動作表情入力部１６に予め用意され
た入力ボタンと対応づけを行う。その入力ボタンを利用
者が押したならば、それに対応する身体動作データ又は
表情データのアドレスがデータ管理部３に通知され、そ
の通知結果は身体動作データのアドレスならば身体動作
制御部１２に、表情動作データのアドレスならば表情制
御部１３に通知される。入力ボタンを複数用意すること
で、保持できる身体動作データのアドレス、表情動作デ
ータのアドレスは複数保持できる。Finally, at the time of setting the CG character and the physical motion pattern, the address of the specific facial motion data in the facial pattern data and the address of the specific physical motion data in the physical motion pattern data are set. The motion facial expression input unit 16 is notified. The motion facial expression input unit 16 holds the notified address of the physical motion data and the notified address of the facial motion data, and associates it with the input button prepared in advance in the motion facial expression input unit 16. When the user presses the input button, the address of the body motion data or facial expression data corresponding to the input button is notified to the data management unit 3, and if the notification result is the address of the body motion data, the body motion control unit 12 If it is the address of the facial expression motion data, the facial expression control unit 13 is notified. By preparing a plurality of input buttons, a plurality of addresses of body motion data and facial motion data that can be held can be held.

【００５４】また、通話開始前と通話開始後の身体動作
データのアドレス、表情動作データのアドレスは、明示
的に判るようにしておく。なお、本実施例ではボタン入
力として記述したが、特定できる入力部（例えばキーボ
ード、マウスなど）ならば、いかなるものでも良い。従
って、利用者は自身のキャラクタを選択できると共に、
通話相手のキャラクタをも自由に選択することができ、
また、利用者側の通話装置がバーチャルテレビ通話に必
要なデータを備えているため、通話相手が必ずしもバー
チャル通話装置を用いていなくても利用者はバーチャル
テレビ通話を行うことができる。Further, the addresses of the body motion data and the addresses of the facial motion data before and after the start of the call are clearly known. In the present embodiment, the button input is described, but any input unit (for example, keyboard, mouse, etc.) that can be specified may be used. Therefore, the user can select his own character and
You can freely select the character of the other party,
Further, since the communication device on the user side has data necessary for the virtual video call, the user can make a virtual video call even if the other party does not necessarily use the virtual communication device.

【００５５】尚、以上のような、グラフィカルな設定
は、PCでは一般に行われることで、既存のソフト技術に
よって実現可能である。（送着信時の動作）送信時には、通信部１で電話番号を
入力し、保存された通信者管理テーブル１ａに記録され
た電話番号の欄の内容と照合することで、送信者のIDと
表示モードを特定する。着信時は、通常、着信相手の電
話番号が通話前に通知されるので、その電話番号と通信
者管理テーブル１ａの電話番号の欄を照合することで、
送信者IDと表示モードを特定する。なお、通信部１は通
常の音声通信機能は保持しているものとする（携帯電話
の場合ならば、いわゆるベースバンド処理など）。The above-described graphical setting is generally performed on a PC and can be realized by existing software technology. (Operation at the time of sending and receiving) At the time of sending, by inputting the telephone number in the communication unit 1 and checking with the contents of the telephone number column recorded in the saved correspondent management table 1a, the sender's ID and display Specify the mode. When a call arrives, the telephone number of the called party is usually notified before the call. Therefore, by collating the telephone number with the telephone number column of the correspondent management table 1a,
Specify the sender ID and display mode. Note that the communication unit 1 has a normal voice communication function (so-called baseband processing in the case of a mobile phone).

【００５６】特定した表示モードが非表示モードの場合
は、一般に行われている音声通話処理を行う。つまり、
通信相手との通信承認が得られた後、音声データが送信
者から送られてきた場合は、音声処理部５により、デコ
ード処理などの通常行われる音声処理を行って、音声変
換部６を通過して、音声出力部７に送り、音声を出力す
る。また、利用者本人の音声は、音声入力部８から入力
して、音声処理部５で、通常行われる音声データの圧縮
などの音声処理を行って、通信部１を介して通信相手に
送信する。When the specified display mode is the non-display mode, the voice call processing generally performed is performed. That is,
When the voice data is sent from the sender after the approval of the communication with the communication partner is obtained, the voice processing unit 5 performs a normal voice process such as a decoding process and passes the voice conversion unit 6. Then, the audio is output to the audio output unit 7, and the audio is output. In addition, the voice of the user himself is input from the voice input unit 8, the voice processing unit 5 performs voice processing such as compression of voice data that is normally performed, and transmits the voice data to the communication partner via the communication unit 1. .

【００５７】特定した表示モードが相手のみをCGキャラ
クタとして表示する、相手表示モードの場合についてそ
の動作を説明するが、動作は通話開始前と通話開始後に
分けられ、通話開始は通信部１によってその開始をデー
タ管理部３に知らせる。The operation will be described for the case where the specified display mode is the other party display mode in which only the other party is displayed as a CG character. The operation is divided before and after the call is started, and the call is started by the communication unit 1. Notify the data management unit 3 of the start.

【００５８】送着信時の通話開始前において、前述のよ
うに送信相手の電話番号が特定できるので、通信者管理
テーブル１ａから通信相手の送信者IDを通信部１で特定
し、送信者IDをデータ管理部３に送る。データ管理部３
は保存しているCGデータ管理テーブル３ｂから、送信者
IDに対応するCGキャラクタID、背景ID、動作パターンID
（通話前と後の身体動作パターンの２つのID）を特定す
る。送られてきた送信者IDに対応するものが、CGデータ
管理テーブル３ｂにない場合は、デフォルト設定された
CGキャラクタID、背景ID、動作パターンID（通話前と後
の身体動作パターンの２つのID）を特定する。Since the telephone number of the transmission partner can be specified as described above before the call is started at the time of transmission / reception, the sender ID of the communication partner is specified by the communication unit 1 from the communication party management table 1a, Send to the data management unit 3. Data management unit 3
Is the sender from the saved CG data management table 3b.
CG character ID corresponding to ID, background ID, motion pattern ID
Identify (two IDs of the physical movement pattern before and after the call). If the one corresponding to the sent sender ID is not in the CG data management table 3b, it is set as default.
A CG character ID, a background ID, and a motion pattern ID (two IDs of a physical motion pattern before and after a call) are specified.

【００５９】データ管理部３では、特定したCGキャラク
タIDによりCGキャラクタデータ管理テーブル３ａから、
CGキャラクタ形状データのアドレス、交換前の衣類テク
スチャのアドレスおよび交換後の衣類テクスチャのアド
レス、通話開始前と通話開始後の２つの表情パターンデ
ータのアドレス、口唇動作パターンのアドレスを特定す
る。保存されている背景データ管理テーブルにより、特
定された背景IDから背景データのアドレスを特定する。
また、保存されている動作パターン管理テーブルによ
り、動作パターンID（通話前と後の身体動作パターンの
２つのID）から通話開始前と通話開始後の２つの身体動
作パターンのアドレスを特定する。In the data management unit 3, the CG character data management table 3a is used for the specified CG character ID,
The address of the CG character shape data, the address of the clothing texture before the exchange and the address of the clothing texture after the exchange, the addresses of the two facial expression pattern data before and after the call start, and the address of the lip movement pattern are specified. The background data management table is stored to specify the address of the background data from the specified background ID.
Further, the stored motion pattern management table identifies the addresses of the two physical motion patterns before and after the call start from the motion pattern IDs (two IDs of the physical motion patterns before and after the call).

【００６０】データ管理部３は、特定した、CGキャラク
タ形状データのアドレス、交換前の衣類テクスチャのア
ドレスと交換後の衣類テクスチャのアドレス、背景デー
タのアドレスを３次元描画部１４に通知する。また、デ
ータ管理部３は、キャラクタ動作データ保存部１９か
ら、特定した通話開始前と通話開始後の２つの身体動作
パターンのアドレス、通話開始前と通話開始後の２つの
表情パターンデータのアドレス、口唇動作パターンデー
タのアドレスにより、通話開始前と通話開始後の２つの
身体動作パターンデータを読み出して身体動作制御部１
２に送り、通話開始前と通話開始後の２つの表情パター
ンデータを読み出して表情制御部１３に送り、口唇動作
パターンデータを読み出して口唇動作制御部１１に送
る。The data management unit 3 notifies the specified three-dimensional drawing unit 14 of the address of the CG character shape data, the address of the clothing texture before the exchange, the address of the clothing texture after the exchange, and the address of the background data. In addition, the data management unit 3 stores, from the character motion data storage unit 19, addresses of two specified physical motion patterns before and after the start of the call, addresses of two facial expression pattern data before and after the start of the call, Based on the address of the lip motion pattern data, two physical motion pattern data before the start of the call and after the start of the call are read and the physical motion control unit 1
2, the two facial expression pattern data before and after the start of the call are read and sent to the facial expression control unit 13, and the lip movement pattern data is read and sent to the lip movement control unit 11.

【００６１】口唇動作制御部１１では、口唇動作パター
ンデータ中から適当な口唇動作データのアドレスを選
び、３次元描画部１４にそのアドレスとともにフレーム
番号０からフレーム数分まで順次通知する。口唇動作パ
ターンデータ中から適当な口唇動作データのアドレスを
選ぶ方法としては、乱数を用いて行う方法があるが、等
確率で選択する他に、口唇重み付を行って選択制御を行
う。この処理を通話開始まで繰り返す。なお、乱数を使
わずに固定的な遷移を予め規定しておき、その遷移のシ
ーケンスに従って、口唇動作データのアドレスとフレー
ム番号を３次元描画部１４に通知することもできる。但
し、この場合、利用者は規則的な繰り返し口唇動作を見
ることになる。例えば、「電話だよ」言葉に合わせた口
唇動作を繰り返し表示することもできる。The lip motion control unit 11 selects an appropriate lip motion data address from the lip motion pattern data and sequentially notifies the three-dimensional drawing unit 14 of the address from the frame number 0 to the number of frames. As a method of selecting an appropriate lip motion data address from the lip motion pattern data, there is a method of using a random number. In addition to selection with equal probability, lip weighting is performed to perform selection control. This process is repeated until the call starts. It is also possible to predefine a fixed transition without using random numbers and notify the 3D drawing unit 14 of the address and frame number of the lip motion data according to the sequence of the transition. However, in this case, the user will see regular repeated lip movements. For example, it is possible to repeatedly display the lip movement in accordance with the word "I'm on the phone".

【００６２】身体動作制御部１２は、最初、通話開始前
の身体動作パターンデータの中から、図６（ｂ）に示し
たように、標準状態に相当する身体動作データのアドレ
スとフレーム番号を０から順次フレーム数分３次元描画
部１４に通知する。フレーム数分通知後、各遷移の移行
確率に基づく乱数を発生して次の身体動作データを選択
し、その移行先の身体動作データのアドレスとフレーム
番号を０からフレーム数分３次元描画部１４に通知す
る。終了後は、再び各移行確率に基づく乱数を発生して
遷移を行う。この処理を通話開始まで繰り返す。The physical motion control section 12 first sets the address and frame number of the physical motion data corresponding to the standard state to 0 from the physical motion pattern data before the start of the call, as shown in FIG. 6B. The three-dimensional drawing unit 14 is sequentially notified of the number of frames. After the notification of the number of frames, a random number based on the transition probability of each transition is generated to select the next physical action data, and the address and frame number of the physical action data of the transition destination are set from 0 to the number of frames for the three-dimensional drawing unit 14. To notify. After the end, a transition is performed again by generating a random number based on each transition probability. This process is repeated until the call starts.

【００６３】なお、乱数を使わずに固定的な遷移を予め
身体動作パターンに規定しておき、その遷移のシーケン
スに従って、身体動作データのアドレスとフレーム番号
を３次元描画部１４に通知することもできる。但し、こ
の場合、利用者は規則的な繰り返し身体動作を見ること
になる。例えば、「電話の受話器を取る」といような身
体動作を繰り返し表示することもできる。Note that fixed transitions may be defined in advance in the body movement pattern without using random numbers, and the address and frame number of the body movement data may be notified to the three-dimensional drawing unit 14 according to the sequence of the transitions. it can. However, in this case, the user will see regular repeated physical movements. For example, a physical action such as "pick up the telephone receiver" can be repeatedly displayed.

【００６４】表情制御部１３は、最初、通話開始前の表
情動作パターンデータの中から、図６（ａ）に示したよ
うに、通常顔に相当する表情動作データのアドレスとフ
レーム番号を０から順次フレーム数分３次元描画部１４
に通知する。フレーム数分通知後、各遷移の移行確率に
基づく乱数を発生して次の表情動作データを選択し、そ
の移行先の表情動作データのアドレスとフレーム番号を
０からフレーム数分３次元描画部１４に通知する。終了
後は、再び各移行確率に基づく乱数を発生して遷移を行
う。この処理を通話開始まで繰り返す。First, the facial expression control unit 13 sets the address and the frame number of the facial expression action data corresponding to the normal face from 0 from the facial action pattern data before the start of the call, as shown in FIG. 6A. Three-dimensional drawing unit 14 for the number of frames in sequence
To notify. After the notification of the number of frames, a random number based on the transition probability of each transition is generated to select the next facial expression motion data, and the address and frame number of the facial expression motion data of the transition destination are set from 0 to the number of frames for the three-dimensional drawing unit 14. To notify. After the end, a transition is performed again by generating a random number based on each transition probability. This process is repeated until the call starts.

【００６５】なお、乱数を使わずに固定的な遷移を予め
表情動作パターンに規定しておき、その遷移のシーケン
スに従って、表情動作データのアドレスとフレーム番号
を３次元描画部１４に通知することもできる。但し、こ
の場合、利用者は規則的な繰り返し表情動作を見ること
になる。例えば、「通常の顔と困った顔」というような
表情動作を繰り返し表示することもできる。Note that it is also possible to predefine a fixed transition in the facial expression action pattern without using random numbers, and notify the three-dimensional drawing unit 14 of the address and frame number of the facial action data in accordance with the sequence of the transition. it can. However, in this case, the user will see a regular repeated facial expression motion. For example, facial expressions such as “normal face and troubled face” can be repeatedly displayed.

【００６６】３次元描画部１４での基本的な３次元描画
の動作について説明をする。３次元描画部１４は、デー
タ管理部３から通知された、CGキャラクタ形状データの
アドレス、交換前の衣類テクスチャのアドレスと交換後
の衣類テクスチャのアドレス、背景データのアドレスに
より、キャラクタ形状データ保存部１８から描画を行う
CGキャラクタの形状データと、テクスチャデータ保存部
２１から衣類テクスチャデータ、背景データ保存部２０
から背景データをまずロードしておく。The basic operation of three-dimensional drawing in the three-dimensional drawing unit 14 will be described. The three-dimensional drawing unit 14 uses the address of the CG character shape data, the address of the clothing texture before the exchange and the address of the clothing texture after the exchange, and the address of the background data notified from the data management unit 3 to store the character shape data storage unit. Draw from 18
CG character shape data, texture data storage unit 21 to clothing texture data, background data storage unit 20
First, load the background data from.

【００６７】次に、口唇動作制御部１１から通知され
る、口唇動作データのアドレスとフレーム番号、身体動
作制御部１２から通知される、身体動作データアドレス
とフレーム番号、表情制御部１３から通知される、表情
動作データのアドレスとフレーム番号を受けとる。受け
とった口唇動作データのアドレス、身体動作データのア
ドレス、表情動作データのアドレスにより、キャラクタ
動作データ保存部から口唇動作データ、身体動作デー
タ、表情動作データをロードする。このロードは、口唇
動作制御部１１、身体動作制御部１２、表情制御部１３
から通知される各動作のアドレスが更新されない限り、
通知の最初に一度だけ行う。尚、特定の通信相手に対応
するキャラクタを着信時に画面に表示するため、利用者
は画面に表示されたキャラクタを見るだけで誰からの着
信かを理解することができる。Next, the lip motion data address and frame number notified by the lip motion control unit 11, the physical motion data address and frame number notified by the physical motion control unit 12, and the facial expression control unit 13 are notified. Receive the address and frame number of the facial expression motion data. The lip movement data, the body movement data, and the facial movement data are loaded from the character movement data storage unit according to the received addresses of the lip movement data, the addresses of the body movement data, and the addresses of the facial movement data. This load is performed by the lip motion control unit 11, the body motion control unit 12, and the facial expression control unit 13.
Unless the address of each operation notified from is updated,
Only once at the beginning of the notification. Since a character corresponding to a specific communication partner is displayed on the screen when an incoming call is received, the user can understand who the incoming call is from just looking at the character displayed on the screen.

【００６８】口唇動作制御部１１から通知されたフレー
ム番号の動作データをロードした口唇動作データから生
成するが、これは口唇動作データが形状変形のような場
合なら、通常行われている、キーフレームアニメーショ
ンの技術と同様に、キーとなる動作データの補間によっ
て生成し、テクスチャの場合でもキーとなるテクスチャ
の補間によって生成する。生成したフレーム番号の動作
データを用いて、形状変形の場合ならばCGキャラクタ形
状データの口部の形状を変形する。テクスチャの場合
は、通常行われているテクスチャマッピングの技術によ
り、口部にマッピングを行うが、これは３次元描画処理
の時に行われる。The motion data of the frame number notified from the lip motion control unit 11 is generated from the loaded lip motion data. This is a key frame that is usually used when the lip motion data is a shape deformation. Similar to the animation technique, it is generated by interpolation of key motion data, and even in the case of texture, it is generated by interpolation of key texture. In the case of shape transformation, the shape of the mouth of the CG character shape data is transformed using the generated motion data of the frame number. In the case of a texture, a texture mapping technique that is usually used is used to perform mapping on the mouth portion, which is performed at the time of three-dimensional drawing processing.

【００６９】表情動作データの場合も同様に、通知され
たフレーム番号の動作データを生成して、その動作デー
タに基づき、形状変形の場合は顔の変形を行う。テクス
チャの場合はテクスチャマッピングによって顔部の描画
を行うが、これは３次元描画処理の時行う。また、通知
されたフレーム番号の身体動作データの動作データを、
キーとなる身体動作データの補間により生成し、その身
体動作データに基づき、前述の変換をCGキャラクタに施
してCGキャラクタの位置と身体状態を決定する。Similarly, in the case of facial expression motion data, the motion data of the notified frame number is generated, and in the case of shape deformation, the face is deformed based on the motion data. In the case of a texture, the face part is drawn by texture mapping, which is performed during the three-dimensional drawing process. In addition, the motion data of the body motion data of the notified frame number,
It is generated by interpolation of the physical motion data that is a key, and based on the physical motion data, the aforementioned conversion is applied to the CG character to determine the position and physical condition of the CG character.

【００７０】この後、背景データ、衣類テクスチャデー
タ、口唇動作データがテクスチャの場合はそのテクスチ
ャ、表情動作データがテクスチャの場合はそのテクスチ
ャを用いて、通常行われる３次元描画処理（モデリング
変換、視界変換、透視変換、スクリーン変換、スクリー
ンへのピクセル処理の順で行うが、テクスチャマッピン
グはスクリーンへのピクセル処理の際に実施する）によ
り画像を生成する。その際、カメラデータ（カメラの位
置と方向、画角で視界変換、スクリーン変換に必要）
は、最初デフォルトのものを用いる。例えば、CGキャラ
クタ正面を向いて身体全体が生成された画像の中心部に
あるように設定するなどで、このような設定は、CGキャ
ラクタを含む最小の直方体を求め、CGキャラクタのルー
トの方向ベクトルの正面部に相当する方向と逆向きの光
軸で、その重心部が光軸上にあり、各頂点がスクリーン
に含まれるように画角を設定すれば良い。After that, if the background data, the clothing texture data, and the lip movement data are textures, the textures are used. If the facial expression movement data are textures, the textures are used to perform the three-dimensional drawing processing (modeling conversion, visual field conversion) normally performed. Conversion, perspective conversion, screen conversion, and pixel processing to the screen are performed in this order, but texture mapping is performed at the time of pixel processing to the screen) to generate an image. At that time, camera data (required for view conversion and screen conversion depending on the position and direction of the camera and angle of view)
First uses the default one. For example, by setting the whole body facing the front of the CG character so as to be in the center of the generated image, such setting finds the smallest rectangular parallelepiped including the CG character, and determines the direction vector of the root of the CG character. The angle of view may be set so that the optical axis is in the direction opposite to the direction corresponding to the front part of the optical axis, its center of gravity is on the optical axis, and each vertex is included in the screen.

【００７１】また、視点変更入力部１７により、カメラ
データを入力して、３次元描画部１４に通知し、このカ
メラデータに基づいて３次元描画処理を行うことで、視
点が変更された画像が生成できる。また、視点変更入力
部１７にプリセットしたカメラデータを用意しておき、
そのプリセットデータを３次元描画部１４に通知して視
点の変更を行う。Further, by inputting camera data from the viewpoint changing input unit 17 and notifying the 3D drawing unit 14 and performing 3D drawing processing based on this camera data, an image whose viewpoint has been changed is displayed. Can be generated. In addition, camera data preset in the viewpoint change input unit 17 is prepared,
The preset data is notified to the three-dimensional drawing unit 14 to change the viewpoint.

【００７２】動作表情入力部１６は、上記で述べたよう
に予め設定しておいた入力ボタンを利用者が押すと身体
動作データのアドレス又は表情動作データのアドレスが
データ管理部３を介して、身体動作データのアドレスに
ついては身体動作制御部１２に、表情動作データのアド
レスについては表情制御部１３に通知される。身体動作
データのアドレスの場合、身体動作制御部１２はこの通
知を受けとると、現在、３次元描画部１４に通知してい
る身体動作データに関して最後のフレーム数番号の通知
が終ると、通常は次の移行先の身体動作データを上述の
ように選択するが、強制的に通知された身体動作データ
のアドレスとフレーム番号を３次元描画部１４に通知す
る。表情動作データのアドレスの場合も同様に、表情制
御部１３は現在通知している表情動作データの通知が終
了後、強制的にデータ管理部３から通知された表情動作
データのアドレスとフレーム番号を３次元描画部１４に
通知する。これにより、自動的に選択されてアニメーシ
ョンを通常行うが、利用者が自分の好みで選択した動作
を強制的に表示できるようになる。When the user presses the input button which is set in advance as described above, the action facial expression input unit 16 sends the address of the physical action data or the address of the facial action data via the data management unit 3. The address of the body motion data is notified to the body motion control unit 12, and the address of the facial motion data is notified to the facial expression control unit 13. In the case of the address of the body movement data, when the body movement control unit 12 receives this notification, when the notification of the last frame number of the body movement data currently notified to the three-dimensional drawing unit 14 ends, the next unit is normally set. The body movement data of the transition destination is selected as described above, but the address and frame number of the body movement data forcibly notified are notified to the three-dimensional drawing unit 14. Similarly, in the case of the address of the facial expression motion data, the facial expression control unit 13 compulsorily determines the address and the frame number of the facial expression motion data notified from the data management unit 3 after the notification of the facial expression motion data currently notified is completed. Notify the three-dimensional drawing unit 14. This allows the user to forcibly display the action selected by the user, which is automatically selected and normally used for animation.

【００７３】以上のようにして生成された３次元描画終
了後の画像は表示部１５に転送されて表示される。な
お、３次元描画部１４での３次元描画処理は、通常、表
示部１５のリフレッシュレートに合わせて処理が行われ
る。口唇動作制御部１１、身体動作制御部１２、表情制
御部１３から通知される動作のアドレスとフレーム番号
は、３次元描画部１４の３次元描画処理中に通知され、
次に用いるデータとしてセットされている。次のフレー
ムの３次元描画処理を行う際には、このセットされた各
動作データのアドレスとフレーム番号が用いられる。以
上により、口唇動作制御部１１、身体動作制御部１２、
表情制御部１３からの通知に関して同期制御が行われ
る。The image generated as described above after the three-dimensional drawing is completed is transferred to the display unit 15 and displayed. It should be noted that the three-dimensional drawing process in the three-dimensional drawing unit 14 is usually performed according to the refresh rate of the display unit 15. The address and frame number of the motion notified from the lip motion control unit 11, the body motion control unit 12, and the facial expression control unit 13 are notified during the 3D drawing process of the 3D drawing unit 14,
It is set as the data to be used next. When performing the three-dimensional drawing process of the next frame, the address and frame number of each set operation data are used. From the above, the lip motion control unit 11, the body motion control unit 12,
Synchronous control is performed regarding the notification from the facial expression control unit 13.

【００７４】音楽データに関して説明する。データ管理
部３では、音声管理テーブル３ｃによって送信者IDに対
応する音声変換数値パラメータの値と、音楽データIDを
特定する。送られてきた送信者IDに対応するものが、音
声管理テーブル３ｃにない場合は、デフォルト設定され
た音声変換数値パラメータと音楽データIDを特定する。
音楽データ管理テーブルから音楽IDにより音楽データの
アドレスを取得する。取得した音楽データのアドレスに
より、音楽データ保存部２２から該当音楽データをロー
ドして、音声処理部５に転送する。The music data will be described. The data management unit 3 specifies the value of the voice conversion numerical parameter corresponding to the sender ID and the music data ID by the voice management table 3c. When the one corresponding to the sent sender ID is not in the voice management table 3c, the voice conversion numerical parameter and the music data ID set as default are specified.
The address of the music data is acquired from the music data management table by the music ID. The corresponding music data is loaded from the music data storage unit 22 according to the obtained address of the music data and transferred to the voice processing unit 5.

【００７５】音声処理部５は、音楽データが圧縮されて
いる場合はその伸長処理や、MIDIデータなどの符号化さ
れた音楽データの場合は、保存されている音源データで
の音生成処理を行って、音声変換部６を通して音声出力
部７から音楽を出力する。このように、着信時には音声
出力部７から通信相手のキャラクタと結びつけた着信メ
ロディーを出力することにより通信相手の確認を容易に
することができる。The voice processing unit 5 performs a decompression process for compressed music data and a sound generation process for stored sound source data in the case of encoded music data such as MIDI data. Then, the audio output unit 7 outputs music through the audio conversion unit 6. In this way, when an incoming call arrives, the voice output unit 7 outputs the incoming melody linked to the character of the communication partner, so that the communication partner can be easily confirmed.

【００７６】以上の操作により、音楽が流れた状態で、
CGキャラクタを表示することは可能であるが、音楽とCG
キャラクタの動作は基本的には同期しない（音楽データ
に合わせて、予め同期が取れるように動作データを作成
しておけば同期は取れるので、少なくとも最初の出力を
同期させることはできる）。By the above operation, while the music is playing,
It is possible to display CG characters, but music and CG
The movements of the character are basically not synchronized (if the movement data is created in advance so as to be synchronized with the music data, the movement can be synchronized, so at least the first output can be synchronized).

【００７７】ここで、音楽とCGキャラクタの同期につい
て述べる。音楽データに映像データなどで用いるタイム
スタンプに相当する時間管理データが含まれたものを用
いる。これは、MPEG-4（Moving Picture Experts Group
Phase 4）のオーディオにはタイムスタンプが入ってお
り、またMIDIデータならばデルタタイムと呼ばれる、時
間増分データを積分制御すれば代用できる。音声処理部
５では音楽データを音声出力部７に転送する際に、タイ
ムスタンプを管理し、これから音楽出力使用としている
もののタイプスタンプを時間同期信号として、口唇動作
制御部１１、身体動作制御部１２、表情制御部１３に送
る。口唇動作データ、表情動作データ、身体動作データ
にも０から始まるタイムスタンプを入れたものを用い
る。これは予め、音楽に合わせてタイムスタンプを割り
振っておく。Here, the synchronization of music and CG characters will be described. Music data including time management data corresponding to a time stamp used for video data or the like is used. This is MPEG-4 (Moving Picture Experts Group
Phase 4) has a time stamp in the audio, and if it is MIDI data, it is called Delta Time, which can be substituted by integrating the time increment data. The voice processing unit 5 manages the time stamp when transferring the music data to the voice output unit 7, and uses the type stamp of what is to be used for music output as a time synchronization signal, the lip motion control unit 11, the body motion control unit 12 , To the facial expression control unit 13. The lip movement data, facial expression movement data, and body movement data with time stamps starting from 0 are also used. For this, a time stamp is assigned in advance according to the music.

【００７８】口唇動作制御部１１、身体動作制御部１
２、表情制御部１３では、この送られてきたタイムスタ
ンプと各々が制御する、動作データのタイムスタンプの
番号を照合するが、その際、これまで３次元描画を行っ
てきた動作データのタイムスタンプの累積数を各動作が
持っているタイムスタンプに加算すると、音楽のタイム
スタンプと一致することを用いる。この照合に合ったフ
レーム番号と動作データのアドレスを３次元描画部１４
に同時に送る。以上の処理により、音楽データと同期し
た動作制御を行うことができる。Lip motion controller 11, body motion controller 1
2. The facial expression control unit 13 compares the sent time stamp with the time stamp number of the motion data controlled by each, and at that time, the time stamp of the motion data that has been three-dimensionally rendered so far. It is used that when the cumulative number of is added to the time stamp of each operation, it matches the music time stamp. The frame number and the address of the motion data that match this collation are given to the three-dimensional drawing unit 14
Send to at the same time. With the above processing, operation control in synchronization with music data can be performed.

【００７９】次に、通話開始後の動作について説明をす
る。通信部１で、通信相手との通話開始が成立したこと
を判定する。これは、通常の電話通信であれば、自分か
ら電話を掛けたのであれば、相手が受話器を取った時
に、アクセプト信号を返信してもらうことで、また、相
手から掛かってきた時には、受話器を取ることで相手に
アクセプト信号を返すことで通信成立を認知できる。携
帯電話などの無線通信やインターネットなどでの通信で
も基本的な機構は同じで、通信開始の成立を認知するこ
とができる。通信部１は、通話が成立したことをデータ
管理部３に通知する。Next, the operation after the call is started will be described. The communication unit 1 determines that the call with the communication partner has been started. For normal telephone communication, if you make a call from yourself, when the other party picks up the receiver, you will receive an accept signal, and when you are called from the other party, The communication establishment can be recognized by returning an accept signal to the other party by taking it. The basic mechanism is the same for wireless communication such as mobile phones and communication over the Internet, and it is possible to recognize the establishment of communication start. The communication unit 1 notifies the data management unit 3 that the call has been established.

【００８０】データ管理部３は通話成立の通知を受ける
と、音声処理部５への音楽データの転送を中止し、通話
が開始されることを通知し、さらに、データ管理部３は
音声管理テーブル３ｃから音声変換数値パラメータを読
み出し、音声処理部５を介して音声変換部６へ通知す
る。また、並行して、口唇動作制御部１１、身体動作制
御部１２、表情制御部１３に通話が開始されることを通
知する。When the data management unit 3 receives the notification of the establishment of the call, it stops the transfer of the music data to the voice processing unit 5 and notifies that the call is started. The voice conversion numerical parameter is read from 3c and notified to the voice conversion unit 6 via the voice processing unit 5. In parallel, the lip movement control unit 11, the body movement control unit 12, and the facial expression control unit 13 are notified that the call is started.

【００８１】口唇動作制御部１１、身体動作制御部１
２、表情制御部１３は通知を受けとると、３次元描画部
１４への転送を止める。口唇動作制御部１１は、後述す
る音声解析部９が音強度解析処理のみを行う場合は、図
５（ａ）に示したレベル０の状態の口唇動作データのア
ドレスとフレーム番号、音素解析のみ又は音強度解析と
音素解析とを両方行う場合は、図５（ｂ）に示した
「ん」音の口唇動作データのアドレスとフレーム番号を
３次元描画部１４に送る。Lip motion controller 11, body motion controller 1
2. Upon receiving the notification, the facial expression control unit 13 stops the transfer to the three-dimensional drawing unit 14. When the voice analysis unit 9 described later performs only the sound intensity analysis process, the lip motion control unit 11 only addresses and frame numbers of the lip motion data in the level 0 state shown in FIG. When both the sound intensity analysis and the phoneme analysis are performed, the address and the frame number of the lip motion data of the “n” sound shown in FIG. 5B are sent to the three-dimensional drawing unit 14.

【００８２】身体動作制御部１２は、通話開始後の身体
動作パターンデータの標準状態の身体動作データのアド
レスとフレーム番号を３次元描画部１４に送る。表情制
御部１３は、通話開始後の表情動作パターンデータの通
常顔の表情動作データのアドレスとフレーム番号を３次
元描画部１４に送る。３次元描画部１４は、口唇動作制
御部１１、身体動作制御部１２、表情制御部１３から送
られた、動作データのアドレスとフレーム番号を受けと
ると、前述と同様の動作で３次元処理を行って、表示部
１５に生成した画像を送って表示する。The body movement control unit 12 sends the address and frame number of the body movement data in the standard state of the body movement pattern data after the start of the call to the three-dimensional drawing unit 14. The facial expression control unit 13 sends the address and frame number of the facial expression action data of the normal face of the facial action pattern data after the start of the call to the three-dimensional drawing unit 14. Upon receiving the address and frame number of the motion data sent from the lip motion control unit 11, the body motion control unit 12, and the facial expression control unit 13, the three-dimensional drawing unit 14 performs the three-dimensional processing by the same motion as described above. Then, the generated image is sent to the display unit 15 and displayed.

【００８３】音声処理部５は通話開始の通知を受ける
と、通信部１から送られてくる通信媒体に即した音声デ
ータの音声処理（音声データのデコードやノイズキャン
セルなど）を行い、音声処理したデータを音声変換部６
と音声解析部９に送る。When the voice processing unit 5 receives the notification of the start of the call, the voice processing unit 5 performs voice processing of the voice data (decoding of voice data, noise cancellation, etc.) according to the communication medium sent from the communication unit 1, and the voice processing is performed. Data to voice converter 6
To the voice analysis unit 9.

【００８４】音声変換部６では送られてきた音声変か数
値パラメータに基づき音声の変換を掛けて（例えば上述
のようにフィルタ処理で行う場合は、該当フィルタを掛
ける）、音声出力部７に送る。従って、通話者の声が別
の声に変換されて出力される。The voice conversion unit 6 converts the voice based on the voice variation or the numerical parameter that has been sent (for example, when the filter processing is performed as described above, applies the corresponding filter), and sends the voice output unit 7. . Therefore, the voice of the caller is converted into another voice and output.

【００８５】音声解析部９では、送られてきた音声デー
タに対して、音強度解析、又は音素解析、又はその両方
の解析を行う。音強度解析は、図５（ａ）に示したよう
に、音声データを予め定めた一定期間（例えば、表示レ
ート時間）に対して、その振幅の絶対値を積分（サンプ
リング値の加算）し、その積分値を予め定めた区分値に
応じてそのレベル値を決定する。The voice analysis unit 9 performs sound intensity analysis, phoneme analysis, or both analysis on the sent voice data. In the sound intensity analysis, as shown in FIG. 5A, the absolute value of the amplitude of the audio data is integrated (addition of sampling values) with respect to a predetermined period (for example, display rate time), The level value of the integrated value is determined according to a predetermined segment value.

【００８６】音素解析は、通常、音声認識で行われる処
理を行って、各音素が「ん」、「あ」、・・・、「お」
のどれかを分類、又はその割合を出力するものである。
基本的には統計的に集めた「ん」、「あ」、「い」、・
・・、「お」音の音声データを正規化したものをテンプ
レートとして、入力された音声データを音素分解して正
規化したものとテンプレートマッチングを行い、マッチ
ング度の最も高いものを選出するか、マッチング度の割
合を出力する。マッチング度は、適当な距離関数（ユー
クリッド距離、ヒルベルト、マハラノビス）を規定して
おき、その距離関数で計った時の距離が最も小さいもの
を選出したり、「ん」、「あ」、・・・、「お」音の全
ての距離を測定したものの和で各距離を除算した値を割
合として算出する。以上の音声解析結果は感情推定部１
０に送られる。また、音声解析結果から、前述のように
口唇IDを決定して、決定した口唇IDを口唇動作制御部１
１に送る。In the phoneme analysis, a process usually performed by speech recognition is performed so that each phoneme is “n”, “a”, ..., “O”.
Any of the above is classified, or the ratio thereof is output.
Basically, statistically collected "n", "a", "i", ...
..A template is created by normalizing the voice data of the "O" sound and template matching is performed with the input voice data that is phoneme decomposed and normalized, and the one with the highest degree of matching is selected. Output the matching rate. As for the matching degree, an appropriate distance function (Euclidean distance, Hilbert, Mahalanobis) is defined, and the one having the smallest distance measured by the distance function is selected, or “n”, “a”, ...・, Calculate the value obtained by dividing each distance by the sum of all measured distances of "O" sound as a ratio. The above voice analysis result is the emotion estimation unit 1.
Sent to 0. In addition, the lip ID is determined from the voice analysis result as described above, and the determined lip ID is used as the lip motion control unit 1.
Send to 1.

【００８７】口唇動作制御部１１は、音声解析部９から
送られてきた口唇IDにより、口唇動作パターンデータか
ら、それに対応する口唇動作データのアドレスを決定
し、口唇動作データのアドレスとフレーム番号を３次元
描画部１４に送る。The lip movement control section 11 determines the address of the lip movement data corresponding to the lip movement pattern data from the lip movement pattern data based on the lip ID sent from the voice analysis section 9, and obtains the address and frame number of the lip movement data. It is sent to the three-dimensional drawing unit 14.

【００８８】感情推定部１０は、音声解析部９から送ら
れてきた音声解析結果を、予め定めた一定期間分保存し
ておき、その保存結果に対して通話者の感情状態を推定
する。例えば、分類する感情を「通常」、「笑い」、
「怒り」、「泣き」、「悩み」と定める。音強度のレベ
ルに関して、一定期間分のレベルパターンを各感情のテ
ンプレートとして持つ。一定期間を例えば音声解析の３
回分とすれば、「レベル２、レベル２、レベル２」なら
ば「通常」、「レベル３、レベル２、レベル３」ならば
「笑い」「レベル３、レベル３、レベル３」ならば「怒
り」、「レベル１、レベル２、レベル１」ならば「泣
き」、「レベル０、レベル１、レベル０」ならば「悩
み」、をテンプレートとして持つ。The emotion estimation unit 10 stores the voice analysis result sent from the voice analysis unit 9 for a predetermined period of time, and estimates the emotional state of the caller based on the stored result. For example, the emotions to be classified are "normal", "laughter",
It is defined as "anger", "crying", and "worry". Regarding the sound intensity level, it has a level pattern for a certain period as a template of each emotion. For a certain period, for example, 3 for voice analysis
If it is a batch, "normal" if "level 2, level 2, level 2", "laugh" if "level 3, level 2, level 3""anger" if "level 3, level 3, level 3" , "Level 1, Level 2, Level 1" have "crying", and "Level 0, Level 1, Level 0" have "worry" as templates.

【００８９】これに対して、保存した音声結果の３回分
に対して、各レベル値の差の絶対値の和（ヒルベルト距
離）や各レベルの差の２乗和（ユークリッド距離）を計
算して、最も近いものをその時の感情状態として判定す
る。もしくは、各感情に対する距離の和を計算してその
和で各感情に対する距離を除したものを割合として感情
状態を算出する。音素解析結果が送られる場合は、キー
ワードを辞書テンプレートとして持ち、キーワードとの
テンプレートマッチングによって行う。On the other hand, the sum of the absolute values of the differences between the level values (Hilbert distance) and the sum of the squares of the differences between the levels (Euclidean distance) are calculated for the three stored voice results. , The closest one is determined as the emotional state at that time. Alternatively, the sum of distances for each emotion is calculated, and the sum of the distances for each emotion is divided by the sum to calculate the emotional state. When the phoneme analysis result is sent, the keyword is held as a dictionary template and template matching with the keyword is performed.

【００９０】但し、本実施例では、音素解析が母音解析
だけなので次のような手法を用いる。例えば、怒りの場
合、「怒っている」、「憤り」、「殴る」などの怒りを
表す単語を母音表示して、「いあえいう」、「いいお
い」、「あうう」のように表し、一定期間を音声解析結
果の３回分とすると、その並びのうち頭から３文字の辞
書を作る。同様に、他の感情状態についても同じように
辞書を作る。それらの辞書で、同じ並びのものが当然出
てくるが、日常会話などの分析を行い、その頻度の高い
方の感情状態の辞書に含めて、辞書テンプレートを予め
生成しておく。一定期間が３回分の場合は、その母音の
組合せは２１６通りなので、この辞書テンプレートは２
１６の語彙を感情状態で分類したものになる。However, in this embodiment, since the phoneme analysis is only the vowel analysis, the following method is used. For example, in the case of anger, angry words such as “angry”, “indignation”, and “hit” are displayed as vowels and expressed as “Ia-i-i-u”, “Ioi-i”, “Au-u”. , Assuming that the fixed period is three times of the voice analysis result, a dictionary of three letters from the beginning of the sequence is created. Similarly, create dictionaries for other emotional states. Of these dictionaries, the same ones will naturally appear, but everyday conversations are analyzed, and the dictionary template is generated in advance by including them in the dictionary of the emotional state with the higher frequency. If the fixed period is three times, there are 216 combinations of vowels, so this dictionary template has 2
The 16 vocabularies are classified according to emotional states.

【００９１】保存した３回分の音素解析結果を辞書テン
プレートとテンプレートマッチングを行い、感情状態を
判定する。これらを音強度の場合と音素の場合を組み合
わせる場合は、どちらも同じ感情状態を判定した場合
は、その感情状態を、異なる場合は、乱数によって確率
的にどちらかの感情状態を選択して感情状態とする。以
上のように算出した感情状態を身体動作制御部１２と表
情制御部１３に送る。The stored phoneme analysis results for three times are subjected to template matching with the dictionary template to determine the emotional state. When these are combined with the case of sound intensity and the case of phoneme, when both have the same emotional state, the emotional state is selected, and when different, the emotional state is stochastically selected by a random number. State. The emotional state calculated as described above is sent to the body movement control unit 12 and the facial expression control unit 13.

【００９２】一方、利用者が発生した会話は、音声入力
部８に入力され、音声処理部５に入力された音声データ
を送る。音声入力部８としてはマイクロフォンを用い
る。音声処理部５は送られてきた入力音声データに対し
て通常行われる、ノイズキャンセル処理やエコー除去処
理などを行い、処理後の音声データを音声解析部９に送
る。また、処理後の音声データは通信方法に依存した処
理、例えば、符号化処理がストリーム化、パケット化処
理を行って、通信部１を介して通信相手に送信される。
音声解析部９では、送られて来た入力音声データに対し
ても、前述の、音強度解析や音素解析を行い、入力音声
に対する音声解析結果と入力音声のものであることを示
す識別子と共に、感情推定部１０に送る。On the other hand, the conversation generated by the user is input to the voice input unit 8 and the voice data input to the voice processing unit 5 is sent. A microphone is used as the voice input unit 8. The voice processing unit 5 performs noise cancellation processing, echo removal processing, etc., which are usually performed on the input voice data that has been sent, and sends the processed voice data to the voice analysis unit 9. The processed voice data is subjected to processing depending on the communication method, for example, the encoding processing is streamed and packetized, and is transmitted to the communication partner via the communication unit 1.
The voice analysis unit 9 also performs the above-described sound intensity analysis and phoneme analysis on the input voice data that has been sent, and a voice analysis result for the input voice and an identifier indicating that it is the input voice, It is sent to the emotion estimation unit 10.

【００９３】感情推定部１０では、入力音声に対する専
用の保存領域に、音声解析結果を前述のように一定期間
保存して、その保存結果に対して、上記と同様の感情推
定処理を行う。但し、感情推定には聞き手の時の特有の
状態、例えば「納得状態」などを加えた感情推定を行
う。つまり、送信相手の音声データと利用者本人の音声
データに対する感情推定は異なっても良い。感情推定結
果は身体動作制御部１２と表情制御部１３に送られる。The emotion estimation unit 10 stores the voice analysis result in the dedicated storage area for the input voice for a certain period as described above, and performs the same emotion estimation process as described above on the storage result. However, the emotion estimation is performed by adding a peculiar state at the time of the listener, for example, a "convincing state". That is, the emotion estimation for the voice data of the transmission partner and the voice data of the user himself / herself may be different. The emotion estimation result is sent to the body motion control unit 12 and the facial expression control unit 13.

【００９４】また、別の感情推定手法として、韻律や振
幅、強勢等の音声データの周波数信号を用いた方法があ
る。図９は、周波数信号を用いた感情推定方法の処理手
順を示すフローチャートである。尚、この感情推定方法
においては、最も基本的な感情の分類である「怒り」、
「悲しみ」、「喜び」、及び「標準」の４種類の感情を
推定することを前提として説明を行う。As another emotion estimation method, there is a method using a frequency signal of voice data such as prosody, amplitude, stress and the like. FIG. 9 is a flowchart showing a processing procedure of an emotion estimation method using a frequency signal. In this emotion estimation method, the most basic emotion classification is "anger",
The description will be given on the assumption that four types of emotions of “sadness”, “joy”, and “standard” are estimated.

【００９５】まず、利用者本人の音声は、音声データと
して音声入力部８に入力された後に音声処理部５に送ら
れる。一方、通話相手の音声は、通信部１を介して音声
処理部５に入力される（Ｓ９０１）。音声処理部５は、
送られてきた音声データに対して通常行われるノイズキ
ャンセル処理やエコー除去処理などを行い、処理後の音
声データを音声解析部９に送る。First, the voice of the user himself is input to the voice input unit 8 as voice data and then sent to the voice processing unit 5. On the other hand, the voice of the other party is input to the voice processing unit 5 via the communication unit 1 (S901). The voice processing unit 5
The noise canceling process and the echo removing process that are normally performed are performed on the sent voice data, and the processed voice data is sent to the voice analysis unit 9.

【００９６】音声解析部９は、韻律や振幅、強勢等の音
声データの周波数信号を用いた処理によって特徴量を取
り出す。この特徴量には、感情ごとの相違がよく反映さ
れる基本周波数をベースとし、例えば、ＦＯmax（発話
中の基本周波数（ＦＯ）の最大値〔Ｈｚ〕）、Ａmax
（発話中の振幅の最大値〔Ｈｚ〕）、Ｔ（発話の開始か
ら終了までの時間長〔sec〕）、ＦＯinit（発話の開始
直後の基本周波数〔Ｈｚ〕）、ＦＯrange（発話中の最
大基本周波数−最小基本周波数〔Ｈｚ〕）等が用いられ
る。また、特徴量に他のパラメータである、例えば性別
差補正等を加えることもできる。The voice analysis unit 9 extracts a feature amount by processing using a frequency signal of voice data such as prosody, amplitude and stress. The feature amount is based on a fundamental frequency that well reflects differences between emotions, and for example, FOmax (maximum fundamental frequency (FO) [Hz] during utterance) [Hz], Amax.
(Maximum amplitude [Hz] during utterance), T (time length from start to end of utterance [sec]), FOinit (fundamental frequency [Hz] immediately after start of utterance), FOrange (maximum fundamental during utterance) Frequency-minimum fundamental frequency [Hz]) or the like is used. Further, it is possible to add another parameter such as gender difference correction to the feature amount.

【００９７】音声解析部９における基本周波数の抽出方
法としては、発話全体の連続性を考慮したＤＰマッチン
グによる方法を使用する。この抽出方法を簡単に説明す
ると、音声入力部８に入力された音声データは、音声解
析部９において周波数領域のデータにいったん変換され
た後、所定操作によって時間領域のデータとされる。こ
のデータからピーク値の大きい順にある一定数を選び、
このピークを所定処理によりつなぐことで基本周波数を
抽出する（Ｓ９０２）。As a method of extracting the fundamental frequency in the voice analysis unit 9, a method based on DP matching considering the continuity of the entire utterance is used. To briefly explain this extraction method, the voice data input to the voice input unit 8 is once converted into frequency domain data in the voice analysis unit 9, and then converted into time domain data by a predetermined operation. From this data, select a certain number in descending order of peak value,
The fundamental frequency is extracted by connecting the peaks by a predetermined process (S902).

【００９８】次に、感情推定部１０は、音声解析部９に
おいて取り出された特徴量に基づいた統計を算出（Ｓ９
０３）することにより各音声データがどの感情群に所属
するかを推定する（Ｓ９０４）。この感情推定方法によ
れば、高い確率で話者の感情を推定することが可能とな
る。次に、感情推定部１０は、感情推定結果を口唇動作
制御部１１、身体動作制御部１２、及び表情制御部１３
に送る。Next, the emotion estimation unit 10 calculates statistics based on the feature amount extracted by the voice analysis unit 9 (S9).
By doing so, it is estimated to which emotion group each voice data belongs (S904). According to this emotion estimation method, it is possible to estimate the emotion of the speaker with high probability. Next, the emotion estimation unit 10 uses the emotion estimation result as the lip movement control unit 11, the body movement control unit 12, and the facial expression control unit 13.
Send to.

【００９９】従って、バーチャルテレビ通話装置の画面
に表示されるキャラクタは、利用者及び通話相手の感情
を推定して動くため、より娯楽性を高めたバーチャルテ
レビ通話装置が実現される。Therefore, the character displayed on the screen of the virtual video call device moves by estimating the emotions of the user and the other party of the call, so that the virtual video call device with higher entertainment can be realized.

【０１００】そして、身体動作制御部１２では、次の動
作遷移を、送られてきた感情推定結果に対応した身体動
作データに決定し（予め定めておく）、現在３次元描画
部１４に送っている、身体動作データのアドレスとフレ
ーム番号をフレーム数分終ったら、決定した身体動作デ
ータのアドレスとフレーム番号を３次元描画部１４に送
る。身体動作データの遷移決定を確率的にコントロール
する場合は、感情推定結果に対応した遷移を起こす確率
又は起こさない確率（２項分布なので片方の確率を決め
れば必然的に残りは決まる）を決めておき、その分布に
従った乱数を用いて遷移を決定する。表情制御部１３に
ついても、同様の処理で遷移の決定処理を行い、表情動
作データのアドレスとフレーム番号を３次元描画部１４
に送る。Then, the body movement control unit 12 determines (predetermines) the next movement transition as the body movement data corresponding to the sent emotion estimation result, and sends it to the three-dimensional drawing unit 14 at present. When the address and frame number of the body motion data corresponding to the number of frames have been completed, the determined address and frame number of the body motion data are sent to the three-dimensional drawing unit 14. In order to control the transition determination of the body motion data stochastically, the probability of causing or not causing the transition corresponding to the emotion estimation result (because it is a binomial distribution, if one probability is determined, the rest is inevitably determined) is determined. Then, the transition is determined by using a random number according to the distribution. The facial expression control unit 13 also performs the transition determination process by the same process, and calculates the address and frame number of the facial expression motion data by the three-dimensional drawing unit 14.
Send to.

【０１０１】３次元描画部１４では、口唇動作制御部１
１から送られてくる前述の口唇動作データのアドレスと
フレーム番号、身体動作制御部１２から送られてくる身
体動作データのアドレスとフレーム番号、表情制御部１
３から送られてくる表情動作データのアドレスとフレー
ム番号を用いて、通話開始前の時と同様の処理によって
画像を生成し、表示部１５に送る。表示部１５は送られ
てきた画像を表示する。In the three-dimensional drawing unit 14, the lip motion control unit 1
1, the address and frame number of the above-mentioned lip motion data sent from 1, the address and frame number of the physical motion data sent from the physical motion control unit 12, the facial expression control unit 1
An image is generated using the address and frame number of the facial expression motion data sent from 3 by the same process as before the start of the call and sent to the display unit 15. The display unit 15 displays the sent image.

【０１０２】動作表情入力部１６や視点変更入力部１７
から入力があった場合は、通話開始前の時と同様にその
入力に応じた動作や表情がCGキャラクタに反映され、ま
た視点の変更が行われる。Motion facial expression input unit 16 and viewpoint change input unit 17
When there is an input from, the motion and facial expression according to the input are reflected in the CG character, and the viewpoint is changed, as in the case before the call is started.

【０１０３】同時表示モードの場合も基本的な動作は、
上記の動作と同様であるが、本人分の追加が必要になる
ことが異なる。つまり、通話開始前と開始後にデータ管
理部３から通知されるデータに本人のものが加わる。ま
た、口唇動作制御部１１、身体動作制御部１２、表情制
御部１３では、相手のCGキャラクタの動作データのアド
レスとフレーム番号の他、本人のCGキャラクタの動作デ
ータのアドレスとフレーム番号を３次元描画部１４に、
相手と本人を示す識別子とともに送る。Even in the simultaneous display mode, the basic operation is
The operation is the same as the above, except that it is necessary to add the person. That is, the person's own data is added to the data notified from the data management unit 3 before and after the start of the call. In addition, in the lip motion control unit 11, the body motion control unit 12, and the facial expression control unit 13, in addition to the address and frame number of the motion data of the opponent CG character, the address and frame number of the motion data of the CG character of the other party are three-dimensionally set. In the drawing unit 14,
Send with an identifier that indicates the other party and the person.

【０１０４】３次元描画部１４では、その識別子に基づ
いて相手のCGキャラクタの身体状態や表情、口唇状態、
本人のCGキャラクタの身体状態や表情、口唇状態を決定
して、上記同様に処理を行い画像を生成し、生成した画
像を表示部１５に送って表示する。音声処理部５から送
る音声データには、相手か本人かの識別子をつけて音声
データが音声解析部９に送られる。音声解析部９では、
上記と同様の処理を行うが、音声解析結果を相手か本人
かの識別子を付けて、口唇動作制御部１１と感情推定部
１０に送る。In the three-dimensional drawing unit 14, based on the identifier, the physical condition, facial expression, lip condition,
The physical condition, facial expression, and lip condition of the CG character of the person himself / herself are determined, the same processing is performed as described above to generate an image, and the generated image is sent to the display unit 15 and displayed. The voice data sent from the voice processing unit 5 is sent to the voice analysis unit 9 with the identifier of the partner or the person being attached. In the voice analysis unit 9,
Although the same processing as described above is performed, the voice analysis result is sent to the lip movement control unit 11 and the emotion estimation unit 10 with an identifier of the partner or the person.

【０１０５】口唇動作制御部１１は、相手か本人かの識
別子によって、相手又は本人の口唇動作の遷移や口唇動
作パターンからの口唇動作データのアドレスとフレーム
番号を決定する。感情推定部１０では、上記と同様の感
情推定を行うが、相手と本人、各々に対応した感情推定
を行い、その結果を相手か本人かの識別子と共に身体動
作制御部１２と表情制御部１３に送る。身体動作制御部
１２では、相手か本人かの識別子により、相手の身体動
作の遷移先、本人の遷移先を決定し、各々の身体動作デ
ータのアドレスとフレーム番号を識別子と共に３次元描
画部１４に送る。表情制御部１３でも同様に、相手の表
情動作の遷移先、本人の表情動作の遷移先を各々決定し
ながら、各々の表情動作データのアドレスとフレーム番
号を識別子と共に３次元描画部１４に送る。The lip motion control unit 11 determines the address and frame number of the lip motion data from the transition of the lip motion of the partner or the principal or the lip motion pattern according to the identifier of the partner or the principal. The emotion estimation unit 10 performs the same emotion estimation as described above, but performs emotion estimation corresponding to each of the other party and the person, and the result is sent to the body motion control section 12 and the facial expression control section 13 together with the identifier of the other party or the person. send. The body movement control unit 12 determines the transition destination of the body movement of the other party and the transition destination of the person according to the identifier of the other party or the person, and the address and frame number of each body movement data are sent to the three-dimensional drawing unit 14 together with the identifier. send. Similarly, the facial expression control unit 13 sends the address and frame number of each facial expression action data to the three-dimensional drawing unit 14 together with the identifier while determining the transition destination of the facial expression action of the other party and the transition destination of the facial expression action of the person.

【０１０６】なお、感情推定部１０での感情推定結果
は、会話は基本的に交互に行われるので、相手の会話内
容に対する相手と本人の感情が推定されて、その推定結
果が相手と本人のCGキャラクタの身体動作、表情動作に
反映され、次にそれを受けた本人の会話内容の感情推定
結果が、同様に、相手と本人のCGキャラクタの身体動
作、表情動作に反映されということを交互に繰り返すこ
とになる。The emotion estimation result in the emotion estimation unit 10 is that the conversation is basically alternately performed, so that the emotions of the other party and the person with respect to the conversation content of the other party are estimated, and the estimation result is obtained between the other party and the person. Alternately, it is reflected in the physical and facial expressions of the CG character, and the emotion estimation result of the conversation content of the person who received it next is also reflected in the physical and facial expressions of the CG character of the other person and himself. Will be repeated.

【０１０７】視点変更入力部１７で入力が行われたら前
述のときと同様に視点が変更された画像が生成されて表
示部１５に表示される。動作表情入力部１６について
は、本実施例では、相手の動作や表情の変更を行うため
の動作について述べたが、相手用と本人用の入力ボタン
を設けて、入力ボタンが押された際に、相手か本人かの
識別子を付ける以外は同様にデータ管理部３からの処理
を行えば、相手のCGキャラクタも本人のCGキャラクタ
も、動作表情入力部１６に応じた変更が行える。When an input is made in the viewpoint change input section 17, an image in which the viewpoint is changed is generated and displayed on the display section 15 as in the above case. Regarding the motion facial expression input unit 16, in the present embodiment, the motion of the other party and the operation for changing the facial expression have been described. However, when the input buttons for the other party and the person are provided and the input button is pressed, By performing the same processing from the data management unit 3 except that the identifier of the other party or the person is added, both the CG character of the other party and the CG character of the person can be changed according to the action facial expression input unit 16.

【０１０８】図７に、以上の音声入力から画像表示まで
の一連の動作をパイプライン化したものを示す。音声処
理部５での処理結果は音声変換出力としており、描画の
際にはダブルバッファを用いている。図７から判るよう
に、音声変換出力と表示されるCGキャラクタの口唇動作
は、表示レートで２フレーム分のディレイが生じるが、
表示レートが例えば３０フレーム／秒なら６６ms程度
で、見ためには判らない。また、感情推定結果は、音声
解析結果の保存の一定期間に１フレーム分加算した分の
ディレイが生じる。図７のように保存のための期間が３
フレームならば、４フレーム分のディレイが生じる（表
示レートが３０フレーム／秒なら１３４ms程度）。しか
し、実際の人間でも何か言われた時に、それに対する感
情が生成するまでの時間はかなり掛かるので（認識内容
にも依存するが、相手の言葉を理解した後、数100ms前
後と推定される）、このディレイは保存期間を非常に大
きくしない限り問題にならない。FIG. 7 shows a pipeline of the series of operations from the voice input to the image display. The processing result of the voice processing unit 5 is output as voice conversion, and a double buffer is used for drawing. As can be seen from FIG. 7, the lip movement of the CG character displayed as the voice conversion output is delayed by 2 frames at the display rate.
If the display rate is, for example, 30 frames / second, it is about 66 ms, which cannot be seen visually. In addition, the emotion estimation result is delayed by one frame added during a certain period of storing the voice analysis result. As shown in Fig. 7, the storage period is 3
If it is a frame, a delay of 4 frames occurs (about 134 ms if the display rate is 30 frames / sec). However, it takes a considerable amount of time for an actual person to generate emotions when they say something (it depends on the recognition content, but after understanding the other person's words, it is estimated to be around several hundred ms). ), This delay doesn't matter unless the storage period is very large.

【０１０９】（第２の実施例）以下、本発明の第２の実
施例のバーチャルテレビ通話装置について、図面を参照
しながら説明する。図２は本発明の第２の実施例におけ
るバーチャルテレビ通話装置の構成を示すものである。
通信部１０１、データダウンロード部１０２、通信デー
タ判定部１０３、キャラクタ背景選択入力部２、データ
管理部１０４、音声選択入力部４、音声処理部５、音声
変換部６、音声出力部７、音声入力部８、音声解析部
９、感情推定部１０、口唇動作制御部１１、身体動作制
御部１２、表情制御部１３、３次元描画部１４、表示部
１５、動作表情入力部１６、視点変更入力部１７、キャ
ラクタ形状データ保存部１８、キャラクタ動作データ保
存部１９、背景データ保存部２０、テクスチャデータ保
存部２１、及び音楽データ保存部２２を含む。(Second Embodiment) A virtual video call device according to a second embodiment of the present invention will be described below with reference to the drawings. FIG. 2 shows the configuration of a virtual video call device according to the second embodiment of the present invention.
Communication unit 101, data download unit 102, communication data determination unit 103, character background selection input unit 2, data management unit 104, voice selection input unit 4, voice processing unit 5, voice conversion unit 6, voice output unit 7, voice input. Unit 8, voice analysis unit 9, emotion estimation unit 10, lip motion control unit 11, body motion control unit 12, facial expression control unit 13, three-dimensional drawing unit 14, display unit 15, motion facial expression input unit 16, viewpoint change input unit. 17, a character shape data storage unit 18, a character motion data storage unit 19, a background data storage unit 20, a texture data storage unit 21, and a music data storage unit 22.

【０１１０】以上のように構成された本発明の第２の実
施例におけるバーチャルテレビ通話装置について、以
下、詳細に説明を行うが、本発明の第１の実施例とは、
CGデータのダウンロードの可能点が異なるだけなので、
CGデータのダウンロードの動作についてのみ説明する。The virtual video call apparatus according to the second embodiment of the present invention configured as described above will be described in detail below. The first embodiment of the present invention is
Only the points of downloading CG data are different, so
Only the operation of downloading CG data will be described.

【０１１１】本実施例では、ダウンロードするデータ
は、CGキャラクタデータ（形状データ、衣類のテクスチ
ャデータ、表情パターンデータと表情動作データ、口唇
動作パターンデータと口唇動作データ、サムネール画像
データ）、身体動作パターンデータと身体動作データ、
背景データ、音楽データであるが、各々のデータを個別
にダウンロードする場合も同様に行える。In this embodiment, the data to be downloaded are CG character data (shape data, clothing texture data, facial expression pattern data and facial expression movement data, lip movement pattern data and lip movement data, thumbnail image data), body movement patterns. Data and physical activity data,
Although it is background data and music data, it can be similarly performed when each data is individually downloaded.

【０１１２】データダウンロード部１０２から、通信部
１０１を介してデータ保存用のサーバにアクセスする。
このアクセスは、通常の携帯電話でのダウンロード時
や、パーソナルコンピュータでのダウンロード時に行わ
れるものと同じである。例えば、IPアドレスによってサ
ーバを特定して、サーバマシンにアクセスを通知し、TC
P/IPプロトコルによる手続きを行えばよい。The data download unit 102 accesses the data storage server via the communication unit 101.
This access is the same as that performed when downloading with a normal mobile phone or downloading with a personal computer. For example, specify the server by IP address, notify the server machine of access, and
The procedure according to the P / IP protocol may be performed.

【０１１３】次に、サーバに保存されている上記のデー
タのリストをhttpやftpプロトコルで送信してもらい、
送信結果をデータダウンロード部１０２で受け取る。利
用者はそのリストの中からダウンロードしたいデータを
選択する。例えば、リストを通信部１０１介して、通信
データ判定部１０３に送り、通信データ判定部１０３で
はそのデータがリストの中にあることを判別して、デー
タ管理部１０４を介して３次元描画部１４に送る。３次
元描画部１４では、そのリストを画像化して表示部１５
に送って表示することで利用者はその内容を確認でき
る。Next, ask the list of the above data stored in the server to be transmitted by http or ftp protocol,
The data download unit 102 receives the transmission result. The user selects the data to download from the list. For example, the list is sent to the communication data determination unit 103 via the communication unit 101, the communication data determination unit 103 determines that the data is in the list, and the three-dimensional drawing unit 14 via the data management unit 104. Send to. The three-dimensional drawing unit 14 converts the list into an image and displays it in the display unit 15.
The user can confirm the content by sending it to and displaying it.

【０１１４】利用者のデータ選択は、データダウンロー
ド部１０２を介して行う。選択したデータの名前又は識
別子を通信部１０１によって上述のプロトコルの規約に
従ってサーバに送信する。サーバ側では、選択されたデ
ータのファイルを上述のプロトコルの規約に従って通信
部１０１に送信し、通信データ判定部１０３により、通
信内容がデータファイルであることを判別して、データ
管理部１０４に送る。The user's data selection is performed via the data download unit 102. The communication unit 101 transmits the name or identifier of the selected data to the server according to the protocol of the above-mentioned protocol. On the server side, the file of the selected data is transmitted to the communication unit 101 in accordance with the protocol of the protocol described above, the communication data determination unit 103 determines that the communication content is a data file, and sends it to the data management unit 104. .

【０１１５】データ管理部１０４では、そのデータが、
CGキャラクタデータか、身体動作パターンデータと身体
動作データか、背景データか音楽データかの判別とデー
タサイズの特定を行う。この判別は、データダウンロー
ド部１０２での選択結果が通信部１０１、通信データ判
定部１０３を介してデータ管理部１０４に通知される場
合は、事前に判っているので必要ない。次に、データ管
理部１０４はそのデータ内容に応じて、キャラクタ形状
データ保存部１８、キャラクタ動作データ保存部１９、
背景データ保存部２０、テクスチャデータ保存部２１、
音楽データ保存部２２に、保存のための空き領域の問い
合わせを行い、空き領域がある場合にはデータのファイ
ルを該当保存部に送る。該当保存部ではデータのファイ
ルを保存して、保存したアドレスをデータ管理部１０４
に送る。In the data management unit 104, the data is
CG character data, physical motion pattern data and physical motion data, background data or music data are discriminated and the data size is specified. This determination is not necessary when the selection result in the data download unit 102 is notified to the data management unit 104 via the communication unit 101 and the communication data determination unit 103, since it is known in advance. Next, the data management unit 104, according to the data content, the character shape data storage unit 18, the character motion data storage unit 19,
Background data storage unit 20, texture data storage unit 21,
The music data storage unit 22 is inquired about a free space for storage, and if there is a free space, a data file is sent to the storage unit. The corresponding storage unit stores the data file, and stores the stored address in the data management unit 104.
Send to.

【０１１６】データ管理部１０４は、データの内容に応
じて、管理テーブルに保存すべきデータを管理テーブル
に追加する。例えば、図３のCGキャラクタデータの場合
は、CGキャラクタIDとして４を追加し、該当する欄に保
存部から返ってきたアドレスを記入する。他のデータの
場合も同様である。管理テーブルの追記が完了したら、
完了の通知を通信データ判定部１０３、通信部１０１を
介してデータダウンロード部１０２に送り、データダウ
ンロード終了を通信部１０１を介してサーバに送ってダ
ウンロード処理は終了する。The data management unit 104 adds the data to be stored in the management table to the management table according to the contents of the data. For example, in the case of the CG character data of FIG. 3, 4 is added as the CG character ID, and the address returned from the storage unit is entered in the corresponding column. The same applies to other data. After adding the management table,
The notification of completion is sent to the data download unit 102 via the communication data determination unit 103 and the communication unit 101, and the data download end is sent to the server via the communication unit 101, and the download processing ends.

【０１１７】データ保存領域がないときは、データ保存
領域がないことを通信データ判定部１０３、通信部１０
１を介してデータダウンロード部１０２に通知する。デ
ータダウンロード部１０２は保存領域がないことを利用
者に通知（上記のように表示部１５に表示したりなど）
して、ダウンロード処理を上記と同様に、完了の通知を
通信データ判定部１０３、通信部１０１を介してデータ
ダウンロード部１０２に送り、データダウンロード終了
を通信部１０１を介してサーバに送ってダウンロード処
理は終了する。If there is no data storage area, it is determined that there is no data storage area.
The data download unit 102 is notified via 1. The data download unit 102 notifies the user that there is no storage area (such as displaying on the display unit 15 as described above).
Then, in the same manner as above, the download processing is notified of completion by sending a notification of completion to the data download section 102 via the communication data determination section 103 and the communication section 101, and sending a data download end to the server via the communication section 101 to download processing. Ends.

【０１１８】なお、音声データの通信時には、通信デー
タ判定部１０３が音声データであることを判定して、音
声処理部５に送る。なお、本発明の第１、第２の実施例
は、音声通信部、表示部、音声入出力部、中央演算装置
とメモリを持った装置に対するプログラムとして実現が
可能である。例えば、携帯電話、ポケットコンピュー
タ、表示装置付の据え置き型の電話機、通信機能付の車
載端末器、パーソナルコンピュータなどである。但し、
専用の３次元処理装置や音声入出力装置、音声処理装置
を有した方が処理を高速化できる。パーソナルコンピュ
ータの場合であれば、３次元グラフィックスボードとサ
ウンドブラスターボードを有したものを用いると効果的
である。また、表示部１５は、CRT、液晶、有機ELなど
を用いることができ、その種類を問わない。At the time of communication of voice data, the communication data determination unit 103 determines that the data is voice data and sends it to the voice processing unit 5. The first and second embodiments of the present invention can be implemented as a program for a device having a voice communication unit, a display unit, a voice input / output unit, a central processing unit and a memory. For example, a mobile phone, a pocket computer, a stationary telephone with a display device, a vehicle-mounted terminal device with a communication function, a personal computer, and the like. However,
The processing can be speeded up by having a dedicated three-dimensional processing device, a voice input / output device, and a voice processing device. In the case of a personal computer, it is effective to use one having a three-dimensional graphics board and a sound blaster board. The display unit 15 can use CRT, liquid crystal, organic EL, or the like, and its type does not matter.

【０１１９】図８（ａ）、図８（ｂ）は本発明のバーチ
ャルテレビ通信の概観図を示したもので、以上の構成に
よって選択した受信相手に対応するCGキャラクタを表示
してCGキャラクタとの会話が楽しめる。また、利用者本
人も同時表示して仮想空間での会話が楽しめるものあ
る。なお、設定時の動作は、通話開始前、通話開始後
でも、動作可能である。FIGS. 8 (a) and 8 (b) are schematic views of the virtual television communication of the present invention. The CG character corresponding to the recipient selected by the above configuration is displayed as a CG character. Enjoy the conversation. In addition, some users can simultaneously display and enjoy conversation in a virtual space. The operation at the time of setting can be performed before and after the call starts.

【０１２０】また、図１０（ａ）は、本発明のバーチャ
ルテレビ通話機能を備えるパーソナルコンピュータ（以
下ＰＣと記す）１００１を示す図であり、スピーカー１
００２及びマイク１００３を備えている。FIG. 10A is a diagram showing a personal computer (hereinafter referred to as a PC) 1001 having a virtual video call function according to the present invention.
002 and a microphone 1003.

【０１２１】利用者は、自身又は通話相手の少なくとも
一方のキャラクタを選択して通話を開始すると、感情推
定部１０は通話中の音声に基づいて感情が推定される。
この感情推定に従って、画面１００４に表示されるCGキ
ャラクタが動作や表情を変化させるため、より娯楽性を
有するバーチャルテレビ通話装置とできる。また、ＰＣ
１００１の利用者は、相手のキャラクタや声色を自由に
選択することができるため、例えば、上司の設定におい
て背景設定を森、キャラクタ設定を熊、音声をかわいく
する等、エンターテインメント性を高めたバーチャルテ
レビ通話機能を備えたＰＣ１００１とできる。When the user selects at least one character of himself or the other party to start a call, the emotion estimation unit 10 estimates the emotion based on the voice during the call.
In accordance with this emotion estimation, the CG character displayed on the screen 1004 changes its motion and facial expression, so that the virtual television call device can be made more entertaining. Also, PC
Since the user of 1001 can freely select the character and voice of the other party, for example, in the setting of the boss, a virtual TV with enhanced entertainment, such as forest setting background setting, bear setting character setting, and cute voice setting. The PC 1001 having a call function can be used.

【０１２２】図１０（ｂ）は、本発明のバーチャルテレ
ビ通話機能を備える携帯電話１００５を示す図であり、
この携帯電話１００５はハンズフリー機能を備え、選択
されたキャラクタは、感情推定された動作を行いながら
画面１００６に表示される。従って、エンターテインメ
ント性を高めたバーチャルテレビ通話機能を備えた携帯
電話１００５とできる。FIG. 10B is a diagram showing a mobile phone 1005 having a virtual video call function of the present invention.
The mobile phone 1005 has a hands-free function, and the selected character is displayed on the screen 1006 while performing the motion for which the emotion is estimated. Therefore, a mobile phone 1005 having a virtual video call function with enhanced entertainment can be obtained.

【０１２３】また、本発明の感情推定機能を向上させる
ために、バーチャルテレビ通話装置に新たなセンサ部を
付け加えることも可能である。図１１は、図１又は図２
におけるバーチャルテレビ通話装置の機能ブロック図に
センサ部１１０１を加えたブロック図を示す。このセン
サ部１１０１は、利用者の体温や心拍、携帯機器を握る
握力等の変化を検知して、感情推定部１０に変化を伝え
るための処理部となる。例えば、センサ部１１０１は、
サーミスタにより利用者の体温の変化を検知して、感情
推定部１０に結果を渡すと、感情推定部１０は、新たな
感情推定のパラメータである体温変化を用いてより確実
に感情推定を行うことが考えられる。Further, in order to improve the emotion estimation function of the present invention, it is possible to add a new sensor unit to the virtual video call device. FIG. 11 corresponds to FIG. 1 or FIG.
A block diagram in which a sensor unit 1101 is added to the functional block diagram of the virtual video call device in FIG. The sensor unit 1101 serves as a processing unit that detects changes in the body temperature and heartbeat of the user, grip strength of a portable device, and the like, and notifies the emotion estimation unit 10 of the changes. For example, the sensor unit 1101 is
When the change in the body temperature of the user is detected by the thermistor and the result is passed to the emotion estimation unit 10, the emotion estimation unit 10 can more reliably perform the emotion estimation using the body temperature change that is a new parameter for emotion estimation. Can be considered.

【０１２４】そして、図１２（ａ）は、感情推定のため
に各種センサ部を備える携帯電話の使用例を示す図であ
り、利用者の握力変化を検知する握力測定部１２０１を
備えている。図１２（ｂ）は、感情推定のために各種セ
ンサ部を備える携帯電話を示す参考図であり、握力測定
部１２０１及び利用者の体温変化を測定するためのサー
ミスタ１２０２を備えるものである。従って、前記音声
データ以外の新たなパラメータを用いて、より確実な感
情推定機能を行うことが考えられる。FIG. 12A is a diagram showing an example of use of a mobile phone equipped with various sensor units for emotion estimation, which is provided with a grip strength measurement unit 1201 for detecting changes in the grip strength of the user. FIG. 12B is a reference diagram showing a mobile phone including various sensor units for emotion estimation, which includes a grip strength measuring unit 1201 and a thermistor 1202 for measuring a change in user's body temperature. Therefore, it is possible to perform a more reliable emotion estimation function by using a new parameter other than the voice data.

【０１２５】尚、本発明は上述した各実施例に限定され
るものではなく、その利用可能な範囲において実施でき
るものであり、上記実施例においては利用者と通信相手
との少なくとも一方のキャラクタを画面に表示したバー
チャルテレビ通話装置として説明したが、例えば、ＰＣ
通信等で多人数が集まる通信において感情推定を行い、
感情推定を伴う多数のキャラクタを画面に表示するバー
チャルテレビ通話装置とすることも考え得る。The present invention is not limited to the above-mentioned respective embodiments, but can be carried out within the range in which it can be used. In the above-mentioned embodiments, at least one character of the user and the communication partner is used. Although the virtual video call device displayed on the screen has been described, for example, a PC
Emotional estimation is performed in the communication where a large number of people gather through communication,
It can be considered that the virtual video call device displays a large number of characters with emotion estimation on the screen.

【０１２６】また、感情推定の結果を音楽データに反映
させて、暗い、明るい、楽しい、リズミカル等の音楽を
出力してCGキャラクタの表情動作や身体動作の制御を行
うことも考えられる。It is also conceivable to reflect the result of emotion estimation in music data and output dark, bright, fun, rhythmic or other music to control the facial expression motion and physical motion of the CG character.

【０１２７】[0127]

【発明の効果】以上の構成により、本発明は、通信相手
を受話者が選択した仮想の３次元CGキャラクタとして表
示し、通信相手の会話を利用することで、仮想の３次元
CGキャラクタと音声会話が行える。これにより、「通信
相手の顔が見られる、又は、それに類似した映像が見ら
れる」、「架空のキャラクタになりすます」という機能
とは別の方法で、音声会話の娯楽性を高めた新たな通信
端末の実現が可能となる。また、本発明は、上記従来技
術のようにサーバのような装置を用いない、仮想空間で
の会話を実現する表示装置付の通話装置が実現できる。
また、ダウンロードが可能なので、CGデータを新たなも
のに更新できる。話し相手が同じ人でも、CGキャラクタ
を交換したり、音声変換によって音声を変更すること
で、様々なCGキャラクタとの会話が楽しめる。As described above, according to the present invention, the communication partner is displayed as a virtual three-dimensional CG character selected by the listener, and the conversation of the communication partner is used to generate a virtual three-dimensional character.
You can have a voice conversation with a CG character. As a result, a new type of communication that enhances the entertainment of voice conversations is provided in a different method from the functions of "You can see the face of the communication partner or a video similar to it" and "Spoof a fictional character". A terminal can be realized. Further, the present invention can realize a communication device with a display device that realizes a conversation in a virtual space without using a device such as a server as in the above-described conventional technique.
Also, since it can be downloaded, the CG data can be updated with new data. You can enjoy conversations with various CG characters by exchanging CG characters or changing voices by voice conversion even if the person you talk to is the same.

【０１２８】また、受話者側が自らのキャラクタ及び通
話相手のキャラクタを選択することができると共に、感
情推定機能を用いて通話中の会話に適した感情表現をキ
ャラクタが行うという娯楽性を高めた新たなバーチャル
テレビ通話装置となる。In addition, the listener can select his or her own character and the character of the other party of the call, and the character is capable of expressing emotions suitable for the conversation during the call by using the emotion estimation function. It becomes a virtual video call device.

【０１２９】以上のことより、本発明による効果は絶大
で、音声会話装置による会話に新たな楽しみと喜びをも
たらすものと考える。From the above, it is considered that the effect of the present invention is great and brings a new enjoyment and joy to the conversation by the voice conversation device.

[Brief description of drawings]

【図１】本発明の第１の実施例によるバーチャルテレビ
通話装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a virtual video call device according to a first embodiment of the present invention.

【図２】本発明の第２の実施例によるバーチャルテレビ
通話装置の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a virtual video call device according to a second embodiment of the present invention.

【図３】本発明のCGキャラクタデータ管理テーブルとCG
キャラクタ選択画面の説明図である。FIG. 3 is a CG character data management table and CG of the present invention.
It is an explanatory view of a character selection screen.

【図４】（ａ）本発明の通信管理テーブル、CGデータ管
理テーブル、音声管理テーブルの説明図である。（ｂ）本発明の設定時の動作の流れ図である。FIG. 4A is an explanatory diagram of a communication management table, a CG data management table, and a voice management table according to the present invention. (B) It is a flowchart of the operation | movement at the time of setting of this invention.

【図５】（ａ）本発明の音強度解析と口唇動作の説明図
である。（ｂ）本発明の音素解析と口唇動作の説明図である。FIG. 5 (a) is an explanatory diagram of sound intensity analysis and lip movement of the present invention. (B) It is explanatory drawing of the phoneme analysis and lip movement of this invention.

【図６】（ａ）本発明の表情動作の遷移の説明図であ
る。（ｂ）本発明の身体動作の遷移の説明図である。FIG. 6 (a) is an explanatory diagram of transition of facial expression motion according to the present invention. (B) It is explanatory drawing of the transition of the body motion of this invention.

【図７】本発明のパイプライン処理とディレイに関する
説明図である。FIG. 7 is an explanatory diagram relating to pipeline processing and delay of the present invention.

【図８】（ａ）本発明のバーチャルテレビ通信の概観図
を示したものである。（ｂ）本発明のバーチャルテレビ通信の概観図を示した
ものである。FIG. 8 (a) is a schematic view of virtual television communication according to the present invention. (B) A schematic view of the virtual television communication of the present invention is shown.

【図９】周波数信号を用いた感情推定方法の処理手順を
示すフローチャートである。FIG. 9 is a flowchart showing a processing procedure of an emotion estimation method using a frequency signal.

【図１０】（ａ）本発明の第１及び第２の実施例の他の
利用形態を示す参考図である。（ｂ）本発明の第１及び第２の実施例の他の利用形態を
示す参考図である。FIG. 10 (a) is a reference diagram showing another mode of use of the first and second embodiments of the present invention. (B) It is a reference drawing showing another mode of use of the first and second embodiments of the present invention.

【図１１】本発明のバーチャルテレビ通話装置の機能ブ
ロック図にセンサ部を加えたブロック図を示す。FIG. 11 is a block diagram in which a sensor unit is added to the functional block diagram of the virtual video call device of the present invention.

【図１２】（ａ）感情推定のために各種センサ部を備え
る携帯電話の使用例を示す図である。（ｂ）感情推定のために各種センサ部を備える携帯電話
を示す参考図である。FIG. 12A is a diagram showing a usage example of a mobile phone including various sensor units for emotion estimation. (B) It is a reference diagram showing a mobile phone including various sensor units for emotion estimation.

[Explanation of symbols]

１通信部２キャラクタ背景入力部３データ管理部４音声選択入力部５音声処理部６音声変換部７音声出力部８音声入力部９音声解析部１０感情推定部１１口唇動作制御部１２身体動作制御部１３表情制御部１４３次元描画部１５表示部１６動作表情入力部１７視点変更入力部１８キャラクタ形状データ保存部１９キャラクタ動作データ保存部２０背景データ保存部２１テクスチャデータ保存部２２音楽データ保存部１０１通信部１０２通信データ判定部１０３データダウンロード部１０４データ管理部 1 Communication unit 2 Character background input section 3 Data management department 4 Voice selection input section 5 Voice processing unit 6 Voice converter 7 Audio output section 8 voice input section 9 Speech analysis section 10 emotion estimation part 11 Lip motion controller 12 Physical movement control unit 13 Expression control unit 14 3D drawing unit 15 Display 16 Motion facial expression input section 17 Viewpoint change input section 18 Character shape data storage 19 Character motion data storage 20 Background data storage 21 Texture data storage 22 Music data storage 101 Communication unit 102 communication data determination unit 103 Data download section 104 Data Management Department

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 3/00 ５５１Ｇ (72)発明者樋尻利紀大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者大谷尚毅大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者中俊弥大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者山本剛司神奈川県横浜市港北区綱島東四丁目３番１号松下通信工業株式会社内 (72)発明者浅原重夫大阪府門真市大字門真1006番地松下電器産業株式会社内Ｆターム(参考） 5B050 AA08 BA08 BA09 BA12 CA08 EA24 EA28 FA02 FA05 5D015 AA05 AA06 KK02 5E501 AC16 BA17 CA08 CB02 CB03 CB09 CB14 CB15 DA11 EA21 EB05 FA14 FA27 FA32 ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G10L 3/00 551G (72) Inventor Riki Hijiri 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Inventor Naoki Otani 1006, Kadoma, Kadoma-shi, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Toshiya Naka Naka, 1006, Kadoma, Kadoma-shi, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Goji Yamamoto, Kanagawa Kanagawa Matsushita Communication Industrial Co., Ltd., 3-1, Tsunashimahigashi, 3-chome, Kohoku-ku, Yokohama, Japan (72) Inventor Shigeo Asahara 1006, Kadoma, Kadoma, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. F-term (reference) 5B050 AA08 BA08 BA09 BA12 CA08 EA24 EA28 FA02 FA05 5D015 AA05 AA06 KK02 5E501 AC16 BA17 CA08 CB02 CB03 CB09 CB14 CB15 DA11 EA21 EB05 FA14 FA27 FA32

Claims

[Claims]

1. A communication means for performing voice communication, a character selection means for selecting CG character shape data of at least one of a user himself or a communication partner, and a voice input means for inputting a voice of the user himself. The voice output means for outputting the voice of the communication partner, the voice data of the communication partner received by the communication means or both the voice data of the communication partner received and the voice data of the user himself / herself input by the voice input means. A voice analysis unit that performs voice analysis on the other hand, an emotion estimation unit that estimates the emotional state of the communication partner or the communication partner and the user himself using the voice analysis result of the voice analysis unit, and control of the motion of the CG character Motion control means for performing the motion estimation based on the emotion estimation means, and motion data generated based on the CG character shape data and the control information of the motion control means. Virtual video call device, it characterized in that it comprises a drawing means for generating an image by performing the drawing process had, and display means for displaying the image generated by the drawing means.

2. The emotion estimating means notifies the motion control means of the estimation result of the emotion estimating means, and the motion control means specifies the motion data based on the notification result. The virtual video call device according to claim 1.

3. The motion control means includes a lip motion control means for generating lip motion control information of the CG character data based on a voice analysis result of the voice analysis means, and the drawing means has the CG character shape. 2. The virtual video call apparatus according to claim 1, wherein an image is generated by performing a drawing process using the lip movement data generated based on the data and the control information of the lip movement control means.

4. The emotion estimation means notifies the lip movement control means of the estimation result of the emotion estimation means, and the lip movement control means specifies the lip movement data based on the notification result. The virtual video call device according to claim 3, which is characterized.

5. The movement control means includes a body movement control means for controlling a body movement of the CG character, and the drawing means is a body movement based on the body movement control information generated by the body movement control means. The virtual video call device according to claim 1, wherein drawing processing is performed using data.

6. The emotion estimation means notifies the physical movement control means of the estimation result of the emotion estimation means, and the physical movement control means specifies the physical movement data based on the notification result. The virtual video call device according to claim 5, which is characterized.

7. The virtual video call device further comprises a body motion pattern data selecting unit that defines a specific body motion, and the body motion controlling unit sets the body motion pattern data selected by the selecting unit. The virtual video call device according to claim 5, wherein the physical control is performed based on the physical control.

8. The motion control means includes a facial expression control means for controlling a facial expression motion of the CG character, and the drawing means stores facial expression motion data based on the facial expression motion control information generated by the facial expression control means. The virtual video call device according to claim 1, wherein the drawing process is performed using the virtual video call device.

9. The emotion estimating means notifies the facial expression control means of the estimation result of the emotion estimating means, and the facial expression control means specifies the facial expression motion data based on the notification result. The virtual video call device according to claim 8.

10. The virtual video call device further comprises voice conversion means for converting the received voice of the communication partner into another voice, according to any one of claims 1 to 9. Virtual video call device.

11. The virtual video call device further comprises voice selection input means capable of selecting the voice quality when converting the voice of the communication partner received by the voice conversion means into another voice. The virtual video call device according to claim 10.

12. The drawing means generates an image of a CG character of the communication partner when the communication is received from the communication partner, and the display means displays the image of the CG character when the communication is received and before voice communication is started. Is displayed to indicate a voice communication waiting state.

13. The voice output means outputs music data corresponding to each of the communication partners when the communication is received from the communication partner to indicate a voice communication waiting state. The virtual video call device according to item 1.

14. The virtual video call device according to claim 1, wherein the drawing means performs a drawing process using background data to generate an image.

15. The virtual video call device according to claim 14, wherein the virtual video call device further includes a background selection unit capable of selecting background data.

16. The drawing means performs a three-dimensional drawing process to generate a three-dimensional image.
16. The virtual video call device according to any one of 1 to 15.

17. The virtual video call device comprises means for storing lip motion data and means for downloading lip motion data from an external device and storing it in the storage means. Alternatively, the virtual video call device according to item 4.

18. The virtual video call device comprises means for storing lip movement pattern data, and means for downloading lip movement pattern data from an external device and storing it in the storage means. Item 15. The virtual video call device according to Item 17.

19. The virtual video call device comprises means for storing physical movement data, and means for downloading the physical movement data from an external device and storing it in the storage means. Or claim 6
The virtual video call device described in.

20. The virtual video call device comprises means for storing physical movement pattern data, and means for downloading physical movement pattern data from an external device and storing it in the storage means. The virtual video call device according to item 7.

21. The virtual video call device comprises a storage unit for facial expression motion data, and a unit for downloading the facial expression motion data from an external device and storing it in the storage unit. Alternatively, the virtual video call device according to item 9.

22. The virtual video call device comprises: storage means for storing facial expression pattern data; and means for downloading the facial expression pattern data from an external device and storing it in the storing means. The described virtual video call device.

23. The virtual video call device comprises: means for storing music data; and means for downloading music data from an external device and storing it in the storage means. Virtual video call device.

24. The virtual video call device comprises: background data storage means; and means for downloading the background data from an external device and storing the background data in the storage means. The described virtual video call device.

25. The virtual video call device is a CG
25. The clothing texture data of a character is stored, and the clothing texture data of a CG character is downloaded from an external device, and means for storing the data in the storage means is provided. The virtual video call device described in.

26. The virtual video call device is a CG
Character shape data storage means and CG from external device
26. The virtual video call device according to claim 1, further comprising means for downloading the character shape data and storing it in the storage means.

27. The virtual video call device is a CG
2. The virtual video call apparatus according to claim 1, further comprising a display mode selection unit that determines whether or not to display a character.

28. The display mode is a communication partner display mode in which only the CG character of the communication partner is displayed, a simultaneous display mode in which CG characters of the communication partner and the user himself are displayed, and no CG character is displayed. 28. The virtual video call device according to claim 27, which is in one of the display modes.

29. The virtual video call device is a CG
8. The virtual video call device according to claim 5, further comprising means capable of controlling designation and start of a physical action of the character.

30. The virtual video call device is a CG
10. The virtual video call device according to claim 8, further comprising means for controlling designation and start of facial expression motion of the character.

31. The virtual video call device is a CG
31. The virtual video call device according to claim 1, further comprising a viewpoint changing unit for displaying a character from a viewpoint direction according to a user's intention.

32. A virtual video call system for making a call between at least a call device between a user and a communication partner, wherein the virtual video call system comprises at least a user's call device and a communication device of a communication partner. The communication device includes a communication means for performing voice communication, a character selection means for selecting CG character shape data of at least one of the user himself or the communication partner, and a voice input means for inputting the voice of the user himself. A voice output means for outputting the voice of the communication partner, voice data of the communication partner received by the communication means or voice data of the received communication partner and voice data of the user himself / herself input by the voice input means. A voice analysis means for performing voice analysis on both, and a voice analysis result of the voice analysis means are used to communicate with the communication partner or the communication partner. Emotion estimation means for estimating the emotional state of the user himself, movement control means for controlling the movement of the CG character based on the emotion estimation means, based on the CG character shape data and control information of the movement control means A virtual video call system comprising: a drawing unit that performs an image drawing process by using the operation data generated by the above process to generate an image; and a display unit that displays the image generated by the drawing unit.

33. The emotion estimating means notifies the motion control means of the estimation result of the emotion estimating means, and the motion control means specifies the motion data based on the notification result. The virtual video call system according to claim 32.

34. A program for performing a virtual video call between a communication partner device and its own device by performing communication between at least a communication partner and a user, the communication step of performing voice communication, and the user himself / herself. Alternatively, a character selection step of selecting CG character shape data of at least one of the communication partners, a voice input step of inputting the voice of the user himself, a voice output step of outputting the voice of the communication partner, the communication step In the voice analysis step of performing voice analysis on both the voice data of the communication partner received in step 1 or both the voice data of the received communication partner and the voice data of the user himself / herself input in the voice input step; An emotion estimation step that estimates the emotional states of the communication partner or the communication partner and the user himself using the voice analysis result. A motion control step of controlling the motion of the CG character based on the emotion estimation step; and a drawing process using motion data generated based on the CG character shape data and control information of the motion control step. A program, comprising: a drawing step of performing an image to generate an image; and a display step of displaying the image generated in the drawing step.

35. The program according to claim 34, wherein the emotion estimation step specifies the motion data based on the estimation result obtained in the emotion estimation step.