JPH10271470A

JPH10271470A - Image/voice communication system and video telephone transmission/reception method

Info

Publication number: JPH10271470A
Application number: JP9070062A
Authority: JP
Inventors: Seiichiro Tabata; 誠一郎田端; Hiromasa Kobayashi; 裕昌小林; Hisami Kikuchi; 久美菊池
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 1997-03-24
Filing date: 1997-03-24
Publication date: 1998-10-09
Anticipated expiration: 2017-03-24
Also published as: JP3771989B2; US6313864B1

Abstract

PROBLEM TO BE SOLVED: To provide the image/voice communication system and the video telephone transmission/reception method where a simple and inexpensive device is used and a motion of a face or expression of the face of a talker is transmitted in real time without causing trouble to the user. SOLUTION: A talker wears a head mounted display device(HMD) 1 and sets an optional character image that is not necessarily its own face image before the communication start by controlling its own controller pad 3 at a video image generating box 2. In the case of communication, after a character image is sent/received, the HMD 1 detects expression of the talker in an actual interactive stage, converts a change of the expression into a prescribed code and only the code is sent/received in real time to move the character image displayed on a monitor of an HMD 1 of the opposite party.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、画像音声通信シス
テムおよびテレビ電話送受信方法、詳しくは、話者の音
声を通信相手側に伝送する際に、該話者の顔等の画像ま
たはこれに代わる画像を通信相手側に伝え、相手の顔等
を確認しながら会話を行う、画像音声通信システムおよ
びテレビ電話送受信方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video / audio communication system and a videophone transmission / reception method, and more particularly, to transmitting an image of a speaker's face or the like when transmitting a speaker's voice to a communication partner. The present invention relates to a video / audio communication system and a videophone transmission / reception method in which an image is transmitted to a communication partner and a conversation is performed while confirming the partner's face and the like.

【０００２】[0002]

【従来の技術】従来、話者の音声を通信相手側に伝送す
る際に、該話者の顔等の画像を同時に通信相手側に伝
え、相手の顔等を確認しながら会話を行う、いわゆるテ
レビ電話システムは種々の方式が知られるところにあ
る。これらのテレビ電話システムの多くは既存の電話回
線を利用するものであり、話者の音声データ信号ととも
に顔画像等の画像データ信号を疑似的双方向に略同時に
通信相手側に伝送するものである。しかしながら、顔画
像データを動画像としてそのままの状態で伝送しようと
すると、その情報量の多さ故、既存の電話回線を使用す
る伝送形態をとる限り困難が伴った。2. Description of the Related Art Conventionally, when a speaker's voice is transmitted to a communication partner, an image such as the face of the speaker is simultaneously transmitted to the communication partner, and a conversation is performed while confirming the face of the partner. There are various types of video telephone systems known. Many of these video telephone systems use existing telephone lines, and transmit image data signals such as face images together with voice data signals of a speaker to a communication partner side in a pseudo two-way manner at substantially the same time. . However, if face image data is to be transmitted as it is as a moving image, it is difficult to transmit the image data as long as the existing telephone line is used.

【０００３】このような事情により、電話回線等、伝送
容量の小さい伝送路にもなじむことを目的とし、従来、
単位時間あたりの伝送情報量がより少なくてすむよう
に、静止画像を細切れに伝送するテレビ電話方式が採用
されている。[0003] Under such circumstances, the purpose is to adapt to transmission lines having a small transmission capacity, such as telephone lines.
A videophone system for transmitting a still image in small pieces has been adopted so that the amount of transmission information per unit time can be reduced.

【０００４】しかしながら、このようなテレビ電話シス
テムでは、動画像をリアルタイムに正確に伝送すること
が困難であり、このため、自然な顔画像を相手側に伝送
することができず、結果として顔の表情がぎこちないも
のとなっていた。However, in such a video telephone system, it is difficult to accurately transmit a moving image in real time, and therefore, a natural face image cannot be transmitted to the other party, and as a result, a face image cannot be transmitted. The expression was awkward.

【０００５】このような問題点を解消する技術手段とし
て、近年、コンピュータ・グラフィック（ＣＧ）技術を
用いた通信会議システムが、たとえば特開平７−３８８
７３号公報において提案されている。以下、該通信会議
システムで用いられる技術手段について簡単に説明す
る。As a technical means for solving such a problem, a communication conference system using a computer graphic (CG) technique has recently been proposed, for example, in Japanese Patent Laid-Open No. 7-388.
No. 73 has proposed this. Hereinafter, technical means used in the communication conference system will be briefly described.

【０００６】この技術手段においては、まず予め会議に
参加する者の顔像の凹凸等の形状情報や色彩情報をレー
ザースキャナ等を用いて計測して取り込むとともに、デ
ジタルカメラ等を用いて顔画像の情報を取り込む。そし
て、上記形状情報にもとづいて３Ｄのポリゴンデータに
変換し、各参加者のワイヤーフレームモデルを作成す
る。In this technical means, first, shape information and color information such as unevenness of a face image of a participant of a conference are measured and taken in using a laser scanner or the like, and the face image is taken out using a digital camera or the like. Capture information. Then, the data is converted into 3D polygon data based on the shape information, and a wire frame model of each participant is created.

【０００７】また、会議を行う際には、これら各参加者
の顔にマーカーを貼り、さらに頭、腕および身体にその
動きを検出するセンサを取り付ける。そして、各参加者
の近傍、たとえば各参加者が装着するヘッドギア等にそ
れぞれ設置されたカメラで上記顔に貼ったマーカーの動
きを検出することで顔の動きを検出し、また、頭、腕お
よび身体に取り付けたセンサによりこれら各部の動きを
検出する。When a conference is held, a marker is attached to each participant's face, and sensors for detecting the movement are attached to the head, arms and body. Then, the movement of the marker attached to the face is detected by a camera installed in the vicinity of each participant, for example, headgear or the like worn by each participant, and the movement of the face is detected. The movement of each of these parts is detected by a sensor attached to the body.

【０００８】次に、上述した各部位の動きデータに基づ
いて上述したように予め作成されているワイヤーフレー
ムモデルをリアルタイムに変形させる。そして、このワ
イヤーフレームモデルに予め取り込んでおいた色彩を張
り付けて対応する参加者のグラフィック像を完成させ
る。その後、この完成された参加者のグラフィック像を
該参加者の動きに合わせてリアルタイムでスクリーンに
表示させる。これにより、会議の参加者はこのスクリー
ン表示をモニタすることで他の参加者の顔の表情等を認
識しつつ話し合い等を行うことができるようになってい
る。Next, the wire frame model created in advance as described above is deformed in real time based on the above-mentioned motion data of each part. Then, a color image previously taken in is attached to the wire frame model to complete a graphic image of the corresponding participant. Then, the graphic image of the completed participant is displayed on the screen in real time in accordance with the movement of the participant. This allows the participants of the conference to have a conversation while monitoring the screen display while recognizing the facial expressions and the like of the other participants.

【０００９】このような方式だと、多大なデータ量を必
要とする画像データは予め取り込まれており、リアルタ
イムに変化するデータ量は少なくてすむため、既存の電
話回線等、伝送容量の小さい伝送路を使用するテレビ電
話システムにおいても話者の動画像をリアルタイムに伝
送することが可能となる。In such a system, image data requiring a large amount of data is previously captured, and the amount of data that changes in real time can be small. A video image of a speaker can be transmitted in real time even in a videophone system using a road.

【００１０】[0010]

【発明が解決しようとする課題】上記特開平７−３８８
７３号公報において提案された通信会議システムは、予
め話者の画像データの取り込みや、会話を始める前に話
者の顔にマーカーを貼り付けたり、頭、腕、身体にセン
サを取り付けるといった繁雑な手間がかかるといった問
題点がある。また、会議システム等の業務用ではなく一
般家庭での使用を考慮すると、このような繁雑性は極め
て不適当な感がある。The above-mentioned Japanese Patent Application Laid-Open No. 7-388.
The communication conference system proposed in Japanese Patent Publication No. 73 is complicated in that it captures image data of a speaker in advance, attaches a marker to the speaker's face before starting conversation, and attaches sensors to the head, arms, and body. There is a problem that it takes time and effort. In addition, considering the use of a conference system or the like not for business use but for general home use, such complexity is extremely inappropriate.

【００１１】すなわち、この通信会議システムにおける
テレビ電話システムは、話者たる使用者の顔画像の諸デ
ータを予めレーザースキャナ等で計測して取り込む必要
があるが、このような大がかりな計測を一般家庭におい
ては行うのはコスト等を考慮すると非常に困難である。
また、会話を始める前に顔にマーカーを貼る必要がある
が、一般家庭での使用状況を考えるに、電話での会話の
度に、特に電話がかかってきた場合等において、その度
に顔にマーカーを貼りつけるのは現実的ではない。That is, in the video telephone system in this communication conference system, it is necessary to previously measure and capture various data of the face image of the user as a speaker by using a laser scanner or the like. Is very difficult to perform in consideration of costs and the like.
In addition, it is necessary to put a marker on the face before starting a conversation, but considering the usage in ordinary households, every time you talk on the phone, especially when you receive a phone call, etc. Sticking a marker is not practical.

【００１２】さらに、会話時にはスクリーン前に居なく
てはならないという、通常の音声だけの電話に対して著
しい制約が科せられるという点については、それ以前の
テレビ電話システムにも共通した問題点である。Furthermore, the fact that the user must be in front of the screen during a conversation, which imposes significant restrictions on ordinary voice-only telephones, is a problem common to earlier videophone systems. .

【００１３】本発明はかかる問題点に鑑みてなされたも
のであり、簡単で安価な装置を用い、使用者に手間をか
けることなく、話者の顔の動きや表情をリアルタイムに
伝送する画像音声通信システムおよびテレビ電話送受信
方法を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned problems, and uses a simple and inexpensive apparatus to transmit a facial movement and a facial expression of a speaker in real time without any trouble to the user. An object of the present invention is to provide a communication system and a videophone transmission / reception method.

【００１４】[0014]

【課題を解決するための手段】上記の目的を達成するた
めに本発明の第１の画像音声通信システムは、画像およ
び音声の通信に適合する画像表示手段および音声出力手
段が当該通信に係る少なくとも受信側に備えられた画像
音声通信システムであって、前記画像表示手段による表
示に適用するキャラクター画像を任意に設定可能なキャ
ラクター画像設定手段と、通信相手から当該キャラクタ
ー画像に対して変形を与えるための指令信号を受信する
変形指令受信手段と、前記指令信号に応じて上記キャラ
クター画像に変形を施すためのキャラクター変形手段
と、前記キャラクター変形手段によって変形が施された
キャラクター画像を前記表示手段に供給して表示せしめ
るための手段と、を具備したことを特徴とする。According to a first aspect of the present invention, there is provided a video and audio communication system according to the present invention, wherein the image display means and the voice output means adapted to the communication of images and voices have at least A video / audio communication system provided on a receiving side, wherein a character image setting unit capable of arbitrarily setting a character image to be applied to display by the image display unit and a communication partner to deform the character image. A deforming command receiving means for receiving a command signal of the character, a character deforming means for deforming the character image according to the command signal, and supplying the character image deformed by the character deforming means to the display means. And means for displaying.

【００１５】上記の目的を達成するために本発明の第２
の画像音声通信システムは、上記第１の画像音声通信シ
ステムにおいて、前記キャラクター画像設定手段は、当
該通信における送信側において任意に自らの側を表わす
キャラクター画像を設定し、該設定したキャラクター画
像を受信側に伝送して前記画像表示手段による表示に適
用するキャラクター画像として提供するように上記送信
側に設けられたものであることを特徴とする。[0015] In order to achieve the above object, a second aspect of the present invention is provided.
The video / audio communication system according to the first video / audio communication system, wherein the character image setting means arbitrarily sets a character image representing its own side on a transmitting side in the communication and receives the set character image. The transmission side is provided so as to provide the character image to be transmitted to the side and applied as a character image to be displayed by the image display means.

【００１６】上記の目的を達成するために本発明の第３
の画像音声通信システムは、上記第２の画像音声通信シ
ステムにおいて、キャラクター画像と指令信号のコード
とこれに対応する当該キャラクター画像の変形の程度と
の対応関係を表わす第１の情報を会話の開始に先行して
伝送し、会話中は、実質的に上記指令信号のコードのみ
からなる第２の情報をリアルタイムで伝送することを特
徴とする。[0016] In order to achieve the above object, a third aspect of the present invention is provided.
The video and audio communication system according to the second video and audio communication system, wherein the first information indicating the correspondence relationship between the character image, the code of the command signal, and the corresponding degree of deformation of the character image is used to start conversation. And the second information substantially consisting only of the code of the command signal is transmitted in real time during a conversation.

【００１７】上記の目的を達成するために本発明の第４
の画像音声通信システムは、上記第３の画像音声通信シ
ステムにおいて、前記第１の情報におけるキャラクター
画像と指令信号のコードとこれに対応する当該キャラク
ター画像の変形の程度との対応関係を通信社が任意に決
定するキャラクター画像変形量決定手段と、前記指令信
号の送信実行の条件を通信者各々が任意に決定する送信
条件決定手段と、を具備したことを特徴とする。In order to achieve the above object, a fourth aspect of the present invention is provided.
In the video / audio communication system according to the third aspect, in the third video / audio communication system, a communication company determines a correspondence relationship between the character image in the first information, the code of the command signal, and the corresponding degree of deformation of the character image. Character image deformation amount determining means for arbitrarily determining, and transmission condition determining means for arbitrarily determining the condition for executing the transmission of the command signal by each communicator are provided.

【００１８】上記の目的を達成するために本発明の第５
の画像音声通信システムは、上記第１の画像音声通信シ
ステムにおいて、前記キャラクター画像設定手段は、通
信者各々が任意に通信相手とキャラクター画像との対応
付けを行うようになされ、通信実行に際して、通信相手
から伝送される信号によって通信相手を認識し、前記キ
ャラクター画像設定手段によって設定された該通信相手
とキャラクター画像との対応付けの情報に基づいて当該
通信相手に対応つけたキャラクター画像を前記画像表示
手段による表示に適用する表示キャラクター選択手段を
更に備えたことを特徴とする。In order to achieve the above object, a fifth aspect of the present invention is provided.
The video / audio communication system according to the first video / audio communication system, wherein the character image setting means allows each of the communicating parties to arbitrarily associate a communication partner with a character image. The communication partner is recognized by a signal transmitted from the partner, and the character image associated with the communication partner is displayed based on the information on the correspondence between the communication partner and the character image set by the character image setting means. A display character selecting means for applying to the display by the means.

【００１９】上記の目的を達成するために本発明の第６
の画像音声通信システムは、上記第１の画像音声通信シ
ステムにおいて、前記画像表示手段、音声出力手段、お
よび音声入力手段は頭部装着型ディスプレイ装置として
構成されてなるものであることを特徴とする。In order to achieve the above object, a sixth aspect of the present invention is provided.
In the video / audio communication system according to the first aspect, the video display unit, the voice output unit, and the voice input unit are configured as a head-mounted display device in the first video / audio communication system. .

【００２０】上記の目的を達成するために本発明の第７
の画像音声通信システムは、上記第１の画像音声通信シ
ステムにおいて、前記頭部装着型ディスプレイ装置に配
置され、これら頭部装着型ディスプレイ装置を用いる当
該画像音声通信システムにおける通信者の視線を検出す
るための視線検出器、頭部の動きを検出するためのヘッ
ドモーションセンサ、発生された音声を検出するための
音声検出手段と、前記視線検出器、ヘッドモーションセ
ンサ、音声検出手段の検出出力に応じて前記指令信号を
送信する送信手段と、を具備したこと特徴とする。In order to achieve the above object, the seventh aspect of the present invention
The video / audio communication system according to the first video / audio communication system is arranged on the head mounted display device, and detects a line of sight of a communication person in the video / audio communication system using the head mounted display device. A line-of-sight detector for detecting head movement, a head motion sensor for detecting head movement, a voice detection unit for detecting a generated voice, and a detection output of the line-of-sight detector, the head motion sensor, and the voice detection unit. Transmitting means for transmitting the command signal.

【００２１】上記の目的を達成するために本発明の第８
の画像音声通信システムは、上記第８の画像音声通信シ
ステムにおいて、当該通信者が自己に対応するものとし
てのキャラクターを動かすための制御、および相手側の
者に対応するものとしてのキャラクターを実空間上に固
定しているように映出する映出位置の制御の双方の制御
は前記ヘッドモーションセンサの出力信号に依拠して行
われるように構成されたことを特徴とする。In order to achieve the above-mentioned object, an eighth aspect of the present invention is provided.
The video / audio communication system according to the eighth video / audio communication system, in which the communicator controls the character to move the character corresponding to himself / herself, and controls the character as the person corresponding to the other party in real space. It is characterized in that both the control of the projection position where the image is projected as fixed above is performed based on the output signal of the head motion sensor.

【００２２】上記の目的を達成するために本発明の第９
の画像音声通信システムは、上記第１の画像音声通信シ
ステムにおいて、前記所定のキャラクター画像は複数種
の図形を組み合わせたものであり、キャラクターデータ
は各図形を指定するコードと各図形の位置関係を表わす
座標データとを含んで構成され、前記キャラクター変形
手段が変形したキャラクターデータを所定の変換フォー
マットに基づいて前記表示手段で表示すべき画像信号に
変換する第１の変換手段を有することを特徴とする。In order to achieve the above object, a ninth aspect of the present invention is provided.
The video / audio communication system according to the first video / audio communication system, wherein the predetermined character image is obtained by combining a plurality of types of figures, and the character data is a code designating each figure and a positional relationship between the figures. And first coordinate conversion means for converting the character data deformed by the character deformation means into an image signal to be displayed on the display means based on a predetermined conversion format. I do.

【００２３】上記の目的を達成するために本発明の第１
のテレビ電話送受信方法は、画像と音声とを通信するテ
レビ電話装置において、通信開始に先立って、当該通信
者が各自己に対応するキャラクターを任意に作成するキ
ャラクター画像作成のステップと、前記キャラクター画
像のうち変形させる部分を上記通信者各々が任意に決定
する変形部分決定のステップと、前記変形部分の変形量
を上記通信者各々が任意に決定する変形量決定ステップ
と、前記変形を実行させる条件を通信者各々が任意に決
定する変形実効条件決定ステップと、を有し、通信開始
直後に、前記変形部分と前記変形量とを対応づけて送信
するキャラクター画像送信ステップと、通信開始後上記
キャラクター画像送信ステップに続く期間内に前記条件
の発生を検出したときに前記変形を促す指令を送信する
変形指令送信ステップと、音声を送信するステップと、
受信側において、前記変形指令を受信して前記キャラク
ター画像のうち変形指定部分を変形指定量だけ変形さ
せ、その変形画像を表示するステップと、の上記各ステ
ップを逐次実行することを特徴とする。In order to achieve the above object, the first aspect of the present invention
A video phone transmitting / receiving method, wherein in a video phone apparatus for communicating an image and a voice, prior to the start of communication, a step of creating a character image in which the corresponding communicator arbitrarily creates a character corresponding to each user; A step of determining a deformed portion in which each of the correspondents arbitrarily determines a portion to be deformed; a step of determining a deformable amount in which each of the correspondents arbitrarily determines the amount of deformation of the deformed portion; A character image transmitting step of transmitting the deformed portion and the deformation amount immediately after the communication starts, and a character image transmitting step of transmitting the character after the communication starts. A deformation command transmitting step of transmitting a command prompting the deformation when the occurrence of the condition is detected within a period following the image transmitting step; And-flops, and transmitting the voice,
Receiving the deformation command, deforming the designated deformation portion of the character image by a designated deformation amount, and displaying the transformed image, on the receiving side.

【００２４】上記の目的を達成するために本発明の第２
のテレビ電話送受信方法は、画像と音声とを通信するテ
レビ電話装置において、通信開始に先立って、当該通信
相手とキャラクター画像との対応付けを各通信者が任意
に決定するキャラクター画像決定のステップと、通信開
始直後に、通信相手を確認し、当該通信相手に対応づけ
た上記キャラクター画像を用意するキャラクター画像用
意ステップと、上記キャラクター画像用意ステップに続
く期間内で当該通信相手から上記キャラクターを変形さ
せる指令を受信する変形指令受信ステップと、上記変形
指令受信ステップで受信された上記キャラクターを変形
させる指令に基づいて当該キャラクター画像を変形さ
せ、その変形を施した画像を表示するステップと、の上
記各ステップを逐次実行することを特徴とする。In order to achieve the above object, the second aspect of the present invention
The videophone transmission / reception method is a videophone device for communicating an image and a voice, and prior to the start of communication, a character image determination step in which each communicator arbitrarily determines the correspondence between the communication partner and the character image. Immediately after the start of communication, a communication partner is confirmed, and a character image preparing step of preparing the character image associated with the communication partner, and transforming the character from the communication partner within a period following the character image preparing step A deformation command receiving step of receiving a command, and deforming the character image based on the command for deforming the character received in the deformation command receiving step, and displaying the deformed image. The method is characterized in that steps are sequentially executed.

【００２５】上記の目的を達成するために本発明の第１
０の画像音声通信システムは、通話者の現実の姿に依存
しないキャラクターを該通話者の相手方に表示するため
の表示手段と、上記表示手段によって表示されるキャラ
クターを当該通話者の通話状況に応じて動かすためのキ
ャラクター制御手段と、を具備したことを特徴とする。In order to achieve the above object, the first aspect of the present invention
0, a display means for displaying a character that does not depend on the actual appearance of the caller to the other party of the caller, and a character displayed by the display means according to the call situation of the caller. And character control means for moving.

【００２６】[0026]

【発明の実施の形態】以下、図面を参照して本発明の実
施の形態を説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００２７】まず、本発明の一実施形態である画像音声
通信システムの概要について説明する。First, an outline of an audiovisual communication system according to an embodiment of the present invention will be described.

【００２８】この画像音声通信システムは、電話回線等
の回線網を介して通信相手の顔画像をモニタしながら音
声会話を行う、いわゆるテレビ電話システムに属するも
のであるが、通信相手の顔画像は通信相手本人の顔画像
に限らず所定のキャラクターデータにより形成された画
像を用いて対話を行うことを特徴としている。すなわ
ち、対話時において一方の話者がモニタする他方相手側
の顔画像は、所定の（任意に設定も可能とする）キャラ
クターデータにより形成されたキャラクター画像であ
り、対話に際して相手側の顔に表情変化が生じると、こ
の表情変化に関連する顔主要部の動きに実質的にリアル
タイムに対応するように該キャラクター画像を変化させ
て伝達することを特徴とするものである。This video and audio communication system belongs to a so-called videophone system in which a voice conversation is performed while monitoring a face image of a communication partner via a line network such as a telephone line. It is characterized in that the dialogue is performed using not only the face image of the communication partner but also an image formed by predetermined character data. That is, the face image of the other party monitored by one speaker during the dialogue is a character image formed by predetermined (arbitrarily settable) character data. When a change occurs, the character image is changed and transmitted so as to substantially correspond in real time to the movement of the main part of the face related to the change in facial expression.

【００２９】そして本画像音声通信システムにおいて
は、上記所定のキャラクター画像は、送り手側の話者に
よって予め設定されるようになされており、画像伝送の
際には、送り手側話者のキャラクター画像データを一
旦、相手側（受け手側）に伝送し、その後、対話時にお
いて送り手側話者の変化に応じて当該キャラクター画像
の主要部の動きに関するデータのみを送信するようにな
っている。これにより、画像伝送に要するデータ伝送量
は極小ですみ、伝送容量の小さい回路網を使用したテレ
ビ電話システムであっても、相手の顔の表情を実質的に
リアルタイムに相手側に伝えることができるようになっ
ている。In the present video / audio communication system, the predetermined character image is set in advance by the sender's speaker, and when the image is transmitted, the character of the sender's speaker is used. The image data is once transmitted to the other party (recipient side), and thereafter, only data relating to the movement of the main part of the character image is transmitted according to a change in the speaker on the sender side during the dialogue. As a result, the amount of data transmission required for image transmission is minimal, and even in the case of a videophone system using a circuit network with a small transmission capacity, the facial expression of the other party can be transmitted to the other party substantially in real time. It has become.

【００３０】また、本画像音声通信システムにおいては
上記キャラクター画像は、使用者が任意に設定するよう
になっているが、予め記憶された画像を使用者が選択し
て用いるようにすることも可能であり、何れにせよこれ
により遊び心に満ちた対話を実現することができる。In the present video / audio communication system, the character image is set arbitrarily by the user, but the user may select and use an image stored in advance. In any case, this makes it possible to realize a playful dialogue.

【００３１】また、本画像音声通信システムにおいて
は、通信の送受信に使用される端末装置として、使用者
が個別に頭部に装着して画像をモニタするＨｅａｄｍ
ｏｕｎｔｅｄｄｉｓｐｌａｙ（以下、ＨＭＤと略記す
る）を用いていることも特徴とする（図５参照）。Further, in the present video / audio communication system, as a terminal device used for transmission / reception of a communication, a head m is used in which a user individually wears his / her head to monitor an image.
It is also characterized by using an attached display (hereinafter abbreviated as HMD) (see FIG. 5).

【００３２】以下、図１を参照して本第１の実施形態で
ある画像音声通信システムにおける主要構成部を概略的
に説明する。Referring now to FIG. 1, the main components of the audiovisual communication system according to the first embodiment will be schematically described.

【００３３】図１は、本第１の実施形態である画像音声
通信システムにおいて用いられる画像音声通信装置の主
要構成を示したブロック図であり、図中、上段側が送信
部の主要構成を、下段側が受信部の主要構成を示してい
る。FIG. 1 is a block diagram showing a main configuration of a video / audio communication apparatus used in the video / audio communication system according to the first embodiment. In FIG. The side shows the main configuration of the receiving unit.

【００３４】なお、本第１の実施形態の画像音声通信シ
ステムは、一方の話者と相手側他方の話者とが同様の機
能を有する装置を使用して通信を行うものであり、互い
に話者側の立場と聞き手側の立場とが入れ替わりつつ対
話をおこなうようになっている。しかしながら、ここで
は便宜上、一方の話者を送り手側話者、他方を受け手側
話者とし、それぞれに係る画像音声通信装置を送り手側
装置、受け手側装置として説明する。In the audiovisual communication system according to the first embodiment, one speaker and the other speaker communicate with each other using devices having similar functions. The dialogue is being conducted while the positions of the listener and the listener are switched. However, here, for convenience, one speaker will be referred to as a sender side speaker, and the other will be referred to as a receiver side speaker.

【００３５】まず、一方の送り手側装置における送信部
の構成について説明する。First, the configuration of the transmission unit in one sender device will be described.

【００３６】この第１の実施形態の画像音声通信装置に
おける送信部は、送り手側話者のキャラクター画像を生
成するキャラクター画像生成手段１００と、このキャラ
クター画像生成手段１００によって生成された各種キャ
ラクター画像に係るデータを記憶する第１のキャラクタ
ーデータ記憶手段１０４と、送り手側話者の目、口等の
動きを検出し、後述する基準値用の信号を送出すると共
に対話時においては送り手側話者の目、口、頭の動きお
よび音声を検出し、次段に送出する表情検出手段１０５
と、この表情検出手段１０５からの検出結果（目、口、
頭の動き）を所定の指令信号コード（目の動きコード、
口の動きコード、頭の動きコード）に変換して出力する
表情コード変換手段１０８と、上記第１のキャラクター
データ記憶手段１０４からのデータまたは表情検出手段
１０５からのデータあるいは上記表情コード変換手段１
０８からのデータを所定のタイミングで選択して次段に
送出する第１の選択手段１０７と、この第１の選択手段
１０７によって選択された上記第１のキャラクターデー
タ記憶手段１０４からのデータまたは表情検出手段１０
５からのデータあるいは表情コード変換手段１０８から
のデータを相手側装置（受け手側装置）に対して送信す
るデータ送信手段１０６と、で主要部が構成されてい
る。The transmitting section in the image and voice communication apparatus of the first embodiment includes a character image generating means 100 for generating a character image of a sender speaker, and various character images generated by the character image generating means 100. The first character data storage means 104 for storing the data relating to the above, detects the movement of the eyes, mouth, etc. of the speaker on the sender side, sends out a signal for a reference value, which will be described later, and at the time of dialogue, Facial expression detecting means 105 for detecting the eyes, mouth, head movements and voice of the speaker and sending the detected results to the next stage
And the detection results (eye, mouth,
Head movement) is converted to a predetermined command signal code (eye movement code,
A facial expression code converting means 108 for converting and outputting to a mouth movement code and a head movement code), data from the first character data storage means 104 or data from the facial expression detecting means 105 or the facial expression code converting means 1
08 from the first character data storage means 104 selected by the first selection means 107 and selecting the data from the first character data storage section 104 at a predetermined timing and transmitting the selected data to the next stage. Detecting means 10
The main part is constituted by the data transmission means 106 for transmitting the data from the apparatus 5 or the expression code conversion means 108 to the other apparatus (reception apparatus).

【００３７】なお、上記第１の選択手段１０７は、端子
１０７ａが選択される上記第１のキャラクターデータ記
憶手段１０４出力端がデータ送信手段１０６に接続さ
れ、端子１０７ｂが選択されると上記表情検出手段１０
５の出力端あるいは上記表情コード変換手段１０８の出
力端が同データ送信手段１０６に接続されるようになっ
ている。なお、この第１の選択手段１０７は、本第１の
実施形態の画像音声通信装置においては、ソフト的に接
続先を選択するようにしている。Note that the first selection means 107 is connected to the data transmission means 106 at the output terminal of the first character data storage means 104 for selecting the terminal 107a, and to detect the facial expression when the terminal 107b is selected. Means 10
5 or the output terminal of the expression code conversion means 108 is connected to the data transmission means 106. Note that the first selecting means 107 is configured to select a connection destination by software in the audiovisual communication apparatus of the first embodiment.

【００３８】上記キャラクター画像生成手段１００は、
所定のキャラクター画像のデータを入力するキャラクタ
ーデータ入力手段１０１と、このキャラクターデータ入
力手段１０１によって入力された所定のキャラクターデ
ータを、所定のドットデータに変換する第１の変換手段
１０２と、この第１の変換手段１０２によってドットデ
ータに変換されたキャラクターデータを表示する表示手
段１０３と、を備えている。The character image generating means 100 includes:
Character data input means 101 for inputting data of a predetermined character image; first conversion means 102 for converting predetermined character data input by the character data input means 101 into predetermined dot data; And display means 103 for displaying character data converted into dot data by the conversion means 102.

【００３９】なお、この表示手段１０３は、使用者が装
着するＨＭＤ等に設けられた表示部である。また、この
ＨＭＤに関しては後に詳述する。The display means 103 is a display unit provided on an HMD or the like worn by a user. The HMD will be described later in detail.

【００４０】上記キャラクターデータ入力手段１０１
は、後述する操作部等で構成されており、使用者（送り
手側話者）は、該キャラクターデータ入力手段１０１を
用いて、所定のキャラクター画像のデータを入力するよ
うになっている。この際、使用者は、初期状態のキャラ
クター画像のデータ（基本キャラクター画像データ）と
ともに同使用者の目、口等の動きに合わせて変化する表
情が変化（変形）したキャラクター画像に係るデータも
設定するようになっている。なお、この表情変化に対応
したキャラクター画像は、予め設定された指令信号コー
ドに対応するパターンの種類ほど設定する。The character data input means 101
Is composed of an operation unit and the like described later, and a user (sender of the sender) uses the character data input means 101 to input data of a predetermined character image. At this time, the user sets not only the data of the character image in the initial state (basic character image data) but also the data of the character image whose expression that changes according to the movement of the user's eyes, mouth, etc. has changed (deformed). It is supposed to. It should be noted that the character images corresponding to the change of the facial expression are set as the types of the patterns corresponding to the preset command signal codes.

【００４１】なお、本実施形態においては、上記キャラ
クターデータ入力手段１０１は、所定のキャラクターデ
ータを操作部等で操作して入力する装置を想定し、使用
者がキャラクター画像を任意に設定するようにしたが、
これに限らず、たとえば、電子カメラ、スキャナ等で採
取した任意の画像データ（使用者本人の顔等の実写を含
む）であっても良い。In the present embodiment, the character data input means 101 is assumed to be a device for inputting predetermined character data by operating the operation unit or the like, so that the user can arbitrarily set a character image. But
The present invention is not limited to this, and may be, for example, arbitrary image data (including a real photograph of the face of the user himself / herself) collected by an electronic camera, a scanner, or the like.

【００４２】さらに、使用者（送り手側話者）は、キャ
ラクターデータ入力手段１０１および表情検出手段１０
５を用いて、該表情検出手段１０５で検出した送り手側
話者の目、口の動きに基づいて表情変化の基準値を設定
するようになっている。なお、基準値とは、当該話者の
表情の変化の程度に応じて当該指令信号コードを出力す
るか否かを判定する際のしきい値を意味する。Further, the user (speaker on the sender side) inputs the character data input means 101 and the facial expression detection means 10.
5, the reference value of the facial expression change is set based on the eyes and mouth movements of the sender speaker detected by the facial expression detecting means 105. The reference value means a threshold value for determining whether to output the command signal code according to the degree of change in the expression of the speaker.

【００４３】上記表情検出手段１０５は、使用者が装着
するＨＭＤ等に設けられており、上記表情変化の基準値
を生成する際に用いる検出手段であるとともに、対話を
行う際に送り手側話者の各表情変化（目、口、頭の動
き）および音声信号を互いに同期した所定のタイミング
で検出して送出する検出手段としての役目も果たすよう
になっている。The facial expression detecting means 105 is provided on an HMD or the like worn by the user, and is used for generating the reference value of the facial expression change. It also serves as a detecting means for detecting and transmitting each facial expression change (eye, mouth, head movement) and audio signal of the person at a predetermined timing synchronized with each other.

【００４４】上記表情変化の基準値を生成する際には、
送り手側話者によるキャラクターデータ入力手段１０１
の操作により、該表情検出手段１０５からの表情変化の
要素のうち目、口の動きに関する検出値が表情コード変
換手段１０８に入力され、この検出値に基づいて表情変
化の基準値が生成されるようになっている。なお、この
基準値も、予め設定された指令信号コードに対応する種
類ほど生成するようになっている。When generating the reference value of the facial expression change,
Character data input means 101 by sender speaker
Of the facial expression change element from the facial expression detecting means 105, the detected value relating to the movement of the eyes and the mouth is input to the facial expression code converting means 108, and a reference value of the facial expression change is generated based on the detected value. It has become. It should be noted that this reference value is generated for a type corresponding to a preset command signal code.

【００４５】一方、対話時においては、上記所定のタイ
ミングで変化する送り手側話者の表情変化は、表情変化
の要素のうち、目、口の動きに関しては、次段の表情コ
ード変換手段１０８において逐次（実質的にリアルタイ
ムに）所定の指令信号コードに変換され、データ送信手
段１０６を介して受け手側装置に伝送されるようになっ
ている。On the other hand, during the dialogue, the expression change of the sender-side speaker which changes at the above-mentioned predetermined timing is one of the elements of the expression change. Is sequentially (substantially in real time) converted into a predetermined command signal code and transmitted to the receiver device via the data transmitting means 106.

【００４６】また、頭の動きのデータは、同表情コード
変換手段１０８において、上記指令信号コードには対応
しない別の所定のコードに変換され、データ送信手段１
０６を介して受け手側装置に伝送されるようになってい
る。The head movement data is converted by the expression code conversion means 108 into another predetermined code which does not correspond to the command signal code.
06 to the receiver device.

【００４７】さらに、音声信号は、該表情コード変換手
段１０８をバイパスして、データ送信手段１０６を介し
て受け手側装置に伝送されるようになっている。Further, the audio signal is transmitted to the receiver device via the data transmission means 106, bypassing the expression code conversion means 108.

【００４８】なお、上記目、口の動きデータに係る所定
の指令信号コードの伝送と、頭の動きデータに係るコー
ドの伝送、音声信号の伝送は、互いに同期して伝送され
るようになっている。The transmission of the predetermined command signal code relating to the eye and mouth movement data, the transmission of the code relating to the head movement data, and the transmission of the audio signal are transmitted synchronously with each other. I have.

【００４９】なお、以上の作業は、第１の変換手段１０
２においてドットデータに変換され逐次、上記表示手段
１０３に表示されるようになっており、使用者は上記作
業をモニタしながら行い得るようになっている。The above operation is performed by the first conversion means 10
2, the data is converted into dot data and sequentially displayed on the display means 103, so that the user can perform the work while monitoring the work.

【００５０】一方、上記画像音声通信装置の受信部は、
相手側装置のデータ送信手段１０６より送られてきた所
定のデータを受信するデータ受信手段１１１と、このデ
ータ受信手段１１１で受信したデータの種別により次段
の回路を選択する第２の選択手段１１２と、この第２の
選択手段１１２で選択された際に相手側のキャラクター
画像に係る所定のデータを一時的に記憶する第２のキャ
ラクターデータ記憶手段１１３と、この第２のキャラク
ターデータ記憶手段１１３に記憶された相手側のキャラ
クター画像に係る所定データと相手側の表情検出手段１
０５で検出されさらに上記表情コード変換手段１０８で
コード化された表情変化のデータとに基づいて相手側の
キャラクター画像を加工するキャラクターデータ加工手
段１１４と、このキャラクターデータ加工手段１１４で
加工された相手側キャラクター画像を所定データに変換
する第２の変換手段１１５と、対話時に相手側の表情検
出手段１０５で検出された頭の動きのデータに基づいて
画像変形の度合いを演算して出力する画像変形手段１１
７と、対話時に相手側の表情検出手段１０５で検出され
た音声信号を再生する音声再生手段１１８と、相手側の
キャラクター画像を表示する表示手段１１６と、で主要
部が構成されている。On the other hand, the receiving section of the video and audio communication device comprises:
A data receiving unit 111 for receiving predetermined data transmitted from the data transmitting unit 106 of the partner device, and a second selecting unit 112 for selecting a next-stage circuit according to the type of data received by the data receiving unit 111 A second character data storage unit 113 for temporarily storing predetermined data relating to the character image of the other party when selected by the second selection unit 112; and a second character data storage unit 113 Predetermined data relating to the character image of the other party stored in the storage device and the facial expression detecting means 1 of the other party
The character data processing means 114 processes the character image of the other party based on the expression change data coded by the expression code conversion means 108 and detected by the expression code conversion means 108, and the other party processed by the character data processing means 114. A second conversion unit 115 for converting the side character image into predetermined data, and an image transformation for calculating and outputting the degree of image transformation based on the head movement data detected by the expression detection unit 105 on the other side during the dialogue. Means 11
7, a voice reproducing means 118 for reproducing a voice signal detected by the facial expression detecting means 105 on the other side during a dialogue, and a display means 116 for displaying a character image of the other side, the main parts are constituted.

【００５１】なお、上記第２の選択手段１１２において
は、詳しくは後述するが通信の初期段階においては、端
子１１２ａが選択されるようになっており、このときデ
ータ受信手段１１１の出力端は上記第２のキャラクター
データ記憶手段１１３に接続されるようになっている。
一方、通信の対話段階においては、端子１１２ｂが選択
されるようになっており、このとき、同データ受信手段
１１１の出力端は上記第２のキャラクターデータ記憶手
段１１３を迂回してキャラクターデータ加工手段１１
４、上記音声再生手段１１８に接続されるようになって
いる。In the second selecting means 112, a terminal 112a is selected in an initial stage of communication, which will be described in detail later. At this time, the output end of the data receiving means 111 is connected to the terminal 112a. It is connected to the second character data storage means 113.
On the other hand, in the communication dialogue stage, the terminal 112b is selected, and at this time, the output end of the data receiving means 111 bypasses the second character data storage means 113 and receives the character data processing means. 11
4. It is connected to the audio reproducing means 118.

【００５２】上記第２の変換手段１１５は、通信の対話
段階において、相手側送り手側話者の目、口の動きに伴
うキャラクター画像の変形度合いを、送り手側装置から
送られてきた指令信号コードを予め設定されている指令
信号コードに照らし合わせて決定し、所定の画像処理を
施した後、画像変形手段１１７に送出するようになって
いる。The second conversion means 115, in the communication dialogue stage, instructs the degree of deformation of the character image accompanying the movement of the eyes and mouth of the other party's sender's speaker from the command sent from the sender's device. The signal code is determined by referring to a preset command signal code, subjected to predetermined image processing, and transmitted to the image deforming means 117.

【００５３】また、上記画像変形手段１１７は、通信の
対話段階において、相手側送り手側話者の頭の動きに伴
うキャラクター画像の変形度合いを所定の演算手法で演
算し、上記第２の変換手段１１５からのデータに所定の
画像処理を施した後、相手側のキャラクター画像を実質
的にリアルタイムに生成し、表示手段１１６に表示させ
るようになっている。The image transformation means 117 computes the degree of transformation of the character image associated with the movement of the head of the speaker at the other end of the communication by a predetermined operation method in the communication dialogue stage. After subjecting the data from the means 115 to predetermined image processing, a character image of the other party is generated substantially in real time and displayed on the display means 116.

【００５４】なお、送り手側装置から送られてくる、上
記頭の動きのデータに係るコードの伝送は、上述したよ
うに上記目、口の動きのデータに係る指令信号コードの
伝送と同期しているので、受け手側装置の表示手段１１
６においても同期して表示するようになっている。The transmission of the code related to the head movement data transmitted from the sender device is synchronized with the transmission of the command signal code related to the eye / mouth movement data as described above. Display means 11 of the receiver-side device.
6 is also displayed synchronously.

【００５５】さらに、上記音声再生手段１１８は、通信
の対話段階において、相手側送り手側話者の音声信号を
上記目、口の動きのデータに係る指令信号コード、頭の
動きのデータに係るコードと同期させて再生するように
なっている。Further, the voice reproducing means 118 converts the voice signal of the other party's sender side into the command signal code relating to the eye / mouth movement data and the head movement data in the communication dialogue stage. Plays in sync with the code.

【００５６】次に、このような構成をなす本実施形態の
画像音声通信システムの作用を簡単に説明する。Next, the operation of the video and audio communication system according to the present embodiment having the above configuration will be briefly described.

【００５７】送り手側話者は、まず、準備段階として、
自身の画像音声通信装置（以下、送り手側装置と略記す
る）において、自身の初期状態のキャラクター画像（基
本キャラクター画像）をキャラクター画像生成手段１０
０内のキャラクターデータ入力手段１０１によって生成
する。なお、送り手側話者は、生成するキャラクター画
像を表示手段１０３をモニタしながら行うようになって
いる。すなわち、キャラクターデータ入力手段１０１に
よって入力されたキャラクターデータは、第１の変換手
段１０２によって所定のドットデータに変換され表示手
段１０３に表示されるようになっている。First, as a preparation stage, the sender speaker
In its own image / audio communication device (hereinafter abbreviated as “sender-side device”), a character image (basic character image) in its own initial state is converted into a character image generating means 10
It is generated by the character data input means 101 within 0. The sender speaker performs the generated character image while monitoring the display unit 103. That is, the character data input by the character data input unit 101 is converted into predetermined dot data by the first conversion unit 102 and displayed on the display unit 103.

【００５８】この後、送り手側話者は、キャラクターデ
ータ入力手段１０１を用いて、予め記憶されている指令
信号コードに対応させて、上記初期状態のキャラクター
画像に対して目、口等が変形した、すなわち、表情を変
化させたキャラクター画像を所定種類ほど作成するとと
もに、この変形の度合い（以下、変形キャラクター画像
データという）をそれぞれ対応させて生成する。この所
定種類は、予め記憶された指令信号コードの数に対応し
た種類である。Thereafter, the sender speaker uses the character data input means 101 to deform the eyes, mouth, etc., of the character image in the initial state according to the command signal code stored in advance. That is, a predetermined number of character images with changed facial expressions are created, and the degrees of deformation (hereinafter, referred to as deformed character image data) are generated in association with each other. The predetermined type is a type corresponding to the number of command signal codes stored in advance.

【００５９】次に、表情検出手段１０５で送り手側話者
の目、口の動きの所定データを検出し、これらのデータ
に基づき、表情コード変換手段１０８において表情変化
の基準値を生成する。この際、この表情変化の基準値
は、先ほどキャラクターデータ入力手段１０１を用いて
所定種類ほど生成した、初期状態のキャラクター画像に
対して目、口等が変化したキャラクター画像に対応した
種類ほど生成されるようになっている。Next, predetermined data of eye and mouth movements of the speaker on the sender side are detected by the facial expression detecting means 105, and a facial expression code converting means 108 generates a reference value of facial expression change based on these data. In this case, the reference value of the facial expression change is generated for a predetermined type using the character data input means 101, and for the type corresponding to the character image whose eyes, mouth, etc. have changed with respect to the initial character image. It has become so.

【００６０】なお、この生成工程の際、各キャラクター
画像は、上記第１の変換手段１０２において各キャラク
ターデータからドットデータに変換されて表示手段１０
３に表示される。これにより、送り手側話者は表示手段
１０３をモニタしながら上記作業を行いえるようになっ
ている。At the time of this generation step, each character image is converted from each character data into dot data by the first conversion means 102 and displayed on the display means 10.
3 is displayed. As a result, the sender speaker can perform the above operation while monitoring the display means 103.

【００６１】次に、このようにして上記キャラクター画
像生成手段１００で生成された各種キャラクターデー
タ、すなわち、送り手側話者の初期のキャラクター画像
のデータ、表情変化に対応したキャラクター画像のデー
タ、該表情変化に対応する基準値のデータが、送り手側
装置の第１のキャラクターデータ記憶手段１０４に記憶
される。Next, the various character data generated by the character image generating means 100 in this manner, that is, the data of the initial character image of the sender speaker, the data of the character image corresponding to the facial expression change, The data of the reference value corresponding to the change of the facial expression is stored in the first character data storage unit 104 of the sender device.

【００６２】以上により、準備段階が完了する。なお、
送り手側話者と対話を行う受け手側話者の受け手側装置
においても、同様の準備を行うものとする。Thus, the preparation stage is completed. In addition,
The same preparation is performed in the receiver device of the receiver speaker who interacts with the sender speaker.

【００６３】準備が完了し、相手側装置と通信が開始さ
れると、所定のプロトコルにより初期段階の交渉が行わ
れ、引き続き、まず、第１のキャラクターデータ記憶手
段１０４より、上述した送り手側話者の各種キャラクタ
ーデータが出力される。すなわち、送り手側装置におい
ては、上記第１の選択手段１０７において端子１０７ａ
が選択され、上記第１のキャラクターデータ記憶手段１
０４から上述した各種キャラクターデータがデータ送信
手段１０６を介して受け手側装置に向けて送出される。When the preparation is completed and communication with the partner device is started, negotiations in the initial stage are performed by a predetermined protocol. Various character data of the speaker is output. That is, in the sender device, the terminal 107a
Is selected, and the first character data storage means 1 is selected.
From 04, the various character data described above is transmitted to the receiver device via the data transmission means 106.

【００６４】受け手側装置は、送り手側装置から送られ
てきた送り手側話者のキャラクターデータをデータ受信
手段１１１で受信し、第２の選択手段１１２において端
子１１２ａを選択し、上記各種キャラクターデータを第
２のキャラクターデータ記憶手段１１３に記憶する。The receiver device receives the character data of the sender speaker transmitted from the sender device by the data receiving means 111, selects the terminal 112 a by the second selecting means 112, The data is stored in the second character data storage unit 113.

【００６５】なお、この交渉段階では、送り手側装置か
ら受け手側装置へ各種キャラクターデータが伝送される
と同時に、受け手側装置の各種キャラクターデータも送
り手側装置に対して伝送される。したがって、送り手側
装置においても、受け手側話者の各種キャラクターデー
タを自身の第２のキャラクターデータ記憶手段１１３に
記憶する。In this negotiation stage, various character data are transmitted from the sender device to the receiver device, and at the same time, various character data of the receiver device are also transmitted to the sender device. Therefore, the sender device also stores various character data of the receiver speaker in its own second character data storage unit 113.

【００６６】上記各種キャラクターデータが互いに相手
側装置に伝送されると、次に、対話段階に移行する。ま
ず、上記第１の選択手段１０７，第２の選択手段１１２
において、スイッチがそれぞれ端子１０７ｂ，端子１１
２ｂに切換わる。When the various character data are transmitted to each other, the process proceeds to the interactive stage. First, the first selecting means 107 and the second selecting means 112
, The switches are respectively a terminal 107b and a terminal 11
Switch to 2b.

【００６７】送り手側装置の表情検出手段１０５で送り
手側話者の表情変化のうち、目、口の動きの変化を所定
のタイミングで検出し、続いて表情コード変換手段１０
８で該目、口の動きの変化を逐次所定の指令信号コード
に変換して、データ送信手段１０６を介して受け手側装
置に送出する。The facial expression detecting means 105 of the sender device detects a change in the movement of the eyes and mouth among the facial expressions of the sender speaker at a predetermined timing.
At step 8, the change of the movement of the eyes and the mouth is sequentially converted into a predetermined command signal code, and transmitted to the receiving apparatus via the data transmitting means 106.

【００６８】この目、口の動きのデータに係る指令信号
コードの伝送に同期して、同じく上記表情検出手段１０
５で頭の動きに係るデータを検出して上記表情コード変
換手段１０８で所定のコードに変換した後、データ送信
手段１０６を介して受け手側装置に送出する。さらに、
同目、口の動きのデータに係る指令信号コードの伝送に
同期して、同じく上記表情検出手段１０５で検出した音
声信号をデータ送信手段１０６を介して受け手側装置に
送出する。In synchronization with the transmission of the command signal code relating to the eye and mouth movement data, the expression detecting means 10
In 5, the data relating to the head movement is detected and converted into a predetermined code by the expression code conversion means 108, and then transmitted to the receiver device via the data transmission means 106. further,
Similarly, in synchronism with the transmission of the command signal code relating to the mouth movement data, the voice signal detected by the expression detecting means 105 is transmitted to the receiver device via the data transmitting means 106.

【００６９】受け手側装置は、データ受信手段１１１
で、送り手側話者の目、口の動きのデータに係る指令信
号コードを受信すると、キャラクターデータ加工手段１
１４において、該指令信号コードに応じて第２のキャラ
クターデータ記憶手段１１３に記憶されている送り手側
話者の各種キャラクターデータを加工する。The data receiving means 111
When the command signal code relating to the eye and mouth movement data of the sender side speaker is received, the character data processing means 1
In step 14, various character data of the sender-side speaker stored in the second character data storage means 113 is processed in accordance with the command signal code.

【００７０】この後、上記キャラクターデータ加工手段
１１４で加工処理が施された送り手側話者のキャラクタ
ーデータが予め記憶されているフォーマットに基づいて
第２の変換手段１１５で変換される。Thereafter, the character data of the sender speaker processed by the character data processing means 114 is converted by the second conversion means 115 based on a format stored in advance.

【００７１】また、受け手側装置は、データ受信手段１
１１で、上記目、口の動きのデータに係る指令信号コー
ドの受信に同期して頭の動きのデータに係るコードを受
信する。また、該頭の動きコードに基づいて画像変形手
段１１７で画像の変形の度合いを演算し、上記第２の変
換手段１１５で変換されたキャラクター画像のデータに
所定の画像処理を施して表示手段１１６に表示する。こ
のとき、相手側送り手側話者のキャラクター画像を実質
的にリアルタイムに表示する。Further, the receiving device is a data receiving means 1
At 11, a code relating to the head movement data is received in synchronization with the reception of the command signal code relating to the eye / mouth movement data. Further, the image deformation unit 117 calculates the degree of image deformation based on the head movement code, performs predetermined image processing on the character image data converted by the second conversion unit 115, and displays the character image data on the display unit 116. To be displayed. At this time, the character image of the other party's sender speaker is displayed substantially in real time.

【００７２】さらに、受け手側装置は、データ受信手段
１１１で、上記指令信号コードの受信に同期して相手側
送り手側話者の音声信号を受信し、該音声信号を音声再
生手段１１８で再生する。Further, the receiver device receives the voice signal of the speaker on the other side in synchronization with the reception of the command signal code by the data receiving means 111, and reproduces the voice signal by the voice reproducing means 118. I do.

【００７３】以上の通信を要約すると、以下の通りとな
る。すなわち、Ａ：通信前段階 (1)：話者は、それぞれ自身の基本となるキャラクター
画像を生成する。 (2)：話者は、上記基本キャラクター画像に対して、所
定の指令信号コードに対応する表情変化（目、口の動
き）を付けた新たなキャラクター画像に関するデータ
（該基本キャラクター画像に対してどれだけ変形するか
のデータ、以下、変形キャラクター画像データとする）
を生成する。 (3)：話者は、自身の目、口の動きを検出し、所定の指
令信号コードに対応する表情変化の基準値（しきい値）
を設定する。The above communication is summarized as follows. A: Pre-communication stage (1): Each speaker generates its own basic character image. (2): The speaker transmits data relating to a new character image obtained by adding a facial expression change (eye, mouth movement) corresponding to a predetermined command signal code to the basic character image (with respect to the basic character image). Data on how much deformation, hereinafter referred to as deformed character image data)
Generate (3): The speaker detects the movement of its own eyes and mouth, and the reference value (threshold) of the facial expression change corresponding to the predetermined command signal code.
Set.

【００７４】Ｂ：通信初期段階 (1)：送り手側装置より、送り手側話者の基本キャラク
ター画像データが受け手側装置に伝送され、該受け手側
装置の記憶部に記憶する。 (2)：送り手側装置より、所定の指令信号コードに対応
する表情変化を付けた変形キャラクター画像データが受
け手側装置に伝送され、該受け手側装置の記憶部に記憶
する。B: Communication Initial Stage (1): Basic character image data of the sender's speaker is transmitted from the sender's device to the receiver's device and stored in the storage unit of the receiver's device. (2): The transformed character image data with the expression change corresponding to the predetermined command signal code is transmitted from the sender device to the recipient device, and stored in the storage unit of the recipient device.

【００７５】Ｃ：通信対話段階（送り手側装置） (1)：送り手側話者の目、口の動きを所定のタイミング
で検出する。 (2)：送り手側話者の目、口の動きの検出結果と上記し
きい値に基づいて所定の指令信号コードを逐次、受け手
側装置に伝送する。 (3)：送り手側話者の頭の動きを所定のタイミングで検
出し、この検出結果を逐次、受け手側装置に伝送する。
なお、この頭の動きのコードの伝送は、上記(2)におけ
る所定の指令信号コードの伝送に同期している。 (4)：送り手側話者の音声信号を所定のタイミングで採
取し、この音声信号を逐次、受け手側装置に伝送する。
なお、この音声信号の伝送は、上記(2)における所定の
指令信号コードの伝送に同期している。C: Communication Dialogue Stage (Sender Side Device) (1): The eyes and mouth movements of the sender side speaker are detected at a predetermined timing. (2): A predetermined command signal code is sequentially transmitted to the receiver device based on the detection results of the eyes and mouth movements of the sender speaker and the threshold value. (3): The movement of the head of the sender speaker is detected at a predetermined timing, and the detection result is sequentially transmitted to the receiver device.
The transmission of the head movement code is synchronized with the transmission of the predetermined command signal code in the above (2). (4): A voice signal of the sender speaker is collected at a predetermined timing, and the voice signal is sequentially transmitted to the receiver device.
Note that the transmission of the audio signal is synchronized with the transmission of the predetermined command signal code in the above (2).

【００７６】Ｄ：通信対話段階（受け手側装置） (1)：送り手側装置から逐次（実質的にリアルタイム
に）送られてくる送り手側話者の目、口の動きに関する
所定の指令信号コードを受けとる。 (2)：送り手側装置から逐次（実質的にリアルタイム
に）送られてくる送り手側話者の頭の動きに関するコー
ドを受けとる（上記(1)に同期）。 (3)：送り手側装置から逐次（実質的にリアルタイム
に）送られてくる送り手側話者の音声信号を受けとる
（上記(1)に同期）。 (4)：上記(1)で受け取った指令信号コードに対応する
目、口の動きのキャラクター画像データあるいは変形キ
ャラクター画像データを記憶部より検索し読み出す。 (5)：上記(4)で読み出したキャラクター画像データある
いは変形キャラクター画像データと、上記(2)で受け取
った頭の動きに関するコードに基づいて、送り手側話者
のキャラクター画像を実質的にリアルタイムに表示部に
表示する。 (6)：上記(3)で受け取った音声信号に基づいて、送り手
側話者の音声をリアルタイムに再生する。D: Communication Dialogue Stage (Receiver Device) (1): Predetermined command signal regarding eye and mouth movements of the sender speaker sequentially (substantially in real time) sent from the sender device Receive the code. (2): Receiving a code related to the head movement of the sender speaker sequentially (substantially in real time) sent from the sender device (synchronous with (1) above). (3): Receiving the voice signal of the sender speaker sequentially (substantially in real time) from the sender device (synchronous with (1) above). (4): Searching and reading character image data or deformed character image data of eye and mouth movements corresponding to the command signal code received in (1) from the storage unit. (5): Based on the character image data or deformed character image data read out in (4) above and the head movement code received in (2) above, the character image of the sender speaker is substantially real-time. Is displayed on the display unit. (6): The voice of the sender speaker is reproduced in real time based on the voice signal received in (3).

【００７７】以上、本第１の実施形態の画像音声通信シ
ステムの概要を説明したが、以下、該実施形態の画像音
声通信システムの具体的な構成、作用を図２ないし図３
２を参照して説明する。The outline of the video and audio communication system according to the first embodiment has been described above. Hereinafter, the specific configuration and operation of the video and audio communication system according to the first embodiment will be described with reference to FIGS.
This will be described with reference to FIG.

【００７８】図２は、本発明の第１の実施形態である画
像音声通信システムの主要構成を示した説明図である。FIG. 2 is an explanatory diagram showing the main configuration of the video and audio communication system according to the first embodiment of the present invention.

【００７９】図に示すように、本実施形態の画像音声通
信システムは、当該通信システムを介して対話を行う使
用者の頭部に装着され後述する映像生成ボックス２から
の所定データに基づいて該使用者に映像、音声を供給す
るとともに、同使用者の目の動き、頭の動き、口の動
き、音声等を映像生成ボックス２に送出するＨｅａｄｍ
ｏｕｎｔｅｄｄｉｓｐｌａｙ（以下、ＨＭＤ）１と、
このＨＭＤ１に接続され同ＨＭＤ１に対して電力を供給
するとともに所定の映像信号、音声信号等を供給する一
方、該ＨＭＤ１より視線信号（目の動きに対応）、ヘッ
ドモーション信号（頭の動きに対応）、音声信号（口の
動きに対応）等を受信し、後述する所定の処理を施す映
像生成ボックス２と、この映像生成ボックス２に接続さ
れ、該映像生成ボックス２に内設されたキャラクターデ
ータ記録部３６に対してキャラクターデータ生成用のコ
ントロール信号９を送出するコントローラパッド３と、
上記映像生成ボックス２と通常の電話回線を介して接続
され、キャラクタデータ、頭の動きコード、目の動きコ
ード、口の動きコード、音声信号（図中、符号８で示
す）等の信号の送受信を行う外部装置４とで、主要部が
構成されている。As shown in the figure, the video and audio communication system of the present embodiment is mounted on the head of a user who has a conversation through the communication system and is based on predetermined data from a video generation box 2 described later. Headm that supplies video and audio to the user and sends the user's eye movements, head movements, mouth movements, and voices to the video generation box 2.
mounted display (hereinafter, HMD) 1;
While being connected to the HMD 1 and supplying power to the HMD 1 and supplying predetermined video signals, audio signals, and the like, the HMD 1 provides a line-of-sight signal (corresponding to eye movement) and a head motion signal (corresponding to head movement). ), An audio signal (corresponding to the movement of the mouth) and the like, and a predetermined processing described later is performed, and character data connected to the video generation box 2 and installed in the video generation box 2 A controller pad 3 for sending a control signal 9 for generating character data to the recording unit 36;
Transmission and reception of signals such as character data, head movement codes, eye movement codes, mouth movement codes, and audio signals (indicated by reference numeral 8 in the figure) are connected to the video generation box 2 via a normal telephone line. The main part is constituted by the external device 4 for performing the above.

【００８０】まず、上記ＨＭＤ１について図２、図３、
図４を参照して説明する。First, FIG. 2, FIG.
This will be described with reference to FIG.

【００８１】図３は、上記ＨＭＤ１を使用者が装着した
際の様子を一側方よりみた側面図である。FIG. 3 is a side view showing a state where the HMD 1 is worn by a user from one side.

【００８２】また、図４は、上記ＨＭＤ１，映像生成ボ
ックス２，コントローラパッド３の接続対応と、これら
各部の電気回路的な構成を詳しく示したブロック構成図
である。FIG. 4 is a block diagram showing in detail the connection of the HMD 1, the image generation box 2, and the controller pad 3 and the electric circuit configuration of each of these components.

【００８３】上記ＨＭＤ１は、図２、図３に示すように
使用者の両眼部前方より頭頂部にかけて延設された支持
筐体に接眼光学系１３，１６、ヘッドモーションセンサ
１１、マイク１９、スピーカ２０、通話スイッチ２４等
が配設されて構成されており、当該通信システムを介し
て対話を行う各使用者の頭部に図示の如く装着されるよ
うになっている。すなわち、使用者に装着された際に、
使用者の眼部前方には接眼光学系等からなる映像部が、
頭頂部にヘッドモーションセンサ１１が、左右の耳部に
は左右のスピーカ２０Ａ，２０Ｂ（図２参照）が、口部
前方には上記支持筐体より延設されたマイク１９がそれ
ぞれ配置されるようになっており、耳部の後部において
支持部２５で頭部に支持するようになっている。また、
支持筐体の一側方には、通信開始時におけるオフフック
スイッチの役目を果たす通話スイッチ２４が配設されて
いる。As shown in FIGS. 2 and 3, the HMD 1 is mounted on a supporting housing extending from the front of both eyes of the user to the top of the head, the eyepiece optical systems 13 and 16, the head motion sensor 11, the microphone 19, A speaker 20, a call switch 24, and the like are arranged and configured to be mounted on the head of each user who has a conversation through the communication system as shown in the figure. That is, when worn by the user,
In front of the user's eye part, an image part consisting of an eyepiece optical system etc.,
A head motion sensor 11 is arranged at the top of the head, left and right speakers 20A and 20B (see FIG. 2) are arranged at the left and right ears, and a microphone 19 extended from the support housing is arranged at the front of the mouth. At the rear of the ear, and is supported on the head by the support portion 25. Also,
A call switch 24 serving as an off-hook switch at the start of communication is provided on one side of the support housing.

【００８４】また、上記支持部２５からは映像生成ボッ
クス２と接続する接続コードが延設されており、該映像
生成ボックス２より左右の映像信号、音声信号、液晶シ
ャッタ駆動信号、電力（図２中、符号７で示す）等の供
給を受けることで所定の動作を行うようになっている。A connecting cord for connecting to the video generation box 2 extends from the support portion 25, and the left and right video signals, audio signals, liquid crystal shutter drive signal, and power (see FIG. (Shown by reference numeral 7)) to perform a predetermined operation.

【００８５】ここで、図３に加え図４を参照してＨＭＤ
１における接眼光学系周辺の構成をさらに詳しく説明す
る。Now, referring to FIG. 4 in addition to FIG.
The configuration around the eyepiece optical system 1 will be described in more detail.

【００８６】上述したように使用者両眼部の前方には、
左右の接眼光学系１６，１３が配置され、これら左接眼
光学系１６，右接眼光学系１３の上方にはそれぞれ左Ｌ
ＣＤ１７、右ＬＣＤ１４が配設されている。また、これ
ら左ＬＣＤ１７、右ＬＣＤ１４のさらに上方にはバック
ライト２１が配設され、さらに上記左右の接眼光学系の
前方には液晶シャッタ２３が配設されている。As described above, in front of the user's both eyes,
Left and right eyepiece optical systems 16 and 13 are arranged, and above the left eyepiece optical system 16 and right eyepiece optical system 13, a left L
A CD 17 and a right LCD 14 are provided. Further, a backlight 21 is disposed further above the left LCD 17 and the right LCD 14, and a liquid crystal shutter 23 is disposed in front of the left and right eyepiece optical systems.

【００８７】上記左ＬＣＤ１７、右ＬＣＤ１４はＨＭＤ
１内部に配設されたＬＣＤ駆動回路１８によって駆動さ
れるようになっており、該ＬＣＤ駆動回路１８は映像生
成ボックス２の制御により動作されるようになってい
る。また、図示はしないが上記液晶シャッタ２３、バッ
クライト２１も映像生成ボックス２に接続され、各々駆
動制御されるようになっている。The left LCD 17 and the right LCD 14 are HMD
1 is driven by an LCD drive circuit 18 disposed inside the LCD drive circuit 18. The LCD drive circuit 18 is operated under the control of the video generation box 2. Further, although not shown, the liquid crystal shutter 23 and the backlight 21 are also connected to the image generation box 2 and are each driven and controlled.

【００８８】すなわち、映像生成ボックス２からの左右
の映像信号、液晶シャッタ駆動信号に基づいて上記左Ｌ
ＣＤ１７、右ＬＣＤ１４、液晶シャッタ２３、バックラ
イト２１が動作し、使用者に所定の映像が供給されるよ
うになっている。That is, based on the left and right video signals from the video generation box 2 and the liquid crystal shutter drive signal, the left L
The CD 17, the right LCD 14, the liquid crystal shutter 23, and the backlight 21 operate to supply a predetermined image to the user.

【００８９】また、上記左右のスピーカ２０Ａ，Ｂは、
映像生成ボックス２からの音声信号に基づいて所定の音
声を再生するようになっている。The left and right speakers 20A and 20B are
A predetermined audio is reproduced based on the audio signal from the video generation box 2.

【００９０】一方、上記左接眼光学系１６，右接眼光学
系１３の近傍には使用者の視線を検出する左視線検出器
１５，右視線検出器１２およびこれらの視線検出用の光
源２２が配設されている。上記左視線検出器１５，右視
線検出器１２で検出した左右の視線情報は映像生成ボッ
クス２の目の動きコード変換器５１に対して送出される
ようになっている。On the other hand, in the vicinity of the left eyepiece optical system 16 and right eyepiece optical system 13, a left eye gaze detector 15, a right eye gaze detector 12 for detecting the gaze of the user, and a light source 22 for gaze detection are arranged. Has been established. The left and right line-of-sight information detected by the left line-of-sight detector 15 and the right line-of-sight detector 12 are sent to the eye motion code converter 51 of the video generation box 2.

【００９１】そして、この左右の視線検出器１５，１２
からの視線情報は、所定の初期設定情報として用いられ
るほか、会話時においては使用者の視線の動き（目の動
き）の情報として用いられる。なお、詳しくは後述す
る。The left and right eye gaze detectors 15, 12
The line-of-sight information from is used as predetermined initial setting information, and is also used as information on the user's line-of-sight movement (eye movement) during conversation. The details will be described later.

【００９２】また、上記ヘッドモーションセンサ１１
は、使用者の頭部の動きを３次元的に検出するセンサで
あり、頭の動きに対応する３次元情報を映像生成ボック
ス２の頭の動きコード変換器５２に対して送出するよう
になっている。The head motion sensor 11
Is a sensor that three-dimensionally detects the movement of the user's head, and sends three-dimensional information corresponding to the head movement to the head movement code converter 52 of the video generation box 2. ing.

【００９３】このヘッドモーションセンサ１１からのデ
ータも、会話時において使用者の頭の動きの情報として
用いられる。The data from the head motion sensor 11 is also used as information on the head movement of the user during conversation.

【００９４】さらに、上記マイク１９は会話時において
通常の音声記録装置として使用者の音声を採取する機能
を果たすとともに、所定の条件のもと使用者の口の動き
を検出する検出装置としての役目も果たすようになって
いる。すなわち、所定の初期設定を行う際、該マイク１
９からの音声信号は映像生成ボックス２の口の動きコー
ド変換器５０に対して送出されるようになっている。一
方、会話時においては該マイク１９で採取した音声信号
は映像生成ボックス２の音声信号送信器４８に対して送
出され通信相手に対して伝送される一方、会話中の使用
者の口の動きが検出され映像生成ボックス２の口の動き
コード変換器５０に対して該情報が送出されるようにな
っている。なお、詳しくは後述する。Further, the microphone 19 functions as a normal voice recording device during conversation to collect the voice of the user, and also functions as a detection device for detecting the movement of the user's mouth under predetermined conditions. Also fulfills. That is, when performing a predetermined initial setting, the microphone 1
9 is sent to the motion code converter 50 of the mouth of the video generation box 2. On the other hand, during a conversation, the audio signal collected by the microphone 19 is sent to the audio signal transmitter 48 of the video generation box 2 and transmitted to the communication partner, while the mouth movement of the user during the conversation is reduced. The detected information is sent to the mouth motion code converter 50 of the video generation box 2. The details will be described later.

【００９５】次に、上記映像生成ボックス２の構成につ
いて図４を参照してさらに詳しく説明する。Next, the configuration of the video generation box 2 will be described in more detail with reference to FIG.

【００９６】上記映像生成ボックス２は、電話回線を介
して通信相手側の装置となる外部装置４（図２参照）と
の間で、キャラクタデータ、頭の動きコード、目の動き
コード、口の動きコード、音声信号（図２において符号
８で示す）等の信号の送受信を行うようになっている
が、図４に示すようにその際の送受信器を各種備えてい
る。The video generation box 2 communicates character data, a head movement code, an eye movement code, a mouth movement code with an external device 4 (see FIG. 2) which is a communication partner device via a telephone line. Signals such as motion codes and audio signals (indicated by reference numeral 8 in FIG. 2) are transmitted and received, and various transmitters and receivers are provided as shown in FIG.

【００９７】すなわち、一方の話者が用いる画像音声通
信装置（図２中、ＨＭＤ１，映像生成ボックス２，コン
トローラパッド３）と、相手側話者が用いる画像音声通
信装置である外部装置４との間において、所定のキャラ
クターデータの送受信はキャラクターデータ送信器３
１、キャラクターデータ受信器３２で行うようになって
いる。以下、目の動きコード、口の動きコード、頭の動
きコードの送受信はそれぞれ、目の動きコード受信器３
３、目の動きコード送信器４５、口の動きコード受信器
３４、口の動きコード送信器４７、頭の動きコード受信
器３５、頭の動きコード送信器４６で行うようになって
いる。That is, a video / audio communication device (HMD 1, video generation box 2, controller pad 3 in FIG. 2) used by one speaker and an external device 4 which is a video / audio communication device used by the other speaker. Transmission and reception of predetermined character data between the character data transmitters 3
1. The character data receiver 32 performs the processing. Hereinafter, the transmission and reception of the eye movement code, the mouth movement code, and the head movement code are performed by the eye movement code receiver 3 respectively.
Third, the eye motion code transmitter 45, the mouth motion code receiver 34, the mouth motion code transmitter 47, the head motion code receiver 35, and the head motion code transmitter 46 are used.

【００９８】また、通信相手との会話中の音声信号の送
受信は、音声信号送信器４８、音声信号受信器４９で行
うようになっている。Further, the transmission and reception of the audio signal during the conversation with the communication partner are performed by the audio signal transmitter 48 and the audio signal receiver 49.

【００９９】ところで、本第１の実施形態の画像音声通
信システムは、上述したように所定のキャラクター画像
を用いて一方の話者と相手側の話者とが対話を行うよう
になっているが、以下、該映像生成ボックス２内におい
て、キャラクターデータの生成、加工、記憶を行う各装
置について、信号の流れに沿ってその構成を説明する。By the way, in the video and audio communication system of the first embodiment, as described above, one speaker and the other speaker interact with each other using a predetermined character image. Hereinafter, the configuration of each device that generates, processes, and stores character data in the video generation box 2 will be described along the flow of signals.

【０１００】本実施形態の画像音声通信装置において
は、対話に用いる送信者のキャラクター画像に係るキャ
ラクターデータは、フォーマット記憶部４４に予め記憶
されている指令信号コードに対応してキャラクターデー
タ生成装置４３で生成されるようになっている。このキ
ャラクターデータ生成装置４３には、図２に示すように
接続コードを介して上記コントローラパッド３が接続さ
れている。そして、該コントローラパッド３に配設され
たキャラクターデータ生成用コントローラ６１、ダイヤ
ルボタン６２の操作によりコントロール信号９（図２参
照）がキャラクターデータ生成装置４３に対して送出さ
れ、使用者による任意のキャラクターデータ作成を可能
としている。なお、詳しくは後述する。In the image and voice communication apparatus of the present embodiment, the character data relating to the character image of the sender used in the dialogue is stored in the character data generation device 43 corresponding to the command signal code stored in the format storage section 44 in advance. Is generated. The controller pad 3 is connected to the character data generation device 43 via a connection cord as shown in FIG. A control signal 9 (see FIG. 2) is transmitted to the character data generating device 43 by operating the character data generating controller 61 and the dial button 62 disposed on the controller pad 3, and an arbitrary character by the user. Data creation is possible. The details will be described later.

【０１０１】また、上記キャラクターデータ生成装置４
３で生成される所定のキャラクター画像のデータは、キ
ャラクター画像生成装置３９等を介してＨＭＤ１に伝送
され、該ＨＭＤ１のモニタ画面上に当該キャラクター画
像の作成画面が映し出されるようになっている。The character data generating device 4
The data of the predetermined character image generated in step 3 is transmitted to the HMD 1 via the character image generating device 39 and the like, and a screen for creating the character image is displayed on the monitor screen of the HMD 1.

【０１０２】ところで、いま、一方の話者を送り手側話
者とし、該送り手側話者が使用する装置を送り手側装置
とすると、送り手側話者のキャラクター画像および該キ
ャラクター画像に係る各種キャラクターデータは送り手
側装置の映像生成ボックス２内のキャラクターデータ生
成装置４３において生成されるようになっている。すな
わち、送り手側話者に対して相手受け手側話者のキャラ
クターデータは相手受け手側装置のキャラクターデータ
生成装置４３において生成するようになっている。Now, assuming that one of the speakers is a sender speaker, and the device used by the sender speaker is a sender device, the character image of the sender speaker and the character image will be Such various character data is generated by the character data generation device 43 in the video generation box 2 of the sender device. In other words, the character data of the other party's receiver is generated by the character data generator 43 of the other party's device with respect to the sender's speaker.

【０１０３】図４に戻って、上記キャラクターデータ生
成装置４３は、送り手側話者のキャラクター画像および
該キャラクター画像の変形に伴ういくつかのキャラクタ
ーデータを生成するようになっている。すなわち、ま
ず、使用者（送り手側話者）は、通信を行う前の準備段
階において、該キャラクターデータ生成装置４３に接続
された上記コントローラパッド３におけるキャラクター
データ生成用コントローラ６１、ダイヤルボタン６２を
操作して、所定のキャラクター画像を任意に生成し、さ
らに、該キャラクター画像に対して所定の変形を施した
キャラクター画像を生成するようになっている。Returning to FIG. 4, the character data generating device 43 generates a character image of the sender speaker and some character data accompanying the deformation of the character image. That is, first, the user (sender speaker) operates the character data generation controller 61 and the dial button 62 on the controller pad 3 connected to the character data generation device 43 in a preparation stage before performing communication. By operating, a predetermined character image is arbitrarily generated, and further, a character image obtained by performing a predetermined deformation on the character image is generated.

【０１０４】この際、まず、使用者（送り手側話者）自
身の初期状態のキャラクター画像である基本キャラクタ
ー画像を後述する手法により生成するとともに、同使用
者の目、口等の動きに応じて表情が変化（変形）するキ
ャラクター画像に係るデータを設定するようになってい
る。このとき、該表情変化に対応したキャラクター画像
は、フォーマット記憶部４４に予め設定記憶された指令
信号コードに対応するパターンの種類ほど設定するよう
になっている。そして、この表情変化に対応したキャラ
クター画像は、実際は、上記基本キャラクター画像に対
する変形量として設定されるようになっている。At this time, first, a basic character image, which is a character image of an initial state of the user (sender-side speaker), is generated by a method described later, and the basic character image is generated according to the movement of the user's eyes and mouth. Thus, data relating to a character image whose expression changes (deforms) is set. At this time, the character image corresponding to the change of the facial expression is set as the type of the pattern corresponding to the command signal code preset and stored in the format storage unit 44 is set. Then, the character image corresponding to the change of the expression is actually set as the deformation amount with respect to the basic character image.

【０１０５】なお、上記フォーマット記憶部４４に記憶
されている指令信号コードについては後に詳述する。The command signal code stored in the format storage unit 44 will be described later in detail.

【０１０６】また、本実施形態においては、上記キャラ
クター画像の生成および該キャラクター画像を変形させ
たキャラクター画像のデータ入力は、上述したようにコ
ントローラパッド３をもちいて行ったが、これに限ら
ず、たとえば、電子カメラ、スキャナ等で採取した任意
の画像データ（使用者本人の顔等の実写を含む）であっ
ても良い。In the present embodiment, the generation of the character image and the data input of the character image obtained by deforming the character image are performed using the controller pad 3 as described above. However, the present invention is not limited to this. For example, arbitrary image data (including a real photograph of the user's face, etc.) collected by an electronic camera, a scanner, or the like may be used.

【０１０７】また、通信前の準備段階においては、上記
目の動きコード変換器５１あるいは口の動きコード変換
器５０おいて、使用者（送り手側話者）がＨＭＤ１で検
出した送り手側話者の目、口の動きに基づく表情変化の
基準値を設定するようになっている。In the preparatory stage before the communication, in the eye motion code converter 51 or the mouth motion code converter 50, the sender (speaker) detects the sender side speech detected by the HMD1. The reference value of the facial expression change based on the movement of the eyes and mouth of the person is set.

【０１０８】なお、基準値とは、当該話者の表情の変化
の程度に応じて当該指令信号コードを出力するか否かを
判定する際のしきい値を意味する。Note that the reference value means a threshold value for determining whether or not to output the command signal code according to the degree of change in the expression of the speaker.

【０１０９】すなわち、上述したようにＨＭＤ１には、
上記左右の視線検出器１５，１２およびマイク１９（図
３参照）が設けられている。そして、上記左右の視線検
出器１５，１２で送り手側話者の目の動きを、マイク１
９で同口の動きを検出し、これらの検出結果が図４中、
口の動きコード変換器５０または目の動きコード変換器
５１に送出されるようになっている。なお、詳しくは後
述する。That is, as described above, the HMD 1
The left and right eye gaze detectors 15 and 12 and the microphone 19 (see FIG. 3) are provided. Then, the left and right eye gaze detectors 15 and 12 use the microphone 1 to detect the eye movement of the sender speaker.
9, the movement of the mouth is detected, and the detection results are shown in FIG.
It is sent to the mouth motion code converter 50 or the eye motion code converter 51. The details will be described later.

【０１１０】なお、上記ＨＭＤ１は、上記表情変化の基
準値を設定する際に用いる検出手段であるとともに、対
話を行う際に送り手側話者の各表情変化（目、口、頭の
動き）および音声信号を互いに同期した所定のタイミン
グで検出して送出する検出手段としての役目も果たすよ
うになっている。The HMD 1 is a detecting means used for setting the reference value of the facial expression change, and also changes each facial expression (eye, mouth, head movement) of the sender speaker when performing a dialogue. Also, it also functions as a detecting means for detecting and transmitting the audio signal at a predetermined timing synchronized with each other.

【０１１１】上記キャラクターデータ生成装置４３の出
力端は、一方で、上述したようにキャラクター画像生成
装置３９に接続され、さらに該キャラクター画像生成装
置３９を介してＨＭＤ１の表示部（右接眼光学系１３，
１６等の光学系）に接続されている。これにより、使用
者は、ＨＭＤ１を装着することで上記キャラクター画像
の生成作業をモニタしながら行うことができるようにな
っている。The output end of the character data generation device 43 is connected to the character image generation device 39 as described above, and is further connected via the character image generation device 39 to the display unit of the HMD 1 (the right eyepiece optical system 13). ,
16 optical systems). Thus, the user can perform the operation of generating the character image while wearing the HMD 1 while monitoring the character image.

【０１１２】また、上記キャラクターデータ生成装置４
３の出力端は、他方で、キャラクターデータ記憶装置３
６に接続されている。このキャラクターデータ記憶装置
３６は、該キャラクターデータ生成装置４３で生成した
送り手側話者の基本キャラクター画像のデータを記憶す
ると共に、上記フォーマット記憶部４４に記憶された指
令信号コードに対応する、上記基本キャラクター画像に
所定の変形を施したキャラクター画像のデータ（実際
は、該基本キャラクター画像に対する変形量のデータ）
を記憶するようになっている。The character data generating device 4
3 has, on the other hand, a character data storage device 3;
6 is connected. The character data storage device 36 stores the data of the basic character image of the sender speaker generated by the character data generation device 43, and also corresponds to the command signal code stored in the format storage unit 44. Character image data obtained by applying a predetermined deformation to the basic character image (actually, data of the amount of deformation with respect to the basic character image)
Is stored.

【０１１３】また、キャラクターデータ記憶装置３６に
は上記キャラクターデータ送信器３１が接続されてお
り、通信の初期段階において、上記キャラクターデータ
記憶装置３６に記憶された送り手側話者の基本キャラク
ター画像、変形キャラクター画像に係るキャラクターデ
ータがキャラクターデータ送信器３１より相手受け手側
装置のキャラクターデータ受信器３２に向けて送信され
るようになっている。Further, the character data transmitter 31 is connected to the character data storage device 36. In the initial stage of communication, the basic character image of the sender speaker stored in the character data storage device 36 is stored. Character data relating to the deformed character image is transmitted from the character data transmitter 31 to the character data receiver 32 of the partner receiver device.

【０１１４】一方、相手受け手側装置で生成された受け
手側話者の基本キャラクター画像、変形キャラクター画
像に係るキャラクターデータを受信するキャラクターデ
ータ受信器３２の出力端には、該相手受け手側装置で生
成された上記各種キャラクターデータを記憶するキャラ
クターデータ記憶装置３７が接続されており、通信初期
段階において相手受け手側話者の各種キャラクターデー
タを一旦記憶するようになっている。On the other hand, the output terminal of the character data receiver 32 which receives the character data relating to the basic character image and the deformed character image of the receiver speaker generated by the other receiver device is provided at the output terminal. A character data storage device 37 for storing the above-mentioned various character data is connected to temporarily store various character data of the speaker on the partner receiver side at the initial stage of communication.

【０１１５】以上が、本実施形態の画像音声通信システ
ムにおいて、通信前の準備段階あるいは通信の初期段階
で主に用いられる構成要素について説明した。次に、通
信が開始された後の対話段階に主に使用される構成要素
について説明する。The above has described the components mainly used in the preparation stage before communication or the initial stage of communication in the audiovisual communication system of the present embodiment. Next, components mainly used in the dialogue stage after the communication is started will be described.

【０１１６】本実施形態の画像音声通信システムは、通
信が開始され実際に対話が始まると、上記ＨＭＤ１によ
り送り手側話者の表情の変化を所定のタイミングで検出
し、この表情の変化を所定のコードに変換して送出する
ようになっている。以下、この表情の変化に応じて送出
される所定のコード変換器等について説明する。In the video and audio communication system according to the present embodiment, when the communication is started and the dialogue actually starts, the HMD 1 detects a change in the expression of the sender's speaker at a predetermined timing, and detects the change in the expression. The code is converted and transmitted. Hereinafter, a predetermined code converter and the like transmitted in response to the change of the expression will be described.

【０１１７】上記ＨＭＤ１における上記右視線検出器１
２，左視線検出器１５の出力端は、目の動きコード変換
器５１に接続され、該目の動きコード変換器５１の出力
端はさらに目の動きコード送信器４５に接続されてい
る。また、該ＨＭＤ１におけるヘッドモーションセンサ
１１の出力端は、頭の動きコード変換器５２に接続さ
れ、該頭の動きコード変換器５２の出力端はさらに頭の
動きコード送信器４６に接続されている。さらに、マイ
ク１９の出力端は、口の動きコード変換器５０および音
声信号送信器４８に接続され、上記口の動きコード変換
器５０の出力端はさらに口の動きコード送信器４７に接
続されている。The right eye gaze detector 1 in the HMD 1
2. The output end of the left eye gaze detector 15 is connected to the eye motion code converter 51, and the output end of the eye motion code converter 51 is further connected to the eye motion code transmitter 45. The output end of the head motion sensor 11 of the HMD 1 is connected to a head motion code converter 52, and the output end of the head motion code converter 52 is further connected to a head motion code transmitter 46. . Further, the output end of the microphone 19 is connected to a mouth motion code converter 50 and an audio signal transmitter 48, and the output end of the mouth motion code converter 50 is further connected to a mouth motion code transmitter 47. I have.

【０１１８】通信の対話時において、上記目の動きコー
ド変換器５１は、上記ＨＭＤ１の左右の視線検出器１
５，１２で検出した視線データと上記基準値とに基づい
て、所定の条件を満たした場合に送り手側話者の目の動
きコードに変換し、目の動きコード送信器４５より相手
受け手側話者に向けて送出するようになっている。During a communication conversation, the eye motion code converter 51 is connected to the left and right eye gaze detectors 1 of the HMD 1.
Based on the line-of-sight data detected in steps 5 and 12 and the above-mentioned reference value, when a predetermined condition is satisfied, it is converted into the eye movement code of the sender's speaker, and the eye movement code transmitter 45 sends the eye movement code to the other party's eye. The message is sent to the speaker.

【０１１９】なお、頭の動きに関する基準値は予め工場
出荷時に頭の動きコード変換器５２に記憶されている。The reference value relating to the head movement is stored in advance in the head movement code converter 52 at the time of shipment from the factory.

【０１２０】また、頭の動きコード変換器５２は、ＨＭ
Ｄ１のヘッドモーションセンサ１１で検出した頭の動き
データを受けて該データを送り手側話者の頭の動きコー
ドに変換し、頭の動きコード送信器４６より相手受け手
側話者に送出するようになっている。Also, the head motion code converter 52 uses the HM
The head movement data detected by the head motion sensor 11 of D1 is received, the data is converted into a head movement code of the sender's speaker, and the head movement code transmitter 46 sends the data to the other receiver's speaker. It has become.

【０１２１】さらに、口の動きコード変換器５０は、上
記ＨＭＤ１のマイク１９で採取した音声信号に基づい
て、所定の条件を満たした場合に送り手側話者の口の動
きコードに変換し、口の動きコード送信器４７より相手
側に送出するようになっている。一方、マイク１９の音
声信号は音声信号送信器４８より相手受け手側話者に音
声信号として伝送される。また、受け手側話者において
は、音声信号は音声信号受信器４９で受信し、ＨＭＤ１
のスピーカ２０で該音声を再生するようになっている。Further, based on the audio signal collected by the microphone 19 of the HMD 1, the mouth motion code converter 50 converts the speech signal into the mouth motion code of the sender speaker when a predetermined condition is satisfied. The mouth motion code transmitter 47 sends the message to the other party. On the other hand, the audio signal of the microphone 19 is transmitted as an audio signal from the audio signal transmitter 48 to the talker on the other side. On the other hand, the voice signal is received by the voice signal receiver 49 and the HMD 1
The sound is reproduced by the speaker 20 of the.

【０１２２】次に、通信の対話段階において、相手側の
装置から伝送される各種キャラクターデータの受信器等
について説明する。Next, a description will be given of a receiver for various character data transmitted from a device on the other side in a communication dialogue stage.

【０１２３】通信の対話段階において、目の動きコード
送信器４５，頭の動きコード送信器４６，口の動きコー
ド送信器４７，音声信号送信器４８から送信される送り
手側話者の各種データは、それぞれ、目の動きコード受
信器３３，口の動きコード受信器３４，頭の動きコード
受信器３５，音声信号受信器４９で受信するようになっ
ている。In the dialogue stage of communication, various data of the sender speaker transmitted from the eye motion code transmitter 45, the head motion code transmitter 46, the mouth motion code transmitter 47, and the audio signal transmitter 48. Are received by the eye motion code receiver 33, the mouth motion code receiver 34, the head motion code receiver 35, and the audio signal receiver 49, respectively.

【０１２４】上記目の動きコード受信器３３，口の動き
コード受信器３４は何れも相手受け手側装置から送出さ
れる目の動きコード，口の動きコードを受信する受信器
であり、これら目の動きコード受信器３３、口の動きコ
ード受信器３４と上記キャラクターデータ記憶装置３７
の出力端は何れもキャラクターデータ加工装置３８に接
続されている。Each of the eye motion code receiver 33 and the mouth motion code receiver 34 is a receiver for receiving the eye motion code and the mouth motion code transmitted from the partner receiving apparatus. The motion code receiver 33, the mouth motion code receiver 34 and the character data storage device 37
Are connected to the character data processing device 38.

【０１２５】上記キャラクターデータ加工装置３８は、
受信した受け手側話者の目の動きコード、口の動きコー
ドに基づいてキャラクターデータ記憶装置３７に記憶さ
れたキャラクターデータのうち“目の動き”および“口
の動き”を加工し、該加工結果をキャラクター画像生成
装置３９に対して出力するようになっている。The character data processing device 38
The “eye movement” and the “mouth movement” of the character data stored in the character data storage device 37 are processed based on the received eye movement code and mouth movement code of the receiver speaker. To the character image generation device 39.

【０１２６】上記キャラクター画像生成装置３９では、
上記キャラクターデータ加工装置３８で加工された相手
側のキャラクターデータに基づいて最終的な相手側のキ
ャラクター画像を生成し、画像変形部４１に対して出力
するようになっている。なお、詳細は後述する。In the character image generating device 39,
A final character image of the other party is generated based on the character data of the other party processed by the character data processing device 38, and is output to the image transformation unit 41. The details will be described later.

【０１２７】また、上記頭の動きコード受信器３５は、
相手受け手側装置から送出される頭の動きコードを受信
する受信器であり、この頭の動きコード受信器３５の出
力端は画像変形量演算部４０を経て画像変形部４１に接
続されている。上記画像変形量演算部４０においては上
記相手受け手側装置からの頭の動きコードに基づいて画
像をどれだけ変形させるかを演算するようになってお
り、この演算結果に基づいて次段の画像変形部４１にお
いて、上記キャラクター画像生成装置３９で生成された
相手側のキャラクター画像を変形させるようになってい
る。すなわち、頭の動きコード受信器３５で受信した相
手受け手側話者の頭の動きに応じてキャラクター画像生
成装置３９で最終的に生成した相手側のキャラクター画
像を変形させるようになっている。なお、上記画像変形
量演算部４０および画像変形部４１の作用については、
後に詳述する。Further, the head motion code receiver 35 is
This is a receiver for receiving the head movement code transmitted from the partner receiver side device. The output terminal of the head movement code receiver 35 is connected to the image deformation unit 41 via the image deformation amount calculation unit 40. The image deformation amount calculation unit 40 calculates how much the image is to be deformed based on the head movement code from the partner receiver side device. Based on the calculation result, the next-stage image deformation In the section 41, the character image of the other party generated by the character image generating device 39 is deformed. In other words, the character image of the other party finally generated by the character image generation device 39 is deformed in accordance with the head movement of the speaker on the partner receiver side received by the head movement code receiver 35. The operation of the image deformation amount calculation unit 40 and the image deformation unit 41 is described below.
Details will be described later.

【０１２８】上記画像変形部４１の出力は座標変換部４
２に接続されており、該画像変形部４１で変形処理が施
された相手側のキャラクター画像は座標変換部４２で座
標変換処理が施され、モニタする側のＨＭＤ１に送出さ
れるようになっている。このとき、上記座標変換部４２
における座標変換は、送り手側話者がモニタする画面に
おいては、送り手側話者の頭の動きに応じてモニタ画面
上に映し出されている相手のキャラクター画像の変換ベ
クトルが決定されるようになっている。なお、この座標
変換部４２の作用については後に詳述する。The output of the image transformation section 41 is output from the coordinate transformation section 4.
2, the character image of the other party subjected to the transformation processing by the image transformation unit 41 is subjected to coordinate transformation processing by the coordinate transformation unit 42, and is sent to the HMD 1 on the monitoring side. I have. At this time, the coordinate conversion unit 42
In the coordinate conversion in the above, on the screen monitored by the sender speaker, the conversion vector of the character image of the other party projected on the monitor screen is determined according to the movement of the head of the sender speaker. Has become. The operation of the coordinate conversion unit 42 will be described later in detail.

【０１２９】一方、音声信号受信器４９は、相手受け手
側装置からの音声信号を受信する受信器であり、受信さ
れた受け手側話者の音声信号は、ＨＭＤ１のスピーカ２
０に送出され、再生されるようになっている。On the other hand, the audio signal receiver 49 is a receiver for receiving an audio signal from the other party's receiver side device, and receives the received speaker's audio signal from the speaker 2 of the HMD 1.
0 and reproduced.

【０１３０】なお、本実施形態においては、上記マイク
１９は、ＨＭＤ１を装着した際に使用者の口部前方に位
置するように配設しているが、これに限らず、たとえ
ば、図６に示すようにＨＭＤ１の光学系の近傍に配設
（図中、符号１９Ａ）しても良い。これにより、ＨＭＤ
１をより簡素に構成することができる。In the present embodiment, the microphone 19 is disposed so as to be located in front of the mouth of the user when the HMD 1 is mounted. However, the present invention is not limited to this. For example, FIG. As shown, it may be disposed near the optical system of the HMD 1 (19A in the figure). Thereby, HMD
1 can be configured more simply.

【０１３１】以上、本第１の実施形態である画像音声通
信システムにおいて、各話者が使用する画像音声通信装
置の構成について説明した。The configuration of the video and audio communication device used by each speaker in the video and audio communication system according to the first embodiment has been described.

【０１３２】次に、本画像音声通信システムの作用につ
いて説明する。まず、具体的な作用の説明に先だって、
本画像音声通信システムを実際に使用する際の使用状況
を図５に示す。Next, the operation of the video and audio communication system will be described. First, before explaining the concrete action,
FIG. 5 shows a usage situation when the video / audio communication system is actually used.

【０１３３】前述したように、互いにそれぞれＨＭＤを
装着した状態で対話を行う場合、前述したように一方の
話者のモニタ画面には他方の話者のキャラクター画像が
表示されている。すなわち、いま、一方の話者を使用者
甲、他方の話者を使用者乙とし、使用者甲のキャラクタ
ー画像を図中、作成画像ＩＩ、使用者乙のキャラクター
画像を図中、作成画像Ｉとすると、それぞれ装着したＨ
ＭＤ１のモニタ画面（図中、座標面Ｉ、座標面ＩＩで示
す）には、図示の如く、相手側話者のキャラクター画像
が表示されている。As described above, when a conversation is performed with the HMDs mounted on each other, the character image of the other speaker is displayed on the monitor screen of one speaker as described above. That is, now, one speaker is the user A and the other speaker is the user B. The character image of the user A is shown in the drawing, the created image II, the character image of the user B is shown in the drawing, and the created image I is shown. Then, each attached H
On the monitor screen of the MD 1 (indicated by a coordinate plane I and a coordinate plane II in the figure), a character image of a partner speaker is displayed as shown in the figure.

【０１３４】本第１の実施形態の画像音声通信システム
においては、通信を行う前段階として、各話者がその使
用する画像音声通信装置に自身のキャラクター画像の設
定等、所定の設定を行うようになっている。以下、この
通信前段階の作用について説明する。In the video and audio communication system according to the first embodiment, as a pre-communication stage, each speaker performs predetermined settings such as setting of his / her own character image in the video and audio communication device used by the speaker. It has become. Hereinafter, the operation in the pre-communication stage will be described.

【０１３５】まず、通信前段階の作業として、(1) そ
れぞれ自身の基本となるキャラクター画像を生成する。
(2) 上記基本キャラクター画像に対して、所定の指令
信号コードに対応する表情変化（目、口の動き）を付け
た新たなキャラクター画像に関するデータ（該基本キャ
ラクター画像に対してどれだけ変形するかのデータ、以
下、変形キャラクター画像データとする）を生成する。
(3) 自身の目、口の動きを検出し、所定の指令信号コ
ードに対応する表情変化の基準値（しきい値）を設定す
る等を行うようになっている。First, as a pre-communication work, (1) a character image which is a basic character of each is generated.
(2) Data relating to a new character image obtained by adding a facial expression change (eye, mouth movement) corresponding to a predetermined instruction signal code to the basic character image (how much the basic character image is deformed) (Hereinafter referred to as transformed character image data).
(3) The movement of the eyes and the mouth is detected, and a reference value (threshold) of a facial expression change corresponding to a predetermined command signal code is set.

【０１３６】これらの作業について図７ないし図１５を
参照して説明する。本第１の実施形態の画像音声通信シ
ステムでは、上述したように送り手側話者で生成するキ
ャラクターデータは上記映像生成ボックス２内のキャラ
クターデータ生成装置４３においてフォーマット記憶部
４４に記憶された専用の作成ソフトを用いて作成するよ
うになっている。These operations will be described with reference to FIGS. In the video and audio communication system according to the first embodiment, as described above, the character data generated by the sender speaker is stored in the format storage unit 44 in the character data generation device 43 in the video generation box 2. It is designed to be created using the creation software.

【０１３７】まず、使用者はＨＭＤ１を装着し、所定の
操作でキャラクターデータ作成モードに設定する。な
お、この操作は上記コントローラパッド３のキャラクタ
ーデータ生成用コントローラ６１、ダイヤルボタン６２
等を用いて行われる。このキャラクターデータ作成モー
ドに設定されるとキャラクターデータ生成装置４３にお
いて作成ソフトが起動する。このとき、ＨＭＤ１のモニ
タ画面上には図７に示すようなキャラクターデータ作成
ソフトの画面が掲示される。以下、図８，図９，図１０
に示すフローチャートを参照して説明する。First, the user wears the HMD 1 and sets the character data creation mode by a predetermined operation. This operation is performed by the character data generation controller 61 of the controller pad 3 and the dial button 62.
And so on. When the character data creation mode is set, creation software is activated in the character data creation device 43. At this time, a screen of character data creation software is displayed on the monitor screen of the HMD 1 as shown in FIG. Hereinafter, FIGS. 8, 9, and 10
This will be described with reference to the flowchart shown in FIG.

【０１３８】まず、キャラクターデータ作成ソフトが起
動すると、キャラクター画像の基本図の作成を行う（ス
テップＳ１）。このとき、使用者（送り手側話者）はコ
ントローラパッド３のキャラクターデータ生成用コント
ローラ６１、ダイヤルボタン６２等を操作し、ＨＭＤ１
のモニタ画面上に展開される図７に示すような作図画面
をモニタしながら、自身のキャラクター画像を作成す
る。First, when the character data creation software is started, a basic diagram of a character image is created (step S1). At this time, the user (speaker on the sender side) operates the controller 61 for character data generation, the dial button 62, and the like of the controller pad 3, and the HMD 1
The user creates his own character image while monitoring the drawing screen as shown in FIG.

【０１３９】いま、使用者（送り手側話者）が自身のキ
ャラクター画像を、たとえば図１１に示すような猫の顔
に設定したとする。このとき、該キャラクター画像の基
本図のキャラクターデータとしては、「顔の輪郭」を示
す大きな円１（半径、中心座標および色彩が設定され
る）と、「目」を示す円３（上記同様半径、中心座標お
よび色彩が設定される）と、「瞳」を示す円２（同、半
径、中心座標および色彩が設定される）と、「口」を示
す線（長さ、中心座標および色彩が設定される）等が設
定される。Now, it is assumed that the user (sender) sets his or her own character image to, for example, the face of a cat as shown in FIG. At this time, as the character data of the basic drawing of the character image, a large circle 1 indicating the “face outline” (a radius, a center coordinate and a color are set) and a circle 3 indicating the “eyes” (radius as described above) , Center coordinates and color are set), a circle 2 indicating “pupil” (same, radius, center coordinates and color are set), and a line indicating “mouth” (length, center coordinates and color are set) Is set).

【０１４０】上記キャラクター画像が完成すると次に該
キャラクター画像のキャラクターデータをキャラクター
データ記憶装置３６に記憶する（ステップＳ２）。この
後、基本図として記憶された該キャラクター画像を所定
の条件に従い加工する（ステップＳ３）。以下、この加
工ルーチンについて図９に示すフローチャートおよび図
１１〜図１４を参照して説明する。When the character image is completed, the character data of the character image is stored in the character data storage device 36 (step S2). Thereafter, the character image stored as the basic diagram is processed according to a predetermined condition (step S3). Hereinafter, this machining routine will be described with reference to the flowchart shown in FIG. 9 and FIGS.

【０１４１】図１１に示す猫の顔図を基本図とすると、
まず、この基本図を加工して視線を左に動かした図を作
成する（ステップＳ１１）。具体的には、使用者はコン
トローラパッド３を用いて図１２に示すように、「瞳」
を示す上記円２の中心座標データを変更し、図に示すよ
うに基本図（図１１）に対して視線が左に動いた表情を
作成する。次に、上記ステップＳ１１において加工した
図において、基本図に対して加工した量（すなわち、瞳
の中心座標の移動量）をコード“ＥＬ”と共に記憶する
（ステップＳ１２）。Assuming that the cat's face shown in FIG. 11 is the basic diagram,
First, the basic diagram is processed to create a diagram in which the line of sight is moved to the left (step S11). Specifically, the user uses the controller pad 3 to change the “pupil” as shown in FIG.
Is changed, and an expression in which the line of sight moves to the left with respect to the basic diagram (FIG. 11) is created as shown in the figure. Next, in the diagram processed in step S11, the amount processed with respect to the basic diagram (that is, the moving amount of the center coordinate of the pupil) is stored together with the code "EL" (step S12).

【０１４２】次に、上記基本図を加工して視線を右に動
かした図を作成する（ステップＳ１３）。この場合も上
記ステップＳ１１と同様に、使用者はコントローラパッ
ド３を用いて「瞳」を示す上記円２の中心座標データを
変更し、基本図（図１１）に対して視線が右に動いた表
情を作成する。次に、上記ステップＳ１２と同様に、上
記ステップＳ１３において加工した加工量をコード“Ｅ
Ｒ”として記憶する（ステップＳ１４）。Next, the above basic diagram is processed to create a diagram in which the line of sight is moved to the right (step S13). Also in this case, as in step S11, the user changes the center coordinate data of the circle 2 indicating the "pupil" using the controller pad 3, and the line of sight has moved rightward with respect to the basic diagram (FIG. 11). Create facial expressions. Next, similarly to the step S12, the processing amount processed in the step S13 is represented by a code “E”.
R "(step S14).

【０１４３】次に、上記基本図を加工して目を閉じた場
合の図を作成する（ステップＳ１５）。具体的には、使
用者はコントローラパッド３を用いて図１３に示すよう
に、「瞳」を示す上記円２と、「目」を示す円３のうち
片方の円のデータを変更して、図示の如く基本図（図１
１）に対して目を閉じた表情を作成する。次に、上記ス
テップＳ１５において加工した図において、基本図に対
して加工した量をコード“ＥＣ”と共に記憶する（ステ
ップＳ１６）。Next, the basic diagram is processed to create a diagram in the case where the eyes are closed (step S15). Specifically, the user changes the data of the circle 2 indicating “pupil” and the circle 3 indicating “eye” using the controller pad 3 as shown in FIG. As shown, the basic diagram (Fig. 1
Create an expression with closed eyes for 1). Next, in the diagram processed in step S15, the amount processed for the basic diagram is stored together with the code "EC" (step S16).

【０１４４】次に、上記基本図を加工して口を動かした
場合、すなわち、何等かの音声を発したと考えられる場
合の図を作成する（ステップＳ１７）。具体的には、使
用者はコントローラパッド３を用いて図１４に示すよう
に、「口」を示す上記線のデータを変更して、図示の如
く基本図（図１１）に対して口を動かした表情を作成す
る。次に、上記ステップＳ１７において加工した図にお
いて、基本図に対して加工した量をコード“Ｍ”と共に
記憶し（ステップＳ１８）、メインルーチンに戻る。Next, a diagram is created in the case where the mouth is moved by processing the basic diagram, that is, when it is considered that some kind of sound is produced (step S17). Specifically, as shown in FIG. 14, the user changes the data of the line indicating "mouth" using the controller pad 3, and moves the mouth with respect to the basic diagram (FIG. 11) as shown. Create a facial expression. Next, in the diagram processed in step S17, the amount processed for the basic diagram is stored together with the code "M" (step S18), and the process returns to the main routine.

【０１４５】図８に戻って、次に、上記基本図に対して
加工された「目の動き」、「口の動き」に対する対応関
係を定める操作を行う（ステップＳ４）。Returning to FIG. 8, next, an operation for determining the correspondence between the “eye movement” and the “mouth movement” processed with respect to the basic diagram is performed (step S4).

【０１４６】以下、この対応関係の設定操作について図
１０に示すフローチャートを参照して説明する。Hereinafter, the setting operation of the correspondence will be described with reference to the flowchart shown in FIG.

【０１４７】使用者（送り手側話者）は上記キャラクタ
ー画像の基本図の作成、基本図の加工に引き続いてＨＭ
Ｄ１を装着する。そして、上記ステップＳ１１〜Ｓ１８
（図９参照）において「目の動き」、「口の動き」に対
応して加工した各キャラクター画像に、実際に使用者の
「目の動き」、「口の動き」を対応するべく各種検出を
行う。The user (sender on the sender side) creates the basic diagram of the character image, processes the basic diagram,
Attach D1. Then, the above steps S11 to S18
In FIG. 9, various detections are performed to correspond to the “eye movement” and “mouth movement” of the user for each character image processed corresponding to “eye movement” and “mouth movement”. I do.

【０１４８】まず、使用者の目の動きを検出する。すな
わち、まず、使用者が視線を左に動かした場合の視線の
動きを検出する（ステップＳ２１）。ここで、この視線
検出機構に関して図１５および図１６〜図１９を参照し
て説明する。First, the movement of the user's eyes is detected. That is, first, the movement of the line of sight when the user moves the line of sight to the left is detected (step S21). Here, this gaze detection mechanism will be described with reference to FIG. 15 and FIGS.

【０１４９】図１５は、上記視線検出機構とその周辺部
を示した説明図である。なお、上記図３，図４に示した
構成要素と同様の構成要素には同一の符号を付与して示
している。FIG. 15 is an explanatory diagram showing the visual axis detection mechanism and its peripheral parts. Note that the same components as those shown in FIGS. 3 and 4 are denoted by the same reference numerals.

【０１５０】上記左右の接眼光学系１３，１６はハーフ
ミラー面２６を有するプリズムを形成しており、使用者
がＨＭＤ１を装着した際に使用者の眼球２８の前面に配
置されるようになっている。また、上記接眼光学系１
３，１６の底面は符号２７で示すように凹面ミラーとな
っている。さらに、上記接眼光学系１３，１６のさらに
前方には、上記眼球２８に向けて赤外線を照射する赤外
線光源２２と、眼球２８で反射した該赤外線光を検出す
る左右の視線検出器１２，１５とが配設されている。The right and left eyepiece optical systems 13 and 16 form a prism having a half mirror surface 26, and are arranged in front of the user's eyeball 28 when the user wears the HMD 1. I have. The eyepiece optical system 1
The bottom surfaces of 3 and 16 are concave mirrors as indicated by reference numeral 27. Further, further in front of the eyepiece optical systems 13 and 16, an infrared light source 22 that irradiates infrared rays toward the eyeball 28, and left and right eye-gaze detectors 12 and 15 that detect the infrared light reflected by the eyeball 28. Are arranged.

【０１５１】上記左右の視線検出器１２，１５の何れも
ＣＣＤ２９、検出回路３０を備え、上記赤外線光源２２
から照射された赤外線光で照らされた眼球面、すなわ
ち、黒目（瞳孔位置）を検出するようになっている。こ
のとき、上記眼球面の像は凹面ミラー２７で拡大されて
ＣＣＤ２９に入射するようになっており、この後、該Ｃ
ＣＤ２９に入射した眼球２８の像は次段の検出回路３０
に入力される。そして、この検出回路３０において使用
者の視線方向とまばたきが検出されるようになってい
る。Each of the left and right line-of-sight detectors 12 and 15 includes a CCD 29 and a detection circuit 30.
The eye sphere illuminated by the infrared light emitted from the camera, that is, the iris (pupil position) is detected. At this time, the image of the eye sphere is magnified by the concave mirror 27 and is incident on the CCD 29.
The image of the eyeball 28 incident on the CD 29 is detected by the detection circuit 30 at the next stage.
Is input to The detection circuit 30 detects the direction of the user's line of sight and blinking.

【０１５２】なお、上記ＣＣＤ２９，検出回路３０によ
る検出精度は高いものである必要はなく、水平方向の分
解能は５°位が確保できるようなものでよい。Note that the detection accuracy of the CCD 29 and the detection circuit 30 does not need to be high, and the resolution in the horizontal direction may be as high as about 5 °.

【０１５３】さて、上記検出した使用者の視線方向の動
き、まばたき（目を閉じたことを意味する）と、上記ス
テップＳ１１〜ステップＳ１６で加工・記憶した各キャ
ラクター画像との対応は、本実施形態の画像音声通信シ
ステムでは以下に示すように行う。The correspondence between the detected movement of the user's line of sight and the blinking (meaning that the eyes are closed) and the character images processed and stored in steps S11 to S16 are described in this embodiment. In the video and audio communication system according to the embodiment, the processing is performed as follows.

【０１５４】まず、上記検出回路３０で検知されるＣＣ
Ｄ２９の暗電流を基準として、このときの電圧０ｍＶを
基準電位値とする。そして、たとえば、使用者の瞳の位
置がほぼ中心に位置しているときには上記基準電位に対
して＋２０ｍＶの電圧信号が出力されるように設定する
（図１６参照）。そして、この＋２０ｍＶの電圧信号を
境に、瞳が左に移動したとき、すなわち視線が左に移動
したときには基準電位に対して＋３０ｍＶの電圧信号
が、右に移動したときは同＋１０ｍＶの電圧信号がそれ
ぞれ出力されるように設定する（図１７，図１８参
照）。また、まばたきをして目を閉じたときには、上記
基準電位０ｍＶが出力されるように設定する（図１９参
照）。First, the CC detected by the detection circuit 30
Based on the dark current of D29, the voltage at this time, 0 mV, is set as a reference potential value. Then, for example, when the position of the user's pupil is located substantially at the center, a setting is made such that a voltage signal of +20 mV with respect to the reference potential is output (see FIG. 16). When the pupil moves to the left from the +20 mV voltage signal, that is, when the line of sight moves to the left, a voltage signal of +30 mV with respect to the reference potential, and when the pupil moves to the right, a voltage signal of +10 mV increases. Settings are made so that each is output (see FIGS. 17 and 18). When the eyes are closed by blinking, the reference potential is set to be 0 mV (see FIG. 19).

【０１５５】図１０に戻って、上記ステップＳ２１で、
使用者が視線を左に動かした場合の視線の動きを検出す
ると、このとき、上述したように上記検出回路３０から
は基準電位に対して＋３０ｍＶの電圧信号が出力される
（図１８参照、ステップＳ２２）。そして、図示はしな
いがこのときの電圧信号値＋３０ｍＶが上記図１２のよ
うに視線を左に動かした図に対応するコード“ＥＬ”の
基準値として映像生成ボックス２内の目の動きコード変
換器５１に記憶される（ステップＳ２３）。Returning to FIG. 10, in the above step S21,
When the movement of the line of sight when the user moves the line of sight to the left is detected, at this time, a voltage signal of +30 mV with respect to the reference potential is output from the detection circuit 30 as described above (see FIG. 18, step S22). Although not shown, the voltage signal value at this time +30 mV is used as a reference value of the code "EL" corresponding to the figure in which the line of sight is moved to the left as shown in FIG. 51 (step S23).

【０１５６】次に、使用者は視線を右に動かし、上記視
線検出器１２，１５でこの視線の動きを検出すると（ス
テップＳ２４）、このとき、上述したように上記検出回
路３０からは基準電位に対して＋１０ｍＶの電圧信号が
出力される（図１７参照、ステップＳ２５）。そして、
上記同様にこのときの電圧信号値＋１０ｍＶが視線を右
に動かした図に対応するコード“ＥＲ”の基準値として
映像生成ボックス２内の目の動きコード変換器５１に記
憶される（ステップＳ２６）。Next, the user moves his line of sight to the right and detects the movement of his line of sight with the line of sight detectors 12 and 15 (step S24). At this time, the detection circuit 30 outputs the reference potential as described above. , A voltage signal of +10 mV is output (see FIG. 17, step S25). And
Similarly to the above, the voltage signal value +10 mV at this time is stored in the eye motion code converter 51 in the video generation box 2 as a reference value of the code “ER” corresponding to the figure in which the line of sight is moved to the right (step S26). .

【０１５７】次に、使用者は目を閉じ、上記視線検出器
１２，１５でこの目が閉じられたことを検出すると（ス
テップＳ２７）、このとき、上述したように上記検出回
路３０からは基準電位に対して＋０ｍＶの電圧信号が出
力される（図１９参照、ステップＳ２８）。そして、上
記同様にこのときの電圧信号値＋０ｍＶが上記図１３に
示す如く目を閉じたした図に対応するコード“ＥＣ”の
基準値として映像生成ボックス２内の目の動きコード変
換器５１に記憶される（ステップＳ２９）。Next, the user closes his / her eyes, and if the eyes are detected by the line-of-sight detectors 12 and 15 (step S27), at this time, the detection circuit 30 outputs a reference signal as described above. A voltage signal of +0 mV with respect to the potential is output (see FIG. 19, step S28). Then, similarly to the above, the voltage signal value +0 mV at this time is used as the reference value of the code “EC” corresponding to the figure with the eyes closed as shown in FIG. It is stored (step S29).

【０１５８】次に、使用者の口の動きを検出する。すな
わち、使用者が音声を発したか否かの検出を行う（ステ
ップＳ３０）。ここで、この音声検出機構に関して図２
０を参照して説明する。Next, the movement of the mouth of the user is detected. That is, it is detected whether or not the user has made a voice (step S30). Here, FIG.
0 will be described.

【０１５９】図２０は、上記音声検出機構とその周辺部
を示した説明図である。なお、上記図３，図４に示した
構成要素と同様の構成要素には同一の符号を付与して示
している。また、図中、符号３０１，３０２は、それぞ
れ送り手側装置，受け手側装置を示しており、その構成
要素は同等である。FIG. 20 is an explanatory diagram showing the voice detection mechanism and its peripheral parts. Note that the same components as those shown in FIGS. 3 and 4 are denoted by the same reference numerals. Further, in the drawing, reference numerals 301 and 302 indicate a sender device and a receiver device, respectively, and their constituent elements are equivalent.

【０１６０】送り手側話者の装置３０１において、通信
前の所定の初期設定を行う準備段階の際、該マイク１９
からの音声信号は上記口の動きコード変換器５０に対し
て送出されるようになっており、一方、通信対話段階に
おいては該マイク１９で採取した音声信号は音声信号送
信器４８に対して送出され受け手側の装置３０２に対し
て伝送される一方、会話中の使用者の口の動きが検出さ
れ映像生成ボックス２の口の動きコード変換器５０に対
して該情報が送出されるようになっている。At the preparation stage for performing predetermined initial settings before communication in the apparatus 301 of the sender speaker, the microphone 19
Is transmitted to the mouth motion code converter 50, while the voice signal collected by the microphone 19 is transmitted to the voice signal transmitter 48 in the communication dialogue stage. While being transmitted to the receiver device 302, the mouth movement of the user during conversation is detected and the information is sent to the mouth movement code converter 50 of the video generation box 2. ing.

【０１６１】上記口の動きコード変換器５０は、図に示
すように、基準音量レベル記憶部５０Ａと、口の動きコ
ード生成部５０Ｂと、上記基準音量レベル記憶部５０Ａ
のオン・オフを制御するスイッチ５０Ｃとで構成されて
いる。上記基準音量レベル記憶部５０Ａは、スイッチ５
０Ｃのオン時のみ動作するようになっており、該スイッ
チ５０Ｃは、通信前の準備段階であって、基準音量レベ
ルを設定するときのみにオンするようになっている。As shown in the figure, the mouth motion code converter 50 includes a reference volume level storage unit 50A, a mouth motion code generation unit 50B, and the reference volume level storage unit 50A.
And a switch 50C for controlling ON / OFF of the switch. The reference volume level storage unit 50A includes a switch 5
The switch 50C is designed to operate only when ON is set to 0C, and the switch 50C is turned ON only when setting a reference volume level in a preparation stage before communication.

【０１６２】図１０に戻って、基準音量レベルを設定す
る際には、上記スイッチ５０Ｃがオンしており、使用者
（送り手側話者）が音声を発すると（ステップＳ３
０）、該音声の音量がコード“Ｍ”の基準値として該基
準音量レベル記憶部５０Ａ（音声検出器）に記憶され
（ステップＳ３１）、メインルーチンにリターンする。Returning to FIG. 10, when the reference volume level is set, the switch 50C is turned on and the user (sender) speaks (step S3).
0), the volume of the voice is stored in the reference volume level storage unit 50A (voice detector) as a reference value of the code "M" (step S31), and the process returns to the main routine.

【０１６３】図８に戻って、上記ステップＳ４における
基準値の設定が終了すると、送り手側話者は、基本キャ
ラクター画像等を再度確認し（ステップＳ５）、必要な
らば上記ステップＳ１〜ステップＳ４までの何れかある
いは全てを再度繰り返して所望のキャラクター画像、変
形量が得られるまで調整する。Returning to FIG. 8, when the setting of the reference value in step S4 is completed, the sender-side speaker checks the basic character image and the like again (step S5), and if necessary, the steps S1 to S4. Any or all of the above steps are repeated again until the desired character image and deformation amount are obtained.

【０１６４】以上で、本第１の実施形態の画像音声通信
システムにおける、通信前段階の作用、すなわち、送り
手側話者のキャラクター画像等のキャラクターデータの
生成過程について述べた。In the above, the operation at the pre-communication stage in the video and audio communication system of the first embodiment, that is, the process of generating character data such as the character image of the sender speaker has been described.

【０１６５】次に、通信が開始された以降の作用につい
て説明する。Next, the operation after the communication is started will be described.

【０１６６】本第１の実施形態の画像音声通信システム
においては、通信が開始されると、まず、所定のプロト
コルにより初期交渉が行われ、引き続き、初期段階とし
て以下に示すデータ転送が行われる。In the video and audio communication system according to the first embodiment, when communication is started, first, initial negotiation is performed by a predetermined protocol, and subsequently, the following data transfer is performed as an initial stage.

【０１６７】すなわち、まず、送り手側装置より、送り
手側話者の基本キャラクター画像データが受け手側装置
に転送され、該受け手側装置の記憶部に記憶する。ま
た、送り手側装置より、所定の指令信号コードに対応す
る表情変化を付けた変形キャラクター画像データが受け
手側装置に転送され、該受け手側装置の記憶部に記憶す
る。さらに上記初期段階のデータ転送が終了すると、実
際の対話段階に移行し、以下の作業がなされる。That is, first, the basic character image data of the sender speaker is transferred from the sender device to the receiver device and stored in the storage unit of the receiver device. Further, the deformed character image data with the expression change corresponding to the predetermined command signal code is transferred from the sender side device to the receiver side device and stored in the storage unit of the receiver side device. Further, when the data transfer in the initial stage is completed, the process proceeds to the actual interactive stage, and the following operations are performed.

【０１６８】まず、送り手側装置においては、送り手側
話者の目、口の動きを所定のタイミングで検出する。次
に、送り手側話者の目、口の動きの検出結果と上記しき
い値に基づいて所定の指令信号コードを逐次、受け手側
装置に伝送する。また、送り手側話者の頭の動きを所定
のタイミングで検出し、この検出結果を逐次、受け手側
装置に伝送する。なお、この頭の動きのコードの伝送
は、上記所定の指令信号コードの伝送に同期している。
さらに、送り手側話者の音声信号を所定のタイミングで
採取し、この音声信号を逐次、受け手側装置に伝送す
る。なお、この音声信号の伝送は、上記(2)における所
定の指令信号コードの伝送に同期している。First, the sender's device detects the movement of the eyes and mouth of the sender's speaker at a predetermined timing. Next, predetermined command signal codes are sequentially transmitted to the receiver device based on the detection results of the eyes and mouth movements of the sender speaker and the threshold value. In addition, the head movement of the sender speaker is detected at a predetermined timing, and the detection result is sequentially transmitted to the receiver device. The transmission of the head movement code is synchronized with the transmission of the predetermined command signal code.
Further, a voice signal of the sender speaker is collected at a predetermined timing, and this voice signal is sequentially transmitted to the receiver device. Note that the transmission of the audio signal is synchronized with the transmission of the predetermined command signal code in the above (2).

【０１６９】一方、受け手側装置においては、送り手側
装置から逐次（実質的にリアルタイムに）送られてくる
送り手側話者の目、口の動きに関する所定の指令信号コ
ードを受けとる。また、送り手側装置から逐次（実質的
にリアルタイムに）送られてくる送り手側話者の頭の動
きに関するコードを受けとる。さらに、送り手側装置か
ら逐次（実質的にリアルタイムに）送られてくる送り手
側話者の音声信号を受けとる。そして、受け取った指令
信号コードに対応する目、口の動きのキャラクター画像
データあるいは変形キャラクター画像データを記憶部よ
り検索し読み出す。さらに、読み出したキャラクター画
像データあるいは変形キャラクター画像データと、受け
取った頭の動きに関するコードに基づいて、送り手側話
者のキャラクター画像を実質的にリアルタイムに表示部
に表示する。さらに、受け取った音声信号に基づいて、
送り手側話者の音声をリアルタイムに再生する。On the other hand, the receiver device receives a predetermined command signal code relating to the eyes and mouth movements of the sender speaker sequentially (substantially in real time) sent from the sender device. In addition, it receives a code regarding the head movement of the sender speaker sequentially (substantially in real time) sent from the sender device. Further, it receives the voice signal of the sender speaker sequentially (substantially in real time) sent from the sender device. Then, character image data or deformed character image data of eye and mouth movements corresponding to the received command signal code is retrieved and read from the storage unit. Further, based on the read character image data or deformed character image data and the received code relating to the head movement, the character image of the sender speaker is displayed on the display unit substantially in real time. Furthermore, based on the received audio signal,
Plays back the real-time voice of the sender speaker.

【０１７０】以下、この通信段階における本第１の実施
形態の画像音声通信システムの作用について図２１ない
し図３２を参照して説明する。なお、これらの図におい
ては、図５に示すように一方の使用者を甲、他方を乙と
し、それぞれ使用者甲，乙が装着する装置を甲側の装
置、乙側の装置とする。Hereinafter, the operation of the audiovisual communication system of the first embodiment in this communication stage will be described with reference to FIGS. In these drawings, as shown in FIG. 5, one user is designated as Party A and the other is Party B, and the apparatus worn by the Party A and Party B are respectively the Party A apparatus and the Party B apparatus.

【０１７１】図２１ないし図２３は、本実施形態の画像
音声通信システムにおいて通信が開始された後の作用を
示したフローチャートである。図２１は、通信が開始さ
れた後の甲側の装置における通信初期段階の作用を、図
２２は、甲側の装置における対話（送信）段階の作用
を、図２３は、乙側の装置における対話（受信）段階の
作用をそれぞれ示している。FIGS. 21 to 23 are flowcharts showing the operation after the communication is started in the audiovisual communication system of the present embodiment. FIG. 21 is a diagram illustrating the operation of the communication device in the initial stage of the communication after the communication is started, FIG. 22 is a diagram illustrating the operation of the dialogue (transmission) stage in the device of the communication device, and FIG. The operation of the dialogue (reception) stage is shown.

【０１７２】図２１に示すように、話者甲が他の任意の
話者乙に対して通信を開始しようとする際には、まず、
話者甲はＨＭＤ１を装着し、通話スイッチ２４を操作し
てオフフックし、通常の電話回線を使用した通話と同様
にダイヤルを開始する（ステップＳ４１）。なお、この
ダイヤル操作は、本実施形態においては、上記コントロ
ーラパッド３に配設されたダイヤルボタン６２によって
行うようになっている。As shown in FIG. 21, when the speaker A attempts to start communication with another arbitrary speaker B, first,
The speaker A wears the HMD 1, operates the call switch 24, goes off-hook, and starts dialing in the same manner as a call using a normal telephone line (step S41). In this embodiment, the dial operation is performed by a dial button 62 provided on the controller pad 3.

【０１７３】この後、回線が接続され、相手側話者乙と
の通話が可能となると、すなわち乙側の装置の受信準備
ができると（ステップＳ４２）、まず、相手側話者を識
別するための、たとえばＩＤＮｏ．等を選択し、相手側
話者を識別する（ステップＳ４３）。この操作は、コン
トローラパッド３により行われる。Thereafter, when the line is connected and communication with the other party's party B is possible, that is, when the reception of the other party's apparatus is ready (step S42), first, the other party's speaker is identified. For example, ID No. Is selected to identify the other party (step S43). This operation is performed by the controller pad 3.

【０１７４】この後、甲側装置において上記説明した過
程により生成された各種キャラクターデータ、基本キャ
ラクター画像のデータ、キャラクター画像加工量（変形
量）に係るデータがキャラクターデータ送信器３１（図
４参照）より話者乙に対して送信される。すなわち、ま
ず、甲側装置より、話者乙の基本キャラクター画像デー
タが乙側装置に対して送信される（ステップＳ４４）。
次に、甲側装置より、所定の指令信号コードに対応する
表情変化を付けたキャラクター画像データの加工量デー
タが乙側装置に送信される（ステップＳ４５）。Thereafter, various character data, basic character image data, and data relating to the character image processing amount (deformation amount) generated in the above-described process in the instep device are transmitted to the character data transmitter 31 (see FIG. 4). Sent to speaker B. That is, first, the basic character image data of the speaker B is transmitted from the former device to the second device (step S44).
Next, the processing amount data of the character image data with the expression change corresponding to the predetermined command signal code is transmitted from the instep-side apparatus to the in-house apparatus (step S45).

【０１７５】次に、乙側装置より送信される、話者乙の
基本キャラクター画像データをキャラクターデータ受信
器３２で受信し（ステップＳ４６）、該データをキャラ
クターデータ記憶装置３７に記憶する（ステップＳ４
７）。次に、、乙側装置より送信される、話者乙の所定
の指令信号コードに対応する表情変化を付けたキャラク
ター画像データの加工量データを同様にキャラクターデ
ータ受信器３２で受信し（ステップＳ４８）、該データ
をキャラクターデータ記憶装置３７に記憶する（ステッ
プＳ４９）。Next, the character data receiver 32 receives the basic character image data of the speaker B transmitted from the second apparatus (step S46), and stores the data in the character data storage device 37 (step S4).
7). Next, the character data receiver 32 similarly receives the processing amount data of the character image data with the facial expression change corresponding to the predetermined command signal code of the speaker B, transmitted from the apparatus B side (step S48). ), And stores the data in the character data storage device 37 (step S49).

【０１７６】次に、話者甲の頭の位置をリセットする
（ステップＳ５０）。これは、話者甲が装着したＨＭＤ
１におけるヘッドモーションセンサ１１の位置をリセッ
トする。なお、このリセット動作は、上記ステップＳ４
９までの動作が完了した段階で自動的に行われても良い
し、使用者が図示しないスイッチ等により行っても良
い。Next, the position of the head of the speaker A is reset (step S50). This is the HMD worn by the speaker
The position of the head motion sensor 11 at 1 is reset. This reset operation is performed in step S4.
9 may be performed automatically at the stage when the operations up to 9 are completed, or may be performed by a user using a switch (not shown) or the like.

【０１７７】このように初期段階の各種キャラクターデ
ータの転送が終了すると、実際の対話段階に移行する。
図２２に示すように、まず、甲側装置で、話者甲自身の
音声を検出する（ステップＳ５１）。この音声検出は、
マイク１９で採取した話者甲の音声を音声信号送信器４
８で検出し、音声を検出すると同音声信号送信器４８よ
り音声信号を送信する（ステップＳ５２）。When the transfer of various character data in the initial stage is completed, the process proceeds to the actual dialogue stage.
As shown in FIG. 22, first, the instep side device detects the voice of the instep A (step S51). This voice detection,
The voice of the speaker A collected by the microphone 19 is transmitted to the voice signal transmitter 4.
In step S52, a voice signal is transmitted from the voice signal transmitter 48 when voice is detected in step S52.

【０１７８】上記マイク１９で採取した話者甲の音声
は、同時に上記口の動きコード生成部５０Ｂ（図２０参
照）に入力され、予め基準音量レベル記憶部５０Ａで設
定された基準音量レベルに達したか否かを判定し（ステ
ップＳ５３）、基準値以上であれば、口の動きコード変
換器５０内の該規格化音量レベル変換部５０Ｂで口の動
きを上記指令信号コードに対応したコード“Ｍ”に変換
し（ステップＳ５４）、口の動きコード送信器４７より
乙側装置に送信して（ステップＳ５５）、ステップＳ５
６に移行する。The voice of the instep A collected by the microphone 19 is simultaneously input to the mouth movement code generation unit 50B (see FIG. 20) and reaches the reference volume level previously set in the reference volume level storage unit 50A. It is determined whether or not the mouth movement has been performed (step S53). If it is equal to or greater than the reference value, the standardized volume level converter 50B in the mouth movement code converter 50 determines the mouth movement as a code " M "(step S54), and transmitted from the mouth motion code transmitter 47 to the second party device (step S55).
Move to 6.

【０１７９】上記ステップＳ５１において、話者甲の音
声が検出されないとき、また、上記ステップＳ５３にお
いて、検出された音声が上記基準値に満たない場合は、
ともにステップＳ５６に移行する。If the voice of the speaker A is not detected in step S51, or if the detected voice is less than the reference value in step S53,
In both cases, the process proceeds to step S56.

【０１８０】上記ステップＳ５６においては、話者甲の
目の動きを検出する。これは、上記右視線検出器１２，
左視線検出器１５により話者甲の視線を検出し、該視線
の動きが予め上記ステップＳ２３，Ｓ２６，Ｓ２９（図
１０参照）で設定した基準値を満たすときには、上記目
の動きコード変換器５１（図４参照）で、所定のコード
（“ＥＬ”，“ＥＲ”，“ＥＣ”）に変換し（ステップ
Ｓ５７）、上記目の動きコード送信器４５より乙側装置
に送信して（ステップＳ５８）、ステップＳ５９に移行
する。In step S56, the movement of the eyes of the speaker A is detected. This is because the right eye gaze detector 12,
The gaze of the speaker A is detected by the left gaze detector 15, and when the movement of the gaze satisfies the reference values set in advance in steps S 23, S 26, and S 29 (see FIG. 10), the eye movement code converter 51 (See FIG. 4), it is converted into a predetermined code (“EL”, “ER”, “EC”) (step S57), and transmitted from the above-mentioned motion code transmitter 45 to the second party device (step S58). ), And proceed to step S59.

【０１８１】上記ステップＳ５６において、話者甲の目
の動きが検出されないとき、すなわち視線の動きが上記
基準値を満たさないときには、ステップＳ５９に移行す
る。In step S56, when the movement of the eye of the speaker A is not detected, that is, when the movement of the line of sight does not satisfy the reference value, the flow shifts to step S59.

【０１８２】上記ステップＳ５９においては、話者甲の
頭の動きを検出する。これは、ＨＭＤ１のヘッドモーシ
ョンセンサ１１で話者甲の頭の動きを検出し、この頭の
動きを検出すると、頭の動きコード変換器５２において
所定のコードに変換し（ステップＳ６０）、頭の動きコ
ード送信器４６より乙側装置に送信する（ステップＳ６
１）。In step S59, the movement of the head of the speaker A is detected. This is because the head motion sensor 11 of the HMD 1 detects the movement of the head of the speaker A, and when the movement of the head is detected, the head movement code converter 52 converts the head movement into a predetermined code (step S60). It is transmitted from the motion code transmitter 46 to the second party device (step S6).
1).

【０１８３】ここで、上記頭の動きの検出およびこの動
きの検出がなされた際の処理について図２４，図２５を
参照して説明する。Here, the detection of the head movement and the processing when the movement is detected will be described with reference to FIGS. 24 and 25.

【０１８４】図２４は、本実施形態の画像音声通信シス
テムにおいて、使用者がＨＭＤ１を装着した様子を正面
からみた図であり、図２５は、同様に使用者がＨＭＤ１
を装着した様子を一側面からみた図である。FIG. 24 is a front view of a state in which the user wears the HMD 1 in the video and audio communication system of the present embodiment, and FIG.
FIG. 2 is a diagram showing a state where the camera is mounted as viewed from one side.

【０１８５】使用者がＨＭＤ１を装着した状態で、該使
用者の頭の動きとしては、図２４に示すようなロール方
向、ヨー方向、図２５に示すようなピッチ方向の動きが
考えられる。本実施形態の画像音声通信システムにおい
ては、このような使用者の頭の動きを上記ヘッドモーシ
ョンセンサ１１で検出するようになっている。上述した
ように、対話が開始されると初期段階でヘッドモーショ
ンセンサ１１の位置がリセットされるようになっている
（ステップＳ５０、図２１参照）。そして、このリセッ
トされた基準位置よりヘッドモーションセンサ１１がど
れだけ変位したかを計測することで、使用者の頭の動き
を捉えることができる。With the user wearing the HMD 1, the movement of the user's head may be a roll direction, a yaw direction as shown in FIG. 24, or a pitch direction as shown in FIG. In the audiovisual communication system of the present embodiment, such head movement of the user is detected by the head motion sensor 11. As described above, when the dialogue is started, the position of the head motion sensor 11 is reset at an initial stage (step S50, see FIG. 21). Then, by measuring how much the head motion sensor 11 has been displaced from the reset reference position, the movement of the user's head can be grasped.

【０１８６】本実施形態においては、対話時において、
上記ヘッドモーションセンサ１１の変位を常時検出し、
この検出結果に基づいて頭の動きコード変換器５２で所
定のコードを生成し、上記目、口の動きのコード（指令
信号コードに対応したコード）の伝送に同期して受け手
側話者に対して送出するようになっている。In the present embodiment, at the time of dialogue,
The displacement of the head motion sensor 11 is always detected,
Based on this detection result, a predetermined code is generated by the head motion code converter 52, and the eye and mouth motion codes (codes corresponding to the command signal codes) are transmitted to the receiver speaker in synchronization with the transmission of the codes. And send it out.

【０１８７】上記頭の動きに係る所定のコードは、本実
施形態においては、上記ロール方向、ヨー方向、ピッチ
方向の動きに対応した３種類のコードが用意されてい
る。そして、ヘッドモーションセンサ１１の検出結果を
受けて上記頭の動きコード変換器５２において、使用者
の頭がこれらロール方向、ヨー方向、ピッチ方向の何れ
かあるいは複合された動きをしたと判断すると、これら
の動きに対応した上記３種類のコードのうち何れかある
いは複合的に変換し、頭の動きコード送信器４６に送出
するようになっている。In the present embodiment, three kinds of codes corresponding to the movements in the roll direction, the yaw direction, and the pitch direction are prepared as the predetermined codes related to the head movement. When the head motion code converter 52 receives the detection result of the head motion sensor 11 and determines that the user's head has made any of these roll directions, yaw directions, pitch directions, or a combined movement, Any one of the above three types of codes corresponding to these movements is converted or combined and transmitted to the head movement code transmitter 46.

【０１８８】以上、図２１，図２２を参照して、一方の
話者である甲側の送信作用について説明したが、次に、
対話段階において甲側からの送信信号を受信する乙側の
受信作用について図２３を参照して説明する。The transmission operation on the instep A, one of the speakers, has been described with reference to FIGS. 21 and 22.
The receiving operation of the second party receiving the transmission signal from the first party in the dialogue stage will be described with reference to FIG.

【０１８９】図２３は、乙側の装置における対話受信段
階の作用を示したフローチャートである。FIG. 23 is a flow chart showing the operation of the conversation receiving stage in the device of the second party.

【０１９０】対話が開始されると、乙側装置は甲側装置
からの音声信号の受信の有無を判定する（ステップＳ７
１）。これは、甲側装置３０１（図２０参照）の音声信
号送信器４８から何等かの音声信号が送信されると乙側
装置３０２の音声信号受信器４９で受信するようになっ
ており、該ステップＳ７１で音声信号を受信すると、ス
ピーカ２０で甲側話者の音声を再生する（ステップＳ７
２）。When the dialogue is started, the second party device determines whether a voice signal has been received from the first party device (step S7).
1). This is so that when any audio signal is transmitted from the audio signal transmitter 48 of the instep-side device 301 (see FIG. 20), the audio signal is received by the audio signal receiver 49 of the in-house device 302. When the voice signal is received in S71, the voice of the back speaker is reproduced by the speaker 20 (step S7).
2).

【０１９１】この後、甲側装置の口の動きコード送信器
４７から送出された所定の口の動きコードを乙側装置の
口の動きコード受信器３４（図４，図２０参照）で受信
したか否かを判定し（ステップＳ７３）、該コードが受
信されたならば、キャラクターデータ加工装置３８（図
４参照）において、上述したように該コードに応じてキ
ャラクター画像のデータを加工し（ステップＳ７４）、
ステップＳ７５に移行する。Thereafter, a predetermined mouth movement code transmitted from the mouth movement code transmitter 47 of the instep-side apparatus is received by the mouth movement code receiver 34 (see FIGS. 4 and 20) of the second-party apparatus. It is determined whether the code is received (step S73). If the code is received, the character data processing device 38 (see FIG. 4) processes the character image data according to the code as described above (step S73). S74),
The process moves to step S75.

【０１９２】また、上記ステップＳ７１において、甲側
装置より何等音声を受信しない場合、ステップＳ７３に
おいて、口の動きコードを受信しない場合（話者甲に係
る何等かの音声は受信するが、音量レベルが所定の基準
値より低く甲側装置において口の動きコードが生成され
ない場合）は、ともにステップＳ７５に移行する。In step S71, if no voice is received from the instep side device, in step S73, if no mouth motion code is received (some voice related to the instep A is received, Is lower than the predetermined reference value, and no mouth movement code is generated in the instep side device).

【０１９３】ステップＳ７５においては、乙側装置にお
ける動きコード受信器３３が、甲側装置における目の動
きコード送信器４５から送出される所定の目の動きコー
ドを受信したか否かを判定する。ここで、該コードを受
信すると、キャラクターデータ加工装置３８（図４参
照）において、上述したように該コードに応じてキャラ
クター画像のデータを加工し（ステップＳ７６）、ステ
ップＳ７７に移行する。また、上記ステップＳ７５にお
いて、甲側装置からの目の動きコードを受信しない場合
は、ステップＳ７７に移行する。In step S75, it is determined whether or not the motion code receiver 33 of the second apparatus has received a predetermined eye motion code transmitted from the eye motion code transmitter 45 of the first apparatus. Here, when the code is received, the character data processing device 38 (see FIG. 4) processes the character image data according to the code as described above (step S76), and proceeds to step S77. If it is determined in step S75 that the eye motion code has not been received from the instep-side device, the process proceeds to step S77.

【０１９４】ステップＳ７７においては、乙側装置は、
上述したようにキャラクター画像生成装置３９において
甲側話者のキャラクター画像を生成する。このとき、話
者乙のＨＭＤ１におけるモニタ画面には、たとえば、図
１１〜図１４に示すような話者甲のキャラクター画像
が、当該話者甲の目、口の動きに応じて表情を変化させ
て表示されている。また、このキャラクター画像は、上
述したように、話者甲の目、口の動きに応じて実質的に
リアルタイムに変化する。In step S77, the second party device
As described above, the character image of the instep speaker is generated in the character image generating device 39. At this time, on the monitor screen of the speaker B's HMD1, for example, a character image of the speaker A as shown in FIGS. 11 to 14 changes its expression according to the movement of the eyes and mouth of the speaker A. Is displayed. Further, as described above, this character image changes substantially in real time according to the movement of the eyes and mouth of the speaker A.

【０１９５】この後、ステップＳ７８において、乙側装
置における頭の動きコード受信器３５が、甲側装置にお
ける頭の動きコード送信器４６から送信される所定の頭
の動きコードを受信したか否かを判定する。この頭の動
きコードは、上述したように、ロール方向、ヨー方向、
ピッチ方向の動きに対応した３種類のコードである。Thereafter, in step S78, it is determined whether or not the head motion code receiver 35 of the second party device has received a predetermined head motion code transmitted from the head motion code transmitter 46 of the first party device. Is determined. This head movement code is, as described above, a roll direction, a yaw direction,
There are three types of chords corresponding to movement in the pitch direction.

【０１９６】このステップＳ７８で少なくとも上記何れ
かのコードを受信すると、乙側装置は、画像変形量演算
部４０において受信したコードに基づいて画像をどれだ
け変形させるかを演算する。そして、この演算結果に基
づいて次段の画像変形部４１において、上記ステップＳ
７７においてキャラクター画像生成装置３９で生成され
た話者甲のキャラクター画像を変形させ（ステップＳ７
９）、この後ステップＳ８０に移行する。なお、上記ス
テップＳ７８において、何れのコードも受信しない場合
は、ステップＳ８０に移行する。Upon receiving at least one of the codes in step S78, the second party apparatus calculates how much the image is to be deformed based on the received code in the image deformation amount calculating section 40. Then, based on the calculation result, the next-stage image deforming section 41 performs step S
At 77, the character image of the instep A generated by the character image generation device 39 is transformed (step S7).
9) Then, the process proceeds to step S80. If no code is received in step S78, the process proceeds to step S80.

【０１９７】すなわち、乙側装置の頭の動きコード受信
器３５で受信した話者甲の頭の動きに応じてキャラクタ
ー画像生成装置３９で最終的に生成した相手側（甲）の
キャラクター画像を変形させる。That is, in response to the head movement of the speaker A received by the head movement code receiver 35 of the second apparatus, the character image of the other party (the first party) finally generated by the character image generator 39 is transformed. Let it.

【０１９８】ここで、この画像変形の具体例について説
明する。図２６ないし図２８は、本第１の実施形態の画
像音声通信システムにおける乙側装置が、甲側装置より
頭の動きコードを受信した際に行う画像変形の例を示し
た図であり、図２６は、話者甲のロール方向の頭の動き
に対応したコードを受信した際の画像変形の一例、図２
７は、話者甲のピッチ方向の頭の動きに対応したコード
を受信した際の画像変形の一例、図２８は、話者甲のヨ
ー方向の頭の動きに対応したコードを受信した際の画像
変形の一例をそれぞれ示している。なお、話者甲のキャ
ラクター画像は、上記図１４に示したキャラクター画像
を例にとって示している。Now, a specific example of this image deformation will be described. FIGS. 26 to 28 are diagrams showing examples of image modification performed when the second apparatus in the video and audio communication system according to the first embodiment receives a head movement code from the first apparatus. 26 is an example of image deformation when a code corresponding to the head movement of the speaker A in the roll direction is received, FIG.
FIG. 7 shows an example of image deformation when a code corresponding to the pitcher's head movement in the pitch direction is received, and FIG. 28 shows an example in which a code corresponding to the speaker X's head movement in the yaw direction is received. Each example of image deformation is shown. Note that the character image of the instep A shows the character image shown in FIG. 14 as an example.

【０１９９】乙側装置が話者甲のロール方向の頭の動き
に対応したコードを受信すると、話者乙のＨＭＤ１のモ
ニタ画面上には、図２６に示すように、キャラクター画
像の所定の一点（図に示す例においては、顔の最下点）
を中心に座標を回転（ロールの＋方向、−方向に応じた
回転）させたキャラクター画像が表示される。なお、本
図においては、ロール＋方向のコードを受信した例を示
している。When the second party device receives the code corresponding to the roll movement of the speaker A in the roll direction, a predetermined point of the character image is displayed on the monitor screen of HMD1 of the second party as shown in FIG. (In the example shown, the lowest point of the face)
Is displayed with the character rotated (rotated in accordance with the + and-directions of the roll). Note that this figure shows an example in which a code in the roll + direction is received.

【０２００】乙側装置が話者甲のピッチ方向の頭の動き
に対応したコードを受信すると、話者乙のＨＭＤ１のモ
ニタ画面上には、図２７に示すように、キャラクター画
像に図示の如く周知の台形歪処理（縦方向歪）を施した
キャラクター画像が表示される。なお、本図において
は、ピッチ＋方向のコードを受信した例を示している。When the second party device receives the code corresponding to the pitcher's head movement in the pitch direction, the monitor screen of HMD1 of the second party displays a character image as shown in FIG. A character image subjected to a well-known trapezoidal distortion process (vertical distortion) is displayed. FIG. 3 shows an example in which a code in the pitch + direction is received.

【０２０１】乙側装置が話者甲のヨー方向の頭の動きに
対応したコードを受信すると、話者乙のＨＭＤ１のモニ
タ画面上には、図２８に示すように上記同様キャラクタ
ー画像に図示の如く台形歪処理（横方向歪）を施したキ
ャラクター画像が表示される。なお、本図においては、
ヨー＋方向のコードを受信した例を示している。When the second party device receives the code corresponding to the head movement of the speaker A in the yaw direction, on the monitor screen of the HMD1 of the second party, as shown in FIG. The character image subjected to the trapezoidal distortion processing (lateral distortion) is displayed. In this figure,
An example in which a code in the yaw + direction is received is shown.

【０２０２】また、乙側装置が上記３種類の頭の動きの
コードを所定時間内に複数受信すると、乙側装置は上記
図２６ないし図２８に示した画像変形を複合してＨＭＤ
１のモニタ画面上に表示するようになっている。When the second party device receives a plurality of the above three types of head movement codes within a predetermined time, the second party device combines the image deformations shown in FIGS.
1 is displayed on the monitor screen.

【０２０３】図２３に戻って、次に、乙側装置は、話者
乙自身の頭の動きを上記ヘッドモーションセンサ１１で
検出する（ステップＳ８０）。そして、話者乙の頭の動
きが検出されると、この検出結果に応じて上記座標変換
部４２で話者乙自身のモニタ画面に映し出された話者甲
のキャラクター画像の座標変換を行う（ステップＳ８
１）。Returning to FIG. 23, next, the second party device detects the movement of the head of the second party with the head motion sensor 11 (step S80). When the movement of the head of the speaker B is detected, the coordinate conversion unit 42 performs coordinate conversion of the character image of the speaker A displayed on the monitor screen of the speaker B according to the detection result ( Step S8
1).

【０２０４】ここで、この座標変換について説明する。Here, the coordinate conversion will be described.

【０２０５】上記座標変換部４２においては、話者乙が
モニタする画面において、話者乙の頭の動きに応じてモ
ニタ画面上に映し出されている話者甲のキャラクター画
像の変換ベクトルが決定される。本実施形態において
は、上記ステップＳ５０における甲側の頭の位置（ヘッ
ドモーションセンサ１１の位置）のリセット動作と同様
に話者乙の頭の位置もリセットされる。In the coordinate conversion section 42, on the screen monitored by the speaker B, the conversion vector of the character image of the instep A displayed on the monitor screen is determined according to the movement of the head of the speaker B. You. In the present embodiment, the position of the speaker B's head is also reset in the same manner as the reset operation of the back side head position (the position of the head motion sensor 11) in step S50.

【０２０６】このときの状態を標準位置とすると、話者
乙の頭の位置がこの標準位置にあるときは、たとえば図
２９に示すように話者甲のキャラクター画像はモニタ画
面のほぼ中央に位置する。Assuming that the state at this time is a standard position, when the position of the speaker B's head is at this standard position, for example, as shown in FIG. 29, the character image of the speaker A is positioned substantially at the center of the monitor screen. I do.

【０２０７】いま、話者乙が自身の頭を上記標準位置よ
りヨー（＋）方向（図２４に示すように、話者乙の左方
向への回転を（＋）方向とする）に移動したとすると、
モニタ画面上に表示される話者甲のキャラクター画像は
図３０に示すように向かって右方向に移動して表示され
る。Now, the speaker B has moved his / her head from the above standard position in the yaw (+) direction (as shown in FIG. 24, the left rotation of the speaker B is the (+) direction). Then
The character image of the instep A displayed on the monitor screen moves rightward as shown in FIG. 30 and is displayed.

【０２０８】また、同様に、話者乙が頭を上記標準位置
よりピッチ（＋）方向（図２５に示すように、話者乙の
下方向への回転を（＋）方向とする）に移動したとする
と、モニタ画面上に表示される話者甲のキャラクター画
像は図３１に示すように上方向に移動して表示される。Similarly, the speaker B moves the head in the pitch (+) direction from the standard position (the downward rotation of the speaker B is defined as the (+) direction as shown in FIG. 25). If so, the character image of the instep A displayed on the monitor screen is moved upward and displayed as shown in FIG.

【０２０９】さらに、話者乙が頭を上記標準位置よりロ
ール（＋）方向（図２４に示すように、話者乙の左方向
への回転を（＋）方向とする）に移動したとすると、モ
ニタ画面上に表示される話者甲のキャラクター画像は図
３２に示すように向かって時計回りに回転して表示され
る。Further, it is assumed that the speaker B moves the head in the roll (+) direction from the above standard position (the left rotation of the speaker B is defined as the (+) direction as shown in FIG. 24). The character image of the instep A displayed on the monitor screen is rotated clockwise as shown in FIG.

【０２１０】以上のように、話者乙のＨＭＤ１のモニタ
画面には、甲側装置から送出される上記各データに基づ
いて加工、変形、変換が施された話者甲のキャラクター
画像が表示される（ステップＳ８２）。As described above, on the monitor screen of the speaker B's HMD1, the character image of the speaker A, which has been processed, deformed, and converted based on the data transmitted from the instep device, is displayed. (Step S82).

【０２１１】このような、本第１の実施形態の画像音声
通信システムによると、以下に示す如く効果を奏する。According to the video and audio communication system of the first embodiment, the following effects can be obtained.

【０２１２】(1) 自分の顔とは無関係なキャラクター画
像で対話を楽しむことができるテレビ電話システムを提
供できる。すなわち、上記キャラクター画像は、任意に
作成したり、所定のものより選択して使用できる。(1) It is possible to provide a videophone system that allows a user to enjoy a conversation with a character image unrelated to his / her own face. That is, the character image can be arbitrarily created or selected from predetermined images.

【０２１３】(2) 簡単な構成の装置で難しい手間もかか
らず、使用者の顔の動きや表情の変化を実質的にリアル
タイムに相手側に伝達することができるテレビ電話シス
テムを提供できる。すなわち、使用者の顔の動きや表情
の変化を示す情報は、簡素なデータでよく、高速に送受
信することができる。(2) It is possible to provide a videophone system capable of transmitting a movement of a user's face and a change in facial expression to a partner in a substantially real-time manner with a device having a simple configuration and without any trouble. That is, the information indicating the movement of the user's face and the change in facial expression may be simple data, and can be transmitted and received at high speed.

【０２１４】(3) 使用者は特別な意識を持つことなく対
話を行うことができるテレビ電話システムを提供でき
る。すなわち、ＨＭＤを装着するだけで良い。(3) It is possible to provide a videophone system in which a user can have a conversation without special awareness. That is, it is only necessary to mount the HMD.

【０２１５】(4) 使用者は所定姿勢を強制されることな
く対話を行うことができるテレビ電話システムを提供で
きる。すなわち、特別なディスプレイやカメラに対峙す
ることなく対話できる。(4) It is possible to provide a videophone system in which a user can have a conversation without being forced to take a predetermined posture. That is, the user can interact without facing a special display or camera.

【０２１６】(5) 通信開始前に面倒な準備をすることな
く、家庭においても容易に使用することができるテレビ
電話システムを提供できる。すなわち、表情検出のため
のマーカーを顔に張り付けることなく対話を開始でき
る。(5) It is possible to provide a videophone system that can be easily used at home without having to make any complicated preparations before starting communication. That is, the dialogue can be started without attaching the marker for detecting the facial expression to the face.

【０２１７】次に、本発明の第２の実施形態の画像音声
通信システムについて説明する。本第２の実施形態の基
本的な構成、作用は上記第１の実施形態と同様であり、
ここでは差異のみの言及にとどめ同一部分の説明は省略
する。Next, a video and audio communication system according to a second embodiment of the present invention will be described. The basic configuration and operation of the second embodiment are similar to those of the first embodiment.
Here, only the differences will be mentioned, and the description of the same portions will be omitted.

【０２１８】上記第１の実施形態の画像音声通信システ
ムにおいては、対話段階において一方の話者甲がモニタ
する他方の話者乙の像は、当該話者乙が乙側装置で設定
した話者乙のキャラクタ画像であるが、この第２の実施
形態の画像音声通信システムにおいて話者甲がモニタす
る話者乙の像は、予め甲側装置で任意に設定あるいは記
憶された所定のキャラクタ画像を使用することを特徴と
する。In the video and audio communication system according to the first embodiment, the image of the other speaker B monitored by one speaker in the conversation stage is the speaker set by the speaker B in the device of the other party. Although the image of the second party is a character image of the second party, the image of the second party monitored by the first party in the image and voice communication system of the second embodiment is a predetermined character image previously set or stored arbitrarily in the first party device. It is characterized by being used.

【０２１９】すなわち、話者甲が自身のＨＭＤ１のモニ
タ上で見る話者乙のキャラクター画像は、話者甲が予め
設定あるいは記憶されたキャラクター画像である。この
とき、当該乙のキャラクター画像は、乙固有のキャラク
ター画像として予め話者甲が甲側装置において設定した
ものを使用してもよいし、あるいは不特定のキャラクタ
ー画像を甲側が任意に選択して使用しても良い。That is, the character image of the speaker B seen by the speaker A on the monitor of his / her own HMD 1 is a character image set or stored in advance by the speaker A. At this time, the character image of the second party may be a character image unique to the second party, which is previously set by the first party on the first party device, or the first party may select an unspecified character image arbitrarily. May be used.

【０２２０】また、対話中、話者甲が見る話者乙のキャ
ラクター画像は、任意に切換えて使用されても良い。During the conversation, the character image of the speaker B seen by the speaker A may be arbitrarily switched and used.

【０２２１】さらに、通信開始の際、予め設定されたＩ
Ｄ番号等で互いが認識できるようにすれば、該ＩＤ番号
に対応した相手側のキャラクター画像を受け手側で選択
することもできる。Further, at the start of communication, a predetermined I
If the mutual recognition can be performed by the D number or the like, the character image of the other party corresponding to the ID number can be selected on the receiver side.

【０２２２】このような、第２の実施形態の画像音声通
信システムによると、第１の実施形態に係る上記(2) な
いし(5)項で示す効果に加え、以下に示す如く効果を奏
する。According to the video and audio communication system of the second embodiment, the following effects are obtained in addition to the effects shown in the above items (2) to (5) according to the first embodiment.

【０２２３】対話相手のキャラクタ像を受け手側におい
て任意に設定あるいは切換えることができ、楽しい対話
を実現することができる。The character image of the conversation partner can be arbitrarily set or switched on the receiver side, and a pleasant conversation can be realized.

【０２２４】[0224]

【発明の効果】以上説明したように本発明によれば、簡
単で安価な装置を用い、使用者に手間をかけることな
く、話者の顔の動きや表情をリアルタイムに伝送する画
像音声通信システムおよびテレビ電話送受信方法を提供
できる。As described above, according to the present invention, a video and audio communication system for transmitting a face movement and a facial expression of a speaker in real time using a simple and inexpensive device without any trouble for the user. And a videophone transmission / reception method can be provided.

[Brief description of the drawings]

【図１】本発明の第１の実施形態である画像音声通信シ
ステムにおいて用いられる画像音声通信装置の主要構成
を示したブロック図である。FIG. 1 is a block diagram showing a main configuration of an audiovisual communication device used in an audiovisual communication system according to a first embodiment of the present invention.

【図２】本発明の第１の実施形態である画像音声通信シ
ステムの主要構成を示したシステム図である。FIG. 2 is a system diagram showing a main configuration of the audiovisual communication system according to the first embodiment of the present invention.

【図３】上記第１の実施形態の画像音声通信システムに
おいて、ＨＭＤを使用者が装着した際の様子を一側方よ
りみた側面図である。FIG. 3 is a side view showing a state when the user wears the HMD in the video and audio communication system according to the first embodiment as viewed from one side.

【図４】上記第１の実施形態の画像音声通信システムに
おいて、ＨＭＤ，映像生成ボックス，コントローラパッ
ドの接続対応と、これら各部の電気回路的な構成を詳し
く示したブロック構成図である。FIG. 4 is a block diagram showing in detail a connection correspondence between an HMD, a video generation box, and a controller pad, and an electric circuit configuration of each of these units in the video and audio communication system of the first embodiment.

【図５】上記第１の実施形態の画像音声通信システムを
実際に使用する使用者の状況を示した俯瞰図である。FIG. 5 is a bird's-eye view showing a situation of a user who actually uses the video and audio communication system of the first embodiment.

【図６】上記第１の実施形態の画像音声通信システムに
おいて、ＨＭＤに設けられるマイクの変形例を示した要
部外観斜視図である。FIG. 6 is an external perspective view of a main part showing a modified example of a microphone provided in the HMD in the video and audio communication system of the first embodiment.

【図７】上記第１の実施形態の画像音声通信システムに
おいて、所定キャラクター画像生成ソフトを用いた際の
ＨＭＤモニタ画面の一例を示した説明図である。FIG. 7 is an explanatory diagram showing an example of an HMD monitor screen when using predetermined character image generation software in the video and audio communication system of the first embodiment.

【図８】上記第１の実施形態の画像音声通信システムに
おいて、送り手側話者の基本キャラクター画像のデータ
および該キャラクター画像に関連する各種データを生成
する作業を示すフローチャートである。FIG. 8 is a flowchart showing an operation of generating data of a basic character image of a sender speaker and various data related to the character image in the video and audio communication system of the first embodiment.

【図９】上記第１の実施形態の画像音声通信システムに
おいて、送り手側話者の基本キャラクター画像に所定の
加工を施すとともに、該加工に係る所定コードを生成す
る作業を示すフローチャートである。FIG. 9 is a flowchart showing an operation of applying a predetermined process to the basic character image of the sender speaker and generating a predetermined code relating to the process in the video and audio communication system of the first embodiment.

【図１０】上記第１の実施形態の画像音声通信システム
において、送り手側話者の基本キャラクター画像に施さ
れた所定加工に係る所定コードの送信基準となる基準値
の設定作業を示すフローチャートである。FIG. 10 is a flowchart showing a reference value setting operation as a transmission reference of a predetermined code relating to a predetermined processing performed on a basic character image of a sender side speaker in the video and audio communication system of the first embodiment. is there.

【図１１】上記第１の実施形態の画像音声通信システム
において、送り手側話者の基本キャラクター画像の一例
を示した図である。FIG. 11 is a diagram showing an example of a basic character image of a sender speaker in the video and audio communication system of the first embodiment.

【図１２】上記第１の実施形態の画像音声通信システム
において、送り手側話者の基本キャラクター画像を加工
して生成された、視線を左に動かしたキャラクター画像
の一例を示した図である。FIG. 12 is a diagram showing an example of a character image in which the line of sight has been moved to the left, which is generated by processing the basic character image of the sender speaker in the video and audio communication system of the first embodiment. .

【図１３】上記第１の実施形態の画像音声通信システム
において、送り手側話者の基本キャラクター画像を加工
して生成された、目を閉じたキャラクター画像の一例を
示した図である。FIG. 13 is a diagram illustrating an example of a character image with closed eyes generated by processing a basic character image of a sender speaker in the video and audio communication system according to the first embodiment.

【図１４】上記第１の実施形態の画像音声通信システム
において、送り手側話者の基本キャラクター画像を加工
して生成された、口を開けたキャラクター画像の一例を
示した図である。FIG. 14 is a diagram showing an example of a character image with an open mouth generated by processing a basic character image of a sender speaker in the video and audio communication system of the first embodiment.

【図１５】上記第１の実施形態の画像音声通信システム
における、視線検出機構とその周辺部を示した説明図で
ある。FIG. 15 is an explanatory diagram showing a line-of-sight detection mechanism and a peripheral portion thereof in the video and audio communication system of the first embodiment.

【図１６】上記第１の実施形態の画像音声通信システム
において、上記図１５に示す視線検出機構で検出する視
線に応じて生じる電位と、該電位に対応して設定される
視線位置（基準位置）を説明した図である。16 is a diagram showing a potential generated according to a line of sight detected by the line of sight detection mechanism shown in FIG. 15 and a line of sight position (reference position) set in accordance with the potential in the video and audio communication system of the first embodiment. FIG.

【図１７】上記第１の実施形態の画像音声通信システム
において、上記図１５に示す視線検出機構で検出する視
線に応じて生じる電位と、該電位に対応して設定される
視線位置（右に移動）を説明した図である。17 is a diagram showing a potential generated in accordance with a line of sight detected by the line of sight detection mechanism shown in FIG. 15 and a line of sight position (rightward) set corresponding to the potential in the video and audio communication system of the first embodiment. FIG.

【図１８】上記第１の実施形態の画像音声通信システム
において、上記図１５に示す視線検出機構で検出する視
線に応じて生じる電位と、該電位に対応して設定される
視線位置（左に移動）を説明した図である。18 is a diagram showing a potential generated according to a line of sight detected by the line of sight detection mechanism shown in FIG. 15 and a line of sight position (leftward) set in accordance with the potential in the video and audio communication system of the first embodiment. FIG.

【図１９】上記第１の実施形態の画像音声通信システム
において、上記図１５に示す視線検出機構で検出する視
線に応じて生じる電位と、該電位に対応して設定される
視線位置（目を閉じる）を説明した図である。19 is a diagram showing a potential generated according to a line of sight detected by the line of sight detection mechanism shown in FIG. 15 and a line of sight position (eye position) set in accordance with the potential in the video and audio communication system of the first embodiment. FIG.

【図２０】上記第１の実施形態の画像音声通信システム
における音声検出機構とその周辺部を示した説明図であ
る。FIG. 20 is an explanatory diagram showing an audio detection mechanism and its peripheral parts in the video and audio communication system of the first embodiment.

【図２１】上記第１の実施形態の画像音声通信システム
において、通信開始後における、一方の話者甲の通信初
期段階の作用を示したフローチャートである。FIG. 21 is a flowchart showing an operation of the first party in the communication initial stage after the start of communication in the video and audio communication system of the first embodiment.

【図２２】上記第１の実施形態の画像音声通信システム
において、通信開始後の対話段階における、一方の話者
甲の送信作用を示したフローチャートである。FIG. 22 is a flowchart showing the transmission operation of one speaker in the conversation stage after the start of communication in the audiovisual communication system of the first embodiment.

【図２３】上記第１の実施形態の画像音声通信システム
において、通信開始後の対話段階における、他方の話者
乙の作用を示したフローチャートである。FIG. 23 is a flowchart showing the operation of the other speaker B in the dialogue stage after the start of communication in the audiovisual communication system of the first embodiment.

【図２４】上記第１の実施形態の画像音声通信システム
において、使用者がＨＭＤを装着した様子を正面からみ
た図である。FIG. 24 is a front view of a state in which the user wears the HMD in the video and audio communication system according to the first embodiment.

【図２５】上記第１の実施形態の画像音声通信システム
において、使用者がＨＭＤを装着した様子を一側方から
みた図である。FIG. 25 is a diagram of a state in which the user wears the HMD in the video and audio communication system according to the first embodiment as viewed from one side.

【図２６】上記第１の実施形態の画像音声通信システム
における乙側装置が、甲側装置より頭の動きコードを受
信した際に行う画像変形の例を示し、話者甲のロール方
向の頭の動きに対応したコードを受信した際の画像変形
の一例を示した図である。FIG. 26 shows an example of image transformation performed when the second party device in the video and audio communication system of the first embodiment receives a head movement code from the first party device, and the head of the speaker in the roll direction. FIG. 7 is a diagram showing an example of image deformation when a code corresponding to the movement of the image is received.

【図２７】上記第１の実施形態の画像音声通信システム
における乙側装置が、甲側装置より頭の動きコードを受
信した際に行う画像変形の例を示し、話者甲のピッチ方
向の頭の動きに対応したコードを受信した際の画像変形
の一例を示した図である。FIG. 27 shows an example of image deformation performed by the second party device in the video and audio communication system of the first embodiment when receiving a head motion code from the first party device, and the head of the speaker in the pitch direction. FIG. 7 is a diagram showing an example of image deformation when a code corresponding to the movement of the image is received.

【図２８】上記第１の実施形態の画像音声通信システム
における乙側装置が、甲側装置より頭の動きコードを受
信した際に行う画像変形の例を示し、話者甲のヨー方向
の頭の動きに対応したコードを受信した際の画像変形の
一例を示した図である。FIG. 28 shows an example of image deformation performed by the second party device in the video and audio communication system of the first embodiment when receiving a head motion code from the first party apparatus, and the head of the speaker in the yaw direction. FIG. 7 is a diagram showing an example of image deformation when a code corresponding to the movement of the image is received.

【図２９】上記第１の実施形態の画像音声通信システム
における座標変換に係る説明図であって、話者乙の頭の
位置が標準位置にあるときの、話者乙のモニタ画面上に
表示される話者甲のキャラクター画像の表示例を示した
図である。FIG. 29 is an explanatory diagram relating to coordinate conversion in the audiovisual communication system of the first embodiment, and is displayed on the monitor screen of the speaker B when the position of the head of the speaker B is at the standard position. FIG. 8 is a diagram showing a display example of a character image of a speaker A to be played.

【図３０】上記第１の実施形態の画像音声通信システム
における座標変換に係る説明図であって、話者乙が自身
の頭を標準位置よりヨー方向に移動したときの、話者乙
のモニタ画面上に表示される話者甲のキャラクター画像
の表示例を示した図である。FIG. 30 is an explanatory diagram related to coordinate conversion in the audiovisual communication system according to the first embodiment, and is a monitor of the speaker B when the speaker B moves his / her head from the standard position in the yaw direction. FIG. 8 is a diagram illustrating a display example of a character image of a speaker A displayed on a screen.

【図３１】上記第１の実施形態の画像音声通信システム
における座標変換に係る説明図であって、話者乙が自身
の頭を標準位置よりピッチ方向に移動したときの、話者
乙のモニタ画面上に表示される話者甲のキャラクター画
像の表示例を示した図である。FIG. 31 is an explanatory diagram related to coordinate conversion in the audiovisual communication system of the first embodiment, and is a monitor of the speaker B when the speaker B moves his / her head in the pitch direction from the standard position. FIG. 8 is a diagram illustrating a display example of a character image of a speaker A displayed on a screen.

【図３２】上記第１の実施形態の画像音声通信システム
における座標変換に係る説明図であって、話者乙が自身
の頭を標準位置よりロール方向に移動したときの、話者
乙のモニタ画面上に表示される話者甲のキャラクター画
像の表示例を示した図である。FIG. 32 is an explanatory diagram related to coordinate conversion in the audiovisual communication system of the first embodiment, and is a monitor of the speaker B when the speaker B moves his / her head from the standard position in the roll direction. FIG. 8 is a diagram illustrating a display example of a character image of a speaker A displayed on a screen.

[Explanation of symbols]

１…ＨＭＤ２…映像生成ボックス３…コントローラパッド４…外部装置５…電話回線１１…ヘッドモーションセンサ１２、１５…視線検出器１３、１６…接眼光学系１４、１７…ＬＣＤ１９…マイク２０…スピーカ３１…キャラクターデータ送信器３２…キャラクターデータ受信器３３…目の動きコード受信器３４…口の動きコード受信器３５…頭の動きコード受信器３６…キャラクターデータ記憶装置（送信側）３７…キャラクターデータ記憶装置（受信側）３８…キャラクターデータ加工装置３９…キャラクター画像生成装置４０…画像変形量演算部４１…画像変形部４２…座標変換部４３…キャラクターデータ生成装置４４…フォーマット記憶部４５…目の動きコード送信器４６…頭の動きコード送信器４７…口の動きコード送信器４８…音声信号送信器４９…音声信号受信器５０…口の動きコード変換器５１…目の動きコード変換器５２…頭の動きコード変換器１００…キャラクター画像生成手段１０１…キャラクターデータ入力手段１０２…第１の変換手段１０３…表示手段１０４…第１のキャラクターデータ記憶手段１０５…表情検出手段１０６…データ送信手段１０７…第１の選択手段１０８…表情コード変換手段１１１…データ受信手段１１２…第２の選択手段１１３…第２のキャラクターデータ記憶手段１１４…キャラクターデータ加工手段１１５…第２の変換手段１１６…表示手段１１７…画像変形手段１１８…音声再生手段 DESCRIPTION OF SYMBOLS 1 ... HMD 2 ... Video generation box 3 ... Controller pad 4 ... External device 5 ... Telephone line 11 ... Head motion sensor 12, 15 ... Eye-gaze detector 13, 16 ... Eyepiece optical system 14, 17 ... LCD 19 ... Microphone 20 ... Speaker DESCRIPTION OF SYMBOLS 31 ... Character data transmitter 32 ... Character data receiver 33 ... Eye motion code receiver 34 ... Mouth motion code receiver 35 ... Head motion code receiver 36 ... Character data storage device (transmission side) 37 ... Character data Storage device (reception side) 38 ... Character data processing device 39 ... Character image generation device 40 ... Image deformation amount calculation unit 41 ... Image deformation unit 42 ... Coordinate conversion unit 43 ... Character data generation device 44 ... Format storage unit 45 ... Eye Motion code transmitter 46 ... Head motion code transmitter 47 ... Mouth Code transmitter 48 ... Audio signal transmitter 49 ... Audio signal receiver 50 ... Mouth motion code converter 51 ... Eye motion code converter 52 ... Head motion code converter 100 ... Character image generating means 101 ... Character data Input means 102 First conversion means 103 Display means 104 First character data storage means 105 Expression detection means 106 Data transmission means 107 First selection means 108 Expression code conversion means 111 Data reception means 112 second selection means 113 second character data storage means 114 character data processing means 115 second conversion means 116 display means 117 image transformation means 118 sound reproduction means

Claims

[Claims]

An image / audio communication system provided with an image display unit and an audio output unit adapted to image and audio communication at least on a receiving side for the communication, wherein a character applied to display by the image display unit is provided. Character image setting means capable of arbitrarily setting an image, deformation command receiving means for receiving a command signal for giving a deformation to the character image from a communication partner, and deforming the character image according to the command signal. An audio-visual communication system comprising: a character deforming unit for applying; and a unit for supplying a character image deformed by the character deforming unit to the display unit to display the character image.

2. The character image setting means arbitrarily sets a character image representing its own side on a transmitting side in the communication, transmits the set character image to a receiving side, and displays the character image on a display by the image display means. 2. The video / audio communication system according to claim 1, wherein the video / audio communication system is provided on the transmission side so as to be provided as a character image to be applied.

3. A method according to claim 1, wherein the first information indicating the correspondence between the character image, the command signal code, and the corresponding degree of deformation of the character image is transmitted prior to the start of the conversation. 3. The video / audio communication system according to claim 2, wherein the second information consisting solely of the code of the command signal is transmitted in real time.

4. A character image deformation amount determining means for arbitrarily determining a correspondence between a character image and a command signal code in the first information and a corresponding degree of deformation of the character image, 4. The video and audio communication system according to claim 3, further comprising: transmission condition determination means for each communication person to arbitrarily determine a condition for executing the transmission of the command signal.

5. The character image setting means, wherein each of the communicators arbitrarily associates a communication partner with a character image, and recognizes a communication partner by a signal transmitted from the communication partner when executing communication. A display character selecting unit for applying a character image associated with the communication partner to the display by the image display unit based on the information on the correspondence between the communication partner and the character image set by the character image setting unit. The video and audio communication system according to claim 1, further comprising:

6. The image and sound communication system according to claim 1, wherein said image display means, sound output means and sound input means are constituted as a head mounted display device.

7. An eye-gaze detector arranged on the head-mounted display device, for detecting a gaze of a communication person in the audio-visual communication system using the head-mounted display device, and detecting a head movement. A head motion sensor, a voice detection unit for detecting the generated voice, and a transmission unit for transmitting the command signal in accordance with a detection output of the line-of-sight detector, the head motion sensor, and the voice detection unit. The video and audio communication system according to claim 1, further comprising:

8. A control for moving the character corresponding to the self-correspondent and a projection for projecting the character corresponding to the opponent as if it is fixed in a real space. 8. The video and audio communication system according to claim 7, wherein both of the control of the output position are performed based on an output signal of the head motion sensor.

9. The predetermined character image is a combination of a plurality of types of figures, and the character data includes a code designating each figure and coordinate data representing a positional relationship between the figures. 2. The video / audio communication system according to claim 1, further comprising first conversion means for converting the character data deformed by the deformation means into an image signal to be displayed on the display means based on a predetermined conversion format.

10. A transmitting / receiving method in a videophone device for communicating an image and a voice, the method comprising: before a communication is started, a step of creating a character image in which the correspondent arbitrarily creates a character corresponding to each of the communicating parties; A step of determining a deformed portion in which each of the correspondents arbitrarily determines a portion to be deformed in the character image; a step of determining a deformation amount in which each of the correspondents arbitrarily determines the amount of deformation of the deformed portion; A modification effective condition determination step of arbitrarily determining a condition to be executed by each of the communicating parties; and a character image transmission step of transmitting the modified portion and the transformation amount in association with each other immediately after the communication starts, and a communication start. Later, when the occurrence of the condition is detected within a period following the character image transmitting step, a deformed finger for transmitting a command for prompting the deformation is transmitted. Command transmitting step; transmitting a voice; and receiving on the receiving side the deforming command, deforming the deformed portion of the character image by a deformed amount, and displaying the deformed image. A videophone transmission / reception method, wherein each step is sequentially executed.

11. A transmission / reception method in a video telephone apparatus for communicating an image and a voice, wherein each communication person arbitrarily determines a correspondence between the communication partner and a character image before starting communication. And a character image preparing step of checking the communication partner immediately after the communication starts and preparing the character image associated with the communication partner. A deformation command receiving step of receiving a command to deform the character, and deforming the character image based on the command to deform the character received in the deformation command receiving step, and displaying the deformed image; Transmitting and receiving a videophone call, wherein the above steps are sequentially performed. Law.

12. A display means for displaying a character that does not depend on the actual appearance of the caller to the other party of the caller, and for moving the character displayed by the display means according to the call situation of the caller. An audio-visual communication system, comprising: character control means.