JP4896118B2

JP4896118B2 - Video phone terminal

Info

Publication number: JP4896118B2
Application number: JP2008313086A
Authority: JP
Inventors: 英明松尾; 崇弘牧野; 文彦山田; 哲羽田; 真西村
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2008-12-09
Filing date: 2008-12-09
Publication date: 2012-03-14
Anticipated expiration: 2023-03-19
Also published as: JP2009112027A

Description

本発明は、ユーザの感情や印象を相手に分かりやすく伝えることのできるテレビ電話端末に関する。 The present invention relates to a videophone terminal that can easily convey a user's feelings and impressions to a partner.

複数のテレビ電話端末およびネットワーク等から構成されるテレビ電話システムでは、各テレビ電話端末で撮影された映像が音声と共にネットワークを介して相手端末に送られる。このため、離れた場所にいる相手とでも顔を見ながら会話することができる。相手の顔を見ながら会話することができれば、声のトーンだけでなく相手の表情を視覚的に確認することができるため、よりリアルな高いレベルのコミュニケーションをとることができるといったメリットがある。 In a videophone system composed of a plurality of videophone terminals and a network or the like, video shot by each videophone terminal is sent to the other terminal via the network together with sound. For this reason, it is possible to have a conversation while looking at the face even with a remote party. If you can talk while looking at the other party's face, you can visually check the other party's facial expression as well as the tone of the voice.

しかし、ユーザによっては自分自身を撮影した映像がそのまま相手端末に送られるのを好まない者もいる。さらに、自分または相手が撮影した映像を見ながら会話していても面白みがないと感じるユーザもいる。このため、ユーザの顔を撮影した画像から眉、目、鼻、口等の各部位の特徴点を抽出して、当該特徴点からユーザの顔に似せた仮想のキャラクター（以下、「仮想キャラ」という。）を生成し、この仮想キャラの映像を自分の分身として相手端末に送る技術が考えられている。 However, there are some users who do not like the video of themselves being sent to the partner terminal as it is. In addition, there are users who feel that there is no interest in talking while watching video shot by themselves or the other party. For this reason, feature points of each part such as eyebrows, eyes, nose, and mouth are extracted from an image of the user's face, and a virtual character (hereinafter referred to as “virtual character”) resembling the user's face from the feature points. This is a technology that sends the video of the virtual character to the other terminal.

当該技術では、まず、ユーザの顔を撮影した画像（以下「顔画像」という。）のどの領域が顔かを認識した後、顔画像から眉や目、鼻、口といった各部位の特徴となる点（以下「特徴点」という。）を抽出する。図１は、顔画像と各特徴点を示す説明図である。次に、各部位の特徴点に基づいて、各部位の特徴が平均化された平均顔のキャラクターからユーザの顔に似せた仮想キャラを生成する。より詳しくは、抽出した特徴点と前記平均顔のキャラクターの特徴点との差分を算出し、当該差分データを前記平均顔のキャラクターに反映させることで、ユーザの顔に似せた仮想キャラを生成する。図２は、ユーザの顔に似せた仮想キャラを示す説明図である。 In this technology, first, after recognizing which area of an image obtained by photographing a user's face (hereinafter referred to as “face image”) is a feature of each part such as eyebrows, eyes, nose, and mouth. Points (hereinafter referred to as “feature points”) are extracted. FIG. 1 is an explanatory diagram showing a face image and each feature point. Next, based on the feature points of each part, a virtual character resembling the user's face is generated from the average face character obtained by averaging the features of each part. More specifically, a difference between the extracted feature point and the feature point of the average face character is calculated, and the difference data is reflected in the average face character, thereby generating a virtual character resembling the user's face. . FIG. 2 is an explanatory diagram showing a virtual character imitating a user's face.

そして、ユーザの顔画像における各特徴点をトラッキングして、各特徴点の動きを仮想キャラに反映させる。こうすることで、ユーザの表情の変化に伴う各部位の動きが仮想キャラの各部位の動きと連携するため、ユーザの表情の変化に合わせて仮想キャラの表情も同様に変化することとなる。なお、仮想キャラをユーザの顔に似させることなく、全く別のキャラクターにユーザの顔画像における各特徴点の動きを反映させることで、ユーザの表情の変化に合わせて仮想キャラの表情を変化させることもできる。 Then, each feature point in the user's face image is tracked, and the movement of each feature point is reflected in the virtual character. By doing so, the movement of each part accompanying the change in the facial expression of the user is linked with the movement of each part of the virtual character, so that the facial expression of the virtual character also changes in accordance with the change in the facial expression of the user. In addition, by changing the movement of each feature point in the user's face image to a completely different character without making the virtual character resemble the user's face, the expression of the virtual character is changed according to the change of the user's expression. You can also.

さらに、顔を形成する部位の全てが顔画像の座標軸上で同じ方向に移動すれば、顔全体が動いたとみなすことができる。このため、ユーザが頷いたり、首をかしげたり、頭を振ったとき、この動作を仮想キャラに反映することができる。 Furthermore, if all the parts forming the face move in the same direction on the coordinate axis of the face image, it can be considered that the entire face has moved. Therefore, when the user crawls, crawls his head or shakes his head, this action can be reflected in the virtual character.

特表２００２−５１１６１７号公報Japanese translation of PCT publication No. 2002-511617 特表２００２−５１１６２０号公報Japanese translation of PCT publication No. 2002-511620

このように、上記従来の技術では、ユーザの表情の変化に伴う各部位の動きが仮想キャラの各部位の動きに連携しているため、仮想キャラの表情はユーザの表情に合わせて変化する。さらに、ユーザの頭部の動きが仮想キャラに反映されるため、ユーザが頷いたり、首をかしげたり、頭を振ると、仮想キャラも同様の動きをする。 As described above, in the above-described conventional technology, the movement of each part accompanying the change in the facial expression of the user is linked to the movement of each part of the virtual character, so the facial expression of the virtual character changes in accordance with the facial expression of the user. Furthermore, since the movement of the user's head is reflected in the virtual character, the virtual character moves in the same manner when the user crawls, crawls his head or shakes his head.

しかし、当該従来の技術は、ユーザの表情の変化および頭部の動きをそのまま仮想キャラに反映させるだけであるため、音声を除いて、上述したような仮想キャラの表情および動きだけでは表すことのできない感情または印象を表現することはできない。しかし、ユーザの感情または印象を相手に分かりやすく伝えるためには、ユーザの表情の変化をそのまま仮想キャラの表情に反映させるよりも、表情等の変化を強調したりマーク等で表した方がより伝わりやすいと考えられる。また、この方が仮想キャラの表情や動きに面白みが加わるため、エンターテイメント性の点でも勝ると考えられる。 However, since the conventional technique only reflects the change in the facial expression of the user and the movement of the head as it is in the virtual character, it can be expressed only by the facial expression and movement of the virtual character as described above, except for voice. It cannot express emotions or impressions that cannot be expressed. However, in order to convey the user's emotions or impressions to the other party in an easy-to-understand manner, it is better to emphasize changes in facial expressions, etc. or to express them with marks etc. It is thought that it is easy to communicate. In addition, this is more interesting in terms of entertainment because it adds more fun to the expression and movement of the virtual character.

本発明は、上記従来の要望に鑑みてなされたものであって、ユーザの感情や印象を相手に分かりやすく伝えることのできるテレビ電話端末を提供することを目的としている。 The present invention has been made in view of the above-described conventional demands, and an object of the present invention is to provide a videophone terminal that can easily convey a user's feelings and impressions to the other party.

上記目的を達成するために、本発明に係るテレビ電話端末は、人の顔に基づいて生成された仮想のキャラクター（以下「仮想キャラ」という。）を含む映像および音声によって、ネットワークを介して他の端末と通信を行うテレビ電話端末であって、映像および音声のデータを送受信する送受信部と、複数のボタンを有する入力部と、前記他の端末との通信を終了する際に、前記入力部の複数のボタンのうちの第１のボタンの操作が行われた場合は、回線を切断する前に、仮想キャラを含む第１の映像が画面から消えていく映像を前記送受信部に送信させ、前記入力部の複数のボタンのうちの第２のボタンの操作が行われた場合は、回線を切断する前に、仮想キャラを含む第２の映像が画面から消えていく映像を前記送受信部に送信させる制御部と、を備え、前記第１のボタンと、前記第２のボタンとは、異なるボタンであり、前記仮想キャラを含む第１の映像と、前記仮想キャラを含む第２の映像とは、異なる映像であることを特徴とする。 In order to achieve the above object, a videophone terminal according to the present invention is provided via a network by video and audio including a virtual character (hereinafter referred to as “virtual character”) generated based on a human face. A videophone terminal that communicates with the other terminal, and a transmission / reception unit that transmits and receives video and audio data; an input unit that includes a plurality of buttons; When the first button of the plurality of buttons is operated, before disconnecting the line, the first video including the virtual character disappears from the screen and is transmitted to the transmission / reception unit. When the second button of the plurality of buttons of the input unit is operated, before the line is disconnected, the second video including the virtual character disappears from the screen to the transmission / reception unit. System to send The first button and the second button are different buttons, and the first video including the virtual character and the second video including the virtual character are different. It is a video.

また、本発明に係るプログラムは、映像および音声のデータを送受信する送受信部を備え、人の顔に基づいて生成された仮想のキャラクター（以下「仮想キャラ」という。）を含む映像および音声によって、ネットワークを介して他の端末と通信を行うコンピュータであるテレビ電話端末に、複数のボタンによる入力を受け付けるステップと、前記他の端末との通信を終了する際に、前記複数のボタンのうちの第１のボタンの操作が行われた場合は、回線を切断する前に、仮想キャラを含む第１の映像が画面から消えていく映像を前記送受信部に送信させ、前記複数のボタンのうちの第２のボタンの操作が行われた場合は、回線を切断する前に、仮想キャラを含む第２の映像が画面から消えていく映像を前記送受信部に送信させるステップと、実行させるためのプログラムであって、前記第１のボタンと、前記第２のボタンとは、異なるボタンであり、前記仮想キャラを含む第１の映像と、前記仮想キャラを含む第２の映像とは、異なる映像であることを特徴とするプログラムである。The program according to the present invention includes a transmission / reception unit that transmits and receives video and audio data, and includes video and audio including a virtual character (hereinafter referred to as “virtual character”) generated based on a human face. A step of accepting input by a plurality of buttons to a videophone terminal which is a computer that communicates with another terminal via a network; and when the communication with the other terminal is terminated, When the button 1 is operated, before the line is disconnected, the first video including the virtual character disappears from the screen is transmitted to the transmission / reception unit, and the first of the plurality of buttons is transmitted. When the operation of the button 2 is performed, before the line is disconnected, the second video including the virtual character disappears from the screen and is transmitted to the transmitting / receiving unit. A program for execution, wherein the first button and the second button are different buttons, a first video including the virtual character, a second video including the virtual character, Is a program characterized by different images.

以上説明したように、本発明に係るテレビ電話端末によれば、ユーザの感情や印象を相手に分かりやすく伝えることができる。特に、映像の内容によってユーザが持った会話の印象を相手に伝えることができる。 As described above, according to the videophone terminal of the present invention, it is possible to convey the user's feelings and impressions to the other party in an easily understandable manner. In particular, it is possible to convey the impression of the conversation the user has to the other party according to the content of the video.

以下、本発明に係るテレビ電話端末の実施の形態について、図面を参照して説明する。 Embodiments of a videophone terminal according to the present invention will be described below with reference to the drawings.

本実施形態のテレビ電話端末は、動画または静止画（以下、まとめて「映像」という。）を撮影可能なカメラを備えた携帯電話やＰＨＳ、ＰＤＡ等の通信端末であり、ネットワークを介して別のテレビ電話端末と映像および音声を送受信することによりテレビ電話として用いることができる。但し、テレビ電話中にテレビ電話端末間で送受信される映像は、カメラで撮影した映像の他、カメラで撮影したユーザの顔に基づいて生成された仮想のキャラクター（以下「仮想キャラ」という。）の映像であっても良い。本実施形態では、当該仮想キャラの映像が送受信される場合について説明する。 The video phone terminal according to the present embodiment is a communication terminal such as a mobile phone, a PHS, or a PDA equipped with a camera capable of shooting a moving image or a still image (hereinafter collectively referred to as “video”). The videophone terminal can be used as a videophone by transmitting and receiving video and audio. However, a video transmitted / received between videophone terminals during a videophone call is a virtual character (hereinafter referred to as a “virtual character”) generated based on a user's face shot with a camera in addition to a video shot with a camera. It may be a video. This embodiment demonstrates the case where the image | video of the said virtual character is transmitted / received.

以下、仮想キャラの生成について説明する。本実施形態のテレビ電話端末は、カメラによって撮影されたユーザの顔画像からどの領域が顔かを認識する。次に、顔画像から眉や目、鼻、口といった各部位の特徴となる点（以下「特徴点」という。）を抽出する。図１は、顔画像と各特徴点を示す説明図である。顔を構成する主要な部位である眉、目、鼻、口は表情によって微妙に変化するため、これらの部位のように、表情が変化すると他の特徴点との相対位置が変わる部分が特徴点として抽出される。 Hereinafter, generation of a virtual character will be described. The videophone terminal according to the present embodiment recognizes which region is a face from the face image of the user taken by the camera. Next, points (hereinafter referred to as “feature points”) that are features of each part such as eyebrows, eyes, nose, and mouth are extracted from the face image. FIG. 1 is an explanatory diagram showing a face image and each feature point. The eyebrows, eyes, nose, and mouth, which are the main parts that make up the face, change slightly depending on the facial expression. As these parts change, the relative position with other feature points changes when the facial expression changes. Extracted as

次に、各部位の特徴点に基づいて、各部位の特徴が平均化された平均顔のキャラクターからユーザの顔に近い仮想キャラを生成する。より詳しくは、抽出した特徴点と前記平均顔のキャラクターの特徴点との差分を算出し、当該差分データを前記平均顔のキャラクターに反映させることで、ユーザの顔に近い仮想キャラを生成する。図２は、ユーザの顔に似せた仮想キャラを示す説明図である。 Next, based on the feature points of each part, a virtual character close to the user's face is generated from the average face character obtained by averaging the features of each part. More specifically, a difference between the extracted feature point and the feature point of the average face character is calculated, and a virtual character close to the user's face is generated by reflecting the difference data on the average face character. FIG. 2 is an explanatory diagram showing a virtual character imitating a user's face.

そして、ユーザの顔画像における各特徴点をトラッキングして、各特徴点の動きを仮想キャラに反映させる。また、顔を形成する全ての部位が顔画像の座標軸上で同じ方向に移動すれば、顔全体が動いたとみなすことができるため、ユーザが頷いたり、首をかしげたり、頭を振ったとき、この動作を仮想キャラに反映させる。 Then, each feature point in the user's face image is tracked, and the movement of each feature point is reflected in the virtual character. Also, if all the parts that form the face move in the same direction on the coordinate axis of the face image, it can be considered that the entire face has moved, so when the user scolds, crawls his head, shakes his head, This action is reflected in the virtual character.

以下、本実施形態のテレビ電話端末の構成についての説明を、図３を参照して行う。本実施形態のテレビ電話端末は、同図に示すように、カメラ１０１と、映像処理部１０３と、マイク１０５と、スピーカ１０７と、音声処理部１０９と、仮想キャラ生成部１１１と、表示部１１３と、キーボード１１５と、記憶部１１７と、中央処理部１１９と、無線部１２１と、アンテナ１２３とを備えて構成されている。 Hereinafter, the configuration of the videophone terminal of the present embodiment will be described with reference to FIG. As shown in the figure, the videophone terminal of the present embodiment includes a camera 101, a video processing unit 103, a microphone 105, a speaker 107, an audio processing unit 109, a virtual character generation unit 111, and a display unit 113. A keyboard 115, a storage unit 117, a central processing unit 119, a wireless unit 121, and an antenna 123.

映像処理部１０３は、カメラ１０１で撮影された映像を解析することで、映像中から顔を認識し特徴点を抽出するものである。また、音声処理部１０９は、マイク１０５から入力された自分の音声に対して所定の処理を行ったり、相手のテレビ電話端末から受け取った相手の音声データを処理してスピーカ１０７から出力するものである。なお、音声処理部１０９が行う処理には、音量や音韻、ピッチ等といった音声の特徴となる要素の解析が含まれ、当該解析は自分および相手の音声に対して行われる。 The video processing unit 103 analyzes a video taken by the camera 101 to recognize a face from the video and extract a feature point. The voice processing unit 109 performs predetermined processing on the own voice input from the microphone 105, processes the other party's voice data received from the other party's videophone terminal, and outputs the processed data from the speaker 107. is there. Note that the processing performed by the speech processing unit 109 includes analysis of elements that are features of speech such as volume, phoneme, and pitch, and the analysis is performed on the speech of the user and the other party.

また、仮想キャラ生成部１１１は、映像処理部１０３によって抽出された特徴点等に基づいて仮想キャラを生成し、カメラ１０１で撮影したユーザの表情や動作を当該仮想キャラに反映させるものである。なお、仮想キャラ生成部１１１は、生成した仮想キャラを中央処理部１１９からの指示に基づいて部分的または全体的に変更することもある。なお、仮想キャラ生成部１１１は、記憶部１１７に記憶されたスケジュール情報や日付情報に基づいて、表示部１１３に表示される仮想キャラの背景を所定の画像に設定する。例えば、ユーザの誕生日にはケーキの画像を背景とし、３月３日には雛壇の画像、５月５日には鯉のぼりの画像といったように、日または期間によって背景を変える。 The virtual character generation unit 111 generates a virtual character based on the feature points extracted by the video processing unit 103, and reflects the user's facial expression and action taken by the camera 101 on the virtual character. Note that the virtual character generation unit 111 may change the generated virtual character partially or entirely based on an instruction from the central processing unit 119. Note that the virtual character generation unit 111 sets the background of the virtual character displayed on the display unit 113 to a predetermined image based on the schedule information and date information stored in the storage unit 117. For example, a cake image is used as the background for the user's birthday, and the background is changed according to the date or period, such as a doll image on March 3 and a carp streamer image on May 5.

また、記憶部１１７は、仮想キャラの表情の変化や動作に関するプログラム、所定の映像や音声のデータ、ユーザのスケジュール情報、日付情報等を記憶するものである。なお、仮想キャラ生成部１１１は、記憶部１１７に記憶されたスケジュール情報や日付情報に基づいて、表示部１１３に表示される仮想キャラの背景を所定の画像に設定する。例えば、ユーザの誕生日にはケーキの画像を背景とし、３月３日には雛壇の画像、５月５日には鯉のぼりの画像といったように、日または期間によって背景を変える。 In addition, the storage unit 117 stores a program related to changes and actions of the virtual character's facial expression, predetermined video and audio data, user schedule information, date information, and the like. Note that the virtual character generation unit 111 sets the background of the virtual character displayed on the display unit 113 to a predetermined image based on the schedule information and date information stored in the storage unit 117. For example, a cake image is used as the background for the user's birthday, and the background is changed according to the date or period, such as a doll image on March 3 and a carp streamer image on May 5.

また、キーボード１１５は、後述する保留モードへの移行や回線の切断、他の指示等を中央処理部１１９に行うためのものである。また、中央処理部１１９は、キーボード１１５の操作やキーワードに応じた映像や音声の処理、回線の接続／切断時、保留モードの開始／解除時における所定の処理、映像データおよび音声データの圧縮伸長処理等を行うものである。また、無線部１２１は、映像および音声のデータの変復調等を行って、アンテナ１２３を介して信号を送受信するものである。 The keyboard 115 is used for performing a transition to a hold mode (to be described later), disconnecting a line, and other instructions to the central processing unit 119. In addition, the central processing unit 119 performs video and audio processing according to the operation of the keyboard 115 and keywords, predetermined processing at the time of connection / disconnection of a line, start / release of a hold mode, and compression / decompression of video data and audio data. Processing is performed. The wireless unit 121 performs modulation and demodulation of video and audio data, and transmits and receives signals via the antenna 123.

以上の説明を踏まえて、〔第１の実施形態〕、〔第２の実施形態〕、〔第３の実施形態〕の順に本発明に係るテレビ電話端末の実施の形態について詳細に説明する。 Based on the above description, an embodiment of the videophone terminal according to the present invention will be described in detail in the order of [First Embodiment], [Second Embodiment], and [Third Embodiment].

〔第１の実施形態〕
第１の実施形態では、上述の仮想キャラを利用したテレビ電話による会話を行っているとき、図３に示したキーボード１１５を用いてユーザが所定の操作を行ったり、音声処理部１０９が自分の発した音声の中から所定のキーワードを認識すると、仮想キャラに変化を加えた映像または全く別の映像とする。 [First Embodiment]
In the first embodiment, when a video phone conversation using the above-mentioned virtual character is performed, the user performs a predetermined operation using the keyboard 115 shown in FIG. When a predetermined keyword is recognized from the uttered voice, a video in which the virtual character is changed or a completely different video is obtained.

仮想キャラに変化を加えるとは、例えば、仮想キャラの顔を構成する各部位または顔全体の大きさを変えたり、仮想キャラの目に縦線を入れたり頬を赤らめるといった感情を表現する模様を追加することである。なお、仮想キャラの目だけを通常よりも大きくすると驚きを表すことができ、顔全体を通常よりも大きくして赤らめると怒りを表すことができる。 To change the virtual character is, for example, a pattern that expresses emotions such as changing the size of each part of the virtual character's face or the entire face, putting vertical lines in the virtual character's eyes, or blushing the cheeks. Is to add. If only the virtual character's eyes are made larger than usual, a surprise can be expressed, and if the entire face is made larger than usual and turned red, anger can be expressed.

また、全く別の映像として考えられるものには、例えば、エクスクラメーションマーク（！）やクエスチョンマーク（？）等がある。エクスクラメーションマーク（！）は感嘆を表すことができ、クエスチョンマーク（？）は疑問を表すことができる。 Further, what can be considered as completely different images include, for example, an exclamation mark (!) And a question mark (?). The exclamation mark (!) Can express exclamation, and the question mark (?) Can indicate doubt.

また、親指を立てた画像を予め記憶部１１７に記憶させておき、当該画像を「よくやった！」というキーワードに対応付けておく。こうすると、自分が発した発言から音声処理部１０９が当該キーワードを認識すると、中央処理部１１９が記憶部１１７から親指を立てた画像を読み出して、仮想キャラの映像に代えてまたは重ねて表示する。なお、画像に限らず動画であっても良い。 In addition, the image with the thumb raised is stored in the storage unit 117 in advance, and the image is associated with the keyword “well done!”. In this way, when the voice processing unit 109 recognizes the keyword from the utterance that it has uttered, the central processing unit 119 reads the image with the thumb raised from the storage unit 117 and displays it instead of or superimposed on the virtual character video. . Note that the image is not limited to an image and may be a moving image.

同様に、所定の効果音を予め記憶部１１７に記憶させておき、当該効果音を所定のキーボード操作に対応付けておく。こうすると、キーボード１１５を用いて所定の操作を行えば、中央処理部１１９が記憶部１１７からこの効果音のデータを読み出して、自分および相手の音声に代えてまたは重ねて再生する。 Similarly, a predetermined sound effect is stored in the storage unit 117 in advance, and the sound effect is associated with a predetermined keyboard operation. In this way, when a predetermined operation is performed using the keyboard 115, the central processing unit 119 reads out the sound effect data from the storage unit 117 and reproduces it instead of or overlying the sound of the user and the other party.

以上説明したように、本実施形態では、ユーザがキーボード１１５を用いて所定の操作を行ったり、音声処理部１０９が所定のキーワードを認識すると、通常の仮想キャラとは異なる表情や動作、全く別の映像が表示されるため、仮想キャラの表情および動きだけでは表すことのできない感情または印象を表現することができる。また、この場合、仮想キャラの表情や動きに面白みが加わるため、特に仮想キャラを用いた映像によるコミュニケーションのエンターテイメント性を高めることができる。また、キーボード操作に応じて所定の効果音を再生することもできるため、声や映像だけでは表すことのできないユーザの感情または印象を音で表現することができる。 As described above, in the present embodiment, when a user performs a predetermined operation using the keyboard 115 or when the voice processing unit 109 recognizes a predetermined keyword, the expression and action different from those of a normal virtual character are completely different. Therefore, it is possible to express an emotion or impression that cannot be expressed only by the facial expression and movement of the virtual character. Further, in this case, since the interestingness is added to the facial expression and movement of the virtual character, it is possible to improve the entertainment characteristics of the communication using the video using the virtual character. In addition, since a predetermined sound effect can be reproduced in accordance with a keyboard operation, it is possible to express a user's emotion or impression that cannot be expressed only by voice or video with sound.

〔第２の実施形態〕
第２の実施形態では、上述の仮想キャラを利用したテレビ電話による会話を行っている最中に保留モードに移行し、当該保留モードを解除した際に、メロディの再生と共に仮想キャラが画面に復帰する。保留中にユーザが図３に示したキーボード１１５の保留ボタンを押すと、中央処理部１１９がこれを検知して保留モードを解除する。または、保留中にカメラ１０１で撮影された映像から映像処理部１０３によって抽出された特徴点が保留前に会話をしていたユーザに相当すれば、中央処理部１１９が保留モードを解除する。 [Second Embodiment]
In the second embodiment, the virtual character returns to the screen along with the reproduction of the melody when shifting to the hold mode during the videophone conversation using the virtual character and releasing the hold mode. To do. When the user presses the hold button on the keyboard 115 shown in FIG. 3 during the hold, the central processing unit 119 detects this and releases the hold mode. Alternatively, if the feature point extracted by the video processing unit 103 from the video captured by the camera 101 during the hold corresponds to the user who had a conversation before the hold, the central processing unit 119 releases the hold mode.

このとき、中央処理部１１９は、所定のプログラムを実行して、記憶部１１７から所定のメロディデータを読み出し、メロディの再生と共に保留前に表示されていた仮想キャラを表示させる。但し、保留モードが解除されてから実際に仮想キャラを表示するまでの間には、カメラ１０１で撮影された映像から映像処理部１０３が抽出した特徴点に基づいて、仮想キャラ生成部１１１がユーザの表情や動作を仮想キャラに反映させる必要があるため、多少の時間を要する。このため、この間に表示される画面は、表情は変わらずメロディと共に所定の動作をする仮想キャラの映像である。なお、所定の動きとは、例えば仮想キャラがドアを開けて入ってくる等の動作である。 At this time, the central processing unit 119 executes a predetermined program, reads predetermined melody data from the storage unit 117, and displays the virtual character that was displayed before the suspension along with the reproduction of the melody. However, between the release of the hold mode and the actual display of the virtual character, the virtual character generation unit 111 is based on the feature points extracted by the video processing unit 103 from the video captured by the camera 101. Since it is necessary to reflect the expression and movement of the character in the virtual character, it takes some time. For this reason, the screen displayed during this time is an image of a virtual character that performs a predetermined action together with the melody without changing the expression. The predetermined movement is, for example, an action such as a virtual character opening a door and entering.

以上説明したように、本実施形態によれば、保留モードを解除すると、仮想キャラが所定のメロディに連動して所定の動作を行う映像が表示されるため、相手は保留状態から復帰したことを視覚的に知ることができる。 As described above, according to the present embodiment, when the hold mode is canceled, an image in which the virtual character performs a predetermined action in conjunction with a predetermined melody is displayed, so that the opponent has returned from the hold state. You can know visually.

〔第３の実施形態〕
第３の実施形態では、上述の仮想キャラを利用したテレビ電話による会話を終了する際に、ユーザによる図３に示したキーボード１１５のボタン選択に応じて、仮想キャラが画面から消えていく所定の映像を最後に表示して回線を切断する。所定の映像とは、仮想キャラが花を持って画面から消えていくといった映像や、仮想キャラが重荷で頭からつぶれていくといった映像等である。 [Third Embodiment]
In the third embodiment, when the videophone conversation using the above-mentioned virtual character is terminated, the virtual character disappears from the screen in response to the user selecting the button on the keyboard 115 shown in FIG. The video is displayed last and the line is disconnected. The predetermined video is a video in which the virtual character disappears from the screen with a flower, a video in which the virtual character is crushed from the head with a heavy load, or the like.

但し、映像の内容によって相手に与える印象が異なる。このため、ユーザは、例えば、会話が楽しかったと感じれば前記仮想キャラが花を持って画面から消えていくといった映像で終了するよう所定のボタンを押し、特に何の印象も持たなかったときは前記仮想キャラが重荷で頭からつぶれていくといった映像で終了するよう別のボタンを押す。このように、会話終了時に所定のボタンが押されると、中央処理部１１９は、回線を切断する前に、当該押されたボタンに応じた映像のデータを記憶部１１７から読み出して送信する。 However, the impression given to the other party depends on the content of the video. For this reason, for example, when the user feels that the conversation is enjoyable, the user presses a predetermined button to end the image in which the virtual character disappears from the screen with a flower. Press another button to end the video with the virtual character crushed from the head under heavy load. As described above, when a predetermined button is pressed at the end of the conversation, the central processing unit 119 reads out and transmits video data corresponding to the pressed button from the storage unit 117 before disconnecting the line.

以上説明したように、本実施形態によれば、会話を終了する際、ボタン選択に応じて異なる映像が表示されるため、その映像の内容によってユーザが持った会話の印象を相手に伝えることができる。 As described above, according to the present embodiment, when a conversation is ended, a different video is displayed according to the button selection, so that the user can convey the impression of the conversation that the user has depending on the content of the video. it can.

なお、上記説明した各実施形態のテレビ電話端末の映像処理部１０３、音声処理部１０９、仮想キャラ生成部１１１および中央処理部１１９はプログラムを実行することによって動作するものであっても良い。 Note that the video processing unit 103, the audio processing unit 109, the virtual character generation unit 111, and the central processing unit 119 of the videophone terminal of each embodiment described above may operate by executing a program.

顔画像と各特徴点を示す説明図Explanatory drawing showing face image and each feature point ユーザの顔に似せた仮想キャラを示す説明図Explanatory drawing which shows the virtual character imitating the user's face 本発明に係る一実施形態のテレビ電話端末の構成を示すブロック図The block diagram which shows the structure of the videophone terminal of one Embodiment which concerns on this invention

Explanation of symbols

１０１カメラ
１０３映像処理部
１０５マイク
１０７スピーカ
１０９音声処理部
１１１仮想キャラ生成部
１１３表示部
１１５キーボード
１１７記憶部
１１９中央処理部
１２１無線部
１２３アンテナ 101 Camera 103 Video Processing Unit 105 Microphone 107 Speaker 109 Audio Processing Unit 111 Virtual Character Generation Unit 113 Display Unit 115 Keyboard 117 Storage Unit 119 Central Processing Unit 121 Radio Unit 123 Antenna

Claims

A videophone terminal that communicates with other terminals via a network by video and audio including a virtual character (hereinafter referred to as “virtual character”) generated based on a human face,
A transmission / reception unit for transmitting and receiving video and audio data;
An input unit having a plurality of buttons;
When the first button of the plurality of buttons of the input unit is operated when ending communication with the other terminal, the first character including the virtual character is disconnected before disconnecting the line. When the video disappearing from the screen is transmitted to the transmission / reception unit and the second button among the plurality of buttons of the input unit is operated, the virtual character is included before the line is disconnected. A control unit that causes the transmission / reception unit to transmit a video in which the second video disappears from the screen,
The first button and the second button are different buttons,
The videophone terminal , wherein the first video including the virtual character and the second video including the virtual character are different from each other .

A transmission / reception unit that transmits and receives video and audio data, and communicates with other terminals via a network using video and audio including virtual characters (hereinafter referred to as “virtual characters”) generated based on human faces. To a videophone terminal that is a computer that performs
Accepting input from multiple buttons;
When the first button of the plurality of buttons is operated when the communication with the other terminal is terminated, the first video including the virtual character is displayed on the screen before the line is disconnected. When the video disappearing from the screen is transmitted to the transmission / reception unit and the second button among the plurality of buttons is operated, the second video including the virtual character is displayed on the screen before the line is disconnected. A step of causing the transmitter / receiver to transmit the video disappearing from the program, and a program for executing the video,
The first button and the second button are different buttons,
The first image including the virtual character and the second image including the virtual character are different images.
A program characterized by that.