JP2004287558A

JP2004287558A - Video phone terminal, virtual character forming device, and virtual character movement control device

Info

Publication number: JP2004287558A
Application number: JP2003075858A
Authority: JP
Inventors: Hideaki Matsuo; 英明松尾; Shin Yamada; 伸山田; Kaoru Morita; かおる森田; Fumiyuki Kato; 文之加藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-03-19
Filing date: 2003-03-19
Publication date: 2004-10-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide a video phone terminal, a virtual character forming device and a virtual character movement control device capable of utilizing a virtual character of high entertaining nature. <P>SOLUTION: The virtual character formed based on a user's face photographed by a camera 101 is not formed to completely look like the user's face, but a virtual character forming part 111 forms the virtual character delicately or moderately resembling the user's face by emphasizing the features of previously specified parts. The variation of user's expression or motion is not reflected as it is in the expression or motion of the virtual character, but the virtual character forming part 111 provides the virtual character with original individuality of having expression or motion which the user does not actually have. Further, the neck motion of the user is not reflected as it is in the motion of the virtual character, but the virtual character forming part 111 moves the neck of the virtual character also according to other factors. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、エンターテイメント性の高い仮想キャラを活用することのできるテレビ電話端末、並びに、仮想キャラ生成装置および仮想キャラ動作制御装置に関する。
【０００２】
【従来の技術】
複数のテレビ電話端末およびネットワーク等から構成されるテレビ電話システムでは、各テレビ電話端末で撮影された映像が音声と共にネットワークを介して相手端末に送られる。このため、離れた場所にいる相手とでも顔を見ながら会話することができる。相手の顔を見ながら会話することができれば、声のトーンだけでなく相手の表情を視覚的に確認することができるため、よりリアルな高いレベルのコミュニケーションをとることができるといったメリットがある。
【０００３】
しかし、ユーザによっては自分自身を撮影した映像がそのまま相手端末に送られるのを好まない者もいる。さらに、自分または相手が撮影した映像を見ながら会話していても面白みがないと感じるユーザもいる。このため、ユーザの顔を撮影した画像から眉、目、鼻、口等の各部位の特徴点を抽出して、当該特徴点からユーザの顔に似せた仮想のキャラクター（以下、「仮想キャラ」という。）を生成し、この仮想キャラの映像を自分の分身として相手端末に送る技術が考えられている。
【０００４】
当該技術では、まず、ユーザの顔を撮影した画像（以下「顔画像」という。）のどの領域が顔かを認識した後、顔画像から眉や目、鼻、口といった各部位の特徴となる点（以下「特徴点」という。）を抽出する。図１は、顔画像と各特徴点を示す説明図である。次に、各部位の特徴点に基づいて、各部位の特徴が平均化された平均顔のキャラクターからユーザの顔に似せた仮想キャラを生成する。より詳しくは、抽出した特徴点と前記平均顔のキャラクターの特徴点との差分を算出し、当該差分データを前記平均顔のキャラクターに反映させることで、ユーザの顔に似せた仮想キャラを生成する。図４は、ユーザの顔に似せた仮想キャラを示す説明図である。
【０００５】
そして、ユーザの顔画像における各特徴点をトラッキングして、各特徴点の動きを仮想キャラに反映させる。こうすることで、ユーザの表情の変化に伴う各部位の動きが仮想キャラの各部位の動きと連携するため、ユーザの表情の変化に合わせて仮想キャラの表情も同様に変化することとなる。なお、仮想キャラをユーザの顔に似させることなく、全く別のキャラクターにユーザの顔画像における各特徴点の動きを反映させることで、ユーザの表情の変化に合わせて仮想キャラの表情を変化させることもできる。
【０００６】
さらに、顔を形成する部位の全てが顔画像の座標軸上で同じ方向に移動すれば、顔全体が動いたとみなすことができる。このため、ユーザが頷いたり、首をかしげたり、頭を振ったとき、この動作を仮想キャラに反映することができる。
【０００７】
【特許文献１】
特表２００２−５１１６１７号公報
【特許文献２】
特表２００２−５１１６２０号公報
【０００８】
【発明が解決しようとする課題】
このように、上記従来の技術では、仮想キャラをユーザの顔に似せるか、全く別のキャラクターとしている。しかし、エンターテイメント性といった点から仮想キャラを考えると、ユーザの顔に似すぎているよりも、微妙またはほどほど似ている程度に面白みがあると考えられる。一方、仮想キャラが全く別のキャラクターであると会話相手の顔を彷彿とさせないため、少なくとも相手を識別できる程度には似ていることが望ましい。
【０００９】
また、上記従来の技術では、ユーザの表情の変化に伴う各部位の動きが仮想キャラの各部位の動きに連携しているため、仮想キャラの表情はユーザの表情に合わせて変化する。しかし、エンターテイメント性といった点から仮想キャラを考えると、ユーザの表情の変化をそのまま仮想キャラの表情に反映させるよりも、仮想キャラの表情や動きに意外性のある方が面白みの点で勝ると考えられる。
【００１０】
さらに、上記従来の技術では、ユーザの首の動きが仮想キャラに反映されるため、ユーザが頷いたり、首をかしげたり、頭を振ると、仮想キャラも同様の動きをする。しかし、エンターテイメント性といった点から仮想キャラを考えると、ユーザの首の動きをそのまま仮想キャラの動きに反映させるよりも、仮想キャラの動きに意外性のある方が面白みの点で勝ると考えられる。
【００１１】
したがって、娯楽的な要素を含んだコミュニケーションツールとしてテレビ電話が利用される場合には、エンターテイメント性の高い仮想キャラを活用できることが望ましい。
【００１２】
本発明は、上記従来の要望に鑑みてなされたものであって、エンターテイメント性の高い仮想キャラを活用することのできるテレビ電話端末、並びに、仮想キャラ生成装置および仮想キャラ動作制御装置を提供することを目的としている。
【００１３】
【課題を解決するための手段】
上記目的を達成するために、本発明に係る仮想キャラ生成装置は、人の顔に基づいて仮想のキャラクター（以下「仮想キャラ」という。）を生成する仮想キャラ生成装置であって、ユーザの顔を撮影した映像から各部位の特徴点を抽出する特徴点抽出手段と、前記特徴点抽出手段によって抽出された各部位の特徴点と、前記各部位の特徴が平均化された平均顔のキャラクターの特徴点との差分を算出する特徴点差分算出手段と、前記特徴点差分算出手段によって算出された差分を前記平均顔のキャラクターに反映させて、前記ユーザの仮想キャラを生成する仮想キャラ生成手段と、を備え、前記仮想キャラ生成手段は、所定の部位に対しては、当該所定の部位の特徴を強調するよう前記特徴点差分算出手段によって算出された差分を変更した上で前記平均顔のキャラクターに反映させる。
【００１４】
このように、所定の部位の特徴を強調することで、ユーザの顔に微妙またはほどほど似せた仮想キャラを生成することができる。仮想キャラは、ユーザの顔に似すぎているよりも微妙またはほどほど似ている程度に面白みがあると考えられるため、エンターテイメント性の高い仮想キャラを活用することができる。
【００１５】
また、本発明に係る仮想キャラ動作制御装置は、人の顔に基づいて生成された仮想のキャラクター（以下「仮想キャラ」という。）に、ユーザの実際の表情または動きとは関係のない独自の個性を持たせる仮想キャラ動作制御装置であって、独自の個性を実現するプログラムを少なくとも１つ記憶したプログラム記憶装置から所望のプログラムをダウンロードして、当該ダウンロードしたプログラムを実行することで、前記仮想キャラが前記プログラムに対応する個性に準じた所定の動作を行うよう制御する。
【００１６】
このように、プログラム記憶装置から独自の個性を実現するプログラムをダウンロードして実行することで、仮想キャラにユーザの実際の表情や動きとは異なる表情または動きをさせることができる。したがって、仮想キャラの表情または動きの変化を楽しむことができるため、エンターテイメント性の高い仮想キャラを活用することができる。
【００１７】
また、本発明に係る仮想キャラ動作制御装置は、人の顔に基づいて生成された仮想のキャラクター（以下「仮想キャラ」という。）の首の動きを制御する仮想キャラ動作制御装置であって、前記仮想キャラの基となるユーザの実際の首の動きとは別に、キーワードまたは音声若しくは映像の特徴に応じて、またはランダムに、前記仮想キャラが所定の首の動きを行うよう制御する。
【００１８】
このように、ユーザの実際の動きとは異なる首の動きを仮想キャラが行う。ユーザの首の動きをそのまま仮想キャラの動きに反映させるよりも、この方が仮想キャラの動きに意外性があり面白みの点で勝ると考えられる。したがって、エンターテイメント性の高い仮想キャラを活用することができる。
【００１９】
さらに、本発明に係るテレビ電話端末は、請求項１に記載の仮想キャラ生成装置または請求項２若しくは３に記載の仮想キャラ動作制御装置を備え、ネットワークを介して他の端末と前記仮想キャラの映像および音声による通信を行う。したがって、仮想キャラを用いたコミュニケーションのエンターテイメント性を高めることができる。
【００２０】
【発明の実施の形態】
以下、本発明に係るテレビ電話端末の実施の形態について、図面を参照して説明する。
【００２１】
本実施形態のテレビ電話端末は、動画または静止画（以下、まとめて「映像」という。）を撮影可能なカメラを備えた携帯電話やＰＨＳ、ＰＤＡ等の通信端末であり、ネットワークを介して別のテレビ電話端末と映像および音声を送受信することによりテレビ電話として用いることができる。但し、テレビ電話中に端末間で送受信される映像は、カメラで撮影した映像の他、カメラで撮影したユーザの顔に基づいて生成された仮想のキャラクター（以下「仮想キャラ」という。）の映像であっても良い。本実施形態では、当該仮想キャラの映像が送受信される場合について説明する。
【００２２】
以下、仮想キャラの生成について説明する。本実施形態のテレビ電話端末は、カメラによって撮影されたユーザの顔画像からどの領域が顔かを認識する。次に、顔画像から眉や目、鼻、口といった各部位の特徴となる点（以下「特徴点」という。）を抽出する。図１は、顔画像と各特徴点を示す説明図である。顔を構成する主要な部位である眉、目、鼻、口は表情によって微妙に変化するため、これらの部位のように、表情が変化すると他の特徴点との相対位置が変わる部分が特徴点として抽出される。
【００２３】
次に、各部位の特徴点に基づいて、各部位の特徴が平均化された平均顔のキャラクターからユーザの顔に近い仮想キャラを生成する。より詳しくは、抽出した特徴点と前記平均顔のキャラクターの特徴点との差分を算出し、当該差分データを前記平均顔のキャラクターに反映させることで、ユーザの顔に近い仮想キャラを生成する。
【００２４】
そして、ユーザの顔画像における各特徴点をトラッキングして、各特徴点の動きを仮想キャラに反映させる。また、顔を形成する全ての部位が顔画像の座標軸上で同じ方向に移動すれば、顔全体が動いたとみなすことができるため、ユーザが頷いたり、首をかしげたり、頭を振ったとき、この動作を仮想キャラに反映させる。
【００２５】
以下、本実施形態のテレビ電話端末の構成についての説明を、図２を参照して行う。図２は、本実施形態のテレビ電話端末の構成を示すブロック図である。本実施形態のテレビ電話端末は、同図に示すように、カメラ１０１と、映像処理部１０３と、マイク１０５と、スピーカ１０７と、音声処理部１０９と、仮想キャラ生成部１１１と、表示部１１３と、キーボード１１５と、中央処理部１１７と、無線部１１９と、アンテナ１２１とを備えて構成されている。
【００２６】
映像処理部１０３は、カメラ１０１で撮影された映像を解析することで、映像中から顔を認識し特徴点を抽出するものである。また、音声処理部１０９は、マイク１０５から入力された自分の音声に対して所定の処理を行ったり、相手のテレビ電話端末から受け取った相手の音声データを処理してスピーカ１０７から出力するものである。なお、音声処理部１０９が行う処理には、音量や音韻、ピッチ等といった音声の特徴となる要素の解析が含まれ、当該解析は自分および相手の音声に対して行われる。
【００２７】
また、仮想キャラ生成部１１１は、映像処理部１０３によって抽出された特徴点等に基づいて仮想キャラを生成し、カメラ１０１で撮影したユーザの表情や動作を当該仮想キャラに反映させるものである。なお、仮想キャラ生成部１１１は、生成した仮想キャラを中央処理部１１７からの指示に基づいて部分的または全体的に変更することもある。また、表示部１１３は、仮想キャラ生成部１１１で生成された仮想キャラや、相手のテレビ電話端末から送られた仮想キャラ等を表示するものである。
【００２８】
また、キーボード１１５は、後述する仮想キャラの生成に関する指示等を仮想キャラ生成部１１１に行ったり、他の指示等を中央処理部１１７に行うためのものである。また、中央処理部１１７は、仮想キャラ生成部１１１で生成された仮想キャラのＭＰＥＧ圧縮をはじめとして、仮想キャラの動きと音声との同期や、映像データおよび音声データの圧縮伸長処理等を行うものである。また、無線部１１９は、映像および音声のデータの変復調等を行って、アンテナ１２１を介して信号を送受信するものである。
【００２９】
以上の説明を踏まえて、〔第１の実施形態〕、〔第２の実施形態〕、〔第３の実施形態〕の順に本発明に係るテレビ電話端末の実施の形態について詳細に説明する。
【００３０】
〔第１の実施形態〕
第１の実施形態では、仮想キャラをユーザの顔に完全に似せるのではなく、微妙またはほどほどに似せている。上述したように、仮想キャラを生成する際は、抽出した特徴点と平均顔のキャラクターの特徴点との差分を算出し、当該差分データを平均顔のキャラクターに反映させている。本実施形態では、差分データを平均顔のキャラクターにそのまま反映させるのではなく、ユーザによって予め指定された部位に対しては、その特徴を強調するよう差分データを変更した上で反映させる。
【００３１】
なお、特徴を強調する部位の指定は図２に示したキーボード１１５から中央処理部１１７に対して行われ仮想キャラ生成部１１１に指示される。そして、指定された部位の差分データの変更は仮想キャラ生成部１１１で行われる。特許請求の範囲の特徴点抽出手段は映像処理部１０３に該当し、特徴点差分算出手段および仮想キャラ生成手段は仮想キャラ生成部１１１に該当する。
【００３２】
図３は、（ａ）ユーザの顔に似せた仮想キャラおよび（ｂ）ユーザの顔に微妙またはほどほど似せた仮想キャラの一例を示す説明図である。例えば、ユーザによって目の大きさを強調するよう指示されている場合、目が全体的に大きくなるように目の特徴点の差分データを変更する。そして、目の各特徴点の変更された差分データと、眉、鼻および口の各特徴点における差分データとを平均顔のキャラクターに反映することで、図３（ｂ）に示すような、目の大きさが強調された仮想キャラを生成する。
【００３３】
以上説明したように、本実施形態によれば、予め指定された部位の特徴を強調することで、ユーザの顔に微妙またはほどほど似せた仮想キャラを生成することができる。仮想キャラは、ユーザの顔に似すぎているよりも微妙またはほどほど似ている程度に面白みがあると考えられるため、本実施形態によれば、仮想キャラを用いたコミュニケーションのエンターテイメント性を高めることができる。また、特徴の強調は指定された部位に対して行われるため、ユーザによって指定された部位だけが強調された仮想キャラを生成することができる。
【００３４】
なお、本実施形態では、顔の部位の特徴を強調しているが、ユーザが強調したくない部位については目立たないようにデフォルメしても良い。
【００３５】
〔第２の実施形態〕
第２の実施形態では、ユーザの表情や動きの変化をそのまま仮想キャラの表情または動きに反映させるのではなく、ユーザが実際には行っていない表情または動きをするといった独自の個性を仮想キャラに持たせている。本実施形態では、様々な個性を実現するプログラムが複数用意されたキャラクタサーバ（図示せず）が別に設けられている。ユーザはテレビ電話端末を用いて当該キャラクタサーバにアクセスして、所望の個性を実現するプログラムをテレビ電話端末にダウンロードする。そして、仮想キャラが当該個性を持つよう設定を行う。なお、キャラクタサーバは特許請求の範囲のプログラム記憶装置に該当する。
【００３６】
このように、キャラクタサーバから所望の個性に対応するプログラムを予めダウンロードして設定を行っておけば、次回以降のテレビ電話では、仮想キャラ生成部１１１で当該個性を実現するプログラムが実行され、仮想キャラがこの個性に準じた所定の動作を行うようになる。
【００３７】
なお、個性には、ユーザの表情または動きとは全く関係のない動作を行う個性や、顔の各部位の動きに従って所定の動作を行う個性、ユーザ（自分）または相手が発した（しゃべった）特定のキーワードに反応して所定の表情または動作を行う個性等が考えられる。例えば、１０分に一度踊りだすといった個性や、相手の仮想キャラの動作を真似る個性、会話が途切れるとタバコを吸い出す個性、歌舞伎役者のような動作をする個性、所定時間以上目をつぶると眠りだす個性、自分がはっした「なんで？」といった言葉に反応して目がクエスチョンマークになる個性等々、様々な個性が考えられる。
【００３８】
以上説明したように、本実施形態によれば、キャラクタサーバに用意されている複数の個性の中から所望の個性を選択して、仮想キャラにユーザの実際の表情や動きとは異なる表情または動きをさせることができる。したがって、仮想キャラの表情または動きの変化を楽しみながら相手と会話することができるため、仮想キャラを用いたコミュニケーションのエンターテイメント性を高めることができる。
【００３９】
〔第３の実施形態〕
第３の実施形態では、ユーザの首の動きをそのまま仮想キャラの動きに反映させるのではなく、他の要因にも従って仮想キャラの首を動かしている。なお、「首が動く」とは、頷いたり、首をかしげたり、頭を振る等といった、体に対して頭部が立体的に動く動作のことをいう。本実施形態では、自分または相手が発した（しゃべった）特定のキーワードや音声の特徴、若しくは、図２に示したカメラ１０１で撮影された映像の特徴に応じて、またはランダムに、仮想キャラが首を動かす。
【００４０】
上述したように、図２に示した音声処理部１０９は自分または相手の音声を解析しており、映像処理部１０３はカメラ１０１で撮影された映像を解析している。したがって、キーワードまたは音声の特徴に応じて仮想キャラの首を動かす場合は音声処理部１０９の解析結果、また、映像の特徴に応じて仮想キャラの首を動かす場合は映像処理部１０３の解析結果に基づいて、中央処理部１１７が、仮想キャラが所定の首の動きをするよう仮想キャラ生成部１１１に指示する。また、ランダムに仮想キャラの首を動かす場合は、中央処理部１１７が、乱数等を利用して仮想キャラが所定の首の動きをするよう仮想キャラ生成部１１１に指示する。
【００４１】
以上説明したように、本実施形態では、ユーザの首の動きとは別に、キーワードや音声の特徴、映像の特徴に基づいてまたはランダムに仮想キャラの首を動かしている。ユーザの首の動きをそのまま仮想キャラの動きに反映させるよりも、この方が仮想キャラの動きに意外性があり面白みの点で勝ると考えられる。したがって、仮想キャラを用いたコミュニケーションのエンターテイメント性を高めることができる。
【００４２】
なお、上記説明した各実施形態のテレビ電話端末が有する映像処理部１０３、音声処理部１０９、仮想キャラ生成部１１１および中央処理部１１７はプログラムを実行することによって動作するものであっても良い。
【００４３】
【発明の効果】
以上説明したように、本発明に係る仮想キャラ生成装置および仮想キャラ動作制御装置によれば、エンターテイメント性の高い仮想キャラを活用することができる。また、本発明に係るテレビ電話端末によれば、仮想キャラを用いたコミュニケーションのエンターテイメント性を高めることができる。
【図面の簡単な説明】
【図１】顔画像と各特徴点を示す説明図
【図２】本発明に係る一実施形態のテレビ電話端末の構成を示すブロック図
【図３】（ａ）ユーザの顔に似せた仮想キャラおよび（ｂ）ユーザの顔に微妙またはほどほど似せた仮想キャラの一例を示す説明図
【図４】ユーザの顔に似せた仮想キャラを示す説明図
【符号の説明】
１０１カメラ
１０３映像処理部
１０５マイク
１０７スピーカ
１０９音声処理部
１１１仮想キャラ生成部
１１３表示部
１１５キーボード
１１７中央処理部
１１９無線部
１２１アンテナ[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a videophone terminal capable of utilizing virtual characters having high entertainment properties, a virtual character generation device, and a virtual character operation control device.
[0002]
[Prior art]
In a videophone system including a plurality of videophone terminals, a network, and the like, video captured by each videophone terminal is transmitted to a partner terminal via a network together with audio. For this reason, it is possible to have a conversation with a remote party while looking at the face. If you can talk while looking at the other person's face, you can visually confirm not only the tone of the voice but also the expression of the other person, so that there is an advantage that a more realistic high-level communication can be taken.
[0003]
However, some users do not like to send a video of themselves to the partner terminal as it is. In addition, some users find it interesting to have a conversation while watching the video taken by themselves or the other party. For this reason, feature points of each part such as eyebrows, eyes, nose, mouth, etc. are extracted from an image of the user's face, and a virtual character (hereinafter, “virtual character”) similar to the user's face is extracted from the feature points ) Is generated, and the image of the virtual character is transmitted to the partner terminal as an alter ego.
[0004]
In this technique, first, after recognizing which region of an image of a user's face (hereinafter, referred to as “face image”) is a face, the face image is used to characterize each part such as eyebrows, eyes, nose, and mouth. Points (hereinafter referred to as “feature points”) are extracted. FIG. 1 is an explanatory diagram showing a face image and each feature point. Next, based on the feature points of each part, a virtual character that resembles the user's face is generated from the average face character in which the features of each part are averaged. More specifically, a virtual character that resembles a user's face is generated by calculating a difference between the extracted feature point and the feature point of the average face character, and reflecting the difference data on the average face character. . FIG. 4 is an explanatory diagram illustrating a virtual character that resembles a user's face.
[0005]
Then, each feature point in the user's face image is tracked, and the movement of each feature point is reflected on the virtual character. By doing so, the movement of each part associated with the change of the user's facial expression cooperates with the movement of each part of the virtual character, so that the facial expression of the virtual character also changes in accordance with the change of the user's facial expression. By changing the movement of each feature point in the user's face image to a completely different character without making the virtual character resemble the user's face, the expression of the virtual character is changed according to the change of the user's expression. You can also.
[0006]
Furthermore, if all of the parts forming the face move in the same direction on the coordinate axis of the face image, it can be considered that the entire face has moved. Thus, when the user nods, bows, or shakes his head, this action can be reflected on the virtual character.
[0007]
[Patent Document 1]
Japanese Patent Publication No. 2002-511617 [Patent Document 2]
Japanese Unexamined Patent Publication No. 2002-511620
[Problems to be solved by the invention]
As described above, in the above-described conventional technology, the virtual character is made to resemble the face of the user, or to be a completely different character. However, considering a virtual character from the viewpoint of entertainment, it is considered that the virtual character is more subtle or more similar than the user's face. On the other hand, if the virtual character is a completely different character, it is not reminiscent of the face of the conversation partner.
[0009]
Further, in the above-described conventional technology, since the movement of each part associated with the change of the user's facial expression is linked to the movement of each part of the virtual character, the facial expression of the virtual character changes according to the user's facial expression. However, when considering virtual characters from the point of view of entertainment, it is thought that those who have surprising expressions and movements of the virtual characters are more interesting than reflecting the changes in the user's facial expressions directly on the expressions of the virtual characters. Can be
[0010]
Further, in the above-described conventional technology, the movement of the user's neck is reflected on the virtual character. Therefore, when the user nods, bows, or shakes the head, the virtual character performs the same movement. However, considering a virtual character from the viewpoint of entertainment, it is considered that a person who has unexpectedness in the movement of the virtual character is more interesting than a function of directly reflecting the movement of the user's neck in the movement of the virtual character.
[0011]
Therefore, when a videophone is used as a communication tool including an entertaining element, it is desirable to be able to utilize a virtual character having high entertainment properties.
[0012]
The present invention has been made in view of the above-mentioned conventional needs, and provides a videophone terminal capable of utilizing virtual characters having high entertainment properties, and a virtual character generation device and a virtual character operation control device. It is an object.
[0013]
[Means for Solving the Problems]
In order to achieve the above object, a virtual character generation device according to the present invention is a virtual character generation device that generates a virtual character (hereinafter, referred to as a “virtual character”) based on a human face. Feature point extracting means for extracting the feature points of each part from the image of the captured image, the feature points of each part extracted by the feature point extracting means, and the character of the average face in which the features of each part are averaged. Feature point difference calculation means for calculating a difference from a feature point; virtual character generation means for generating the virtual character of the user by reflecting the difference calculated by the feature point difference calculation means on the character of the average face. Wherein the virtual character generation means changes a difference calculated by the feature point difference calculation means for a predetermined part so as to emphasize characteristics of the predetermined part. In to be reflected in the character of the average face.
[0014]
In this way, by emphasizing the features of the predetermined part, it is possible to generate a virtual character that is subtle or moderately similar to the user's face. The virtual character is considered to be more subtle or more similar than the user's face, so it is possible to utilize a highly entertaining virtual character.
[0015]
Further, the virtual character operation control device according to the present invention provides a virtual character (hereinafter referred to as a “virtual character”) generated based on a human face to a unique character that is not related to the actual expression or movement of the user. A virtual character operation control device having personality, wherein a desired program is downloaded from a program storage device storing at least one program for realizing the unique personality, and the downloaded program is executed to execute the virtual program. The character is controlled to perform a predetermined operation according to the personality corresponding to the program.
[0016]
As described above, by downloading and executing the program for realizing the unique personality from the program storage device, it is possible to cause the virtual character to have an expression or movement different from the actual expression or movement of the user. Therefore, a change in the expression or movement of the virtual character can be enjoyed, so that the virtual character having high entertainment properties can be utilized.
[0017]
Further, the virtual character operation control device according to the present invention is a virtual character operation control device that controls the movement of the neck of a virtual character (hereinafter, referred to as “virtual character”) generated based on a human face, Apart from the actual neck movement of the user as the basis of the virtual character, the virtual character is controlled so as to perform a predetermined neck movement according to a keyword, a feature of audio or video, or randomly.
[0018]
In this manner, the virtual character performs a neck movement different from the actual movement of the user. It is considered that the movement of the virtual character is more surprising and interesting than the reflection of the movement of the user's neck as it is on the movement of the virtual character. Therefore, a virtual character having high entertainment properties can be utilized.
[0019]
Furthermore, a videophone terminal according to the present invention includes the virtual character generation device according to claim 1 or the virtual character operation control device according to claim 2 or 3, and is configured to communicate with another terminal and the virtual character via a network. Performs video and audio communication. Therefore, it is possible to enhance the entertainment property of the communication using the virtual character.
[0020]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of a videophone terminal according to the present invention will be described with reference to the drawings.
[0021]
The videophone terminal according to the present embodiment is a communication terminal such as a mobile phone, a PHS, or a PDA equipped with a camera capable of capturing a moving image or a still image (hereinafter, collectively referred to as “video”). It can be used as a videophone by transmitting and receiving video and audio to and from the videophone terminal. However, the video transmitted and received between terminals during a videophone call is a video of a virtual character (hereinafter, referred to as a “virtual character”) generated based on a user's face captured by the camera, in addition to a video captured by the camera. It may be. In the present embodiment, a case where the video of the virtual character is transmitted and received will be described.
[0022]
Hereinafter, generation of a virtual character will be described. The videophone terminal according to the present embodiment recognizes which area is a face from a user's face image captured by a camera. Next, points (hereinafter, referred to as “feature points”) that are features of each part such as eyebrows, eyes, nose, and mouth are extracted from the face image. FIG. 1 is an explanatory diagram showing a face image and each feature point. The major parts of the face, the eyebrows, eyes, nose, and mouth, change subtly depending on the facial expression.These parts, such as these parts, change their relative position with other characteristic points when the facial expression changes. Is extracted as
[0023]
Next, based on the feature points of each part, a virtual character close to the user's face is generated from the average face character in which the features of each part are averaged. More specifically, a virtual character close to the user's face is generated by calculating a difference between the extracted feature point and the feature point of the average face character, and reflecting the difference data on the average face character.
[0024]
Then, each feature point in the user's face image is tracked, and the movement of each feature point is reflected on the virtual character. Also, if all parts forming the face move in the same direction on the coordinate axis of the face image, the entire face can be regarded as moving, so when the user nods, bows, or shakes his head, This operation is reflected on the virtual character.
[0025]
Hereinafter, the configuration of the videophone terminal of the present embodiment will be described with reference to FIG. FIG. 2 is a block diagram illustrating a configuration of the videophone terminal of the present embodiment. As shown in the figure, the videophone terminal of this embodiment includes a camera 101, a video processing unit 103, a microphone 105, a speaker 107, an audio processing unit 109, a virtual character generation unit 111, a display unit 113 , A keyboard 115, a central processing unit 117, a wireless unit 119, and an antenna 121.
[0026]
The video processing unit 103 analyzes a video captured by the camera 101 to recognize a face from the video and extract feature points. The audio processing unit 109 performs predetermined processing on own audio input from the microphone 105, processes audio data of the other party received from the other videophone terminal, and outputs the processed data from the speaker 107. is there. Note that the processing performed by the voice processing unit 109 includes analysis of elements that are characteristics of the voice such as volume, phoneme, and pitch, and the analysis is performed on the voice of the user and the partner.
[0027]
Further, the virtual character generation unit 111 generates a virtual character based on the feature points and the like extracted by the video processing unit 103, and reflects the user's facial expressions and actions taken by the camera 101 on the virtual character. The virtual character generation unit 111 may change the generated virtual character partially or entirely based on an instruction from the central processing unit 117. The display unit 113 displays a virtual character generated by the virtual character generation unit 111, a virtual character transmitted from the other party's videophone terminal, and the like.
[0028]
The keyboard 115 is for issuing an instruction relating to virtual character generation, which will be described later, to the virtual character generation unit 111, and for issuing other instructions, etc., to the central processing unit 117. The central processing unit 117 performs, for example, MPEG compression of the virtual character generated by the virtual character generation unit 111, synchronization of the movement of the virtual character with the audio, and compression / decompression processing of video data and audio data. It is. The radio section 119 performs modulation and demodulation of video and audio data and transmits and receives signals via the antenna 121.
[0029]
Based on the above description, the embodiment of the videophone terminal according to the present invention will be described in detail in the order of [First Embodiment], [Second Embodiment], and [Third Embodiment].
[0030]
[First Embodiment]
In the first embodiment, the virtual character does not completely resemble the face of the user, but subtly or moderately resembles. As described above, when generating the virtual character, the difference between the extracted feature point and the feature point of the average face character is calculated, and the difference data is reflected on the average face character. In the present embodiment, the difference data is not directly reflected on the character of the average face, but is reflected on a part specified in advance by the user after changing the difference data so as to emphasize the feature.
[0031]
It should be noted that the designation of the part to emphasize the feature is performed from the keyboard 115 shown in FIG. Then, the change of the difference data of the designated part is performed by the virtual character generation unit 111. The feature point extracting means in the claims corresponds to the video processing unit 103, and the feature point difference calculating means and the virtual character generating means correspond to the virtual character generating unit 111.
[0032]
FIG. 3 is an explanatory diagram illustrating an example of (a) a virtual character that resembles a user's face and (b) an example of a virtual character that subtly or moderately resembles a user's face. For example, when the user has instructed to emphasize the size of the eyes, the difference data of the feature points of the eyes is changed so that the eyes become larger as a whole. Then, by reflecting the changed difference data of each feature point of the eyes and the difference data of each feature point of the eyebrows, the nose, and the mouth to the character of the average face, the eyes shown in FIG. A virtual character in which the size of is emphasized is generated.
[0033]
As described above, according to the present embodiment, it is possible to generate a virtual character that is delicate or slightly similar to the user's face by emphasizing the features of the part specified in advance. Since the virtual character is considered to be more subtle or slightly more interesting than the user's face too much, according to the present embodiment, it is possible to enhance the entertainment of communication using the virtual character. it can. In addition, since the feature is emphasized for the designated part, it is possible to generate a virtual character in which only the part designated by the user is emphasized.
[0034]
In the present embodiment, the features of the face parts are emphasized, but the parts that the user does not want to emphasize may be deformed so as to be inconspicuous.
[0035]
[Second embodiment]
In the second embodiment, a change in the expression or movement of the user is not directly reflected on the expression or movement of the virtual character, but a unique character such as an expression or movement that the user does not actually perform is added to the virtual character. I have it. In the present embodiment, a character server (not shown) provided with a plurality of programs for realizing various personalities is separately provided. The user accesses the character server using the videophone terminal and downloads a program for realizing the desired personality to the videophone terminal. Then, the virtual character is set to have the personality. The character server corresponds to a program storage device in the claims.
[0036]
As described above, if a program corresponding to a desired personality is downloaded in advance from the character server and set, the program for realizing the personality is executed by the virtual character generation unit 111 in the next and subsequent videophones, and the virtual character is generated. The character performs a predetermined operation according to this personality.
[0037]
In addition, the personality that performs an operation completely unrelated to the expression or movement of the user, the personality that performs a predetermined operation in accordance with the movement of each part of the face, or the user (self) or a partner (speaks) A personality that performs a predetermined expression or action in response to a specific keyword can be considered. For example, a personality that starts dancing once every 10 minutes, a personality that simulates the behavior of the other party's virtual character, a personality that sucks out cigarettes when the conversation is interrupted, a personality that performs like a kabuki actor, There are a variety of personalities, such as personality that can be added, and a question mark in response to words like “why?”
[0038]
As described above, according to the present embodiment, a desired personality is selected from a plurality of personalities prepared in the character server, and the virtual character is given a facial expression or movement different from the actual facial expression or movement of the user. Can be made. Therefore, since it is possible to have a conversation with the other party while enjoying the change of the expression or movement of the virtual character, it is possible to enhance the entertainment of the communication using the virtual character.
[0039]
[Third embodiment]
In the third embodiment, the movement of the virtual character's neck is not directly reflected in the movement of the virtual character, but is moved according to other factors. Note that "the neck moves" refers to an operation in which the head moves three-dimensionally with respect to the body, such as nodding, shaking the head, or shaking the head. In the present embodiment, the virtual character is generated in accordance with the characteristics of a specific keyword or sound emitted (speaked) by the user or the other party, or the characteristics of the video taken by the camera 101 shown in FIG. Move your neck.
[0040]
As described above, the audio processing unit 109 illustrated in FIG. 2 analyzes the voice of the user or the other party, and the video processing unit 103 analyzes the video captured by the camera 101. Therefore, when moving the head of the virtual character according to the keyword or the feature of the sound, the analysis result of the sound processing unit 109 is obtained. When moving the head of the virtual character according to the feature of the video, the analysis result of the video processing unit 103 is obtained. Based on this, the central processing unit 117 instructs the virtual character generation unit 111 to make the virtual character make a predetermined neck movement. In addition, when moving the head of the virtual character randomly, the central processing unit 117 instructs the virtual character generation unit 111 to make the virtual character make a predetermined head movement by using a random number or the like.
[0041]
As described above, in this embodiment, the head of the virtual character is moved independently of the movement of the user's neck based on keywords, voice characteristics, and video characteristics or randomly. It is considered that the movement of the virtual character is more surprising and interesting than the reflection of the movement of the user's neck as it is on the movement of the virtual character. Therefore, it is possible to enhance the entertainment property of the communication using the virtual character.
[0042]
The video processing unit 103, the audio processing unit 109, the virtual character generation unit 111, and the central processing unit 117 included in the videophone terminal according to each of the embodiments described above may operate by executing a program.
[0043]
【The invention's effect】
As described above, according to the virtual character generation device and the virtual character operation control device according to the present invention, it is possible to utilize virtual characters having high entertainment characteristics. Further, according to the videophone terminal of the present invention, it is possible to enhance the entertainment property of communication using virtual characters.
[Brief description of the drawings]
FIG. 1 is an explanatory diagram showing a face image and each feature point. FIG. 2 is a block diagram showing a configuration of a videophone terminal according to an embodiment of the present invention. FIG. And (b) an explanatory diagram showing an example of a virtual character subtly or moderately similar to the user's face. [FIG. 4] an explanatory diagram showing a virtual character similar to the user's face.
101 Camera 103 Video processing unit 105 Microphone 107 Speaker 109 Audio processing unit 111 Virtual character generation unit 113 Display unit 115 Keyboard 117 Central processing unit 119 Radio unit 121 Antenna

Claims

A virtual character generation device that generates a virtual character (hereinafter, referred to as a “virtual character”) based on a human face,
A feature point extracting means for extracting feature points of each part from a video image of the user's face; a feature point of each part extracted by the feature point extracting means; and an average face in which the features of each part are averaged. Feature point difference calculation means for calculating a difference from the feature point of the character,
Virtual character generating means for generating the virtual character of the user by reflecting the difference calculated by the feature point difference calculating means on the character of the average face,
The virtual character generation means, for a predetermined part, changes a difference calculated by the feature point difference calculation means so as to emphasize characteristics of the predetermined part, and reflects the difference on the character of the average face. A virtual character generation device characterized by the following.

A virtual character operation control device that gives a virtual character (hereinafter, referred to as a “virtual character”) generated based on a human face a unique character that is not related to the actual expression or movement of the user,
By downloading a desired program from a program storage device storing at least one program for realizing a unique personality and executing the downloaded program, the virtual character can be converted to a predetermined character corresponding to the personality corresponding to the program. A virtual character operation control device for performing an operation.

A virtual character operation control device for controlling the movement of the neck of a virtual character (hereinafter, referred to as a “virtual character”) generated based on a human face, comprising: A virtual character operation control device, characterized in that the virtual character performs a predetermined neck movement in accordance with a keyword or a feature of audio or video, or at random, separately from movement.

A virtual character generation device according to claim 1 or a virtual character operation control device according to claim 2 or 3, wherein the terminal communicates with another terminal via a network by video and audio of the virtual character. Video phone terminal to do.