JP2004178036A

JP2004178036A - Device for presenting virtual space accompanied by remote person's picture

Info

Publication number: JP2004178036A
Application number: JP2002340275A
Authority: JP
Inventors: Chigiri Utsugi; 契宇都木; Toshio Moriya; 俊夫守屋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2002-11-25
Filing date: 2002-11-25
Publication date: 2004-06-24

Abstract

<P>PROBLEM TO BE SOLVED: To dynamically select a picture by matching pictures of a transmission user photographed from two or more angles with positional relationship between the transmission user and a reception users in a virtual space, to adjust the picture size and the frame rate, and then to transfer the picture by using a network, so that picture synthesis is performed while suppressing a load imposed on the network. <P>SOLUTION: A photographing system 101 which photographs with a plurality of cameras and a display system 102 which displays a picture synthesized with the virtual space being a background picture are combined via the network 103. An avatar character can be viewed whose angle and resolution are varied in conformity with a position thereof in the virtual space. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
動画をリアルタイムに転送することにより、コミュニケーションや協調作業を行う撮影・表示装置に関する。
【０００２】
【従来の技術】
利用者の画像を撮影してリアルタイムに転送し、その映像を視聴することにより、遠隔地においてもその場に居合わせているようなコミュニケーションを図る装置が実用に供されている。その最も典型的なものは、テレビ会議システムと呼ばれており、遠隔地における送信者の姿をカメラで撮影し、受信側の表示システムに、音声と共にリアルタイムで送信するものである。画像を伴うことにより、電話をはじめとする音声のみで行われる通信システムよりも、より自然で情報量の多いコミュニケーションを図ることが出来る。
【０００３】
また、この画像の人物が写っている部分を背景が写っている部分から分離して、現実の背景映像の代わりに、計算機上で構築した仮想空間の映像と人物映像とを合成することによって、遠隔映像の転送システムに、より一層の付加価値を与えるための研究が各地で行われている。これらの研究はビデオアバタと呼ばれ、多くの実施例が存在する。
【０００４】
しかし、利用者を撮影するカメラが一台である場合には、一方の方向の映像しか得ることが出来ないため、視点位置や利用者の立ち位置が変化した場合に対応するような映像の変化を得ることが出来ない。このため、利用者（送信側）の映像が常に受信システム上で正面を向いた映像になるなどの場合が現状においては多々ある。たとえば特許文献１では、このような問題を考慮したビデオアバタの改良例に関するものである。顔部分の映像を複数方向から撮影し、これらを３次元情報に変換したもの、あるいは全方向からの映像を全て送信し、体を表現する３次元形状モデルと合成することによって仮想空間内の任意の位置からの映像を得ることが出来るようにする。また、映像の転送は情報量が多く、通信ネットワークを圧迫するため、顔部分の映像に絞ることで映像転送量を削減するものである。しかし、顔などの比較的単純で一定である構造については三次元情報の再現も容易であるが、手を含めた上半身などの、複雑な形状の三次元情報を撮影される画像の情報のみから再現することは難しい。また利用者の映像を三次元化されたモデルに映像を貼り付けることで得られる簡易三次元形状のモデルは、計算精度の問題から形状が単純なものとなり、現実の形状と比較すると間違った三次元かがなされた形状部分が出来てしまうことも多いため、不自然な印象を与えることがある。
【０００５】
【特許文献１】
特許出願平１０−２５７７０２号公報
【０００６】
【発明が解決しようとする課題】
映像受信者と映像送信者が仮想空間内でどのような位置関係に存在するかによって、仮想空間の表示装置が必要とする画像の情報が異なる。例えば、仮想空間内で、受信者と送信者が離れた距離にいる場合には、送信者の全身の映像が小さく表示される。逆に受信者と送信者が非常に近い距離にいる場合には、送信者の上半身、特に顔部分のみが大きく表示される。また、受信者が送信者の正面に対してどのような角度に存在するかに応じて、見える映像が異なるならば自然な印象を与えることが出来る。例えば、受信者が送信者の横側にいる場合、受信者からみた送信者の映像は側面の映像のみとなるべきである。しかし、双方の位置関係が明確に把握できない状態でこれらの映像を再現しようとする場合、これらの映像情報はある固定された角度からのものとなり、また画像の品質も常に一定のものが送られることになる。
【０００７】
また、映像を転送するための回線の転送容量には限りがあるため、多くの要望を同時に満たした映像を転送することは困難なものとなる。例えば、滑らかな動きと大きな映像イメージを共に満たす動画を送ることは難しい。滑らかな動きを達成するためには、映像イメージは小さいものとなり、映像イメージを大きくするとフレームレートの少ない映像となってしまう。また、各方向から見た映像の全てを転送すると、回線の転送容量を大幅に圧迫することになる。
【０００８】
【課題を発明するための手段】
本発明では、受信および表示システムは、仮想空間内での送信者と受信者の位置関係を元に必要な映像の条件を判定する。この情報は撮影システムに転送され、撮影システムは複数のカメラによって撮影されたうちから選択された角度の映像を間引き処理やトリミング処理などの手法によって、時間的、空間的に適度なサイズの画像に改変し、表示システムに転送する。このことによって、必要とされる映像を少ない転送容量で送信することができる。
【０００９】
【発明の実施の形態】
図１は、当発明の一実施例を図示したものである。本発明は、大きくわけて撮影システム１０１、表示システム１０２と、その間を接続する通信ネットワーク設備１０３によって構築される。
【００１０】
図２は撮影システム１０１の詳細構成例である。複数台のカメラ機器２０１〜２０５が、送信利用者２００の周囲を取り巻くように設置されている。また、利用者２００の周囲にはクロマキー処理用に同一色の背景布２０７が設置されている。また、送信利用者２００が操作する入力機器２０９と送信利用者２００の下半身部分も、映像情報として取得を行われることがないように同一の色の布２０９によって包まれているものとする。これらのカメラの映像情報は、情報処理装置２１０に転送される。各カメラ２０１〜２０５にはそれぞれ固有のＩＤ番号が振られており、情報処理装置２１０ではこのＩＤ番号をもとに取り込む映像を選択することが出来る。情報処理装置は、これらのカメラの映像を受信する機能と、ネットワーク１０３の情報を送受信する機能、またそれらの情報を計算処理するための機能を有する。
【００１１】
図３は情報処理装置２１０の構成例を表したブロック図を挙げる。ＣＰＵ（中央演算処理装置）３０１は、主記憶装置３０２に記録されているプログラムに従い各種の処理を実行する。主記憶装置３０２と外部記憶装置３０４には制御処理を実行するにあたって必要となるプログラムやデータが記憶される。外部記憶装置３０４にはハードディスクドライブやその他の既存の大容量メディアが使用されるものとする。入出力インタフェース部３０３では、カメラ装置２０１〜２０５とネットワークへの入出力データをやり取りするために必要なデータの転送機構５０７に接続されている。出力機器３０５は、現在の処理状況を送信利用者に伝えるための出力機器である。また、入力機器２０９は、送信利用者がいくつかの命令を伝えることの出来る入力機器である。本実施例において入力機器２０９は足によって操作することの出来る入力機器であるものとする。
【００１２】
図４は表示システム部分の構成例である。映像表示装置４０１が、受信利用者４００の前に置かれる。この装置には、情報処理装置４０２が出力する映像が表示される。受信利用者４００は入力機器４０３を介して、情報処理装置４０２に命令を与えることが出来る。この情報処理装置４０２は、３次元形状データや２次元映像データ、２次元動画像データを用いて各種映像を作成する機能と、ネットワークの情報を送受信する機能、またそれらの情報のための計算処理を行うための機能を有する。
【００１３】
図５は情報処理装置４０２の構成例を表したブロック図を挙げる。ＣＰＵ（中央演算処理装置）５０１は、主記憶装置５０２に記録されているプログラムに従い各種の処理を実行する。主記憶装置５０２と外部記憶装置５０４には制御処理を実行するにあたって必要となるプログラムやデータが記憶される。外部記憶装置５０４にはハードディスクドライブやその他の既存の大容量メディアが使用されるものとする。入出力インタフェース部５０３では、ネットワーク１０３への入出力データをやり取りするために必要なデータの転送機器５０７や、映像表示装置４０１に映像を出力するための処理能力を備えているものとする。
【００１４】
図６は情報処理装置４０２並びに情報処理装置２１０の内部データによって構成される仮想空間情報の模式図を挙げる。
【００１５】
背景となる仮想空間６０１は、表示システムの保持する背景オブジェクト６０２、６０３の３次元形状データなどよって構成されている。この仮想空間内に、送信利用者２００、受信利用者４００の行動を表すキャラクタ６１０、６２０が配置されるものとする。
【００１６】
受信利用者のキャラクタが存在する仮想空間内の位置６２１と、そのキャラクタが向いている方向６２２が定義され、受信利用者の操作に従って制御される。
【００１７】
また、この仮想空間の中に送信利用者を模したキャラクタ６０３も構成され、背景空間６０１の中に配置される。この送信利用者のキャラクタが存在する仮想空間内の位置６１１と、そのキャラクタが向いている方向６１２も、送信利用者の操作に従って制御される。
【００１８】
この送信利用者のキャラクタ６１０を構成する方法を表したものが図７である。
【００１９】
キャラクタの足部分７０３は３次元形状データによって構成される。この形状データはアニメーション情報を保有し、スクリプトデータを受け取って形状を変形することが出来る。図８はアクションをあらわすアニメーションデータの構成例である。８０１に記載された数値は動作パターンをあらわすＩＤデータである。スクリプトデータでは、このＩＤデータが転送されるものとする。スクリプトデータを実行するためには、保有している一連のアニメーションデータの中から、スクリプトのＩＤ番号と、データ８０１のアニメーションが一致するものを検索する。アニメーションデータのロードは、ポリゴンモデルの各関節データの呼び出しとして行われる。アニメーションデータ８００内の８０２には基本フレームの数が記載される。また８０３にはフレームどうしの時間間隔データが記載される。８０４〜には、各関節の回転情報データが記載される。アニメーション開始時刻から表示時刻までの時間を算出し、関節の回転情報データをよみだしてポリゴンを変形させることでアニメーション動作を行う。
【００２０】
キャラクタの上半身部分７０２は、ポリゴン上に動画のあるフレーム情報を表示することで示される。この動画は送信者２００の映像を撮影した動画像の一部分７１２からクロマキー色部分を取り除いたものによって得られる。この送信者の下半身部分はカバー２０８によって囲まれているため、上記画像には取り込まれない。動画が表示されるポリゴンは送信キャラクタの仮想空間上の位置から、受信者の仮想空間上の位置に常に表面を向けるように回転する。このように向きを変化させることで映像の正面を利用者に提示するためのポリゴンを、以下ではビルボードと呼称する。
【００２１】
キャラクタの顔部分７０１は、上半身部分と同様にビルボードによって構成される。このビルボードに表示される動画は送信者２００の映像を撮影した動画像の一部分７１１からクロマキー部分を取り除いたものによって得られる。このビルボードに描かれる際の画像解像度は、キャラクタの顔部分７０１とキャラクタの上半身部分７０２とで異なり、通常、顔部分の映像７０１の方が画素の密度が高い。
【００２２】
以下、６１０のような利用者の行動を反映する仮想空間内のキャラクタをアバタと呼称する。
表示システム２１０の内部データとして構築された仮想空間６００において、位置６１１（Ｓ＿ｐ）に送信者のアバタが存在している。また、この送信者のアバタ６１０が向いている方向６１２をＳ＿ｄとする。また仮想空間６００内の別の位置６２１（Ｏ＿ｐ）に受信者４００の仮想的な視点位置が存在している。仮想空間内で受信者の向いている方向６２２がＯ＿ｄであるものとする。これらの値はベクトル値で表現されるものとする。
【００２３】
汎用ネットワーク１０３は、表示システムが互いに転送するデータを伝達する機能を有するものとする。また、各システムが、ある特定の信号をネットワークに入力することによって、現在のネットワークで転送に利用できる転送容量の情報を得ることが出来る機能を有するものとする。
【００２４】
本実施例において表示システムから撮影システムに転送されるデータ構造体を図９に挙げる。また、撮影システムから表示システムに転送されるデータ構造体を図１０に挙げる。これらの構造体の内部データの意味に関しては、以下に述べる処理作業の解説に伴って説明を加える。
【００２５】
以下では、上記の各装置が動作する処理の流れを順序だてて解説する。図１１と図１２は一連の動作順序をブロック図として表現したものである。
【００２６】
図１１は、撮影システムから転送するデータの設定情報を定める処理を記述したものである。また、図１２は、撮影システムから、表示システムにデータを転送する処理を記述したものである。どちらの処理も、情報処理装置２１０，４０２がＣＰＵ３０１，５０１内に保有しているタイマーの機能を利用して、規定された時間毎に呼び出され、実行される処理である。
【００２７】
図１１の一連の処理を、以下では、情報取得スレッドと呼ぶ。この処理はあらかじめプログラム内で定められた設定値に従い、秒１回から数回程度の頻度で行われる。この情報取得スレッドは、情報処理４０２に置かれたプログラムと情報処理装置２１０にその作業内容が記載されており、情報処理４０２のタイマーの機能によって作業が開始される。
【００２８】
図１２の一連の処理を、以下では、画像転送スレッドと呼ぶ。この処理は秒数回から数十回の割合で呼び出される処理であり、呼び出される頻度は情報取得スレッドによって設定される。この画像転送スレッドは、情報処理２１０に置かれたプログラムと情報処理装置４０２にその作業内容が記載されており、情報処理２１０のタイマーの機能によって作業が開始される。
【００２９】
まず、図１１に従って情報取得スレッドの一連の動作を記述する。
表示利用者４００のアバタ６２０位置に対する撮影利用者２００のアバタ６１０の相対位置を位置ベクトル６１１（Ｓ＿ｐ）から位置ベクトル６２１（Ｏ＿ｐ）を引くことにより求める（工程１１０１）。
この向きと受信者のアバタ６２０の向いている方向６２２との角度差を下式（数１）に基づいて計算する（工程１１０２）。
【００３０】
【数１】
ｃｏｓθ＝（Ｓ＿ｐ−Ｏ＿ｐ）・Ｏ＿ｄ／（｜Ｓ＿ｐ−Ｏ＿ｐ｜｜Ｏ＿ｄ｜）
アバタ６１０が表示環境の範囲内にいるかを判定するために、この値を映像表示機器４０２の画角Ｗ＿ｈと比較する（工程１１０３）。
ｃｏｓθ＜＝ｃｏｓ（Ｗ＿ｈ）ならば、工程１１０４に進む。
もしｃｏｓθ＜ｃｏｓＷ＿ｈならば、表示範囲外であるため、データ転送の必要が無い。この場合、データ構造体の距離情報に０を書きこみ（工程１１０６）、工程１１０７に進む。
工程１１０４では、送信者のアバタが受信者に向けている向きφを計算する。上方向ベクトルの周りに（Ｏ＿ｐ−Ｓ＿ｐ）を−Ｓ＿ｄだけ回転させたベクトルをＯ＿ｐ１とする。このベクトルは、送信者のアバタから見たときの受信者の相対位置を表現するものとなる。Ｏ＿ｐ１を曲座標表現に変換したものをｒφとする。φは原点から見たＯ＿ｐ１の向きを表現する単位長のベクトルであり、ｒは原点からＯ＿ｐ１までの距離（すなわちＯ＿ｐからＳ＿ｐまでの距離）をあらわす値である。
【００３１】
データ構造体１０００の送信情報１００１にｒを書き込み、また１００２にφを書き込み、工程に進む。（工程１１０５）
表示システムは撮影システム１０１に向けてデータ構造体１０００を送信する。（工程１１０７）
撮影システムは表示システムからのデータ構造体１０００を受け取る。（工程１１０８）データ構造体の角度情報１００２を読み取り、該当する角度に最も近い位置にあるカメラを選択する（工程１１０９）。このカメラＩＤと角度の関係は後述の図１４に記載されている。
工程１１１０では、データ構造体の距離情報１００１から必要な解像度を算出する。必要な解像度は撮影システム側に定義されたテーブルを参照することで得られる。このテーブルの構成例を表現したものが図１３のテーブル１３００である。
【００３２】
このテーブルでは、顔画像の転送に必要な各種のデータが、距離値に関連付けられて定義されている。
【００３３】
テーブルに記載された距離情報（列１３０１）を順に読み出していき、データ構造体１０００のデータ１００１から読みこまれた距離の値に最も近いものが見つかった場合、その距離値に関連付けられた情報を顔画像用の解像度情報として読み取る。このデータは列１３０２，１３０３に記述された二つの整数値Ｄ１，Ｄ２から構成されており、画像の解像度を変化させるためのデータとして後述の工程で用いられる。また、１フレームあたりに必要な顔画像のデータサイズを列１３０４から読み取る。これらの各値を主記憶上に記録し、工程１１１１に進む。
【００３４】
また、同様のテーブルが上半身の解像度情報を取得するために別途用意されており、これらの各データも同様に読み取る。
【００３５】
工程１１１１では、ここまでの工程で得られた解像度情報から、送信のフレームレートを算出する。ネットワーク１０３に命令を送信し、現在の回線で利用可能な一秒あたりのデータ量を受信する。一秒あたりの転送可能データ量を、顔画像用映像の転送データ転送容量と、上半身画像用の転送データ転送容量とに振り分ける。この比率値には、システム固定値として与えられた合計して１となる正の小数値が用いられる。また送信者２００が入力機器３０５で数値を入力することによって、この比率を変動させることが出来る。
【００３６】
顔画像のデータ転送容量を、工程１１１０フレームに必要なデータサイズで割ったものを顔画像のフレームレートとする。ただし、送信機構処理可能なフレームレートを超えている場合、その上限値を顔画像のフレームレートとする。また、距離情報１００１に０が記載されていた場合には、顔画像のフレームレートには０が設定される。
【００３７】
同様に、上半身画像のデータ転送容量を１フレームの上半身画像に必要なデータサイズで割ったものを上半身画像のフレームレートとする。ただし、送信機構処理可能なフレームレートを超えている場合、その上限値を上半身画像のフレームレートとし、距離情報１００１に０が記載されていた場合には、上半身画像のフレームレートには０が設定される。
顔画像のフレームレート、上半身画像のフレームレートから画像転送スレッドの呼び出し間隔をミリ秒単位で求める。この値ごとに情報処理装置２１０のタイマー機能が画像転送スレッドを呼び出すように設定する（工程１１１２）。
【００３８】
以上の工程により、送信システムからデータを転送するための設定情報が定められる。
以上のデータ受信処理のあと、次のデータ１０００が表示システム１０２からやってくるまでの間、撮影システム１０１はこれらの設定された値を用いて画像転送スレッドの処理を繰り返す。
【００３９】
続いて、画像転送スレッドの動作を記述する。
【００４０】
画像転送スレッドは、工程１１１２で設定された情報処理装置２１０のタイマー機能によって起動される。以下、顔画像転送のスレッドが起動した場合を対象にして一連の動作を記述する。
【００４１】
工程１２０１では、入力機器２０９を用いて送信者がインプットする入力情報を情報処理装置２１０に取り込む。この情報はアバタキャラクタの移動や足部分を動かすスクリプト命令として用いられる。工程１２０２では、工程１２０１で取り込んだ情報を転送用のデータ構造体９００のスクリプト記述領域９０２に書き込む。またキャラクタの移動情報を９０３に書き込む。
工程１２０３〜１２０４では、送信利用者の画像（顔画像）を作成する。
【００４２】
情報処理装置２１０にはプログラムの一部として、図１４に示すように、画像を取り込む領域を定義したテーブル１４００が定義されている。このテーブルの情報は、カメラのＩＤ番号Ｃ＿ＩＤ（列１４０１）、回転角度Ｒ（列１４０６）と４つの数値ｗ０，ｗ１，ｈ０，ｈ１（列１４０２〜１４０５）によって構成されている。
工程１１０９で選択されたカメラのＩＤを、列１４０１から検索する。この値に結び付けられた領域の情報をテーブル１４００の列１４０２〜１４０５から読み出す（工程１２０３）。
工程１２０４では、カメラ番号Ｃ＿ＩＤに相当するカメラの画像を読み取る。この画像は電子データとして情報処理装置２１０へと転送される。
【００４３】
工程１２０４で撮影された画像のうち（ｗ０，ｈ０）−（ｗ１，ｈ１）の範囲の映像が、転送映像として送信される領域である。この範囲は図７の撮影領域範囲７１１または７１２に相当する。図１５の１５０１は、この（ｗ０，ｈ０）−（ｗ１，ｈ１）でしめされる範囲の画像を表現したものである。この画像１５０１は画素１５１０によって構成されている。画素１５１０は、赤値１５１１、緑値１５１２、青値１５１３を示す３組の数と、透明度を示す数１５１４によって構成されている。この領域の映像の解像度を、距離情報１００１から算出された解像度に落として、映像を転送用データ構造体に書き込む。このために、ｈ０ライン番目の水平ラインについて、ｗ０からｗ１までの画像を図１５に示されるように画素を、工程１１１０で設定されたＤ１，Ｄ２の情報を用いて間引きしながら、データ構造体の画像領域に書き込む（工程１２０５）。ただし、図１５の画像１５０１のうち、黒で塗りつぶされた領域であり転送されない。転送される領域はそれ以外の領域であり、間引き後の映像は、画像１５０２で示される映像になる。また、この転送のさい、赤値１５１１、緑値１５１２、青値１５１３の組み合わせによって示される色情報が、クロマキー色の色情報と一致していた場合には、透明度の情報１５１４には透明をあらわす値を書き込む。それ以外の時には不透明を表す値を書き込む。
【００４４】
工程１２０６では、データ構造体９００のデータ領域９０１に画像のＩＤを書き込む。このＩＤは身体部分、顔、上半身のいずれかを現す固定値（ＩＤ＿ＦＡＣＥ，ＩＤ＿ＵＰＰＥＲ）のうちどちらかである。また、データ領域９０４に画像１５０２の横幅、データ領域９０５に画像１５０２の縦幅の値を書き込む。
【００４５】
以上の工程１２０１〜１２０６で作成された転送用データ構造体９００の情報を送信する（工程１２０７）。
【００４６】
表示システムは、転送用データ構造体９００を受け取る（工程１２０８）。
【００４７】
工程１２０９では、キャラクタ移動データ９０３に基づいて、アバタの仮想空間内の位置データを移動させる。また、移動に伴うアニメーション処理をロードする。
【００４８】
工程１２１０では、アバタの下半身を構成する３次元形状モデル７０３に、行動情報領域のスクリプト９０２に関連付けられたアニメーション処理を行わせる。
【００４９】
既になんらかのアニメーション処理がロードされている場合や、移動のための歩行アニメーションがロードされている場合、これらのアニメーションのブレンド処理を行う。ただしアニメーションの設定とブレンド処理に関しては既存の手法を用いるものとする（（工程１２１１）。
【００５０】
工程１２１２では画像データを読み込む。ヘッダ領域のデータ９０１を確認し、ＩＤ＿ＵＰＰＥＲであった場合には、上半身画像、ＩＤ＿ＦＡＣＥであった場合には、顔画像が送られてきたものと判断し、画像領域９０６の画像データを、ポリゴン１０１、またはポリゴン１０２のテクスチャ領域上にコピーする。
【００５１】
この処理によりテクスチャ更新がされなかったポリゴンに関しては、前回と同じテクスチャを用いてレンダリングが行われる。
【００５２】
工程１２１３では、利用者４００からの入力情報読み取りを行う。
【００５３】
表示システム４０１は、受信利用者４００が入力機器４０３に入力した情報をインタフェース５０３を通して取得し、その入力情報を反映して、仮想空間内での受信者の視点位置６２１および角度６２２を変更する。
【００５４】
工程１２１４では、画面の表示を行う。背景映像とアバタモデルに対してレンダリング処理を行い、映像を作成し、映像表示装置４０１に表示する。また、このレンダリング処理の際には、画像データに記載された透明度の情報を用いてレンダリングを行うことで、人物の背景情報が正しく表示される。
【００５５】
以上で、画像転送スレッドの一連の動作が終了する。
【００５６】
呼び出された画像転送スレッドが上半身画像用のスレッドであった場合には、テーブル１３００と、テーブル１４００には上半身画像用のものが使用され、また工程１２１２ではポリゴン１０２のテクスチャ領域に画像が転送される。それ以外の点に関しては、上記で解説を行った顔画像の転送手段と同一の工程が行われる。
【００５７】
また前述の実施例において、３次元形状モデルを利用して表示したアバタの足部分１０３の代わりに、あらかじめ撮影ずみの映像の動画像で置き換えた場合の実施例を解説する。
このときのアバタキャラクタの構成は図１６の１６００のようになる。顔部分１６０１、上半身部分１６０２、下半身部分１６０３はそれぞれビルボードによって作成されている。
【００５８】
静止、歩行、走行、着席などの各アクションにおける下半身部分の映像を複数の方向から撮影した動画が、あらかじめ記憶装置５０４に保存されている。この映像において足が写っている部分は各画像で同じ大きさに見える様にスタビライズ処理が行われ、背景の切り抜き処理が行われている。また、各動画の開始部分と終了部分の映像に対してはモーフィングと呼ばれる画像処理が行われている。この動画処理により、各動画を連続して再生した際に、つながった映像として表示されるようになっている。
【００５９】
また、図１７に示されるテーブル１７００が保存され、これらの動画ファイル１６２０と、アクション内容、撮影した角度とを対応させるための情報が定義されている。キャラクタの足部分１６０３はビルボードデータによって構成されている。
前述の実施例において、工程１２０９〜１２１１で行われた３次元モデルのアニメーション処理の変わりに、以下の処理が行われる。
【００６０】
まず、構造体９００に登録されているデータ９０２に記述されている値を読み取る。この値をアクション値として保存する。
【００６１】
また、工程１１０４で定められた角度の値を参照する。
【００６２】
テーブル１７００を検索し、列１７０１に書き込まれたアクションＩＤデータが現在のアクション値９０２と同一なものの内で、参照する角度と列１７０２の角度情報が最も近いデータのファイル名を、列１７０３から読み出す。
【００６３】
ビルボード１６０３にはローカルデータとして保存された動画像１６１０〜１６１３のなかから、選択された映像が表示される。
【００６４】
各情報が変化して新たに別の動画が選択されるまでの期間、この動画映像が繰り返し表示される。
【００６５】
また前述の各実施例において、撮影システムと表示システムを互いに組み合わせることにより、双方向で映像を送りあうシステムを構成することが出来る。
【００６６】
【発明の効果】
本発明によれば、仮想空間上の位置に合わせて角度や解像度を変化させるアバタキャラクタを視聴することが出来る。
【００６７】
また、第一の実施例においてはポリゴンを、第二の実施例においてはローカル情報として蓄えられた動画映像を、アバタキャラクタの表示の一部に用いることにより、画像の転送に伴って発生するネットワーク負荷の量を削減することが出来る。
【００６８】
また、フレームレートや解像度の操作により、このシステムが動作する際にネットワークに与える負荷の量を制御することが出来る。
【図面の簡単な説明】
【図１】本発明の第一の実施例におけるシステム全体の構成
【図２】送信システム構成
【図３】送信システム用情報処理装置構成
【図４】受信システム構成
【図５】送信システム用情報処理装置構成
【図６】仮想空間構成模式図
【図７】アバタ構成模式図
【図８】アニメーションデータの構成例
【図９】画像転送スレッドでの通信用データ構造体模式図
【図１０】情報取得スレッドでの通信用データ構造体模式図
【図１１】情報取得スレッドの動作内容
【図１２】画像転送スレッドの動作内容
【図１３】アバタ間の距離情報と、解像度の間引き情報、データサイズの関係をしめすテーブル
【図１４】各カメラと、そのトリミング領域、角度の情報を示すデータ
【図１５】撮影画像から転送用画像を作成する模式図
【図１６】実施例２におけるアバタ構成模式図
【図１７】各アクションと表示角度から動画ファイル名を検索するテーブルの模式図
【符号の説明】
１０１撮影システム、１０２表示システム、１０３ネットワーク、２００画像送信側利用者、２０１カメラ、２０２カメラ、２０３カメラ、２０４カメラ、２０５カメラ、２０７背景用クロマキー布、２０８下半身部分隠蔽用クロマキー布、２０９画像送信利用者用入力装置、２１０送信システム用情報処理装置、３０１中央演算ユニット（ＣＰＵ）、３０２主記憶装置、３０３入出力インタフェース装置、３０４外部記憶装置、３０５画像送信利用者用出力装置、３０７ネットワークインタフェース装置、４００画像受信利用者、４０１映像表示装置、４０２受信システム用情報処理装置、４０３画像受信利用者用入力装置、５０１中央演算ユニット（ＣＰＵ）、５０２主記憶装置
５０３入出力インタフェース装置、５０４外部記憶装置、５０５画像受信利用者用出力装置、５０７ネットワークインタフェース装置、６０１仮想空間、６０２背景オブジェクト、６０３背景オブジェクト、６１０送信利用者アバタ、６１１送信利用者アバタ位置、６１２送信利用者アバタ向き、６２０受信利用者アバタ、６２１受信利用者アバタ位置、６２２受信利用者アバタ向き、７０１顔表示用ビルボード、７０２上半身表示用ビルボード、７０３下半身表示用ポリゴンデータ
７１１顔画像撮影領域、７１２上半身撮影領域、８００アニメーションデータ、８０１動作パターンＩＤ、８０２基本フレームの個数、８０３基本フレーム間の時間間隔データ、８０４関節の回転情報データ、９００データ構造体、９０１画像種類のＩＤ番号、９０２アバタのアニメーション動作を記述するスクリプト情報の記述領域、９０３アバタの移動情報を記述する領域、９０４転送画像の横長さ情報、９０５転送画像の縦長さ情報、９０６転送される画像データの領域
１０００データ構造体、１００１アバタの距離情報、１００２アバタの角度情報、１１０１アバタ間の相対位置の算定、１１０２相対角度の算定、１１０３表示位置の視界内にアバタが入っているかの判定、１１０４送信者をあらわすアバタの向けている向きと距離の計算、１１０５データ構造体１０００への書き込み、１１０６アバタが視界内に入っていない場合の処理、１１０７データ構造体の送信、１１０８データ構造体の受信、１１０９
必要な角度を写しているカメラの選択
１１１０距離情報から解像度を算出、１１１１フレームレートを算出、１１１２画像転送スレッドをタイマーに設定、１２０１入力機器からの入力情報の取得、１２０２アバタ操作命令のデータ構造体９００への書き込み、１２０３撮影用カメラの選択、１２０４カメラによる画像の撮影、１２０５トリミング処理と間引き処理によって撮影画像から必要部分を抽出する処理、１２０６画像データのデータ構造体９００への書き込み、１２０７データ構造体９００の送信、１２０８データ構造体９００の受信、１２０９アバタキャラクタの位置更新と歩行用アニメーションの読み込み、１２１０アニメーション情報の読み込み、１２１１アニメーションのブレンディング、１２１２データ構造体９００からの画像の読み込み、１２１３入力装置４０３からの入力情報の読み込みと処理、１２１４レンダリング処理と映像の表示、１３０１この列には、距離情報の数値が記載される、１３０２この列には値Ｄ１の数値が記載される、１３０３この列には値Ｄ１の数値が記載される、１３０２この列には間引き後のデータサイズ値が記載される、１４０１この列には、カメラのＩＤが記載される、１４０２この列には、トリミングの左端が記載される、１４０３この列には、トリミングの右端が記載される、１４０４この列には、トリミングの上端が記載される、１４０５この列には、トリミングの下端が記載される、１４０６この列には、カメラが撮影する利用者の角度が記載される、１５０１間引き処理前の画像（トリミングされた撮影画像）
１５０２間引き処理後の画像（転送用画像）、１５１０画像を構成する画素の構成、１５１１画素の赤情報、１５１１画素の緑情報、１５１１画素の青情報、１５１１画素の透明度をあらわす情報
１６０１顔表示用ビルボード、１６０２上半身表示用ビルボード、１６０３下半身表示用ビルボード、１６１１顔画像撮影領域、１６１２上半身撮影領域、１６２０各アクションに対応した下半身の動画映像、１７０１アクションＩＤ、１７０２角度情報、１７０３動画像ファイルのファイル名[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a photographing / display device that performs communication and cooperative work by transferring a moving image in real time.
[0002]
[Prior art]
2. Description of the Related Art A device that captures an image of a user, transfers the image in real time, and views the image, thereby realizing communication as if the user is present at a remote place has been put to practical use. The most typical one is called a video conferencing system, which captures the image of a sender in a remote place with a camera and transmits the image together with voice to a display system on the receiving side in real time. With the accompanying image, communication with more information and a larger amount of information can be achieved than in a communication system using only voice such as a telephone.
[0003]
Also, by separating the part of the image where the person is shown from the part where the background is shown, instead of the actual background image, the image of the virtual space constructed on the computer and the person image are synthesized, Research has been conducted in various places to add further added value to a remote video transfer system. These studies are called video avatars and there are many examples.
[0004]
However, if only one camera is used to capture the user, only one direction of video can be obtained. Cannot be obtained. For this reason, there are many cases where the image of the user (transmitting side) always becomes a front-facing image on the receiving system. For example, Patent Document 1 relates to an improved example of a video avatar in consideration of such a problem. By capturing images of the face from multiple directions and converting them into three-dimensional information, or transmitting all images from all directions, and combining them with a three-dimensional shape model representing the body, arbitrary images in the virtual space can be obtained. Video from the position of In addition, since the amount of information to be transferred in a video is large and the communication network is squeezed, the video transfer amount is reduced by narrowing down the video to the face portion. However, for relatively simple and constant structures such as faces, it is easy to reproduce 3D information.However, 3D information of complex shapes, such as the upper body including hands, is captured only from the information of the image to be captured. Difficult to reproduce. In addition, a simple 3D model obtained by pasting an image of a user's image onto a 3D model is simple in shape due to the problem of calculation accuracy. An unnaturally shaped part is often formed, which may give an unnatural impression.
[0005]
[Patent Document 1]
Patent application No. 10-257702
[0006]
[Problems to be solved by the invention]
The image information required by the display device in the virtual space differs depending on the positional relationship between the video receiver and the video sender in the virtual space. For example, when the receiver and the sender are at a distance from each other in the virtual space, a small image of the whole body of the sender is displayed. Conversely, if the receiver and the sender are very close, only the upper body of the sender, especially the face, is displayed large. In addition, a natural impression can be given if the viewed image is different depending on the angle of the receiver with respect to the front of the sender. For example, if the recipient is on the side of the sender, the sender's view from the recipient should be only the side view. However, when trying to reproduce these images in a state where the positional relationship between the two cannot be clearly grasped, these image information comes from a fixed angle, and the image quality is always constant. Will be.
[0007]
Further, since the transfer capacity of the line for transferring the video is limited, it is difficult to transfer the video that simultaneously satisfies many demands. For example, it is difficult to send a moving image that satisfies both smooth motion and a large video image. In order to achieve smooth movement, the video image is small, and if the video image is large, the video image has a low frame rate. In addition, transferring all of the images viewed from each direction greatly reduces the transfer capacity of the line.
[0008]
Means for Inventing the Object
In the present invention, the receiving and displaying system determines necessary video conditions based on the positional relationship between the sender and the receiver in the virtual space. This information is transferred to the image capturing system, which converts the image at the selected angle from among the images captured by the multiple cameras into an image of an appropriate size in terms of time and space by means of thinning and trimming. Modify and transfer to display system. As a result, required video can be transmitted with a small transfer capacity.
[0009]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 illustrates an embodiment of the present invention. The present invention is broadly constructed by an imaging system 101, a display system 102, and a communication network facility 103 connecting them.
[0010]
FIG. 2 is a detailed configuration example of the photographing system 101. A plurality of camera devices 201 to 205 are installed so as to surround the transmission user 200. A background cloth 207 of the same color is installed around the user 200 for chroma key processing. It is also assumed that the input device 209 operated by the transmission user 200 and the lower body of the transmission user 200 are wrapped with the same color cloth 209 so as not to be obtained as video information. The video information of these cameras is transferred to the information processing device 210. Each camera 201 to 205 is assigned a unique ID number, and the information processing device 210 can select a video to be captured based on the ID number. The information processing apparatus has a function of receiving images from these cameras, a function of transmitting and receiving information on the network 103, and a function of calculating and processing the information.
[0011]
FIG. 3 is a block diagram illustrating a configuration example of the information processing device 210. A CPU (Central Processing Unit) 301 executes various processes according to a program recorded in a main storage device 302. The main storage device 302 and the external storage device 304 store programs and data necessary for executing the control processing. It is assumed that a hard disk drive or other existing large-capacity medium is used for the external storage device 304. The input / output interface unit 303 is connected to a data transfer mechanism 507 necessary for exchanging input / output data with the camera devices 201 to 205 to and from the network. The output device 305 is an output device for transmitting the current processing status to the transmission user. The input device 209 is an input device to which the transmitting user can transmit some commands. In this embodiment, the input device 209 is an input device that can be operated by a foot.
[0012]
FIG. 4 is a configuration example of a display system part. The video display device 401 is placed in front of the receiving user 400. The video output from the information processing device 402 is displayed on this device. The receiving user 400 can give a command to the information processing device 402 via the input device 403. The information processing device 402 has a function of creating various images using three-dimensional shape data, two-dimensional video data, and two-dimensional moving image data, a function of transmitting and receiving network information, and a calculation process for the information. It has a function to perform.
[0013]
FIG. 5 is a block diagram illustrating a configuration example of the information processing device 402. A CPU (Central Processing Unit) 501 executes various processes according to programs recorded in a main storage device 502. The main storage device 502 and the external storage device 504 store programs and data necessary for executing the control processing. It is assumed that a hard disk drive or another existing large-capacity medium is used for the external storage device 504. It is assumed that the input / output interface unit 503 has a data transfer device 507 necessary for exchanging input / output data with the network 103 and a processing capability for outputting a video to the video display device 401.
[0014]
FIG. 6 shows a schematic diagram of virtual space information constituted by the information processing device 402 and the internal data of the information processing device 210.
[0015]
A virtual space 601 as a background is configured by three-dimensional shape data of background objects 602 and 603 held by the display system. Characters 610 and 620 representing the actions of the transmitting user 200 and the receiving user 400 are arranged in this virtual space.
[0016]
A position 621 in the virtual space where the character of the receiving user exists and a direction 622 in which the character faces are defined, and are controlled according to the operation of the receiving user.
[0017]
Further, a character 603 imitating the transmitting user is also configured in the virtual space, and is arranged in the background space 601. The position 611 in the virtual space where the character of the transmitting user exists and the direction 612 in which the character faces are also controlled according to the operation of the transmitting user.
[0018]
FIG. 7 shows a method of forming the character 610 of the transmission user.
[0019]
The foot 703 of the character is constituted by three-dimensional shape data. The shape data has animation information, and can receive script data to deform the shape. FIG. 8 is a configuration example of animation data representing an action. Numerical values described in 801 are ID data representing operation patterns. In the script data, this ID data is transferred. In order to execute the script data, a search is made from a series of held animation data for a data in which the animation of the data 801 matches the ID number of the script. The loading of the animation data is performed by calling each joint data of the polygon model. The number 802 in the animation data 800 describes the number of basic frames. Reference numeral 803 describes time interval data between frames. Reference numerals 804 to 804 describe rotation information data of each joint. The animation operation is performed by calculating the time from the animation start time to the display time, reading the rotation information data of the joint, and deforming the polygon.
[0020]
The upper body portion 702 of the character is indicated by displaying frame information of a moving image on a polygon. This moving image is obtained by removing a chroma key color portion from a portion 712 of a moving image obtained by capturing an image of the sender 200. Since the lower body part of the sender is surrounded by the cover 208, it is not captured in the image. The polygon on which the moving image is displayed rotates from the position of the transmission character in the virtual space to the surface of the receiver in the virtual space. The polygon for presenting the front of the image to the user by changing the direction in this way is hereinafter referred to as a billboard.
[0021]
The face 701 of the character is composed of a billboard, like the upper body. The moving image displayed on the billboard is obtained by removing a chroma key portion from a portion 711 of a moving image obtained by capturing the image of the sender 200. The image resolution when drawn on the billboard differs between the face 701 of the character and the upper body 702 of the character, and the image 701 of the face generally has a higher pixel density.
[0022]
Hereinafter, a character in the virtual space that reflects the action of the user such as 610 is referred to as an avatar.
In the virtual space 600 constructed as the internal data of the display system 210, the avatar of the sender exists at the position 611 (S_p). The direction 612 in which the avatar 610 of the sender is facing is defined as S_d. In addition, a virtual viewpoint position of the receiver 400 exists at another position 621 (O_p) in the virtual space 600. It is assumed that the direction 622 to which the receiver faces in the virtual space is O_d. These values are represented by vector values.
[0023]
The general-purpose network 103 has a function of transmitting data that the display system transfers to each other. In addition, each system has a function of obtaining information on a transfer capacity available for transfer in the current network by inputting a specific signal to the network.
[0024]
FIG. 9 shows a data structure transferred from the display system to the photographing system in this embodiment. FIG. 10 shows a data structure transferred from the imaging system to the display system. The meaning of the internal data of these structures will be described along with the explanation of the processing operations described below.
[0025]
In the following, the flow of processing in which the above devices operate will be described in order. FIGS. 11 and 12 are block diagrams showing a series of operation order.
[0026]
FIG. 11 describes processing for determining setting information of data to be transferred from the imaging system. FIG. 12 describes processing for transferring data from the imaging system to the display system. In both processes, the information processing devices 210 and 402 are called and executed at regular intervals by using a timer function held in the CPUs 301 and 501.
[0027]
The series of processes in FIG. 11 is hereinafter referred to as an information acquisition thread. This process is performed at a frequency of once to several times per second according to a set value predetermined in the program. The information acquisition thread has a program placed in the information processing 402 and its work content described in the information processing device 210, and the work is started by a timer function of the information processing 402.
[0028]
The series of processes in FIG. 12 is hereinafter referred to as an image transfer thread. This process is called at a rate of several seconds to several tens of times, and the frequency of the call is set by the information acquisition thread. The image transfer thread has a program placed in the information processing 210 and its work content described in the information processing device 402, and the work is started by a timer function of the information processing 210.
[0029]
First, a series of operations of the information acquisition thread will be described with reference to FIG.
The relative position of the avatar 610 of the photographing user 200 with respect to the position of the avatar 620 of the display user 400 is obtained by subtracting the position vector 621 (O_p) from the position vector 611 (S_p) (step 1101).
The angle difference between this direction and the direction 622 to which the avatar 620 of the receiver faces is calculated based on the following formula (Equation 1) (step 1102).
[0030]
(Equation 1)
cos θ = (S_p−O_p) · O_d / (| S_p−O_p || O_d |)
This value is compared with the angle of view W_h of the video display device 402 to determine whether the avatar 610 is within the range of the display environment (step 1103).
If cos θ <= cos (W_h), the process proceeds to step 1104.
If cos θ <cosW_h, there is no need to transfer data because it is outside the display range. In this case, 0 is written in the distance information of the data structure (step 1106), and the process proceeds to step 1107.
In step 1104, the direction φ in which the sender's avatar is facing the receiver is calculated. A vector obtained by rotating (O_p−S_p) around the upward vector by −S_d is defined as O_p1. This vector represents the relative position of the receiver as viewed from the sender's avatar. A value obtained by converting O_p1 into a musical coordinate expression is defined as rφ. φ is a unit length vector expressing the direction of O_p1 viewed from the origin, and r is a value representing the distance from the origin to O_p1 (that is, the distance from O_p to S_p).
[0031]
Write r in the transmission information 1001 of the data structure 1000, and write φ in 1002, and proceed to the process. (Step 1105)
The display system transmits the data structure 1000 to the imaging system 101. (Step 1107)
The imaging system receives the data structure 1000 from the display system. (Step 1108) The angle information 1002 of the data structure is read, and the camera located closest to the corresponding angle is selected (Step 1109). The relationship between the camera ID and the angle is described in FIG. 14 described later.
In step 1110, a required resolution is calculated from the distance information 1001 of the data structure. The required resolution can be obtained by referring to a table defined on the imaging system side. A table 1300 in FIG. 13 expresses a configuration example of this table.
[0032]
In this table, various data necessary for transferring a face image are defined in association with distance values.
[0033]
The distance information (column 1301) described in the table is read in order, and when the distance closest to the value of the distance read from the data 1001 of the data structure 1000 is found, the information associated with the distance is read. Read as resolution information for the face image. This data is composed of two integer values D1 and D2 described in columns 1302 and 1303, and is used in a later-described step as data for changing the resolution of an image. Further, the data size of the face image required for one frame is read from the column 1304. These values are recorded on the main memory, and the process proceeds to step 1111.
[0034]
Further, a similar table is separately prepared for obtaining resolution information of the upper body, and these data are read in the same manner.
[0035]
In step 1111, the transmission frame rate is calculated from the resolution information obtained in the above steps. It sends instructions to the network 103 and receives the amount of data per second available on the current line. The transferable data amount per second is divided into the transfer data transfer capacity for the face image video and the transfer data transfer capacity for the upper body image. For this ratio value, a positive decimal value which is given as a system fixed value and sums to 1 is used. The ratio can be changed by the sender 200 inputting a numerical value with the input device 305.
[0036]
A value obtained by dividing the data transfer capacity of the face image by the data size required for the process 1110 frames is defined as the face image frame rate. However, when the frame rate exceeds the frame rate that can be processed by the transmission mechanism, the upper limit is set to the frame rate of the face image. When 0 is described in the distance information 1001, 0 is set in the frame rate of the face image.
[0037]
Similarly, the frame rate of the upper body image is obtained by dividing the data transfer capacity of the upper body image by the data size required for the upper body image of one frame. However, if the frame rate exceeds the frame rate that can be processed by the transmission mechanism, the upper limit is set to the frame rate of the upper body image, and if 0 is described in the distance information 1001, 0 is set to the frame rate of the upper body image. Is done.
From the frame rate of the face image and the frame rate of the upper body image, the calling interval of the image transfer thread is obtained in milliseconds. The timer function of the information processing device 210 is set to call the image transfer thread for each value (step 1112).
[0038]
Through the above steps, setting information for transferring data from the transmission system is determined.
After the above data reception processing, until the next data 1000 comes from the display system 102, the imaging system 101 repeats the processing of the image transfer thread using these set values.
[0039]
Next, the operation of the image transfer thread will be described.
[0040]
The image transfer thread is activated by the timer function of the information processing device 210 set in step 1112. Hereinafter, a series of operations will be described for a case where a thread for transferring a face image is activated.
[0041]
In step 1201, input information input by the sender using the input device 209 is taken into the information processing device 210. This information is used as a script command for moving the avatar character or moving the foot. In step 1202, the information fetched in step 1201 is written to the script description area 902 of the data structure 900 for transfer. The character movement information is written in 903.
In steps 1203 to 1204, an image (face image) of the transmission user is created.
[0042]
As shown in FIG. 14, a table 1400 defining an area for capturing an image is defined in the information processing device 210 as a part of the program. The information in this table is composed of camera ID number C_ID (column 1401), rotation angle R (column 1406), and four numerical values w0, w1, h0, h1 (columns 1402 to 1405).
The ID of the camera selected in step 1109 is searched from column 1401. The information of the area linked to this value is read from the columns 1402 to 1405 of the table 1400 (step 1203).
In step 1204, an image of the camera corresponding to the camera number C_ID is read. This image is transferred to the information processing device 210 as electronic data.
[0043]
The image in the range of (w0, h0)-(w1, h1) among the images captured in step 1204 is an area to be transmitted as a transfer image. This range corresponds to the photographing region range 711 or 712 in FIG. Reference numeral 1501 in FIG. 15 represents an image in a range expressed by (w0, h0)-(w1, h1). This image 1501 is composed of pixels 1510. The pixel 1510 is constituted by three sets of numbers indicating a red value 1511, a green value 1512, and a blue value 1513, and a number 1514 indicating transparency. The resolution of the video in this area is reduced to the resolution calculated from the distance information 1001, and the video is written in the transfer data structure. For this purpose, for the h0th horizontal line, images of w0 to w1 are decimated using the information of D1 and D2 set in step 1110 as shown in FIG. (Step 1205). However, in the image 1501 of FIG. 15, the area is filled with black and is not transferred. The area to be transferred is the other area, and the image after the thinning is the image shown by the image 1502. In this transfer, if the color information indicated by the combination of the red value 1511, the green value 1512, and the blue value 1513 matches the color information of the chroma key color, the transparency information 1514 indicates transparency. Write the value. Otherwise, write a value indicating opacity.
[0044]
In step 1206, the image ID is written to the data area 901 of the data structure 900. This ID is one of fixed values (ID_FACE, ID_UPPER) representing any of the body part, the face, and the upper body. Further, the value of the horizontal width of the image 1502 is written in the data area 904, and the value of the vertical width of the image 1502 is written in the data area 905.
[0045]
The information of the transfer data structure 900 created in the above steps 1201 to 1206 is transmitted (step 1207).
[0046]
The display system receives the transfer data structure 900 (operation 1208).
[0047]
In step 1209, the position data of the avatar in the virtual space is moved based on the character movement data 903. In addition, the animation processing accompanying the movement is loaded.
[0048]
In step 1210, the three-dimensional shape model 703 constituting the lower body of the avatar performs an animation process associated with the script 902 in the action information area.
[0049]
If some animation processing has already been loaded, or if a walking animation for movement has been loaded, blending processing of these animations is performed. However, an existing method is used for the setting of the animation and the blending process ((Step 1211)).
[0050]
In step 1212, image data is read. The data 901 in the header area is checked. If it is ID_UPPER, it is determined that an upper body image has been sent. If it is ID_FACE, it is determined that a face image has been sent. Or on the texture area of the polygon 102.
[0051]
For the polygons whose texture has not been updated by this processing, rendering is performed using the same texture as the previous time.
[0052]
In step 1213, input information from the user 400 is read.
[0053]
The display system 401 acquires information input by the receiving user 400 to the input device 403 through the interface 503, and changes the viewpoint position 621 and angle 622 of the receiver in the virtual space by reflecting the input information.
[0054]
In step 1214, a screen is displayed. A rendering process is performed on the background image and the avatar model to create an image, and the image is displayed on the image display device 401. Also, at the time of this rendering processing, by performing rendering using the transparency information described in the image data, the background information of the person is correctly displayed.
[0055]
Thus, a series of operations of the image transfer thread ends.
[0056]
If the called image transfer thread is a thread for the upper body image, the table 1300 and the table 1400 are used for the upper body image, and the image is transferred to the texture area of the polygon 102 in step 1212. You. In other respects, the same steps as those of the face image transferring means described above are performed.
[0057]
In the above-described embodiment, an embodiment will be described in which a footage 103 of an avatar displayed using a three-dimensional shape model is replaced with a moving image of a previously shot video.
The configuration of the avatar character at this time is as shown in 1600 in FIG. The face part 1601, the upper body part 1602, and the lower body part 1603 are respectively created by billboards.
[0058]
Moving images obtained by capturing images of the lower body in each action such as stationary, walking, running, and sitting are stored in the storage device 504 in advance. In this video, a portion where a foot is shown is subjected to a stabilizing process so that each image looks the same size, and a background clipping process is performed. Further, image processing called morphing is performed on the video at the start and end portions of each moving image. By this moving image processing, when each moving image is reproduced continuously, it is displayed as a connected image.
[0059]
Also, a table 1700 shown in FIG. 17 is stored, and information for associating the moving image file 1620 with the action content and the shooting angle is defined. The character's foot 1603 is composed of billboard data.
In the above-described embodiment, the following processing is performed instead of the animation processing of the three-dimensional model performed in steps 1209 to 1211.
[0060]
First, a value described in the data 902 registered in the structure 900 is read. Save this value as the action value.
[0061]
Further, the value of the angle determined in step 1104 is referred to.
[0062]
The table 1700 is searched to read out from the column 1703 the file name of the data whose action ID data written in the column 1701 is the same as the current action value 902 and whose angle information in the column 1702 is closest to the reference angle. .
[0063]
On the billboard 1603, a video selected from the moving images 1610 to 1613 stored as local data is displayed.
[0064]
The moving image is repeatedly displayed until each information changes and another moving image is selected.
[0065]
In each of the above-described embodiments, a system that transmits images in both directions can be configured by combining the photographing system and the display system with each other.
[0066]
【The invention's effect】
According to the present invention, an avatar character whose angle and resolution are changed according to the position in the virtual space can be viewed.
[0067]
In addition, by using a polygon in the first embodiment and a moving image stored as local information in the second embodiment as a part of the display of the avatar character, a network generated with the transfer of the image is used. The amount of load can be reduced.
[0068]
Further, by controlling the frame rate and the resolution, the amount of load applied to the network when the system operates can be controlled.
[Brief description of the drawings]
FIG. 1 is a configuration of an entire system according to a first embodiment of the present invention.
FIG. 2 is a transmission system configuration.
FIG. 3 is a configuration of an information processing apparatus for a transmission system.
FIG. 4 is a configuration of a receiving system.
FIG. 5 is a configuration of an information processing apparatus for a transmission system.
FIG. 6 is a schematic diagram of a virtual space configuration.
FIG. 7 is a schematic diagram of an avatar configuration.
FIG. 8 is a configuration example of animation data.
FIG. 9 is a schematic diagram of a data structure for communication in an image transfer thread.
FIG. 10 is a schematic diagram of a communication data structure in an information acquisition thread.
FIG. 11 shows the operation of an information acquisition thread.
FIG. 12 shows the operation of an image transfer thread.
FIG. 13 is a table showing a relationship between avatar distance information, resolution thinning information, and data size.
FIG. 14 shows data indicating information on each camera and its trimming area and angle.
FIG. 15 is a schematic diagram for creating a transfer image from a captured image;
FIG. 16 is a schematic diagram of an avatar configuration according to a second embodiment.
FIG. 17 is a schematic diagram of a table for searching a moving image file name from each action and a display angle.
[Explanation of symbols]
101 photographing system, 102 display system, 103 network, 200 image transmission side user, 201 camera, 202 camera, 203 camera, 204 camera, 205 camera, 207 background chroma key cloth, 208 lower body part concealment chroma key cloth, 209 image transmission User input device, 210 Transmission system information processing device, 301 Central processing unit (CPU), 302 Main storage device, 303 Input / output interface device, 304 External storage device, 305 Image transmission user output device, 307 Network interface Device, 400 image receiving user, 401 image display device, 402 information processing device for receiving system, 403 image receiving user input device, 501 central processing unit (CPU), 502 main storage device
503 input / output interface device, 504 external storage device, 505 image receiving user output device, 507 network interface device, 601 virtual space, 602 background object, 603 background object, 610 transmission user avatar, 611 transmission user avatar position, 612 Sending user avatar direction, 620 Receiving user avatar, 621 Receiving user avatar position, 622 Receiving user avatar direction, 701 Face display billboard, 702 Upper body display billboard, 703 Lower body display polygon data
711 face image shooting area, 712 upper body shooting area, 800 animation data, 801 operation pattern ID, 802 number of basic frames, 803 time interval data between basic frames, 804 joint rotation information data, 900 data structure, 901 image type 902, an area for describing script information that describes an avatar animation operation, 903 an area for describing movement information for an avatar, 904 horizontal length information of a transferred image, 905 vertical length information of a transferred image, 906 image data to be transferred Area of
1000 data structure, 1001 avatar distance information, 1002 avatar angle information, 1101 calculation of relative position between avatars, 1102 calculation of relative angle, 1103 determination of whether avatar is within view of display position, 1104 sender Calculate the orientation and distance of the avatar pointing to 1105 Write to data structure 1000 1106 Process when avatar is not in view 1107 Transmit data structure 1108 Receive data structure 1109
Select the camera that shows the required angle
1110 Calculate resolution from distance information, 1111 calculate frame rate, 1112 set image transfer thread to timer, 1201 acquire input information from input device, 1202 write avatar operation instruction to data structure 900, 1203 camera for shooting Selection, 1204 shooting of an image by a camera, 1205 processing of extracting necessary parts from a shot image by trimming processing and thinning processing, 1206 writing of image data to data structure 900, transmission of 1207 data structure 900, 1208 data structure Receiving body 900, 1209 updating avatar character position and reading animation for walking, 1210 reading animation information, 1211 blending animation, 1212 reading images from data structure 900, 1212 3 Reading and processing of input information from input device 403, 1214 Rendering process and display of video, 1301 Numerical value of distance information is described in this column 1302 Numerical value of value D1 is described in this column 1303 This column describes the value of the value D1. 1302 This column describes the data size value after thinning out. 1401 This column describes the camera ID. 1402 This column describes The left end of the trimming is described. 1403 The right end of the trimming is described in this column. 1404 The upper end of the trimming is described in this column. 1405 The lower end of the trimming is described in this column. 1406 In this column, the angle of the user photographed by the camera is described. 1501 Image before thinning processing (trimmed photographed image)
1502 Image after thinning process (transfer image), 1510 Configuration of pixels constituting image, 1511 pixel red information, 1511 pixel green information, 1511 pixel blue information, 1511 pixel transparency information
1601 face display billboard, 1602 upper body display billboard, 1603 lower body display billboard, 1611 face image shooting area, 1612 upper body shooting area, 1620 lower body moving image corresponding to each action, 1701 action ID, 1702 angle information , 1703 File name of moving image file

Claims

An image that captures the body image of the user is synthesized in a virtual space configured on a computer, and transmitted in a virtual space displayed on the receiver side in an image display device that performs communication at a remote place. An image to be transmitted based on the three-dimensional positional relationship of the sender, the required angle, resolution, frame rate, and information on the body part of the sender, sequentially transmitted to the sender, and transmitted based on the information transferred by the sender. A video presentation device having a function of switching the content of a video.

An image that captures the user's physical information is synthesized in a virtual space configured on a computer, and transmitted in a virtual space displayed on the receiver side on an image display device that communicates at a remote place. The required angle, resolution, and frame rate information are sequentially transferred to the transmitting side based on the three-dimensional positional relationship of the user, and the image of the image transmitted based on the information is held locally by the receiving side. An image presentation device having a function of synthesizing with a video of a body part and displaying the image in a virtual space.

An image that captures the user's physical information is synthesized in a virtual space configured on a computer, and transmitted in a virtual space displayed on the receiver side on an image display device that communicates at a remote place. The required angle, resolution, and frame rate information are sequentially transferred to the transmitting side based on the three-dimensional positional relationship of the user, and the image of the image transmitted based on the information is held locally by the receiving side. An image presentation device having a function of synthesizing with a three-dimensional shape model of a body part and displaying the same in a virtual space.