JP2004246667A

JP2004246667A - Method for generating free visual point moving image data and program for making computer perform the same processing

Info

Publication number: JP2004246667A
Application number: JP2003036577A
Authority: JP
Inventors: Hideo Saito; 英雄斎藤; Naho Inamoto; 奈穂稲本; Sachiko Iwase; 幸子岩瀬
Original assignee: Keio University
Current assignee: Keio University
Priority date: 2003-02-14
Filing date: 2003-02-14
Publication date: 2004-09-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a free visual point moving image data generating method for generating image data at the middle visual point of both of an object in a static condition and an object in a moving condition at a reception side. <P>SOLUTION: The user of a reception side device inputs the time information and visual point position of an image to be appreciated by using an input device 21. A CPU 23 transmits the information through communication equipment 26 to the transmission side device. The CPU 14 of the transmission side device transmits the moving image data of the moving area of an image picked up by two cameras in a time shown by the received time information, the structural characteristic information of the moving image data, and the corresponding relation information of the moving image data between the two cameras through the communication equipment 17 to the reception side device. The reception side device successively generates the middle visual point image data of the moving area for each frame by using the received moving image data and various information, and composites the image data at the middle visual point of the static area of a preliminarily received close-range view with the image data at the middle visual point of the static area of a distant view, and displays the image data on a monitor 22. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、３次元空間内の被写体の動画像を多視点から撮像し、伝送するシステムに関し、さらに詳細には、多視点における動画像データを送信側の装置に蓄積し、隣接する視点の間の任意の中間視点における動画像データを受信側の装置で生成する方法に関する。
【０００２】
【従来の技術】
従来、３次元空間内の被写体を多視点から撮像し、画像データを伝送するシステムとしては、特許文献１に記載されたものがある。このシステムは、送信側では、複数の異なる視点から被写体を撮像し、撮像された画像データに基づいて撮像された画像の構造データを作成し、その画像データおよび構造データを送信し、受信側では、新視点を指定し、その新視点における画像データを前記多視点における画像データおよび構造データから生成可能にしたものである。
【０００３】
【特許文献１】
特開２００１−８２３１号公報
【０００４】
【発明が解決しようとする課題】
しかし、前記特許文献１に記載されたシステムは、前記特許文献１の図２、図５等に示されているように、静止状態の被写体に対応するものであり、移動状態の被写体については言及されていない。
【０００５】
そこで、本発明は、静止状態の被写体および移動状態の被写体の双方について全ての隣接する視点の間の任意の中間視点における画像データを受信側で生成することができる自由視点動画像データ生成方法を提供することを目的とする。
【０００６】
また、本発明は、静止状態の被写体および移動状態の被写体の双方について全ての隣接する視点の間の任意の中間視点における画像データを受信側で生成することができ、かつ移動状態の被写体を追跡し、画面の中央等に表示することができる自由視点動画像データ生成方法を提供することを目的とする。
【０００７】
【課題を解決するための手段】
本発明の自由視点動画像データ生成方法は、送信側で３次元空間内の被写体を複数の異なる視点から撮像して動画像データを取得し、受信側で隣接視点の間の任意の中間視点における前記被写体の動画像データを生成する自由視点動画像データ生成方法であって、前記被写体を複数の異なる視点から撮像して取得した動画像データを移動領域と静止領域とに分割するステップと、全ての前記隣接視点間の射影幾何情報を推定するステップと、前記静止領域の画像データについて、前記射影幾何情報を基に全ての前記中間視点における自由視点静止画像データを生成するステップと、前記移動領域の動画像データについて、前記射影幾何情報を基に該動画像データの構造的特徴情報、および全ての前記隣接視点間の動画像データの対応関係情報を生成するステップと、全ての前記中間視点における自由視点静止画像データおよび全ての前記射影幾何情報を前記受信側へ予め送信するステップとを前記送信側で実行し、全ての前記中間視点における自由視点静止画像データおよび全ての前記射影幾何情報を予め受信して保存するステップと、前記隣接視点の間の任意の中間視点を選択し、前記送信側へ通知するステップとを前記受信側で実行し、前記受信側から通知された中間視点に対応する隣接視点における移動領域の動画像データ、該動画像データの構造的特徴情報、および前記隣接視点間の動画像データの対応関係情報を前記受信側へ送信するステップを前記送信側で実行し、前記移動領域の動画像データ、該動画像データの構造的特徴情報、および前記隣接視点間の動画像データの対応関係情報を受信し、前記任意の中間視点における前記移動領域の動画像データを生成するステップと、前記予め保存された任意の中間視点における自由視点静止画像データを読み出し、前記生成された任意の中間視点における移動領域の動画像データと合成して、前記任意の中間視点における動画像データを生成するステップとを前記受信側で実行することを特徴とする。
【０００８】
このように構成したことにより、送信側で、被写体を複数の異なる視点から撮像してそれぞれの視点における動画像データを取得し、隣接視点間の射影幾何情報を推定し、前記複数の異なる視点における動画像データを前記複数の異なる視点における静止領域の静止画像データおよび移動領域の動画像データに分離し、前記射影幾何情報を用いて前記隣接視点間の静止画像データの対応付けを行い、さらに前記隣接視点の間の任意の中間視点における静止画像データをモーフィングにより生成し、前記射影幾何情報を用いて前記隣接視点間の動画像データの対応付けを行い、さらに前記隣接視点における動画像データの構造的特徴情報および対応関係情報を生成し、全ての前記中間視点における自由視点静止画像データおよび全ての前記射影幾何情報を前記受信側へ予め送信し、受信側で、全ての前記中間視点における自由視点静止画像データおよび全ての前記射影幾何情報を予め受信して保存し、隣接視点の間の任意の中間視点を選択し、前記送信側へ通知し、送信側から送られてくる前記中間視点に対応する隣接視点における移動領域の動画像データ、該動画像データの構造的特徴情報、および前記隣接視点間の動画像データの対応関係情報と予め保存しておいた射影幾何情報とを用いて前記任意の中間視点における前記移動領域の動画像データをモーフィングにより生成し、前記予め保存しておいた中間視点における静止画像データと合成することで、静止状態の被写体および移動状態の被写体の双方の中間視点における動画像データを生成することができる。
【０００９】
また、本発明の自由視点動画像データ生成方法は、前記複数の異なる視点毎に前記移動領域内の任意の被写体を追跡対象として選択するステップと、前記選択された被写体の位置情報を保存するステップとを前記送信側で実行することを特徴とする。
【００１０】
このように構成したことにより、送信側で、被写体を複数の異なる視点から撮像してそれぞれの視点における動画像データを取得し、隣接視点間の射影幾何情報を推定し、前記複数の異なる視点における動画像データを前記複数の異なる視点における静止領域の静止画像データおよび移動領域の動画像データに分離し、前記射影幾何情報を用いて前記隣接視点間の静止画像データの対応付けを行い、さらに前記隣接視点の間の任意の中間視点における静止画像データをモーフィングにより生成し、前記移動領域の任意の被写体を追跡対象として選択し、前記射影幾何情報を用いて前記隣接視点間の追跡対象の被写体の動画像データの対応付けを行い、さらに前記隣接視点における追跡対象の被写体の動画像データの構造的特徴情報および対応関係情報を生成し、全ての前記中間視点における自由視点静止画像データおよび全ての前記射影幾何情報を前記受信側へ予め送信し、受信側で、全ての前記中間視点における自由視点静止画像データおよび全ての前記射影幾何情報を予め受信して保存し、隣接視点の間の任意の中間視点を選択して前記送信側へ通知し、送信側から送られてくる前記中間視点に対応する隣接視点における追跡対象の被写体の動画像データ、該動画像データの構造的特徴情報、および前記隣接視点間の動画像データの対応関係情報と予め保存しておいた射影幾何情報とを用いて前記任意の中間視点における前記追跡対象の被写体の動画像データをモーフィングにより生成し、前記予め保存しておいた中間視点における静止画像データと合成することで、静止状態の被写体および追跡対象の被写体の双方の中間視点における動画像データを生成し、かつ移動状態の被写体を追跡し、画面の中央等に表示することができる。
【００１１】
【発明の実施の形態】
以下、本発明の実施の形態について図面を用いて説明する。
【００１２】
（第１の実施の形態）
図１は、本発明の第１の実施の形態の自由視点動画像データ生成システムのブロック図である。ここで、（ａ）は送信側装置、（ｂ）は受信側装置である。
送信側装置は、それぞれがバス１８に接続された、ｎ台（ｎは２以上の整数）のカメラ１１_１，１１_２，…１１_ｎと、入力装置１２と、モニタ１３と、ＣＰＵ１４と、メインメモリ１５と、ディスクメモリ１６と、通信装置１７とを具備する。
【００１３】
カメラ１１_１，１１_２，…１１_ｎは、それぞれが被写体を異なる視点で撮像し、動画像データを生成する。ここで、カメラ１１_１，１１_２，１１_ｎは、撮像を行っている間中、パン、チルト、ズームイン、ズームアウトのいずれも行わず、固定されている。入力装置１２は、マウス、キーボード等であり、ユーザの指令等の入力に使用される。モニタ１３は、液晶ディスプレイ等からなり、入力装置１２から入力されたデータ、カメラ１１_１，１１_２，…１１_ｎで撮像された動画像データ、メインメモリ１５から読み出された画像データ、ディスクメモリ１６から読み出された画像データ等を表示する。ＣＰＵ１４は、マイクロプロセッサを備えており、メインメモリ１５に格納されたプログラムに従って各種処理を実行する。メインメモリ１５は、ＲＯＭおよびＲＡＭからなり、ＣＰＵ１４が各種処理を実行するときに用いるプログラムが格納される。また、カメラ１１_１，１１_２，…１１_ｎで撮像された画像データ、ディスクメモリ１６から読み出された画像データ等が一時的に格納される。ディスクメモリ１６は、ハードディスク装置等からなり、カメラ１１_１，１１_２，…１１_ｎで撮像された動画像データが格納される。また、カメラ１１_１，１１_２，１１_ｎで撮像された動画像の位置的な対応関係を表すデータ（詳細は後述）が格納される。通信装置１７は、受信側装置の通信装置との間でデータの通信を行う。
【００１４】
受信側装置は、それぞれがバス２７に接続された、入力装置２１と、モニタ２２と、ＣＰＵ２３と、メインメモリ２４と、ディスクメモリ２５と、通信装置２６とを具備する。
【００１５】
入力装置２１は、マウス、キーボード等であり、ユーザの指令等の入力に使用される。モニタ２２は、液晶ディスプレイ等からなり、入力装置２１から入力されたデータ、メインメモリ２４から読み出された画像データ、ディスクメモリ２５から読み出された画像データ等が表示される。ＣＰＵ２３は、マイクロプロセッサを備えており、メインメモリ２４に格納されたプログラムに従って各種処理を実行する。メインメモリ２４は、ＲＯＭおよびＲＡＭからなり、ＣＰＵ２３が各種処理を実行するときに用いるプログラムが格納される。また、ディスクメモリ２５から読み出された画像データ等が一時的に格納される。ディスクメモリ２５は、ハードディスク装置からなり、送信側装置から送られてきた動画像データが格納される。また、送信側装置から送られてきた、カメラ１１_１，１１_２，…１１_ｎで撮像された動画像の位置的な対応関係を表すデータ（詳細は後述）が格納される。通信装置２６は、送信側装置の通信装置１７との間でデータの通信を行う。ここで、送信側装置の通信装置１７と受信側装置の通信装置２６との間の通信媒体は、インターネット、ＬＡＮ、地上波放送網（ユーザから放送局へリクエストデータ等の送信が可能なデジタル放送）等である。
【００１６】
以上のように構成された自由視点動画像データ生成システムの動作を説明する。まず、概要を説明する。ここでは、被写体としてサッカーの試合を行っている競技場、選手、およびボールとする。そして、図２に示されているように、４台のカメラ１１_１，１１_２，１１３，１１４を観客席上方に配置する。また、図３（ａ）に示されているような被写体を、グラウンドおよびゴール（図３（ｂ））と、観客席（図３（ｃ））と、選手およびボール（図３（ｄ））とに分ける。画像の特徴としては、グラウンドおよびゴールは近景の静止領域、観客席は遠景の静止領域、選手およびボールは移動領域となる。そして、近景の静止領域であるグラウンドおよびゴールには平面射影行列（Ｈｏｍｏｇｒａｐｈｙｍａｔｒｉｘ）を適用して隣接する視点から得られた画像間の対応点を算出し、隣接する視点の間の任意の中間視点である仮想視点における補間画像を生成する。ここで、中間視点とは２つの視点の間の視点を意味するものであり、２つの視点の中央の視点を意味するものではない。移動領域である選手およびボールにはエピポーラ幾何を適用し、隣接する視点から得られた画像間の対応点を算出し、仮想視点における補間画像を生成する。遠景の静止領域である観客席については、モザイク処理を行って２視点における画像を連結し、生成されたパノラマ画像から仮想視点における画像を切り出す。最後に、３種類の画像を合成することで、仮想視点における被写体全体の動画像を生成する。
【００１７】
次に、近景の静止領域、遠景の静止領域、移動領域の順序で説明する。なお、近景の静止領域と移動領域との分離は、画像データの背景差分および２値化により行う（詳細は後述）。
【００１８】
〔１〕近景の静止領域
近景の静止領域では、平面射影行列を求め、その平面射影行列を用いて、異なる視点で得られた画像間の対応点を算出し、仮想視点における補間画像を生成する。
【００１９】
まず、図４を参照しながら、平面射影行列について説明する。互いの視点が異なる２つのカメラＣ_１，Ｃ_２で、３次元空間内の被写体を撮像する。ここで、透視投影によるカメラＣ_１の投影面Ｉ_１上に固定したカメラ座標系を（ｘ_１，ｙ_１）、カメラＣ_２の投影面Ｉ_２上に固定したカメラ座標系を（ｘ_２，ｙ_２）とする。このとき、３次元空間内の平面Ｊ上の点Ｐについて、式［１］が成立する。この式における行列は平面射影行列と呼ばれる。なお、透視投影については、例えば、”末松良一他著「画像処理工学」、ｐｐ．１８２−１８４、２０００−１０−２６、（株）コロナ社”に詳細に記載されているので、説明を省略する。
【００２０】
【数１】

平面Ｊ上の対応のとれている４点を用いて、平面射影行列の各要素を求めることができる。そして、この平面射影行列を用いると、カメラＣ_１の投影面Ｉ_１上で平面Ｊ上の任意の１点の（ｘ_１，ｙ_１）座標を与えることにより、カメラＣ_２の投影面Ｉ_２上の対応点の（ｘ_２，ｙ_２）座標を算出することができる。したがって、平面Ｊ上の全ての点について、（ｘ_１，ｙ_１）座標と対応する（ｘ_２，ｙ_２）座標の値を算出することができる。本実施の形態では、図３（ｂ）に示したグラウンドを１枚の平面とし、ゴールを複数枚（例えば４枚）の平面として、それぞれの平面毎に平面射影行列を求める。
【００２１】
次に、図５を参照しながら、カメラＣ_１の視点とカメラＣ_２の視点の間の任意の仮想視点における補間画像の生成方法について説明する。図５において、カメラＣ_１の視点とカメラＣ_２の視点の間の仮想視点で撮像を行う仮想カメラＣ_１２が撮像した画像データを生成する。本実施の形態では、線形補間によるモーフィングを行う。カメラＣ_１の視点、カメラＣ_２の視点から仮想カメラＣ_１２までの距離の比をα：１−α（ただし、０≦α≦１）、仮想カメラＣ_１２の投影面Ｉ_１２上に固定したカメラ座標系を（ｘ_１２，ｙ_１２）とすると、投影面Ｉ_１２上の点の座標は下記の式［２］で表すことが出来る。
【００２２】
Ｐ_１２＝（１−α）Ｐ_１＋αＰ_２…式［２］
【００２３】
ここで、Ｐ_１２、Ｐ_１、Ｐ_２は、それぞれ、投影面Ｉ_１２上、投影面Ｉ_１上、投影面Ｉ_２上の対応点の座標の位置ベクトルを表す。α＝０の場合はカメラＣ_１と同一視点、α＝１の場合はカメラＣ_２と同一視点、α＝０．５の場合はカメラＣ_１とカメラＣ_２の中央の視点となる。本実施の形態では、カメラＣ_１の視点からのモーフィング（図５のワープ１）とカメラＣ_２の視点からのモーフィング（図５のワープ２）の２通りのモーフィングを行って２つの異なる補間画像を生成し、それらに下記の式［３］を適用して仮想視点における画像データを生成する。
【００２４】
【数２】

【００２５】
この式において、ｖ_１、ｖ_２は、それぞれ投影面Ｉ_１上、投影面Ｉ_２上の画像データの明度値であり、ｖ’は投影面Ｉ_１２上の画像データの明度値である。
【００２６】
〔２〕遠景の静止領域
本実施の形態では、観客席のように、カメラからの距離が十分に遠く、それ自体の凹凸が無視できるような領域を遠景領域としているため、１枚の無限遠に存在する平面で近似する。また、遠景領域の場合、隣接する２つの視点に共通する領域が少ないため、近景の静止領域と同様のモーフィング処理を行うと計算効率が悪いので、平面近似した２つの視点における画像をモザイク処理により連結し、生成されたパノラマ画像から仮想視点における画像を切り出す。
【００２７】
図６を参照しながら、モザイク処理について説明する。最初に、第１視点を有するカメラの投影面Ｉ_１上の背景画像の画像データと第２視点を有するカメラの投影面Ｉ_２上の背景画像の画像データとの間の平面射影行列Ｈ_２１を求め、その平面射影行列Ｈ_２１を用いて、２つの視点における画像データの座標系を統一する。次に、２つの画像データを連結し、パノラマ画像データを生成する。図６における重複エリアは、平面射影行列Ｈ_２１が決まると自動的に決まる。パノラマ画像データを生成するときに、重複エリアの明度を平滑化するが、単純に２つの画像データの明度の平均値を用いると、画像データ間の明るさの違いにより不自然なつなぎ目が出来てしまうので、下記の式［４］に示すように、重複エリアの境界からの距離に応じて各画素の明度値に重み付けをして混合する。
【００２８】
【数３】

【００２９】
この式において、ｕ_１、ｕ_２は、それぞれ投影面Ｉ_１上の背景画像の画像データ、投影面Ｉ_２上の背景画像の画像データの明度値であり、ｕ’はパノラマ画像の画像データの明度値である。また、ｘ_Ｌ、ｘ_Ｒは、それぞれ重複部分の左端および右端のｘ座標であり（図６参照）、β＝（ｘ−ｘ_Ｌ）／（ｘ_Ｒ −ｘ_Ｌ）である。
【００３０】
次に、パノラマ画像より、中間視点画像に必要な背景画像データを切り出し、基準となる第１視点から各中間視点への射影変換を行う。変換のための平面射影行列は下記の式［５］で定義される。この平面射影行列Ｈ’を用いて座標を変換することにより、背景の中間視点画像データが得られる。
【００３１】
Ｈ’＝（１−γ）Ｅ＋γＨ_２１ ^−１ …式［５］
【００３２】
ここで、γ（０≦γ≦１）は中間視点の位置を定めるパラメータ、Ｅは３×３の単位行列である。
【００３３】
〔３〕移動領域
移動領域である選手およびボールにはエピポーラ幾何を適用し、異なる視点で得られた画像間の対応点を算出し、仮想視点における補間画像を生成する。
まず、図７を参照しながら、エピポーラ幾何について説明する。互いの視点が異なる２つのカメラＣ_１，Ｃ_２で、３次元空間内の被写体を撮像する。ここで、透視投影によるカメラＣ_１の投影面Ｉ_１上に固定したカメラ座標系を（ｘ_１，ｙ_１）、カメラＣ_２の投影面Ｉ_２上に固定したカメラ座標系を（ｘ_２，ｙ_２）とする。このとき、投影面Ｉ_１、投影面Ｉ_２間で対応のとれている点について、式［６］が成立する。
【００３４】
【数４】

【００３５】
この式における行列はファンダメンタル・マトリックス（以下、Ｆ−マトリックス）と呼ばれ、２台のカメラの相対的な位置や姿勢の情報を含んでいる。投影面Ｉ_１、投影面Ｉ_２間で対応のとれている７点以上の点から、Ｆ−マトリックスを算出することができる。そして、このＦ−マトリックスを用いると、カメラＣ_１の投影面Ｉ_１上で任意の１点に対応するカメラＣ_２の投影面Ｉ_２上の対応点の探索範囲を狭めることができる。投影面Ｉ_１上の１点ｑ_１（ｘ_１，ｙ_１）を与えると、投影面Ｉ_２上にエピポーラ線ａｘ＋ｂｙ＋ｃ＝０を投影することができる。ここで、ａ、ｂ、ｃは下記の式［７］より求めることができる。
【００３６】
【数５】

【００３７】
このとき、ｑ_１（ｘ_１，ｙ_１）に対応する投影面Ｉ_２上の点ｑ_２（ｘ_２，ｙ_２）は、必ずエピポーラ線上に存在する。したがって、対応点の探索はエピポーラ線上のみ行えば良いことになり、探索が容易となる。図において、点Ｄ１、点Ｄ２は、それぞれカメラＣ_１、Ｃ_２のレンズの中心（＝視点）であり、エピポーラ線は、点Ｑ、Ｄ１、Ｄ２を通る平面（エピポーラ平面）と、投影面Ｉ_１、Ｉ_２との交線である。
【００３８】
このようにして、投影面Ｉ_１、投影面Ｉ_２間の画像データの対応点を求めた後、カメラＣ_１の視点とカメラＣ_２の視点の間の任意の仮想視点における補間画像の生成を行う。この補間画像生成の方法は、近景の静止領域と同じである。
以上、自由視点動画像データ生成システムの動作の概要を説明した。次に、システムの動作をさらに詳しく説明する。
【００３９】
本システムの動作時の処理には、送信側装置で実行されるオフライン処理と、受信側装置で実行されるオンライン処理とがある。以下、オフライン処理、オンライン処理の順に説明する。
【００４０】
図８はオフライン処理のフローチャートである。最初に、多視点画像の入力を行う（ステップＡ１）。即ち、カメラ１１_１〜１１_ｎで被写体を撮像して取得した所望の時間分の動画像データをＣＰＵ１４によりデータ圧縮し、ディスクメモリ１６に格納する。
【００４１】
次に、ディスクメモリ１６に格納されたカメラ１１_１〜１１_ｎの動画像データから、隣り合うカメラ間の画像データの射影幾何の推定を行う（ステップＡ２）。具体的には、まずＣＰＵ１４により、ディスクメモリ１６から、カメラ１１_１〜１１_ｎの互いに同じ撮像時刻の１フレームの画像データを読み出し、データ伸長してメインメモリ１５に書き込む。次に、隣り合うカメラの１フレームの画像データをメインメモリ１５から読み出し、モニタ１３に表示しながら、操作者が入力装置１２から対応点の入力を行う。前述したとおり、本実施の形態では、画像を近景の静止領域、遠景の静止領域、および移動領域の３つに分割しており、近景の静止領域では平面射影行列、移動領域ではＦ−マトリックスを用いて対応点を求めるので、ここでは、隣り合うカメラで撮像された１フレーム同士を比較し、近景の静止領域であるグラウンドを構成する１枚の平面とゴールを構成する複数枚の平面について、それぞれ４つ以上の対応点を入力する。また、隣り合うカメラ間のＦ−マトリックスを算出するために、７つ以上の対応点を入力する。ＣＰＵ１４は、入力装置１２から入力された対応点のデータを用い、式［１］、［６］により平面射影行列およびＦ−マトリックスを算出する。この平面射影行列およびＦ−マトリックスは、ディスクメモリ１６に保存される。
【００４２】
次に、ＣＰＵ１４は、ディスクメモリ１６に格納されたカメラ１１_１〜１１_ｎの動画像データを移動領域と静止領域とに分離する（ステップＡ３）。本実施の形態では、各カメラで撮像された動画像データについてカメラ毎に背景差分をとり、２値化することで、全移動領域が抽出されたシルエット画像を生成するとともに、移動領域以外を静止領域とする。ここで、明度データだけでなく、ＲＧＢ成分をも考慮することで、シルエットをより正確に抽出することができる。移動領域の画像データおよび静止領域の画像データはメインメモリ１５およびディスクメモリ１６に記憶される。また、カメラ１１_１〜１１_ｎは固定されているため、静止領域は、近景、遠景ともに全フレーム同じ画像データとなるため、１フレーム分についてのみ記憶すればよい。図９は移動領域を抽出した例を示す。この図において、（ａ）、（ｂ）は異なる視点から撮像された画像データから抽出された移動領域のシルエット画像である。
【００４３】
ステップＡ３で生成されたシルエット画像には、多くの場合、選手数人のシルエットと、ボールのシルエットといった複数のシルエットが混在する。そこで、ステップＡ４において、ラベリング処理を施してシルエットを切り離して個々の選手とボールに分割した後、２視点間でシルエットの対応付けを行う。なお、複数の選手が重なって見える場合（オクルージョン発生）には、正しく分割されているカメラの画像データを参照し、平面射影幾何を用いてシルエットの分割を行う。選手については、まずグラウンドの平面射影行列を用いて対応付けを行う。これは、選手の足がグラウンドに接しているという条件を用いたもので、選手の領域の最下部の点がグラウンドの平面射影行列によって、対応付けられる。選手がジャンプしている状態であっても、それによって生じる誤差は十分に小さいと考える。一方、ボールに関しては、ラベルの面積の一致によって対応付けを行う。隣り合う全てのカメラ間の全てのシルエットについて、画像の構造的特徴情報であるシルエット画像、シルエット画像のラベル番号、ならびにラベルの特徴量（ラベルの重心、選手の足元座標等）、および２つのカメラ間の対応関係情報であるラベル番号対応テーブルをディスクメモリ１６に保存する。
【００４４】
次に、静止領域に関する中間視点画像データを生成し、ディスクメモリ１６に保存する（ステップＡ５）。このとき、操作者は入力装置１２を用いて、隣り合うカメラ毎にその間の任意の中間視点を入力する。ＣＰＵ１４は、メインメモリ１５から静止領域であるグラウンド、ゴール、および観客席の画像データを読み出し、入力された中間視点から仮想的に撮像されたグラウンドおよびゴールの画像データと、観客席の画像データとを生成する。前述したとおり、グラウンドおよびゴールについては、平面射影行列を用いて、隣り合うカメラで撮像された画像データ間の対応点を求め（図４、式［１］）、隣り合うカメラの間の任意の中間視点における画像データを補間して生成する（図５、式［３］）。また、観客席については、モザイク処理と切り出しにより、中間視点における画像データを生成する（図６、式［４］）。
【００４５】
以上がオフライン処理である。次に、オンライン処理について、図１０のフローチャートを参照しながら説明する。なお、オンライン処理を実行する前提として、送信側装置から受信側装置に対して、カメラ１１_１〜１１_ｎの配置情報、射影幾何情報（平面射影行列およびＦ−マトリックス）、および中間視点の位置情報が伝送され、受信側装置のディスクメモリ２５に記憶されているものとする。また、近景の静止領域であるグラウンドならびにゴールの全中間視点における画像データ、および遠景の静止領域である観客席の全中間視点における画像データが、送信側装置から受信側装置へ伝送され、受信側装置のディスクメモリ２５に記憶されているものとする。さらに、撮像時刻情報（開始時刻、終了時刻）が送信側装置から受信側装置へ伝送され、受信側装置のディスクメモリ２５に記憶されているものとする。
【００４６】
オンライン処理がスタートすると、ＣＰＵ２３は、カメラ１１_１〜１１_ｎの配置情報、中間視点の位置情報、および撮像時刻情報をディスクメモリ２５から読み出し、モニタ２２に表示する。この状態において、受信側装置のユーザは、鑑賞したい時間情報および視点位置を入力装置２１を用いて入力する（ステップＢ１）。ここで、時間情報として、例えば撮像開始時刻から起算した時分秒フレームを用いることができる。視点位置は、例えば２つの視点を両端とするスライドバーにより入力することができる。ＣＰＵ２３は、時間情報および視点位置情報をメインメモリ２４に記憶すると共に、通信装置２６経由で送信側装置へ送信する（ステップＢ２）。
【００４７】
送信側装置では、時間情報および視点位置情報が通信装置１７で受信され、ＣＰＵ１４へ送られる。視点位置は隣り合う２つのカメラの間に存在するので、ＣＰＵ１４は、その時間情報が示す時刻にその２つのカメラで撮像された移動領域の動画像データ、その動画像データ構造的特徴情報であるシルエット画像、シルエット画像のラベル番号、ラベルの特徴量（ラベルの重心、選手の足元座標等）、および２つのカメラ間の対応関係情報であるラベル番号対応テーブルをディスクメモリ１６から読み出し、通信装置１７経由で受信側装置へ送信する（ステップＢ３）。
【００４８】
受信側装置では、移動領域の動画像データ、シルエット画像、シルエット画像のラベル番号、ラベルの特徴量、およびラベル番号対応テーブルが通信装置２６で受信され、ディスクメモリ２５に保存される。ＣＰＵ２３は、ディスクメモリ２５に保存された各フレームの移動領域の動画像データ、シルエット画像、シルエット画像のラベル番号、ラベルの特徴量、ラベル番号対応テーブル、および中間視点の位置情報を用いて、フレーム毎に順次移動領域である選手とボールの中間視点画像データを生成し、ディスクメモリ２５に記憶する（ステップＢ４）。ここで、移動領域の動画像データはシルエット画像の色情報を付与するために用いられる。以下、ステップＢ４について詳しく説明する。送信側から送られてきたシルエット画像とラベル番号とラベル番号対応テーブルとにより、隣接視点間のシルエット画像が対応付けられる。次に対応のとれたシルエット画像に対してエピポーラ線を投影して対応点を算出する。図１１は、選手のシルエット画像の対応付けの手順を説明するための図である。ここで、（ａ）はある１つのカメラで撮像された選手のシルエット画像であり、（ｂ）はその右隣のカメラで撮像された選手のシルエット画像である。ここでは、（ａ）、（ｂ）それぞれに３本ずつのエピポーラ線が投影されている。各エピポーラ線において、まずシルエットの両端との交点（図１１ではａ_１とａ_２、ｂ_１とｂ_２、ａ_３とａ_４、ｂ_３とｂ_４、ａ_５とａ_６、ｂ_５とｂ_６）の対応をとり、続いてシルエットの内部に関して交点の線形補間によって対応付けを行う。シルエットの上端から下端にかけて、エピポーラ線を順に投影してゆくことで、シルエット全体の対応点情報を得ることができる。エピポーラ線の間隔を狭くする程、密度の高い対応点情報を得ることができる。このようにして、シルエット全体の対応点情報を取得した後、式［２］を用いて中間視点へのモーフィングを行って、移動領域の中間視点画像データを生成し、ディスクメモリ２５に保存する。
【００４９】
最後に、近景の静止領域であるグラウンドならびにゴールの中間視点における画像データ、および遠景の静止領域である観客席の中間視点における画像データをディスクメモリ２５から読み出し、ディスクメモリ２５から順次読み出したフレーム毎の移動領域の中間視点画像データと合成し、モニタ２２にて表示する（ステップＢ５）。
【００５０】
このように、本発明の第１の実施の形態によれば、静止状態の被写体および移動状態の被写体の双方の中間視点における画像データを受信側で生成することができる。また、静止領域については中間視点における画像データを予めまとめて作成しておき、各フレームでは移動領域の中間視点における画像データを作成し、静止領域の画像データと合成するので、静止領域についても各フレームで画像データを生成する場合と比較すると、１フレーム当たりの処理時間を大幅に短縮することができる。
【００５１】
なお、以上の説明では、中間視点の移動は左右方向に限られていたが、中間視点を前後方向に移動させる（ズーム等）ことも可能である。
【００５２】
（第２の実施の形態）
本発明の第２の実施の形態は、第１の実施の形態において、選手を追跡し、画面の中央等に表示できるように構成した点が特徴である。
【００５３】
本発明の第２の実施の形態は、オフライン処理のみが第１の実施の形態と異なり、送信側および受信側の装置構成、およびオンライン処理は第１の実施と同じである。ただし、あるカメラの画像で選手が重なって見える（オクルージョン）場合にも確実に選手を追跡できるようにするため、図１２に示すように、８台のカメラ１１_１〜１１_８を４台ずつ２群に分け、両サイド側の観客席上方に配置することが好適である。
【００５４】
次に、本実施の形態におけるオフライン処理について説明する。本実施の形態におけるオフライン処理の流れは、ステップＡ３、Ａ４以外は図８に示した第１の実施の形態と同じであるから、その異なる部分について説明する。
【００５５】
図１３は、本実施の形態における移動領域に関する領域分割処理を示すフローチャートである。ここで、（ａ）は各カメラ毎に共通のカメラ内処理であり、（ｂ）はあるカメラの画像では選手の位置が分からないときに、他のカメラの画像を用いて位置情報を取得するためのカメラ間処理である。以下、カメラ内処理、カメラ間処理の順に説明する。
【００５６】
カメラ内処理では、まず、前処理を実行する（ステップＥ１）。具体的には、各カメラで撮像された動画像データについてカメラ毎に背景差分をとり、２値化することで、全移動領域が抽出されたシルエット画像を生成するとともに、移動領域以外を静止領域とする。次に、移動領域のラベリングを行う。この時、特徴量として各ラベルの重心および面積を求めておく。
【００５７】
次に、選手候補領域を選択する（ステップＥ２）。具体的には、前フレームにおける追跡する選手の位置をもとに、現フレームにおいて前処理で抽出された選手のシルエットの中から、追跡する選手候補のシルエットを求める。前フレームにおいて選手が画角内にいた場合には、前フレームで選択された選手のシルエットからの移動距離が最小となるシルエットを選択する。前フレームにおいて選手が画角外にいた場合には、選手候補の選択は行わず、カメラ間処理を用いて選手の位置を求める。
【００５８】
次いで、求められた選手候補のシルエットが他の選手と重なっていないか否かの判定（オクルージョン判定）を行う（ステップＥ３）。この判定は、前フレームと現フレームとで、選手候補のシルエットの面積、および追跡する選手の周りにいる選手の人数を比較することで行う。例えば、前フレームに比較して、現フレームにおいて選手候補シルエットの面積が増加し、かつその周りのラベル数が減少した場合は、現フレームでオクルージョンが発生したと判定する。オクルージョンが発生していないと判定されたカメラの画像データについては、カメラ内処理のみで選手の追跡ができているとし、求められた選手の位置情報を保持する。オクルージョンが発生していると判定されたカメラについては、選手が画角外にいた場合と同様、カメラ間処理により選手の位置を求める。
【００５９】
カメラ間処理では、まず選手位置の推定を行う（ステップＦ１）。エピポーラ幾何により、あるカメラで撮像された画像の画素から、他のカメラで撮像された画像の対応画素へエピポーラ線を引くことができる。よって、図１４に示されているように、２つのカメラの投影面Ｇ_１、Ｇ_２において、カメラ内処理で選手の位置（ラベルの重心）がそれぞれＧ_１１、Ｇ_２１として求められていれば、オクルージョンの発生によりカメラ間処理で選手の位置が求められなかったカメラの投影面Ｇ_３において、求められた２つのそれぞれの位置に対応するエピポーラ線を第１のＦ−マトリックスおよび第２のＦ−マトリックスを用いて引き、それらの交点Ｇ_３１を算出し、その交点を投影面Ｇ_３における選手の位置と推定する。選手が画角から外れている場合についても、同様に選出の位置を推定することができる。なお、このとき用いる２つのカメラは、カメラ間距離が最大となるものを選択することが好適である。その理由は、カメラ間距離が小さいと、２つのカメラの画像データから得られるエピポーラ線の交角が小さくなり、交点にずれが生じてしまうおそれがあるからである。
【００６０】
以上の処理により、全てのカメラ１１_１〜１１_８の画像上で追跡する選手の位置を求めることができる。しかし、オクルージョンの発生した後の追跡や、途中のフレームから選手が現れた場合の追跡では、追跡したい選手とは違う選手を追ってしまうことがある。そこで、より安定した追跡を実行するために、複数のカメラの情報を用いて選手の位置を確認し、その位置情報を保存する（ステップＦ２）。
【００６１】
まず、位置確認の対象であるカメラの投影面Ｋ_１において、ステップＦ１で推定された選手の位置Ｌ_１の座標をもとに、カメラ内処理で選手の位置が求められたカメラの投影面Ｋ_２〜Ｋ_５の画像に対してエピポーラ線Ｍ_２〜Ｍ_５を引く。次に、カメラの投影面Ｋ_２〜Ｋ_５の各々において、カメラ内処理で求められた選手の位置Ｌ_２〜Ｌ_５とエピポーラ線Ｍ_２〜Ｍ_５との距離を算出する。そして、それぞれのカメラの画像において、距離が閾値内であれば、ステップＦ１で推定された選手の位置Ｌ_１の座標をそのまま選手の位置として保存する。投影面Ｋ_２〜Ｋ_５のいくつかにおいて閾値外になった場合は、投影面Ｋ_１において位置Ｌ_１に近い選手から順に同様な処理を行い、最も適当と推定される選手の位置座標（例えば全ての画像が閾値内になる位置）を保存する。
【００６２】
以上の処理により、全てのカメラで撮像された画像データで選手の位置を追跡することができる。これ以降のオフライン処理は、第１の実施の形態と同様である。また、オンライン処理についても第１の実施の形態と同様である。
【００６３】
このように、本発明の第２の実施の形態によれば、あるカメラの画像ではオクルージョンの発生、画角外に存在する等の理由で選手の位置情報が得られない場合にも、オクルージョンが発生していないカメラの画像データを参照して、選手の位置情報を推定し、選手を追跡することができる。また、オクルージョンが発生していない複数のカメラの画像データを参照して推定位置を確認することにより、より安定した追跡が可能になる。
【００６４】
【発明の効果】
以上の説明から明らかなように、本発明によれば、送信側で、被写体を複数の異なる視点から撮像してそれぞれの視点における動画像データを取得し、全ての隣接視点間の射影幾何情報を推定し、前記複数の異なる視点における動画像データを前記複数の異なる視点における静止領域の静止画像データおよび移動領域の動画像データに分離し、前記射影幾何情報を用いて全ての前記隣接視点間の静止画像データの対応付けを行い、さらに全ての前記中間視点における静止画像データをモーフィングにより生成し、前記射影幾何情報を用いて全ての前記隣接視点間の動画像データの対応付けを行い、さらに全ての前記隣接視点における動画像データの構造的特徴情報および対応関係情報を生成し、全ての前記中間視点における自由視点静止画像データおよび全ての前記射影幾何情報を前記受信側へ予め送信し、受信側で、全ての前記中間視点における自由視点静止画像データおよび全ての前記射影幾何情報を予め受信して保存し、隣接視点の間の任意の中間視点を選択し、前記送信側へ通知し、送信側から送られてくる中間視点に対応する隣接視点における移動領域の動画像データ、該動画像データの構造的特徴情報、および前記隣接視点間の動画像データの対応関係情報と予め保存しておいた射影幾何情報とを用いて前記任意の中間視点における前記移動領域の動画像データをモーフィングにより生成し、前記予め保存しておいた中間視点における静止画像データと合成することにより、静止状態の被写体および移動状態の被写体の双方の中間視点における動画像データを受信側で生成することができる。
【００６５】
また、本発明によれば、送信側で、被写体を複数の異なる視点から撮像してそれぞれの視点における動画像データを取得し、全ての隣接視点間の射影幾何情報を推定し、前記複数の異なる視点における動画像データを前記複数の異なる視点の静止領域の静止画像データおよび移動領域の動画像データに分離し、前記射影幾何情報を用いて全ての前記隣接視点間の静止画像データの対応付けを行い、さらに全ての前記中間視点における静止画像データをモーフィングにより生成し、前記移動領域の任意の被写体を追跡対象として選択し、前記射影幾何情報を用いて全ての前記隣接視点間の追跡対象の被写体の動画像データの対応付けを行い、さらに全ての前記隣接視点における追跡対象の被写体の動画像データの構造的特徴情報および対応関係情報を生成し、全ての前記中間視点における自由視点静止画像データおよび全ての前記射影幾何情報を前記受信側へ予め送信し、受信側で、全ての前記中間視点における自由視点静止画像データおよび全ての射影幾何情報を予め受信して保存し、隣接視点の間の任意の中間視点を選択し、前記送信側へ通知し、送信側から送られてくる中間視点に対応する隣接視点における追跡対象の被写体の動画像データ、該動画像データの構造的特徴情報、および前記隣接視点間の動画像データの対応関係情報と予め保存しておいた射影幾何情報とを用いて前記任意の中間視点における前記追跡対象の被写体の動画像データをモーフィングにより生成し、前記予め保存しておいた中間視点における静止画像データと合成することにより、静止状態の被写体および移動状態の被写体の双方の中間視点における画像データを受信側で生成することができ、かつ移動状態の被写体を追跡し、画面の中央等に表示することができる。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態の自由視点動画像データ生成システムのブロック図、
【図２】本発明の第１の実施の形態におけるカメラの配置を示す図、
【図３】本発明の第１の実施の形態における画像の領域を示す図、
【図４】平面射影行列について説明するための図、
【図５】補間画像の生成方法について説明するための図、
【図６】モザイク処理について説明するための図、
【図７】エピポーラ幾何について説明するための図、
【図８】本発明の第１の実施の形態におけるオフライン処理のフローチャート、
【図９】移動領域を抽出した例を示す図、
【図１０】本発明の第１の実施の形態におけるオンライン処理のフローチャート、
【図１１】シルエットの対応付けを説明するための図、
【図１２】本発明の第２の実施の形態におけるカメラの配置を示す図、
【図１３】本発明の第２の実施の形態における移動領域に関する領域分割処理を示すフローチャート、
【図１４】カメラ間処理における選手位置の推定について説明するための図、
【図１５】カメラ間処理における選手位置の推定について説明するための図である。
【符号の説明】
１１カメラ
１２、２１入力装置
１３、２２モニタ
１４、２３ＣＰＵ
１５、２４メインメモリ
１６、２５ディスクメモリ
１７、２６通信装置
１８、２７バス[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a system that captures and transmits a moving image of a subject in a three-dimensional space from multiple viewpoints, and more particularly, stores moving image data in multiple viewpoints in a transmission-side device, and stores the data between adjacent viewpoints. And a method of generating moving image data at an arbitrary intermediate viewpoint by a device on the receiving side.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, as a system for capturing an image of a subject in a three-dimensional space from multiple viewpoints and transmitting image data, there is a system described in Patent Literature 1. In this system, a transmitting side captures an image of a subject from a plurality of different viewpoints, creates structural data of the captured image based on the captured image data, transmits the image data and the structural data, and a receiving side. , A new viewpoint is designated, and image data at the new viewpoint can be generated from the image data and structure data at the multiple viewpoints.
[0003]
[Patent Document 1]
JP 2001-8231 A
[0004]
[Problems to be solved by the invention]
However, the system described in Patent Document 1 corresponds to a subject in a stationary state as shown in FIGS. 2 and 5 of Patent Document 1, and mentions a subject in a moving state. It has not been.
[0005]
Therefore, the present invention provides a free viewpoint moving image data generation method capable of generating image data at an arbitrary intermediate viewpoint between all adjacent viewpoints on a receiving side for both a stationary object and a moving object. The purpose is to provide.
[0006]
In addition, the present invention can generate image data at an arbitrary intermediate viewpoint between all adjacent viewpoints for both a stationary object and a moving object on the receiving side, and track the moving object. It is another object of the present invention to provide a free viewpoint moving image data generation method that can be displayed at the center of a screen or the like.
[0007]
[Means for Solving the Problems]
The free viewpoint moving image data generation method of the present invention captures moving image data by capturing a subject in a three-dimensional space from a plurality of different viewpoints on a transmitting side, and obtains moving image data at an arbitrary intermediate viewpoint between adjacent viewpoints on a receiving side. A free viewpoint moving image data generating method for generating moving image data of the subject, the method comprising: dividing moving image data obtained by imaging the subject from a plurality of different viewpoints into a moving region and a still region; Estimating projected geometric information between the adjacent viewpoints; generating free viewpoint still image data at all of the intermediate viewpoints based on the projected geometric information with respect to the image data of the stationary region; , Based on the projection geometric information, structural feature information of the moving image data, and correspondence information of the moving image data between all the adjacent viewpoints. Generating on a transmitting side the free viewpoint still image data and all the projective geometric information in all the intermediate viewpoints in advance on the receiving side, and executing a free viewpoint still in all the intermediate viewpoints. Performing, on the receiving side, the steps of pre-receiving and storing image data and all of the projection geometric information, and selecting any intermediate viewpoint between the adjacent viewpoints, and notifying the transmitting side. The moving image data of the moving area in the adjacent viewpoint corresponding to the intermediate viewpoint notified from the receiving side, the structural feature information of the moving image data, and the correspondence information of the moving image data between the adjacent viewpoints are transmitted to the receiving side. Performing on the transmitting side, the moving image data of the moving area, the structural characteristic information of the moving image data, and the moving image data between the adjacent viewpoints. Receiving the corresponding relationship information and generating moving image data of the moving area at the arbitrary intermediate viewpoint, and reading out the pre-stored free viewpoint still image data at the arbitrary intermediate viewpoint, And generating the moving image data at the arbitrary intermediate viewpoint by combining the moving image data with the moving image data of the moving area at the intermediate viewpoint.
[0008]
With this configuration, on the transmission side, the subject is imaged from a plurality of different viewpoints, moving image data at each viewpoint is obtained, projection geometric information between adjacent viewpoints is estimated, and the Separating the moving image data into the still image data of the still region and the moving image data of the moving region at the plurality of different viewpoints, and associating the still image data between the adjacent viewpoints with the projective geometric information, Generating still image data at an arbitrary intermediate viewpoint between adjacent viewpoints by morphing, associating moving image data between the adjacent viewpoints using the projective geometric information, and further configuring the structure of the moving image data at the adjacent viewpoint. Generating free feature still image data and all the projections at all of the intermediate viewpoints What information is pre-transmitted to the receiving side, and at the receiving side, the free viewpoint still image data and all the projection geometric information at all the intermediate viewpoints are previously received and stored, and any intermediate viewpoint between adjacent viewpoints is stored. And notifies the transmitting side, the moving image data of the moving area in the adjacent viewpoint corresponding to the intermediate viewpoint sent from the transmitting side, the structural feature information of the moving image data, and the Using the correspondence information of the moving image data and the previously stored projection geometric information, the moving image data of the moving area at the arbitrary intermediate viewpoint is generated by morphing, and the moving image data at the intermediate viewpoint stored in the previously stored intermediate viewpoint is generated. By combining with still image data, it is possible to generate moving image data at an intermediate viewpoint of both a still object and a moving object.
[0009]
Further, in the free viewpoint moving image data generation method of the present invention, a step of selecting an arbitrary subject in the moving area as a tracking target for each of the plurality of different viewpoints, and a step of storing position information of the selected subject Are executed on the transmitting side.
[0010]
With this configuration, on the transmission side, the subject is imaged from a plurality of different viewpoints, moving image data at each viewpoint is obtained, projection geometric information between adjacent viewpoints is estimated, and the Separating the moving image data into the still image data of the still region and the moving image data of the moving region at the plurality of different viewpoints, and associating the still image data between the adjacent viewpoints with the projective geometric information, Generating still image data at any intermediate viewpoint between adjacent viewpoints by morphing, selecting any subject in the moving area as a tracking target, and using the projective geometric information to track a subject to be tracked between the adjacent viewpoints The moving image data is associated, and the structural feature information and the correspondence of the moving image data of the subject to be tracked at the adjacent viewpoint Generating engagement information, pre-transmitting free viewpoint still image data at all the intermediate viewpoints and all the projection geometric information to the receiving side, and at the receiving side, free viewpoint still image data at all the intermediate viewpoints and all Receiving and storing the projective geometric information in advance, selecting an arbitrary intermediate viewpoint between adjacent viewpoints, notifying the transmitting side, and tracking the adjacent viewpoint corresponding to the intermediate viewpoint sent from the transmitting side. The arbitrary intermediate viewpoint using the moving image data of the target subject, the structural feature information of the moving image data, and the correspondence information of the moving image data between the adjacent viewpoints and the projection geometric information stored in advance. By generating morphing moving image data of the subject to be tracked by morphing and combining with the previously stored still image data at the intermediate viewpoint, It generates moving image data in both of the intermediate perspectives of an object and the tracked object, and track the subject moving state can be displayed in the center or the like of the screen.
[0011]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0012]
(First Embodiment)
FIG. 1 is a block diagram of a free viewpoint moving image data generation system according to the first embodiment of this invention. Here, (a) is a transmitting device, and (b) is a receiving device.
The transmitting apparatus includes n (n is an integer of 2 or more) cameras 11 each connected to the bus 18.₁, 11₂, ... 11_n, An input device 12, a monitor 13, a CPU 14, a main memory 15, a disk memory 16, and a communication device 17.
[0013]

Camera

11₁, 11₂, ... 11_nEach captures a subject from a different viewpoint and generates moving image data. Here, the

camera

11₁, 11₂, 11_nIs fixed without performing any of pan, tilt, zoom-in, and zoom-out during imaging. The input device 12 is a mouse, a keyboard, or the like, and is used for inputting a user's command or the like. The monitor 13 is composed of a liquid crystal display or the like.₁, 11₂, ... 11_nThe moving image data, the image data read from the main memory 15, the image data read from the disk memory 16, and the like are displayed. The CPU 14 has a microprocessor and executes various processes according to a program stored in the main memory 15. The main memory 15 includes a ROM and a RAM, and stores a program used when the CPU 14 executes various processes. Also, the

camera

11₁, 11₂, ... 11_n, Image data read from the disk memory 16 and the like are temporarily stored. The disk memory 16 includes a hard disk device or the like, and₁, 11₂, ... 11_nIs stored. Also, the

camera

11₁, 11₂, 11_nThe data (details will be described later) representing the positional correspondence between the moving images captured in the step (a) is stored. The communication device 17 performs data communication with the communication device of the receiving device.
[0014]
The receiving-side device includes an input device 21, a monitor 22, a CPU 23, a main memory 24, a disk memory 25, and a communication device 26, each of which is connected to a bus 27.
[0015]
The input device 21 is a mouse, a keyboard, or the like, and is used for inputting a user's command or the like. The monitor 22 is composed of a liquid crystal display or the like, and displays data input from the input device 21, image data read from the main memory 24, image data read from the disk memory 25, and the like. The CPU 23 includes a microprocessor, and executes various processes according to a program stored in the main memory 24. The main memory 24 includes a ROM and a RAM, and stores a program used when the CPU 23 executes various processes. Further, image data and the like read from the disk memory 25 are temporarily stored. The disk memory 25 is composed of a hard disk device, and stores moving image data sent from the transmitting device. In addition, the camera 11 transmitted from the transmitting apparatus₁, 11₂, ... 11_nThe data (details will be described later) representing the positional correspondence between the moving images captured in the step (a) is stored. The communication device 26 performs data communication with the communication device 17 of the transmitting device. Here, the communication medium between the communication device 17 of the transmitting device and the communication device 26 of the receiving device is the Internet, a LAN, a terrestrial broadcasting network (digital broadcasting capable of transmitting request data and the like from a user to a broadcasting station). ).
[0016]
The operation of the free viewpoint moving image data generation system configured as described above will be described. First, an outline will be described. Here, the subjects are a stadium, a player, and a ball playing a soccer match. Then, as shown in FIG.₁, 11₂, 113, 114 are arranged above the audience seats. In addition, a subject such as that shown in FIG. 3A is divided into a ground and a goal (FIG. 3B), a spectator seat (FIG. 3C), a player and a ball (FIG. 3D). And divided into As features of the image, the ground and the goal are a near-field still area, the spectator seat is a distant still area, and the players and the ball are moving areas. Then, a corresponding point between images obtained from adjacent viewpoints is calculated by applying a planar projection matrix (Homography matrix) to the ground and the goal, which are stationary regions in the foreground, and an arbitrary intermediate viewpoint between the adjacent viewpoints is calculated. Then, an interpolation image at the virtual viewpoint is generated. Here, the intermediate viewpoint means a viewpoint between two viewpoints, and does not mean a central viewpoint between the two viewpoints. Epipolar geometry is applied to a player and a ball that are moving areas, corresponding points between images obtained from adjacent viewpoints are calculated, and an interpolation image at a virtual viewpoint is generated. For the spectator seat, which is a still area in the distant view, mosaic processing is performed to connect the images at the two viewpoints, and an image at the virtual viewpoint is cut out from the generated panoramic image. Finally, a moving image of the entire subject at the virtual viewpoint is generated by combining the three types of images.
[0017]
Next, a description will be given in the order of a near-area still area, a far-area still area, and a moving area. The separation between the still region and the moving region of the foreground is performed by background difference and binarization of image data (details will be described later).
[0018]
[1] Static area in the foreground
In the foreground still region, a plane projection matrix is obtained, and corresponding points between images obtained from different viewpoints are calculated using the plane projection matrix to generate an interpolation image at a virtual viewpoint.
[0019]
First, the plane projection matrix will be described with reference to FIG. Two cameras C with different viewpoints₁, C₂Then, an object in a three-dimensional space is imaged. Here, camera C by perspective projection₁Projection plane I₁The camera coordinate system fixed above is (x₁, Y₁), Camera C₂Projection plane I₂The camera coordinate system fixed above is (x₂, Y₂). At this time, Expression [1] holds for a point P on the plane J in the three-dimensional space. The matrix in this equation is called a plane projection matrix. The perspective projection is described in, for example, "Image Processing Engineering" by Ryoichi Suematsu et al. 182-184, 2000-10-26, Corona Co., Ltd. ", and the description is omitted.
[0020]
(Equation 1)

Each element of the plane projection matrix can be obtained using the four corresponding points on the plane J. Then, using this plane projection matrix, the camera C₁Projection plane I₁Above, any one point (x₁, Y₁) By giving the coordinates, the camera C₂Projection plane I₂(X₂, Y₂) The coordinates can be calculated. Therefore, for all points on plane J, (x₁, Y₁) Coordinates and corresponding (x₂, Y₂) The value of the coordinates can be calculated. In the present embodiment, the ground shown in FIG. 3B is defined as one plane, and the goals are defined as a plurality of planes (for example, four planes), and a plane projection matrix is obtained for each plane.
[0021]
Next, referring to FIG.₁Viewpoint and camera C₂A method of generating an interpolated image at an arbitrary virtual viewpoint between the viewpoints will be described. In FIG. 5, camera C₁Viewpoint and camera C₂Virtual camera C that captures images at a virtual viewpoint between the viewpoints₁₂Generates image data of the captured image. In the present embodiment, morphing by linear interpolation is performed. Camera C₁Viewpoint, camera C₂Virtual camera C from the viewpoint of₁The ratio of the distance to 2 is α: 1−α (where 0 ≦ α ≦ 1), and the virtual camera C₁₂Projection plane I₁₂The camera coordinate system fixed above is (x₁₂, Y₁₂), The projection plane I₁₂The coordinates of the upper point can be represented by the following equation [2].
[0022]
P₁₂= (1-α) P₁+ ΑP₂… Equation [2]
[0023]
Where P₁₂, P₁, P₂Respectively represent the projection plane I₁₂Top, projection plane I₁Top, projection plane I₂Represents the position vector of the coordinates of the corresponding point above. Camera C if α = 0₁Same viewpoint as camera C when α = 1₂Camera C when α = 0.5₁And camera C₂Is the central point of view. In the present embodiment, the camera C₁Morphing (warp 1 in Fig. 5) and camera C₂Morphing from two viewpoints (warp 2 in FIG. 5) is performed to generate two different interpolated images, and the following equation [3] is applied to them to generate image data at the virtual viewpoint.
[0024]
(Equation 2)

[0025]
In this equation, v₁, V₂Is the projection plane I₁Top, projection plane I₂Is the brightness value of the image data above, where v 'is the projection plane I₁₂This is the brightness value of the upper image data.
[0026]
[2] Still area in distant view
In the present embodiment, an area such as a spectator seat, which is sufficiently far from the camera and in which unevenness of the camera itself can be neglected, is set as a distant view area. . In addition, in the case of a distant view area, since there are few areas common to two adjacent viewpoints, if the same morphing processing is performed on a near view still area, the calculation efficiency is low. An image at the virtual viewpoint is cut out from the connected panorama image.
[0027]
The mosaic processing will be described with reference to FIG. First, the projection plane I of the camera having the first viewpoint₁ Image data of the upper background image and the projection plane I of the camera having the second viewpoint₂A plane projection matrix H between the image data of the above background image and₂₁, And the plane projection matrix H₂₁Are used to unify the coordinate systems of the image data at the two viewpoints. Next, the two image data are concatenated to generate panoramic image data. The overlapping area in FIG.₂₁Is automatically determined when is determined. When generating panoramic image data, the brightness of the overlapping area is smoothed. However, if an average value of the brightness of two image data is simply used, an unnatural seam is formed due to a difference in brightness between the image data. Therefore, as shown in the following equation [4], the brightness value of each pixel is weighted and mixed according to the distance from the boundary of the overlapping area.
[0028]
(Equation 3)

[0029]
In this equation, u₁, U₂Is the projection plane I₁Image data of upper background image, projection plane I₂U 'is the brightness value of the image data of the panoramic image, and u' is the brightness value of the image data of the upper background image. Also, x_L, X_RAre the x coordinates of the left end and the right end of the overlapping portion, respectively (see FIG. 6), and β = (xx_L) / (X_R -X_L).
[0030]
Next, background image data necessary for the intermediate viewpoint image is cut out from the panoramic image, and projection transformation from the first viewpoint as a reference to each intermediate viewpoint is performed. The plane projection matrix for the conversion is defined by the following equation [5]. By converting coordinates using the plane projection matrix H ', intermediate viewpoint image data of the background can be obtained.
[0031]
H ′ = (1−γ) E + γH₂₁ ^-1 … Equation [5]
[0032]
Here, γ (0 ≦ γ ≦ 1) is a parameter for determining the position of the intermediate viewpoint, and E is a 3 × 3 unit matrix.
[0033]
[3] Moving area
Epipolar geometry is applied to a player and a ball that are moving areas, corresponding points between images obtained from different viewpoints are calculated, and an interpolation image at a virtual viewpoint is generated.
First, the epipolar geometry will be described with reference to FIG. Two cameras C with different viewpoints₁, C₂Then, an object in a three-dimensional space is imaged. Here, camera C by perspective projection₁Projection plane I₁The camera coordinate system fixed above is (x₁, Y₁), Camera C₂Projection plane I₂The camera coordinate system fixed above is (x₂, Y₂). At this time, the projection plane I₁, Projection plane I₂Equation [6] holds true for the points that can be taken into account.
[0034]
(Equation 4)

[0035]
The matrix in this equation is called a fundamental matrix (hereinafter, F-matrix) and includes information on the relative positions and postures of the two cameras. Projection plane I₁, Projection plane I₂An F-matrix can be calculated from seven or more points that are compatible between them. Then, using this F-matrix, the camera C₁ Projection plane I₁Camera C corresponding to any one point above₂Projection plane I₂The search range for the corresponding point above can be narrowed. Projection plane I₁One point q on₁(X₁, Y₁) Gives the projection plane I₂The epipolar line ax + by + c = 0 can be projected on top. Here, a, b, and c can be obtained from the following equation [7].
[0036]
(Equation 5)

[0037]
At this time, q₁(X₁, Y₁Projection plane I corresponding to₂Upper point q₂(X₂, Y₂) Always exists on the epipolar line. Therefore, the search for the corresponding point only needs to be performed on the epipolar line, and the search becomes easy. In the figure, a point D1 and a point D2 correspond to the camera C, respectively.₁, C₂Is the center of the lens (= viewpoint), and the epipolar line is a plane passing through points Q, D1, and D2 (epipolar plane) and a projection plane I₁, I₂Is the line of intersection with
[0038]
Thus, the projection plane I₁, Projection plane I₂After finding the corresponding points of the image data between₁Viewpoint and camera C₂Generate an interpolation image at an arbitrary virtual viewpoint between the viewpoints. The method of generating the interpolated image is the same as that of the near-field still region.
The outline of the operation of the free viewpoint moving image data generation system has been described above. Next, the operation of the system will be described in more detail.
[0039]
The processing at the time of operation of the present system includes an off-line processing executed by the transmitting apparatus and an on-line processing executed by the receiving apparatus. Hereinafter, the offline processing and the online processing will be described in this order.
[0040]
FIG. 8 is a flowchart of the offline processing. First, a multi-viewpoint image is input (step A1). That is, the camera 11₁~ 11_nThe CPU 14 compresses the moving image data for a desired time acquired by capturing an image of a subject by the CPU 14 and stores the data in the disk memory 16.
[0041]
Next, the camera 11 stored in the disk memory 16₁~ 11_nThe projection geometry of the image data between the adjacent cameras is estimated from the moving image data (step A2). Specifically, first, the CPU 11 stores the camera 11 from the disk memory 16.₁~ 11_nThe image data of one frame at the same imaging time is read out, decompressed and written to the main memory 15. Next, an operator inputs corresponding points from the input device 12 while reading out one frame of image data of an adjacent camera from the main memory 15 and displaying the image data on the monitor 13. As described above, in the present embodiment, an image is divided into three parts: a near-area still area, a distant still area, and a moving area. A planar projection matrix is used for a near-area still area, and an F-matrix is used for a moving area. Here, the corresponding points are obtained using one frame. Here, one frame imaged by an adjacent camera is compared with each other, and one plane constituting a ground, which is a near-field still area, and a plurality of planes constituting a goal, Input four or more corresponding points respectively. In addition, seven or more corresponding points are input to calculate an F-matrix between adjacent cameras. The CPU 14 uses the data of the corresponding points input from the input device 12 to calculate a plane projection matrix and an F-matrix by equations [1] and [6]. The plane projection matrix and the F-matrix are stored in the disk memory 16.
[0042]
Next, the CPU 14 operates the camera 11 stored in the disk memory 16.₁~ 11_nIs separated into a moving area and a still area (step A3). In the present embodiment, a background image is taken for each camera for the moving image data captured by each camera and binarized to generate a silhouette image in which the entire moving region is extracted, and to make a still image other than the moving region stationary. Area. Here, the silhouette can be more accurately extracted by considering not only the brightness data but also the RGB components. The image data of the moving area and the image data of the still area are stored in the main memory 15 and the disk memory 16. Also, the camera 11₁~ 11_nIs fixed, so that the still area has the same image data in all the frames in both the near view and the distant view, so that only one frame needs to be stored. FIG. 9 shows an example of extracting a moving area. In this figure, (a) and (b) are silhouette images of a moving region extracted from image data captured from different viewpoints.
[0043]
In many cases, the silhouette image generated in step A3 includes a plurality of silhouettes such as a silhouette of several players and a ball silhouette. Therefore, in step A4, the silhouette is separated by performing labeling processing and divided into individual players and balls, and then the silhouette is associated between two viewpoints. When a plurality of players appear to overlap (occurrence occurs), the silhouette is divided using planar projection geometry with reference to the image data of the correctly divided camera. The players are first associated with each other by using a ground plane projection matrix. This is based on the condition that the player's feet are in contact with the ground, and the lowest point of the player's area is associated with the plane projection matrix of the ground. Even when the player is jumping, the error caused by the jump is considered to be sufficiently small. On the other hand, the balls are associated by matching the label areas. For all silhouettes between all adjacent cameras, a silhouette image, which is structural feature information of the image, a label number of the silhouette image, and a feature amount of the label (the center of gravity of the label, the coordinates of the feet of the players, etc.), and the two cameras A label number correspondence table, which is correspondence information between the two, is stored in the disk memory 16.
[0044]
Next, intermediate viewpoint image data relating to the still area is generated and stored in the disk memory 16 (step A5). At this time, the operator uses the input device 12 to input an arbitrary intermediate viewpoint between adjacent cameras. The CPU 14 reads the image data of the ground, the goal, and the audience seats, which are still areas, from the main memory 15, and the image data of the ground and the goal virtually captured from the input intermediate viewpoint, and the image data of the audience seats. Generate As described above, for the ground and the goal, the corresponding points between the image data captured by the adjacent cameras are obtained using the plane projection matrix (FIG. 4, equation [1]), and the arbitrary points between the adjacent cameras are obtained. The image data at the intermediate viewpoint is generated by interpolation (FIG. 5, equation [3]). For the audience seats, image data at the intermediate viewpoint is generated by mosaic processing and clipping (FIG. 6, equation [4]).
[0045]
The above is the offline processing. Next, the online processing will be described with reference to the flowchart of FIG. Note that the premise of executing the online processing is that the transmitting apparatus transmits a camera 11 to the receiving apparatus.₁~ 11_nIt is assumed that the arrangement information, the projection geometric information (the plane projection matrix and the F-matrix), and the position information of the intermediate viewpoint are transmitted and stored in the disk memory 25 of the receiving apparatus. In addition, image data at all intermediate viewpoints of the ground and the goal, which is a still region of the near view, and image data at all intermediate viewpoints of the audience seat, which is a still region of the distant view, are transmitted from the transmitting device to the receiving device. It is assumed that it is stored in the disk memory 25 of the apparatus. Further, it is assumed that the imaging time information (start time, end time) is transmitted from the transmitting device to the receiving device and stored in the disk memory 25 of the receiving device.
[0046]
When the online processing starts, the CPU 23₁~ 11_nIs read from the disk memory 25, and displayed on the monitor 22. In this state, the user of the reception-side device inputs time information and a viewpoint position to be viewed using the input device 21 (step B1). Here, as the time information, for example, an hour-minute-second frame calculated from the imaging start time can be used. The viewpoint position can be input by, for example, a slide bar having two viewpoints at both ends. The CPU 23 stores the time information and the viewpoint position information in the main memory 24, and transmits the time information and the viewpoint position information to the transmitting device via the communication device 26 (step B2).
[0047]
In the transmitting device, the time information and the viewpoint position information are received by the communication device 17 and sent to the CPU 14. Since the viewpoint position exists between two adjacent cameras, the CPU 14 determines the moving image data of the moving area captured by the two cameras at the time indicated by the time information, and the moving image data structural feature information. A silhouette image, a label number of the silhouette image, a label feature amount (a center of gravity of the label, coordinates of the feet of a player, and the like), and a label number correspondence table that is information on a correspondence relationship between the two cameras are read from the disk memory 16, and the communication device 17. The data is transmitted to the receiving-side device via (step B3).
[0048]
In the receiving device, the moving image data of the moving area, the silhouette image, the label number of the silhouette image, the label feature amount, and the label number correspondence table are received by the communication device 26 and stored in the disk memory 25. The CPU 23 uses the moving image data of the moving area of each frame stored in the disk memory 25, the silhouette image, the label number of the silhouette image, the label feature amount, the label number correspondence table, and the position information of the intermediate viewpoint to generate the frame. Intermediate viewpoint image data of a player and a ball, which are moving areas, is sequentially generated for each time and stored in the disk memory 25 (step B4). Here, the moving image data of the moving area is used for giving color information of the silhouette image. Hereinafter, step B4 will be described in detail. A silhouette image between adjacent viewpoints is associated with the silhouette image, the label number, and the label number correspondence table sent from the transmission side. Next, an epipolar line is projected on the corresponding silhouette image to calculate a corresponding point. FIG. 11 is a diagram for explaining a procedure of associating a silhouette image of a player. Here, (a) is a silhouette image of a player captured by a certain camera, and (b) is a silhouette image of a player captured by a camera on the right side thereof. Here, three epipolar lines are projected on each of (a) and (b). In each epipolar line, first, the intersection with the both ends of the silhouette (in FIG. 11, a₁And a₂, B₁And b₂, A₃And a₄, B₃And b₄, A₅And a₆, B₅And b₆), And then the inside of the silhouette is associated by linear interpolation of intersections. By sequentially projecting epipolar lines from the upper end to the lower end of the silhouette, corresponding point information of the entire silhouette can be obtained. The narrower the interval between epipolar lines, the higher the density of corresponding point information can be obtained. After acquiring the corresponding point information of the entire silhouette in this way, morphing to the intermediate viewpoint is performed using Expression [2], and intermediate viewpoint image data of the moving area is generated and stored in the disk memory 25.
[0049]
Lastly, the image data at the intermediate viewpoint of the ground and the goal, which is a still view in the near view, and the image data at the intermediate viewpoint of the spectator seat, which is the still region of the distant view, are read from the disk memory 25. Is synthesized with the intermediate viewpoint image data of the moving area of the moving area and displayed on the monitor 22 (step B5).
[0050]
As described above, according to the first embodiment of the present invention, it is possible to generate the image data at the intermediate viewpoint of both the stationary subject and the moving subject at the intermediate viewpoint. In addition, for the still region, image data at the intermediate viewpoint is created in advance, and in each frame, image data at the intermediate viewpoint of the moving region is created and combined with image data of the still region. Compared with the case where image data is generated in frames, the processing time per frame can be significantly reduced.
[0051]
In the above description, the movement of the intermediate viewpoint is limited to the left and right directions. However, the intermediate viewpoint can be moved in the front and rear direction (zoom or the like).
[0052]
(Second embodiment)
The second embodiment of the present invention is characterized in that, in the first embodiment, a player is tracked and can be displayed at the center of the screen or the like.
[0053]
The second embodiment of the present invention differs from the first embodiment only in the offline processing, and the device configurations on the transmitting side and the receiving side and the online processing are the same as those in the first embodiment. However, in order to ensure that the players can be tracked even when the players seem to overlap (occlusion) in the image of a certain camera, as shown in FIG.₁~ 11₈Are divided into two groups of four, and are preferably arranged above the audience seats on both sides.
[0054]
Next, offline processing according to the present embodiment will be described. The flow of the offline processing according to the present embodiment is the same as that of the first embodiment shown in FIG. 8 except for steps A3 and A4, and therefore different parts will be described.
[0055]
FIG. 13 is a flowchart showing an area dividing process for a moving area in the present embodiment. Here, (a) is an in-camera process common to each camera, and (b) acquires position information using an image of another camera when the position of a player is not known from an image of one camera. This is an inter-camera process. Hereinafter, the in-camera processing and the inter-camera processing will be described in this order.
[0056]
In the in-camera processing, first, pre-processing is executed (step E1). More specifically, a background image is taken for each camera for the moving image data captured by each camera and binarized to generate a silhouette image in which the entire moving region is extracted, and to generate a silhouette image other than the moving region as a static region. And Next, labeling of the moving area is performed. At this time, the center of gravity and the area of each label are obtained as feature amounts.
[0057]
Next, a player candidate area is selected (step E2). Specifically, based on the positions of the players to be tracked in the previous frame, the silhouettes of the player candidates to be tracked are obtained from the silhouettes of the players extracted in the preprocessing in the current frame. If the player is within the angle of view in the previous frame, a silhouette that minimizes the moving distance from the silhouette of the player selected in the previous frame is selected. If the player is out of the angle of view in the previous frame, no player candidate is selected, and the position of the player is obtained using inter-camera processing.
[0058]
Next, it is determined whether or not the obtained silhouette of the candidate player does not overlap another player (occlusion determination) (step E3). This determination is made by comparing the area of the silhouette of the candidate player and the number of players around the player to be tracked between the previous frame and the current frame. For example, when the area of the player candidate silhouette increases in the current frame and the number of labels around the candidate frame decreases as compared with the previous frame, it is determined that occlusion has occurred in the current frame. Regarding the image data of the camera determined that occlusion has not occurred, it is assumed that the player can be tracked only by processing in the camera, and the obtained position information of the player is held. For a camera determined to have occlusion, the position of the player is determined by camera-to-camera processing as in the case where the player is out of the angle of view.
[0059]
In the camera-to-camera processing, the position of the player is first estimated (step F1). Epipolar geometry allows an epipolar line to be drawn from a pixel of an image captured by one camera to a corresponding pixel of an image captured by another camera. Therefore, as shown in FIG. 14, the projection plane G of the two cameras₁, G₂In the processing in the camera, the position of the player (the center of gravity of the label) is G₁₁, G₂₁, The position of the player cannot be determined in the inter-camera processing due to the occurrence of occlusion.₃ , An epipolar line corresponding to each of the two determined positions is drawn using the first F-matrix and the second F-matrix, and their intersection G₃₁Is calculated, and the intersection is defined as the projection plane G₃Is estimated to be the position of the player. Even when the player is out of the angle of view, the position of the selection can be similarly estimated. It is preferable that the two cameras used at this time are selected so as to maximize the distance between the cameras. The reason is that if the distance between the cameras is small, the intersection angle of the epipolar lines obtained from the image data of the two cameras becomes small, and there is a possibility that the intersection may shift.
[0060]
By the above processing, all the cameras 11₁~ 11₈The position of the player to be tracked on the image can be obtained. However, when tracking after an occlusion occurs or when a player appears from an intermediate frame, a player who is different from the player to be tracked may be tracked. Therefore, in order to execute more stable tracking, the position of the player is confirmed using information of a plurality of cameras, and the position information is stored (step F2).
[0061]
First, the projection plane K of the camera whose position is to be confirmed₁ , The position L of the player estimated in step F1₁Projection plane K of the camera, where the position of the player has been determined by the in-camera processing based on the coordinates of₂~ K₅The epipolar line M for the image of₂~ M₅pull. Next, the projection plane K of the camera₂~ K₅Of each player, the position L of the player determined by the in-camera processing₂~ L₅And epipolar line M₂~ M₅Is calculated. If the distance is within the threshold value in each camera image, the position L of the player estimated at step F1₁Is stored as the position of the player. Projection plane K₂~ K₅In some cases, the projection plane K₁At position L₁The same processing is performed in order from the player closest to, and the position coordinates of the player estimated to be most appropriate (for example, the position where all images fall within the threshold) are stored.
[0062]
Through the above processing, the position of the player can be tracked by the image data captured by all the cameras. Subsequent offline processing is the same as in the first embodiment. Further, the online processing is the same as in the first embodiment.
[0063]
As described above, according to the second embodiment of the present invention, even when the position information of the player cannot be obtained due to the occurrence of occlusion and the presence of the player outside the angle of view in the image of a certain camera, the occlusion can be prevented. The position information of the player can be estimated by referring to the image data of the camera that has not occurred, and the player can be tracked. Further, by confirming the estimated position by referring to image data of a plurality of cameras in which occlusion has not occurred, more stable tracking can be performed.
[0064]
【The invention's effect】
As is apparent from the above description, according to the present invention, on the transmitting side, a subject is imaged from a plurality of different viewpoints, moving image data at each viewpoint is obtained, and projection geometric information between all adjacent viewpoints is obtained. Estimating, separating the moving image data at the plurality of different viewpoints into the still image data of the still region and the moving image data of the moving region at the plurality of different viewpoints, and using the projection geometric information between all the adjacent viewpoints. Perform still image data correspondence, generate still image data at all the intermediate viewpoints by morphing, perform moving image data correspondence between all the adjacent viewpoints using the projective geometric information, and further perform Generating the structural feature information and the correspondence information of the moving image data at the adjacent viewpoints of the free viewpoint still images at all the intermediate viewpoints Data and all the projection geometric information are transmitted to the receiving side in advance, and the receiving side previously receives and stores the free viewpoint still image data and all the projection geometric information at all the intermediate viewpoints, Select an arbitrary intermediate viewpoint between, notify the transmitting side, moving image data of the moving area in the adjacent viewpoint corresponding to the intermediate viewpoint sent from the transmitting side, structural feature information of the moving image data, And generating the moving image data of the moving area at the arbitrary intermediate viewpoint by morphing using the correspondence information of the moving image data between the adjacent viewpoints and the previously stored projection geometric information, and storing the moving image data in advance. By combining the still image data at the intermediate viewpoint with the still image data at the intermediate viewpoint, the moving image data at the intermediate viewpoint of both the still object and the moving object at the intermediate viewpoint are received. It can be formed.
[0065]
According to the present invention, on the transmitting side, a subject is imaged from a plurality of different viewpoints, moving image data at each viewpoint is obtained, and projection geometric information between all adjacent viewpoints is estimated. The moving image data at the viewpoint is separated into the still image data of the still region and the moving image data of the moving region at the plurality of different viewpoints, and the mapping of the still image data between all the adjacent viewpoints is performed using the projection geometric information. Performing further morphing still image data at all of the intermediate viewpoints, selecting any subject in the moving area as a tracking target, and using the projective geometric information to track all of the subjects to be tracked between the adjacent viewpoints Of the moving image data, and the structural characteristic information and the corresponding relationship of the moving image data of the subject to be tracked at all the adjacent viewpoints. And transmitting the free viewpoint still image data at all the intermediate viewpoints and the free viewpoint still image data at all the intermediate viewpoints at the receiving side. Projection geometric information is received and stored in advance, an arbitrary intermediate viewpoint between adjacent viewpoints is selected, the transmitting side is notified, and the subject to be tracked in the adjacent viewpoint corresponding to the intermediate viewpoint sent from the transmitting side is selected. The tracking at the arbitrary intermediate viewpoint by using the moving image data, the structural feature information of the moving image data, the correspondence information of the moving image data between the adjacent viewpoints, and the projection geometric information stored in advance. By generating moving image data of the target subject by morphing and combining it with the previously stored still image data at the intermediate viewpoint, the still subject The image data can be generated on the receiving side in both of the intermediate point of view the subject of the preliminary moving state, and to track an object moving state can be displayed in the center or the like of the screen.
[Brief description of the drawings]
FIG. 1 is a block diagram of a free viewpoint moving image data generation system according to a first embodiment of the present invention;
FIG. 2 is a diagram showing an arrangement of cameras according to the first embodiment of the present invention;
FIG. 3 is a diagram showing a region of an image according to the first embodiment of the present invention;
FIG. 4 is a diagram for explaining a plane projection matrix.
FIG. 5 is a diagram for explaining a method of generating an interpolation image;
FIG. 6 is a diagram for explaining a mosaic process;
FIG. 7 is a diagram for explaining epipolar geometry;
FIG. 8 is a flowchart of an offline process according to the first embodiment of the present invention;
FIG. 9 is a diagram showing an example of extracting a moving area;
FIG. 10 is a flowchart of online processing according to the first embodiment of the present invention;
FIG. 11 is a diagram for explaining correspondence of silhouettes;
FIG. 12 is a diagram showing an arrangement of cameras according to a second embodiment of the present invention;
FIG. 13 is a flowchart showing an area dividing process related to a moving area according to the second embodiment of the present invention;
FIG. 14 is a diagram for explaining estimation of a player position in camera-to-camera processing;
FIG. 15 is a diagram for describing estimation of a player position in camera-to-camera processing.
[Explanation of symbols]
11 Camera
12, 21 input device
13,22 monitor
14,23 CPU
15, 24 Main memory
16, 25 disk memory
17, 26 Communication device
18, 27 bus

Claims

A free viewpoint moving image in which a transmitting side captures a subject in a three-dimensional space from a plurality of different viewpoints to acquire moving image data, and a receiving side generates moving image data of the subject at an arbitrary intermediate viewpoint between adjacent viewpoints. An image data generation method,
Dividing moving image data obtained by imaging the subject from a plurality of different viewpoints into a moving region and a still region, estimating projected geometric information between all adjacent viewpoints, and image data of the still region Generating free viewpoint still image data at all of the intermediate viewpoints based on the projective geometric information, and for the moving image data of the moving area, the structural features of the moving image data based on the projective geometric information Generating information and correspondence information of moving image data between all of the adjacent viewpoints, and transmitting beforehand the free viewpoint still image data and all the projection geometric information at all of the intermediate viewpoints to the receiving side. Is executed on the transmitting side,
A step of previously receiving and storing free viewpoint still image data and all the projection geometric information in all the intermediate viewpoints, and a step of selecting an arbitrary intermediate viewpoint between the adjacent viewpoints and notifying the transmitting side. Execute on the receiving side,
Moving image data of a moving area in an adjacent viewpoint corresponding to the intermediate viewpoint notified from the receiving side, structural feature information of the moving image data, and correspondence information of moving image data between the adjacent viewpoints to the receiving side. Performing the transmitting step on the transmitting side;
Receiving moving image data of the moving region, structural feature information of the moving image data, and correspondence information of moving image data between the adjacent viewpoints, and generating moving image data of the moving region at the arbitrary intermediate viewpoint Reading the free viewpoint still image data at the arbitrary intermediate viewpoint stored in advance and combining the generated free viewpoint still image data with the generated moving image data of the moving region at the arbitrary intermediate viewpoint to obtain the moving image at the arbitrary intermediate viewpoint. Generating a free viewpoint moving image data on the receiving side.

Separating the still area into short-distance image data and long-distance image data according to the distance from the viewpoint, and generating free viewpoint still image data at the arbitrary intermediate viewpoint for the short-distance image data 2. The free viewpoint moving image data generating method according to claim 1, wherein the transmitting side executes the step of performing the step of cutting out the image data of the long-distance image after connecting the image data of the adjacent viewpoint. .

The transmitting side executes a step of selecting an arbitrary subject in the moving area as a tracking target for each of the plurality of different viewpoints and a step of storing position information of the selected subject. Item 1. The free viewpoint moving image data generation method according to Item 1.

4. The free viewpoint moving image according to claim 3, wherein when occlusion occurs in the selected subject, the position information is acquired by referring to image data captured at a viewpoint where no occlusion has occurred. Image data generation method.

A free viewpoint moving image in which a transmitting side captures a subject in a three-dimensional space from a plurality of different viewpoints to acquire moving image data, and a receiving side generates moving image data of the subject at an arbitrary intermediate viewpoint between adjacent viewpoints. A program for causing a transmitting computer to execute the image data generating method,
Dividing moving image data obtained by imaging the subject from a plurality of different viewpoints into a moving region and a still region, estimating projected geometric information between all adjacent viewpoints, and image data of the still region Generating free viewpoint still image data at all of the intermediate viewpoints based on the projective geometric information, and for the moving image data of the moving area, the structural features of the moving image data based on the projective geometric information Information, and generating correspondence information of moving image data between the adjacent viewpoints, and transmitting beforehand the free viewpoint still image data and all the projection geometric information in all the intermediate viewpoints to the receiving side, Moving image data of a moving area in an adjacent viewpoint corresponding to the intermediate viewpoint notified from the receiving side, and structural characteristic information of the moving image data And a step of transmitting a correspondence relationship information of the moving image data between the neighboring view to the receiving side program for causing the computer to perform.

A free viewpoint moving image in which a transmitting side captures a subject in a three-dimensional space from a plurality of different viewpoints to acquire moving image data, and a receiving side generates moving image data of the subject at an arbitrary intermediate viewpoint between adjacent viewpoints. A program for causing a receiving computer to execute the image data generating method,
A step of previously receiving and storing free viewpoint still image data at all of the intermediate viewpoints and projection geometric information between all of the adjacent viewpoints generated and transmitted at the transmitting side; and Selecting a viewpoint and notifying the transmitting side of the moving image data, and the correspondence of the moving image data of the moving area transmitted from the transmitting side, the structural feature information of the moving image data, and the moving image data between the adjacent viewpoints Receiving relation information and generating moving image data of the moving area at the arbitrary intermediate viewpoint; reading out the free viewpoint still image data at the previously stored arbitrary intermediate viewpoint; Generating moving image data at the arbitrary intermediate viewpoint by combining with moving image data of a moving area at the viewpoint. Program for causing.