JP2022103836A

JP2022103836A - Information processing device, information processing method, and program

Info

Publication number: JP2022103836A
Application number: JP2020218722A
Authority: JP
Inventors: 裕尚伊藤; Hironao Ito
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-07-08

Abstract

To generate an appropriate virtual viewpoint image.SOLUTION: Shape data of an object used for generating a virtual viewpoint image is generated using a plurality of captured images having different viewpoints, and it is determined whether the object related to the generated shape data corresponds to an unnecessary object. Then, a virtual viewpoint image in which the display of the object corresponding to the unnecessary object is suppressed is generated.SELECTED DRAWING: Figure 2

Description

本開示の技術は、複数の撮像装置で撮影した画像から仮想視点画像を生成する技術に関する。 The technique of the present disclosure relates to a technique of generating a virtual viewpoint image from an image taken by a plurality of image pickup devices.

昨今、複数の撮像装置をそれぞれ異なる位置に設置して多視点で同期撮影し、当該撮影により得られた複数視点画像を用いて、指定された視点（仮想視点）からの見えを表す仮想視点画像を生成する技術が注目されている。仮想視点画像を生成する際は、撮像対象エリアに存在する人物等の被写体（前景オブジェクト）の三次元モデルを計算で求めることによって、撮像対象エリアを仮想視点から見たときの映像を作り出す。仮想視点画像の生成対象は種々あるが、例えばスタジアムで行われるスポーツイベントが挙げられる。このようなスポーツイベントは、放映や配信を目的に、放送局のスタッフ（カメラマン）による撮影が行われることが多い。また、規模の大きいイベントでは、ワイヤーカムと呼ばれる特殊な撮影システムが使用されることがある。ワイヤーカムは、撮影対象空間を張り渡した複数のワイヤーに撮像装置を吊り下げて撮影するシステムであり、例えばサッカーやラグビーといったフィールド競技においてダイナミックな俯瞰映像を撮影することができる。 Nowadays, multiple image pickup devices are installed at different positions to perform synchronous shooting from multiple viewpoints, and a virtual viewpoint image showing the view from a specified viewpoint (virtual viewpoint) using the multi-viewpoint image obtained by the shooting. The technology to generate is attracting attention. When generating a virtual viewpoint image, a three-dimensional model of a subject (foreground object) such as a person existing in the image capture area is calculated to create an image when the image capture area is viewed from the virtual viewpoint. There are various targets for generating virtual viewpoint images, and examples thereof include sporting events held in stadiums. Such sporting events are often filmed by broadcasting station staff (cameramen) for the purpose of broadcasting and distribution. Also, at large events, a special photography system called a wire cam may be used. The wire cam is a system for shooting by suspending an image pickup device from a plurality of wires extending over a space to be shot, and can shoot a dynamic bird's-eye view image in a field competition such as soccer or rugby.

上述したように大規模なスポーツイベント等では、放送局のカメラマンやワイヤーカムが撮影対象空間を移動する。その結果、選手等と同様にそれらが仮想視点画像に映り込んでしまい鑑賞の邪魔になるケースがある。このような本来の前景オブジェクトではない、鑑賞の妨げになる人物や物体については仮想視点画像上に表示されないことが望ましい。特許文献１には、第一のカメラで遮蔽物を検出した場合、第二のカメラによる撮影画像を視点変換処理し、それを第一のカメラの撮影画像に合成することで、遮蔽物によって遮られた被写体を確認できるようにする監視カメラシステムが提案されている。 As mentioned above, in a large-scale sporting event or the like, a cameraman or a wire cam of a broadcasting station moves in the shooting target space. As a result, there are cases in which they are reflected in the virtual viewpoint image as in the case of athletes and the like, which interferes with viewing. It is desirable that people and objects that interfere with viewing, which are not the original foreground objects, are not displayed on the virtual viewpoint image. According to Patent Document 1, when a shield is detected by the first camera, the image captured by the second camera is subjected to viewpoint conversion processing, and the image is combined with the image captured by the first camera to shield the image by the shield. A surveillance camera system has been proposed that allows the subject to be confirmed.

特開２０１１－７７９８１公報Japanese Unexamined Patent Publication No. 2011-77981

しかしながら、上記特許文献１の技術は、位置が固定された複数のカメラのうち遮蔽物が検出されなかったカメラの撮影画像を視点変換処理することで遮蔽物のない状態の画像を得るものである。したがって、仮想視点画像の生成にそのまま適用して、鑑賞の妨げになるオブジェクトを適切に取り除いた仮想視点画像を得ることは困難である。 However, the technique of Patent Document 1 obtains an image in a state without an obstruction by performing viewpoint conversion processing on an image taken by a camera in which an obstruction is not detected among a plurality of cameras whose positions are fixed. .. Therefore, it is difficult to obtain a virtual viewpoint image by applying it as it is to the generation of a virtual viewpoint image and appropriately removing objects that hinder viewing.

そこで、本開示の技術は、適切な仮想視点画像を生成することを目的とする。 Therefore, the technique of the present disclosure aims to generate an appropriate virtual viewpoint image.

本開示に係る情報処理装置は、視点の異なる複数の撮影画像を用いてオブジェクトの三次元形状を表す形状データを生成する第１生成手段と、前記第１生成手段が生成した形状データに係るオブジェクトが、仮想視点画像において不要なオブジェクトに該当するか否かを、少なくともオブジェクトの形状に関する条件に基づいて判定する判定手段と、前記判定手段により不要なオブジェクトと判定されたオブジェクトの表示が、前記判定手段により不要なオブジェクトと判定されなかったオブジェクトの表示よりも抑制された、仮想視点画像を生成する第２生成手段と、を有することを特徴とする。 The information processing apparatus according to the present disclosure includes a first generation means for generating shape data representing a three-dimensional shape of an object using a plurality of captured images having different viewpoints, and an object related to the shape data generated by the first generation means. However, the determination means for determining whether or not the object corresponds to an unnecessary object in the virtual viewpoint image based on at least the conditions relating to the shape of the object, and the display of the object determined to be an unnecessary object by the determination means are the determination. It is characterized by having a second generation means for generating a virtual viewpoint image, which is suppressed more than the display of an object that is not determined to be an unnecessary object by the means.

本開示の技術によれば、適切な仮想視点画像を生成することができる。 According to the technique of the present disclosure, an appropriate virtual viewpoint image can be generated.

仮想視点画像を生成する画像処理システムの構成の一例を示すブロック図。A block diagram showing an example of the configuration of an image processing system that generates a virtual viewpoint image. 撮影対象空間に設置されたカメラの配置を説明する図。The figure explaining the arrangement of the camera installed in the space to be photographed. 情報処理装置のハードウェア構成を示すブロック図。A block diagram showing a hardware configuration of an information processing device. 実施形態１に係るサーバのソフトウェア構成を示す図ブロック図。The figure block diagram which shows the software structure of the server which concerns on Embodiment 1. FIG. （ａ）及び（ｂ）は、ＵＩ画面の一例を示す図。(A) and (b) are diagrams showing an example of a UI screen. 三次元モデル解析部の内部構成を示す機能ブロック図。A functional block diagram showing the internal configuration of the 3D model analysis unit. （ａ）～（ｌ）は、三次元モデルの基礎形状パターンの一例を示す図。(A) to (l) are diagrams showing an example of a basic shape pattern of a three-dimensional model. 三次元モデルの特徴情報の一例を示すテーブル。A table showing an example of feature information of a three-dimensional model. 実施形態１に係る仮想視点画像生成処理の流れを示すフローチャート。The flowchart which shows the flow of the virtual viewpoint image generation processing which concerns on Embodiment 1. （ａ）及び（ｂ）は、実施形態１の効果を説明する図。(A) and (b) are diagrams for explaining the effect of the first embodiment. 実施形態２に係るサーバのソフトウェア構成を示すブロック図。The block diagram which shows the software structure of the server which concerns on Embodiment 2. 実施形態１に係る仮想視点画像生成処理の流れを示すフローチャート。The flowchart which shows the flow of the virtual viewpoint image generation processing which concerns on Embodiment 1.

以下、本開示の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. It should be noted that the following embodiments do not limit the present invention, and not all combinations of features described in the present embodiment are essential for the means for solving the present invention. The same configuration will be described with the same reference numerals.

［実施形態１］
本実施形態では、オブジェクトの三次元形状を表す形状データ（以下、「三次元モデル」と呼ぶ。）を解析し、特定のオブジェクトの三次元モデルに対しては仮想視点画像において表示が不要であることを示す情報を付与する、そして、当該情報が付与された三次元モデルについては透明化処理することで、鑑賞の妨げになるオブジェクトが排除された仮想視点画像を得る。なお、仮想視点画像とは、エンドユーザ及び／又は選任のオペレータ等が自由に仮想カメラの位置及び姿勢を操作することによって生成される映像であり、自由視点映像や任意視点映像などとも呼ばれる。また、仮想視点画像は、動画であっても、静止画であってもよい。以下では、仮想視点画像が動画である場合を例に、本実施形態を実現する画像処理システムについて説明する。 [Embodiment 1]
In this embodiment, shape data representing the three-dimensional shape of an object (hereinafter referred to as "three-dimensional model") is analyzed, and the three-dimensional model of a specific object does not need to be displayed in a virtual viewpoint image. By adding information indicating that, and by making the three-dimensional model to which the information is added transparent, a virtual viewpoint image in which objects that hinder viewing are excluded is obtained. The virtual viewpoint image is an image generated by the end user and / or an appointed operator freely manipulating the position and posture of the virtual camera, and is also called a free viewpoint image or an arbitrary viewpoint image. Further, the virtual viewpoint image may be a moving image or a still image. Hereinafter, an image processing system that realizes the present embodiment will be described by taking the case where the virtual viewpoint image is a moving image as an example.

（基本的なシステム構成）
図１は、本実施形態に係る、仮想視点画像を生成する画像処理システムの構成の一例を示すブロック図である。画像処理システム１００は、複数の撮像モジュール１１０、スイッチングハブ１１５、サーバ１１６、データベース（ＤＢ）１１７、制御装置１１８、及び表示装置１１９を有する。複数の撮像モジュール１１０のそれぞれには、撮像装置であるカメラ１１１とカメラアダプタ１１２が、それぞれ内部配線によって接続されて存在する。各撮像モジュール１１０は、ネットワークケーブルによって伝送を行う。スイッチングハブ（以下、「ＨＵＢ」と表記）１１５は、各ネットワーク装置間のルーティングを行う装置である。撮像モジュール１１０それぞれは、ネットワークケーブル１１３でＨＵＢ１１５に接続されている。同様に、サーバ１１６、ＤＢ１１７及び制御装置１１８もネットワークケーブル１１３でそれぞれＨＵＢ１１５に接続されている。そして、制御装置１１８と表示装置１１９との間は、映像用ケーブル１１４で接続されている。各カメラ１１１は、同期信号に基づいて互いに高精度に同期して撮影を行う。図２に示す通り、本実施形態においては、複数の撮像モジュール１１０（便宜上１０台のみ図示）が、撮影対象空間であるスタジアムのフィールドを囲むように設置されている。また、仮想視点画像の生成には用いない、放送用映像を撮影するためのワイヤーカム１２０がフィールド上空に張ったワイヤーから吊り下げられた状態で設置されている。 (Basic system configuration)
FIG. 1 is a block diagram showing an example of the configuration of an image processing system for generating a virtual viewpoint image according to the present embodiment. The image processing system 100 includes a plurality of image pickup modules 110, a switching hub 115, a server 116, a database (DB) 117, a control device 118, and a display device 119. A camera 111 and a camera adapter 112, which are image pickup devices, are connected to each of the plurality of image pickup modules 110 by internal wiring. Each image pickup module 110 transmits by a network cable. The switching hub (hereinafter referred to as “HUB”) 115 is a device that performs routing between network devices. Each of the image pickup modules 110 is connected to the HUB 115 by a network cable 113. Similarly, the server 116, the DB 117, and the control device 118 are also connected to the HUB 115 by the network cable 113, respectively. The control device 118 and the display device 119 are connected by a video cable 114. Each camera 111 takes pictures in synchronization with each other with high accuracy based on the synchronization signal. As shown in FIG. 2, in the present embodiment, a plurality of image pickup modules 110 (only 10 units are shown for convenience) are installed so as to surround the field of the stadium, which is the shooting target space. Further, a wire cam 120 for shooting a broadcast image, which is not used for generating a virtual viewpoint image, is installed in a state of being suspended from a wire stretched over the field.

サーバ１１６は、撮像モジュール１１０で得られた撮影画像の加工、オブジェクトの三次元モデルの生成・解析、生成された三次元モデルへの色付け（「テクスチャの貼り付け」、「テクスチャマッピング」とも呼ばれる）などを行う情報処理装置である。サーバ１１６は、本システムの時刻同期を行うための時刻同期信号を生成するタイムサーバ機能も有している。本実施形態において三次元モデルの生成対象となるオブジェクトは、選手やボールといった動体オブジェクトである。この場合において、フィールド上空を動き回るワイヤーカム１２０や、フィールド周辺を移動しながら放送用映像の撮影を行うカメラマンなども形式的には動体オブジェクトに該当し、三次元モデルの生成対象となってしまうことが本開示技術における課題である。データベース（以下、「ＤＢ」と表記）１１７は、各撮像モジュール１１０で得られた撮影画像や生成された三次元モデル等のデータを蓄積したり、蓄積されているデータをサーバ１１６や制御装置１１８に提供したりする。制御装置１１８は、各撮像モジュール１１０やサーバ１１６を制御する情報処理装置である。また、制御装置１１８は、仮想カメラ（仮想視点）の設定にも利用される。表示装置１１９は、制御装置１１８においてユーザが仮想視点を指定するための設定用ユーザインタフェース画面（ＵＩ画面）の表示や、生成された仮想視点画像の閲覧用ＵＩ画面の表示などを行う。表示装置１１９は、例えばテレビ、コンピュータのモニタ、タブレットやスマートフォンの液晶表示部などであり、機器の種類は問わない。 The server 116 processes the captured image obtained by the image pickup module 110, generates and analyzes a three-dimensional model of an object, and colors the generated three-dimensional model (also called "texture pasting" or "texture mapping"). It is an information processing device that performs such things. The server 116 also has a time server function for generating a time synchronization signal for performing time synchronization of the system. In this embodiment, the object for which the three-dimensional model is generated is a moving object such as a player or a ball. In this case, the wire cam 120 that moves around the field and the cameraman who shoots broadcast video while moving around the field also formally correspond to moving objects and are the targets for generating a three-dimensional model. Is an issue in the disclosed technology. The database (hereinafter referred to as “DB”) 117 stores data such as captured images obtained by each image pickup module 110 and generated three-dimensional models, and stores the stored data in the server 116 and the control device 118. To provide to. The control device 118 is an information processing device that controls each image pickup module 110 and the server 116. The control device 118 is also used for setting a virtual camera (virtual viewpoint). The display device 119 displays a setting user interface screen (UI screen) for the user to specify a virtual viewpoint in the control device 118, displays a UI screen for viewing the generated virtual viewpoint image, and the like. The display device 119 is, for example, a television, a computer monitor, a liquid crystal display unit of a tablet or a smartphone, and the like, and the type of the device is not limited.

（画像処理システムの動作）
次に、画像処理システム１００における大まかな動作を説明する。撮像モジュール１１０にて得られた撮影画像は前景背景分離等の所定の画像処理が施された後、次の撮像モジュール１１０に伝送される。同様に次の撮像モジュール１１０では、自モジュールにて得た撮影画像を、前の撮像モジュール１１０から受け取った撮影画像と合わせて、さらに次の撮像モジュール１１０に伝送する。このような動作を続けることにより、１００セット分の撮影画像（前景画像を含む）が、ＨＵＢ１１５を介してサーバ１１６へ伝送される。 (Operation of image processing system)
Next, a rough operation in the image processing system 100 will be described. The captured image obtained by the image pickup module 110 is subjected to predetermined image processing such as foreground background separation and then transmitted to the next image pickup module 110. Similarly, in the next image pickup module 110, the photographed image obtained by the own module is combined with the photographed image received from the previous image pickup module 110, and further transmitted to the next image pickup module 110. By continuing such an operation, 100 sets of captured images (including foreground images) are transmitted to the server 116 via the HUB 115.

サーバ１１６は、すべての撮像モジュール１１０から取得した視点の異なる撮影画像データに基づき、オブジェクトの三次元モデルの生成やレンダリング処理を行って、仮想視点画像を生成する。また、サーバ１１６は、時刻及び同期信号を各撮像モジュール１１０に対して送信する。時刻と同期信号を受信した各撮像モジュール１１０は、受信した時刻及び同期信号を用いて撮像を行い、撮影画像のフレーム同期を行う。即ち、各撮像モジュール１１０では同じ時刻に同期してフレーム単位で撮影が行われる。なお、撮影画像データのフォーマットについては、特に限定するものではない。例えば、画素単位のビット深度（８ビット、１０ビットなど）、画素単位の色を表現するフォーマット（ＹＵＶ４４４、ＹＵＶ４２２、ＲＧＢなど）を限定するものではない。画像ファイル形式についても限定しない。例えば、一般的なＰＮＧ（Portable Network Graphics）、やＪＰＥＧ（Joint Photographic Experts Group）の形式であるとする。 The server 116 generates a virtual viewpoint image by generating and rendering a three-dimensional model of an object based on captured image data with different viewpoints acquired from all the image pickup modules 110. Further, the server 116 transmits the time and synchronization signals to each imaging module 110. Each image pickup module 110 that has received the time and synchronization signal performs image pickup using the received time and synchronization signal, and performs frame synchronization of the captured image. That is, in each image pickup module 110, shooting is performed in frame units in synchronization with the same time. The format of the captured image data is not particularly limited. For example, the bit depth of each pixel (8 bits, 10 bits, etc.) and the format for expressing the color of each pixel (YUV444, YUV422, RGB, etc.) are not limited. There is no limitation on the image file format. For example, it is assumed that the format is a general PNG (Portable Network Graphics) or JPEG (Joint Photographic Experts Group).

仮想視点画像の生成に際しては、まず、視点の異なる複数の撮影画像から前景となるオブジェクト毎にそのシルエットが抽出される。次に、抽出したオブジェクトのシルエットを表す画像（前景画像）を使用して、例えばＶｉｓｕａｌＨｕｌｌ方式（視体積交差法）によってオブジェクトの３次元モデルを生成する。最後に、生成したオブジェクトの３次元モデルに対して、各物理カメラ１１１の撮影画像に含まれる色情報（テクスチャ情報）を利用して色を付ける。以上のような処理により、任意の仮想視点からの見えを表す仮想視点画像が得られる。なお、仮想視点画像を生成する際に、上述のモデルベースの方式に代えて、モーフィング方式やビルボーディング方式を用いてもよい。モーフィング方式は、近接したカメラ間の映像をカメラの位置関係に応じて補間することで合成し、カメラ間の仮想視点画像を生成する方式である。また、ビルボーディング方式は、撮影画像の被写体の２次元画像を抜き出し、仮想視点に応じて３次元空間上で射影演算処理によって向きを変えることで仮想視点画像を生成する方式である。こういった、物理カメラによる複数の撮影画像を合成処理や射影変換処理することで仮想視点画像を取得する手法はイメージベースレンダリングと呼ばれる。 When generating a virtual viewpoint image, first, the silhouette of each of the foreground objects is extracted from a plurality of captured images having different viewpoints. Next, using an image (foreground image) representing the silhouette of the extracted object, a three-dimensional model of the object is generated, for example, by the Visual Hull method (visual volume crossing method). Finally, the three-dimensional model of the generated object is colored by using the color information (texture information) included in the captured image of each physical camera 111. By the above processing, a virtual viewpoint image showing the appearance from an arbitrary virtual viewpoint can be obtained. When generating the virtual viewpoint image, a morphing method or a billboarding method may be used instead of the model-based method described above. The morphing method is a method of generating a virtual viewpoint image between cameras by interpolating images between adjacent cameras according to the positional relationship between the cameras. Further, the billboarding method is a method of generating a virtual viewpoint image by extracting a two-dimensional image of a subject of a captured image and changing the direction by a projection calculation process in a three-dimensional space according to the virtual viewpoint. Such a method of acquiring a virtual viewpoint image by synthesizing or projecting a plurality of images taken by a physical camera is called image-based rendering.

（情報処理装置のハードウェア構成）
続いて、サーバ１１６及び制御装置１１８といった情報処理装置のハードウェア構成について、図３を用いて説明する。なお、撮像モジュール１１０ａ～１１０ｊ内のカメラアダプタ１１２ａ～１１２ｊなども、基本的には同様のハードウェア構成を有している。情報処理装置は、ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、補助記憶装置２０４、通信Ｉ／Ｆ２０５、操作部２０６及びバス２０７を有する。 (Hardware configuration of information processing equipment)
Subsequently, the hardware configuration of the information processing device such as the server 116 and the control device 118 will be described with reference to FIG. The camera adapters 112a to 112j in the image pickup modules 110a to 110j basically have the same hardware configuration. The information processing device includes a CPU 201, a ROM 202, a RAM 203, an auxiliary storage device 204, a communication I / F 205, an operation unit 206, and a bus 207.

ＣＰＵ２０１は、ＲＯＭ２０２やＲＡＭ２０３に格納されているプログラムやデータを用いて情報処理装置の全体を制御する。なお、ＣＰＵ２０１とは異なる１又は複数の専用のハードウェアを有し、ＣＰＵ２０１による処理の少なくとも一部を専用のハードウェアが実行してもよい。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。ＲＯＭ２０２は、変更を必要としないプログラムなどを格納する。ＲＡＭ２０３は、補助記憶装置２０４から提供されるプログラムやデータ、及び通信Ｉ／Ｆ２０５を介して外部から提供されるデータなどを一時記憶する。補助記憶装置２０４は、例えばＨＤＤやＳＳＤ等で構成され、画像データや音声データといった入力データの他、後述の各種処理で参照されるテーブル、各種アプリケーションプログラムなど、様々なデータやプログラムを記憶する。通信Ｉ／Ｆ２０５は、外部装置との通信に用いられる。例えば、外部装置と有線で接続される場合には、通信用のケーブルが通信Ｉ／Ｆ２０５に接続され、外部装置と無線通信する機能を有する場合には、通信Ｉ／Ｆ２０５はアンテナを備える。操作部２０６は、例えばキーボードやマウス、ジョイスティック、タッチパネル等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ２０１に入力する。バス２０７は、上記各部を繋いでデータや信号を伝達する。 The CPU 201 controls the entire information processing apparatus by using the programs and data stored in the ROM 202 and the RAM 203. It should be noted that one or a plurality of dedicated hardware different from the CPU 201 may be provided, and at least a part of the processing by the CPU 201 may be executed by the dedicated hardware. Examples of dedicated hardware include ASICs (Application Specific Integrated Circuits), FPGAs (Field Programmable Gate Arrays), and DSPs (Digital Signal Processors). The ROM 202 stores programs and the like that do not require changes. The RAM 203 temporarily stores programs and data provided from the auxiliary storage device 204, data provided from the outside via the communication I / F 205, and the like. The auxiliary storage device 204 is composed of, for example, an HDD or SSD, and stores various data and programs such as input data such as image data and audio data, tables referred to in various processes described later, and various application programs. The communication I / F 205 is used for communication with an external device. For example, when connected to an external device by wire, a communication cable is connected to the communication I / F 205, and when the communication I / F 205 has a function of wirelessly communicating with the external device, the communication I / F 205 includes an antenna. The operation unit 206 is composed of, for example, a keyboard, a mouse, a joystick, a touch panel, or the like, and inputs various instructions to the CPU 201 in response to an operation by the user. The bus 207 connects each of the above parts to transmit data and signals.

なお、本実施形態では、サーバ１１６にて三次元モデルの生成と仮想視点画像の生成の両方を行っているが、システム構成はこれに限定されるものではない。例えば、三次元モデルの生成を行うサーバと仮想視点画像の生成を行うサーバが別々に存在してもよい。 In this embodiment, the server 116 generates both a three-dimensional model and a virtual viewpoint image, but the system configuration is not limited to this. For example, a server that generates a three-dimensional model and a server that generates a virtual viewpoint image may exist separately.

（サーバのソフトウェア構成）
図４は、本実施形態に係る、サーバ１１６のソフトウェア構成を示すブロック図である。本開示の技術では、鑑賞の妨げになるオブジェクトの三次元モデルの特徴に基づいて、非表示にする三次元モデルを決定する。そして、非表示にすると決定された三次元モデルを透明化して表示することで、鑑賞の妨げになるオブジェクトを不可視化し、仮想視点画像の品質を高める。図４に示すようにサーバ１１６は、データ入力部４０１、三次元モデル生成部４０２、三次元モデル解析部４０３、非表示設定部４０４及び仮想視点画像生成部４０５を有する。以下、各部について順に説明する。 (Server software configuration)
FIG. 4 is a block diagram showing a software configuration of the server 116 according to the present embodiment. In the technique of the present disclosure, a three-dimensional model to be hidden is determined based on the characteristics of the three-dimensional model of the object that hinders viewing. Then, by making the three-dimensional model determined to be hidden transparent and displayed, the objects that hinder viewing are made invisible, and the quality of the virtual viewpoint image is improved. As shown in FIG. 4, the server 116 has a data input unit 401, a three-dimensional model generation unit 402, a three-dimensional model analysis unit 403, a non-display setting unit 404, and a virtual viewpoint image generation unit 405. Hereinafter, each part will be described in order.

データ入力部４０１は、撮像モジュール１１０で撮影されＤＢ１１７に格納された視点の異なる複数の撮影画像のデータを取得する。取得した複数の撮影画像のデータは、三次元モデル生成部４０２と仮想視点画像生成部４０５に送られる。また、データ入力部４０１は、仮想視点の位置及び姿勢、画角、焦点距離といった、仮想視点画像の生成に必要な各種パラメータから成る情報（以下、「仮想視点情報」と呼ぶ。）を制御装置１１８から取得する。取得した仮想視点情報は、仮想視点画像生成部４０５に送られる。 The data input unit 401 acquires data of a plurality of captured images with different viewpoints captured by the imaging module 110 and stored in the DB 117. The acquired data of the plurality of captured images is sent to the three-dimensional model generation unit 402 and the virtual viewpoint image generation unit 405. Further, the data input unit 401 controls information including various parameters necessary for generating a virtual viewpoint image (hereinafter, referred to as "virtual viewpoint information") such as the position and orientation of the virtual viewpoint, the angle of view, and the focal length. Get from 118. The acquired virtual viewpoint information is sent to the virtual viewpoint image generation unit 405.

三次元モデル生成部４０２は、データ入力部４０１から受け取った撮影画像データに基づいて、各撮影画像内に存在する動体オブジェクトの三次元モデルを生成する。以下、具体的な生成手順について簡単に説明する。まず、各撮影画像を前景背景分離処理し、オブジェクトのシルエットを表す前景画像を生成する。ここでは前景背景分離の方式として背景差分法を用いることとする。背景差分法では、まず、複数フレームから成る撮影画像のうち時系列に異なるフレーム同士を比較し、画素値の差が小さい箇所を動きのない画素として特定する。そして、特定された動きのない画素を用いて背景画像を生成する。こうして生成した背景画像と撮影画像の注目するフレームとを比較することで、当該フレームにおいて背景との差分の大きい画素を前景となる画素として特定して、前景画像を生成する。以上の処理を、視点の異なる複数の撮影画像それぞれについて行う。次に、各撮影画像に対応する前景画像を用いて、視体積交差法により三次元モデルを抽出する。視体積交差法では、対象三次元空間を細かい単位立方体（ボクセル）に分割し、各ボクセルが複数の撮影画像に映る場合の画素位置を三次元計算によって求め、各ボクセルが前景の画素に該当するか否かを判断する。全ての撮像モジュールの撮影画像において前景の画素と判断された場合、そのボクセルは対象三次元空間において、オブジェクトを構成するボクセルであると特定される。こうして、特定されたボクセルのみを残し、他のボクセルを削除していく。そして、最終的に残ったボクセル群が、対象三次元空間に存在するオブジェクトの三次元形状を表す三次元モデルとなる。生成したオブジェクト毎の三次元モデルは、その基になった撮影画像データと共に、三次元モデル解析部４０３に送られる。 The three-dimensional model generation unit 402 generates a three-dimensional model of a moving object existing in each photographed image based on the photographed image data received from the data input unit 401. Hereinafter, a specific generation procedure will be briefly described. First, each captured image is subjected to foreground-background separation processing to generate a foreground image representing the silhouette of an object. Here, the background subtraction method is used as the foreground background separation method. In the background subtraction method, first, among captured images composed of a plurality of frames, frames that differ in time series are compared, and a portion having a small difference in pixel values is specified as a static pixel. Then, a background image is generated using the specified non-moving pixels. By comparing the background image generated in this way with the frame of interest of the captured image, the pixel having a large difference from the background in the frame is specified as the pixel to be the foreground, and the foreground image is generated. The above processing is performed for each of a plurality of captured images having different viewpoints. Next, a three-dimensional model is extracted by the visual volume crossing method using the foreground image corresponding to each captured image. In the visual volume crossing method, the target three-dimensional space is divided into small unit cubes (voxels), the pixel position when each voxel appears in multiple captured images is obtained by three-dimensional calculation, and each voxel corresponds to the pixel in the foreground. Judge whether or not. When it is determined to be a foreground pixel in the captured images of all the imaging modules, the voxel is identified as a voxel constituting an object in the target three-dimensional space. In this way, only the specified voxels are left, and other voxels are deleted. Then, the finally remaining voxel group becomes a three-dimensional model representing the three-dimensional shape of the object existing in the target three-dimensional space. The generated 3D model for each object is sent to the 3D model analysis unit 403 together with the captured image data on which it is based.

三次元モデル解析部４０３は、三次元モデル生成部４０２が生成した三次元モデルについて解析処理を行う。この解析処理では、まず、仮想視点画像から除外する特定のオブジェクト（以下、「除外オブジェクト」と呼ぶ。）の三次元モデルであるかどうかを後述の除外条件を参照して判定する。そして、除外オブジェクトに該当すると判定された三次元モデルに対しては、仮想視点画像における表示が不要であることを示す情報（例えば、表示不要な場合に「１」それ以外の場合に「０」を割り当てたフラグ。以下、「非表示フラグ」と表記。）を付与する。三次元モデル解析部４０３の詳細については後述する。 The three-dimensional model analysis unit 403 performs analysis processing on the three-dimensional model generated by the three-dimensional model generation unit 402. In this analysis process, first, it is determined whether or not it is a three-dimensional model of a specific object (hereinafter, referred to as “excluded object”) to be excluded from the virtual viewpoint image by referring to the exclusion condition described later. Then, for the three-dimensional model determined to correspond to the excluded object, the information indicating that the display in the virtual viewpoint image is unnecessary (for example, "1" when the display is unnecessary and "0" in other cases). The flag to which is assigned. Hereinafter referred to as "hidden flag".) Is added. The details of the three-dimensional model analysis unit 403 will be described later.

非表示設定部４０４は、除外オブジェクトが仮想視点画像上に映らないようにするための表示制御の実現に必要な設定を行う。具体的には、除外オブジェクトの非表示制御を行うかどうかの動作モード（非表示モード）の有効／無効や、どのようなオブジェクトを除外オブジェクトとするかを規定する条件（除外条件）をユーザ指示に基づいて設定する。図５（ａ）は、ユーザが非表示モードの設定を行うためのユーザインタフェース画面（ＵＩ画面）の一例を示している。鑑賞の妨げなる特定のオブジェクトが仮想視点画像上に表示されないようにしたいユーザは、非表示モードを有効にするためのラジオボタン５０１にチェックを入れてＯＫボタン５０３を押下する。すると、図５（ｂ）に示す除外条件の設定用画面へと遷移する。この遷移後のＵＩ画面にて、ユーザは、除外オブジェクトについての詳細な条件（ここでは、オブジェクトの色、形状、及び三次元空間における位置）を指定できる。いま、図５（ｂ）においては、三次元空間における位置を指定するためのボタン５１１が選択されてポップアップウィンドウ５１２が表示され、地上からの高さを規定する値“3000(mm)”が入力フィールド５１３に入力された状態となっている。この場合、地上からの高さが３ｍ以上のオブジェクトが除外オブジェクトとして設定されることになり、これによりフィールド上空を動き回るワイヤーカム１２０は解析処理において除外オブジェクトと判定されることになる。また、高さ方向の位置情報の条件としては、表示対象のオブジェクトよりも高い位置にあるものを除外オブジェクトとなるように設定してもよい。 The non-display setting unit 404 makes settings necessary for realizing display control for preventing the excluded object from appearing on the virtual viewpoint image. Specifically, the user instructs the condition (exclusion condition) that specifies whether to enable / disable the operation mode (hidden mode) of whether to control the hiding of the excluded object and what kind of object is to be the excluded object. Set based on. FIG. 5A shows an example of a user interface screen (UI screen) for the user to set the non-display mode. A user who wants to prevent a specific object that interferes with viewing from being displayed on the virtual viewpoint image checks the radio button 501 for enabling the non-display mode and presses the OK button 503. Then, the screen transitions to the screen for setting the exclusion condition shown in FIG. 5 (b). On the UI screen after this transition, the user can specify detailed conditions for the excluded object (here, the color, shape, and position of the object in the three-dimensional space). Now, in FIG. 5B, the button 511 for specifying the position in the three-dimensional space is selected, the pop-up window 512 is displayed, and the value "3000 (mm)" that defines the height from the ground is input. It is in the state of being input to the field 513. In this case, an object having a height of 3 m or more from the ground is set as an excluded object, whereby the wire cam 120 moving around over the field is determined to be an excluded object in the analysis process. Further, as a condition of the position information in the height direction, an object at a position higher than the object to be displayed may be set as an excluded object.

ここでは三次元空間における高さ（ｚ座標）が一定以上であることのみを位置の条件としているが、例えばx軸、y軸、z軸それぞれについて閾値を設け、各軸における座標値が閾値以上、閾値以下、閾値との差が一定値以内、といった条件を設定してもよい。また、任意のフレームに対応する仮想視点画像をプレビューエリア５１５に表示し、当該画像から鑑賞の妨げになるオブジェクトをマウス等でユーザに選択させ、該オブジェクトの特徴（色、形状、位置、大きさなど）が除外条件として自動で設定されるようにしてもよい。さらには、ユーザがあるオブジェクトを選択するとその特徴をリスト表示し、該リストの中から除外条件とする特徴をユーザに選択させるようにしてもよい。こうして除外オブジェクトを規定する様々な条件を指定したユーザは、最終的に非表示制御の対象にするかどうかの判定基準となる組み合わせ条件５１６を指定する。組み合わせ条件には、一般的なＡＮＤ条件（論理積）やＯＲ条件（論理和）といった一般的な論理演算を利用できる。例えばＡＮＤ条件を使用する場合は、入力フィールド５１７に“ＡＮＤ”を入力すればよい。ＡＮＤ条件が指定された場合、各解析部６０２～６０４の解析結果がすべて「条件を満たす」となった三次元モデルが、仮想視点画像において非表示とする三次元モデルとして決定されることになる。同様にＯＲ条件を使用した場合は、各解析部６０２～６０４の解析結果のいずれかが「条件を満たす」となった三次元モデルが、仮想視点画像において非表示とする三次元モデルとして決定される。制御装置１１８等を介して行われるこのようなユーザ指示に基づき、非表示モードの有効／無効や、除外オブジェクトを特定するための条件が設定される。 Here, the only condition for the position is that the height (z coordinate) in the three-dimensional space is above a certain level, but for example, thresholds are set for each of the x-axis, y-axis, and z-axis, and the coordinate values in each axis are above the threshold. , Below the threshold value, and the difference from the threshold value is within a certain value. Further, a virtual viewpoint image corresponding to an arbitrary frame is displayed in the preview area 515, an object that hinders viewing is selected from the image by the user with a mouse or the like, and the features (color, shape, position, size) of the object are selected. Etc.) may be automatically set as an exclusion condition. Further, when the user selects an object, the features may be displayed in a list, and the user may be made to select a feature to be excluded from the list. The user who specifies various conditions for defining the excluded object in this way specifies the combination condition 516 which is a criterion for determining whether or not the object is finally targeted for non-display control. As the combination condition, a general logical operation such as a general AND condition (logical product) or an OR condition (logical sum) can be used. For example, when using the AND condition, "AND" may be input in the input field 517. When the AND condition is specified, the 3D model in which all the analysis results of each analysis unit 602 to 604 "satisfy the condition" is determined as the 3D model to be hidden in the virtual viewpoint image. .. Similarly, when the OR condition is used, the 3D model in which any of the analysis results of each analysis unit 602 to 604 "satisfies" is determined as the 3D model to be hidden in the virtual viewpoint image. To. Based on such a user instruction performed via the control device 118 or the like, conditions for enabling / disabling the hidden mode and specifying the excluded object are set.

仮想視点画像生成部４０５は、データ入力部４０１から入力される撮影画像データと仮想視点情報、及び三次元モデル解析部４０３から入力される三次元モデルを用いて、仮想視点画像を生成する。具体的には、仮想視点情報によって特定される画角において各三次元モデルが、三次元空間上でどのように見えるかを計算し、対応する実カメラの撮影画像を用いて色付けすることで、仮想視点からの見えを表す画像を生成する。その際、非表示モードが有効であれば、除外オブジェクトの三次元モデルについてはその表示が抑制されるようにする。具体的には、通常の色付けを行わず、透過度を制御可能なアルファブレンディングなどの手法を用いて半透明化ないしは完全透明化する処理を行う。非表示モードが無効である場合は、非表示フラグの内容に関わらず全ての三次元モデルに通常の色付け処理を行って仮想視点からの見えを表す画像が生成される。こうして得られた仮想視点画像のデータは制御装置１１８を介して表示装置１１９に送られ、ユーザの視聴に供される。 The virtual viewpoint image generation unit 405 generates a virtual viewpoint image by using the captured image data and virtual viewpoint information input from the data input unit 401 and the three-dimensional model input from the three-dimensional model analysis unit 403. Specifically, by calculating how each 3D model looks in the 3D space at the angle of view specified by the virtual viewpoint information, and coloring it using the image taken by the corresponding real camera, Generates an image that represents the view from a virtual viewpoint. At that time, if the non-display mode is enabled, the display of the three-dimensional model of the excluded object is suppressed. Specifically, instead of performing normal coloring, a process of semi-transparency or complete transparency is performed using a method such as alpha blending that can control the transmittance. When the hidden mode is disabled, all three-dimensional models are subjected to normal coloring processing regardless of the content of the hidden flag, and an image showing the appearance from the virtual viewpoint is generated. The data of the virtual viewpoint image thus obtained is sent to the display device 119 via the control device 118 and is provided for viewing by the user.

なお、上記５つの機能部を複数の情報処理装置で分担して実現することも可能である。例えば三次元モデル解析部４０３の出力結果を、仮想視点画像生成部４０５の機能を備えた別の情報処理装置に入力するような構成でもよい。 It is also possible to share and realize the above five functional units by a plurality of information processing devices. For example, the output result of the three-dimensional model analysis unit 403 may be input to another information processing device having the function of the virtual viewpoint image generation unit 405.

（三次元モデル解析部の詳細）
続いて、三次元モデル解析部４０３が行う非表示判定及び特徴情報付与について詳しく説明する。図６は、三次元モデル解析部４０３の内部構成を示す機能ブロック図である。図示されるように、三次元モデル解析部４０３は、除外条件保持部６０１、色解析部６０２、位置解析部６０３、形状解析部６０４、非表示判定部６０５、特徴情報付与部６０６を有する。 (Details of 3D model analysis unit)
Subsequently, the non-display determination and the addition of feature information performed by the three-dimensional model analysis unit 403 will be described in detail. FIG. 6 is a functional block diagram showing the internal configuration of the three-dimensional model analysis unit 403. As shown in the figure, the three-dimensional model analysis unit 403 includes an exclusion condition holding unit 601, a color analysis unit 602, a position analysis unit 603, a shape analysis unit 604, a non-display determination unit 605, and a feature information addition unit 606.

除外条件保持部６０１は、非表示設定部４０４によって設定された上述の除外条件を保持し、３つの解析部６０２～６０４に対し解析項目に応じた条件を出力する。すなわち、除外条件のうち、位置に関する条件を位置解析部６０２に、色に関する条件を色解析部６０３に、形状に関する条件を形状解析部６０４に出力する。また、除外条件保持部６０１は、各解析部６０２～６０４における解析結果の組み合わせ条件を保持し、非表示判定部５０５に対し出力する。 The exclusion condition holding unit 601 holds the above-mentioned exclusion condition set by the non-display setting unit 404, and outputs the conditions according to the analysis items to the three analysis units 602 to 604. That is, among the exclusion conditions, the condition related to the position is output to the position analysis unit 602, the condition related to the color is output to the color analysis unit 603, and the condition related to the shape is output to the shape analysis unit 604. Further, the exclusion condition holding unit 601 holds the combination condition of the analysis results in each analysis unit 602 to 604 and outputs the combination condition to the non-display determination unit 505.

位置解析部６０２は、三次元モデル生成部４０２から提供される三次元モデルの仮想視点映像空間における位置が、除外条件の中の位置に関する条件（上述の例では地上からの高さ）に合致するか否かを判定する位置解析処理を行う。具体的には、対象三次元モデルのバウンディングボックスの下端のｚ座標が表す高さが、条件で指定された高さを超える場合、当該対象三次元モデルのオブジェクトは、位置に関する条件を満たすことになる。ここで、バウンディングボックスとは、オブジェクトの三次元形状を表すボクセル群の外接直方体を意味する。位置解析処理の結果は対象三次元モデルに付与されて、非表示判定部６０５に送られる。 In the position analysis unit 602, the position of the 3D model provided by the 3D model generation unit 402 in the virtual viewpoint video space matches the condition regarding the position in the exclusion condition (height from the ground in the above example). Performs position analysis processing to determine whether or not. Specifically, when the height represented by the z-coordinate of the lower end of the bounding box of the target 3D model exceeds the height specified by the condition, the object of the target 3D model satisfies the condition regarding the position. Become. Here, the bounding box means a circumscribed rectangular cuboid of a group of voxels representing a three-dimensional shape of an object. The result of the position analysis process is given to the target three-dimensional model and sent to the non-display determination unit 605.

色解析部６０３は、三次元モデル生成部４０２から提供される三次元モデルを構成する各ボクセルに対応する色が、除外条件の中の色に関する条件に合致するか否かを判定する色解析処理を行う。いま、撮影画像の各画素が、ＲＧＢそれぞれ最大値“4095”の色輝度値で表現される色を持つとする。この場合の色に関する条件としては、ＲＧＢそれぞれの色輝度値がいずれも“100”以下といった条件を設定すればよい。この条件の場合、対象三次元モデルが黒に近い色を持つ場合、そのオブジェクトは除外オブジェクトの条件を満たすことになる。より詳細には、対象三次元モデルを構成する各ボクセルについて、各実カメラまでの距離や角度（実カメラの視線方向）などに基づき色解析処理において参照する撮影画像を決定して、対応する色を付与する。そして、付与された色を上記条件と比較し、ＲＧＢそれぞれの色輝度値が閾値以下であるかどうかが判定される。このとき、すべてのボクセルの色それぞれが上記条件に合致するかを判定してもよいし、すべてのボクセルの色の平均値が上記条件に合致するかを判定してもよい。色解析処理の結果は対象三次元モデルに付与されて、非表示判定部６０５に送られる。 The color analysis unit 603 determines whether or not the color corresponding to each voxel constituting the three-dimensional model provided by the three-dimensional model generation unit 402 matches the color-related condition in the exclusion condition. I do. Now, it is assumed that each pixel of the captured image has a color represented by a color luminance value of the maximum value "4095" for each of RGB. As the condition regarding the color in this case, the condition that the color luminance value of each of RGB is "100" or less may be set. Under this condition, if the target 3D model has a color close to black, the object satisfies the condition of the excluded object. More specifically, for each voxel constituting the target 3D model, the captured image to be referred to in the color analysis process is determined based on the distance and angle to each actual camera (the line-of-sight direction of the actual camera), and the corresponding color is determined. Is given. Then, the given color is compared with the above conditions, and it is determined whether or not the color luminance value of each of RGB is equal to or less than the threshold value. At this time, it may be determined whether each of the colors of all voxels meets the above condition, or it may be determined whether the average value of the colors of all voxels meets the above condition. The result of the color analysis process is given to the target three-dimensional model and sent to the non-display determination unit 605.

形状解析部６０４は、三次元モデル生成部４０２から提供される三次元モデルの形状が、除外条件の中の形状に関する条件に合致するか否かを判定する形状解析処理を行う。図７の（ａ）～（ｌ）は、三次元モデルの基礎形状パターンの一例を示している。形状に関する条件としては、例えばこのような基礎形状パターンを踏まえ、対象三次元モデルの形状が（ｊ）人型、（ｋ）楕円、（ｌ）球のいずれの形状パターンにも該当しないこと、といった条件を設定すればよい。この条件の場合、対象三次元モデルの形状が例えば筒型など、人型、楕円、球以外の形状を持つ場合に、そのオブジェクトは除外オブジェクトに該当することになる。形状解析処理の結果は対象三次元モデルに付与されて、非表示判定部６０５に送られる。 The shape analysis unit 604 performs a shape analysis process for determining whether or not the shape of the three-dimensional model provided by the three-dimensional model generation unit 402 meets the conditions related to the shape in the exclusion conditions. 7 (a) to 7 (l) show an example of the basic shape pattern of the three-dimensional model. As a condition regarding the shape, for example, based on such a basic shape pattern, the shape of the target three-dimensional model does not correspond to any of the shape patterns of (j) humanoid, (k) ellipse, and (l) sphere. All you have to do is set the conditions. Under this condition, if the shape of the target 3D model has a shape other than a human shape, an ellipsoid, or a sphere, such as a cylinder shape, the object corresponds to an excluded object. The result of the shape analysis process is given to the target three-dimensional model and sent to the non-display determination unit 605.

非表示判定部６０５は、位置解析部６０２、色解析部６０３、形状解析部６０４における各解析処理の結果、及び条件保持部６０１から提供される組み合わせ条件に基づいて、仮想視点映像空間に存在する各オブジェクトの三元モデルについての非表示判定を行う。なお、各解析部６０２～６０４において使用する各種条件における閾値の設定の仕方によっては「条件を満たさない」となった対象三次元モデルを非表示にするなど、上述の組み合わせ条件以外の条件によって判定することも可能である。このような判定処理を終えた各三次元モデルは、その判定結果と共に、特徴情報付与部６０６に出力される。 The non-display determination unit 605 exists in the virtual viewpoint video space based on the results of each analysis process in the position analysis unit 602, the color analysis unit 603, and the shape analysis unit 604, and the combination conditions provided by the condition holding unit 601. Make a non-display judgment for the ternary model of each object. It should be noted that the determination is made based on conditions other than the above-mentioned combination conditions, such as hiding the target three-dimensional model that "does not satisfy the conditions" depending on how the threshold values are set in the various conditions used in each analysis unit 602 to 604. It is also possible to do. Each three-dimensional model that has completed such a determination process is output to the feature information adding unit 606 together with the determination result.

特徴情報付与部６０６は、非表示判定部６０５から受け取った各三次元モデルに対し特徴情報を付与する。この特徴情報の中には、それぞれの非表示判定の結果に基づいて決まる上述の非表示フラグも含まれる。図８は、三次元モデルの特徴情報の一例を示すテーブルである。上述のとおり三次元モデルは、複数のボクセルで構成されており、１つのオブジェクトの３次元形状を表すボクセル群全体のバウンディングボックス（外接直方体）がまず定義される。そして、各三次元モデルについて、その大きさ、形状、色付けに使用される色といった各項目に対応する情報が格納される。この場合において、大きさ情報は、バンディングボックスの体積値と、実際に存在するボクセル数とする。また、形状情報は、例えば前述の図７に示す基礎形状パターンから最も近似した基礎形状パターンの情報が格納される。また、色情報は、各ボクセルに対応する色のうち一番多い色を示す代表色及びＲＧＢそれぞれの値の平均値が格納される。 The feature information addition unit 606 adds feature information to each three-dimensional model received from the non-display determination unit 605. This feature information also includes the above-mentioned non-display flag determined based on the result of each non-display determination. FIG. 8 is a table showing an example of the feature information of the three-dimensional model. As described above, the three-dimensional model is composed of a plurality of voxels, and the bounding box (circular cuboid) of the entire voxel group representing the three-dimensional shape of one object is first defined. Then, for each three-dimensional model, information corresponding to each item such as its size, shape, and color used for coloring is stored. In this case, the size information is the volume value of the banding box and the number of voxels that actually exist. Further, as the shape information, for example, the information of the basic shape pattern most similar to the basic shape pattern shown in FIG. 7 described above is stored. Further, in the color information, the representative color indicating the most colors among the colors corresponding to each voxel and the average value of each of the RGB values are stored.

なお、上述の例では、三次元モデルの位置、色、形状の３つを解析要素としているが、解析要素はこれらに限定されない。例えばこれらに加えて、三次元モデルの大きさ（サイズ）を解析要素としてもよい。また、ある条件を満たすかどうかの判定の際には、閾値との比較において、「閾値以上」、「閾値以下」、「閾値より大きい」、「閾値より小さい」、「基準値との差が〇〇以内」、「基準値との差が△△以上」等を使用することができる。 In the above example, the position, color, and shape of the three-dimensional model are used as analysis elements, but the analysis elements are not limited to these. For example, in addition to these, the size of the three-dimensional model may be used as an analysis element. In addition, when determining whether or not a certain condition is satisfied, in comparison with the threshold value, there are "greater than or equal to the threshold value", "below the threshold value", "greater than the threshold value", "less than the threshold value", and "difference from the reference value". You can use "within 〇〇", "difference from the standard value is △△ or more", etc.

（仮想視点画像生成処理の流れ）
図９は、本実施形態に係る、サーバ１１６における仮想視点画像生成処理の流れを示すフローチャートである。本フローチャートが示す一連の処理は、サーバ１１６のＣＰＵ２０１がＲＯＭ２０２や補助記憶装置２０４から所定のプログラムを読み出してこれを実行することで実現される。 (Flow of virtual viewpoint image generation process)
FIG. 9 is a flowchart showing the flow of the virtual viewpoint image generation process in the server 116 according to the present embodiment. The series of processes shown in this flowchart is realized by the CPU 201 of the server 116 reading a predetermined program from the ROM 202 or the auxiliary storage device 204 and executing the program.

以下、図９のフローチャートに従って、本実施形態に係る仮想視点画像生成処理の流れを説明する。なお、図９のフローは、制御装置１１８からの仮想視点画像生成の開始信号に応答してその実行が開始するものとする。そして、当該開始信号には、前述の仮想視点情報と、非表示モードの有効／無効を示すモード選択情報が付加されているものとする。また、実行開始の時点ではデータ入力部４０１に対し、仮想視点画像の基になる撮影画像データが入力済みであるものとする。以下の説明において記号「Ｓ」はステップを意味する。 Hereinafter, the flow of the virtual viewpoint image generation process according to the present embodiment will be described with reference to the flowchart of FIG. It is assumed that the flow of FIG. 9 starts its execution in response to the start signal of virtual viewpoint image generation from the control device 118. Then, it is assumed that the above-mentioned virtual viewpoint information and mode selection information indicating valid / invalid of the non-display mode are added to the start signal. Further, at the time of starting the execution, it is assumed that the captured image data that is the basis of the virtual viewpoint image has already been input to the data input unit 401. In the following description, the symbol "S" means a step.

Ｓ９０１では、非表示設定部４０４が、制御装置１１８から入力されたモード選択情報に基づき、非表示モードの有効／無効を設定する。この場合において、非表示モードが有効に設定される場合には、さらに非表示制御の対象オブジェクト（除外オブジェクト）を特定するための除外条件も併せて設定される。 In S901, the non-display setting unit 404 sets the enable / disable of the non-display mode based on the mode selection information input from the control device 118. In this case, when the non-display mode is enabled, the exclusion condition for specifying the target object (exclusion object) of the non-display control is also set.

Ｓ９０２では、三次元モデル生成部４０２が、仮想視点情報に含まれる時刻情報に基づき、処理対象フレームの画像を取得する。ここで、時刻情報は、撮影画像データにおける所定期間を特定する開始時刻と終了時刻の情報である。続くＳ９０３では、三次元モデル生成部４０２が、処理対象フレームの画像内に存在する各オブジェクトの三次元モデルを生成する。生成したオブジェクト単位の三次元モデルは、三次元モデル解析部４０３に送られる。 In S902, the three-dimensional model generation unit 402 acquires an image of the frame to be processed based on the time information included in the virtual viewpoint information. Here, the time information is information on a start time and an end time that specifies a predetermined period in the captured image data. In the following S903, the three-dimensional model generation unit 402 generates a three-dimensional model of each object existing in the image of the processing target frame. The generated 3D model for each object is sent to the 3D model analysis unit 403.

Ｓ９０４では、三次元モデル解析部４０３が、Ｓ９０３にて生成されたオブジェクト毎の三次元モデルに対して前述した解析処理を行なう。すなわち、各三次元モデルについて、位置、色、形状の観点からその特徴を特定し、当該特定された特徴が除外オブジェクトの特徴と合致するかが、Ｓ９０１にて設定された除外条件を参照して判定される。そして、判定結果に応じた非表示フラグを含む特徴情報が、各三次元モデルに対して付与される。こうして、解析結果に従い特徴情報が付与された三次元モデルは、仮想視点画像生成部４０５に送られる。 In S904, the three-dimensional model analysis unit 403 performs the above-mentioned analysis processing on the three-dimensional model for each object generated in S903. That is, for each three-dimensional model, its characteristics are specified from the viewpoint of position, color, and shape, and whether the specified characteristics match the characteristics of the excluded object is determined by referring to the exclusion condition set in S901. It is judged. Then, feature information including a non-display flag according to the determination result is given to each three-dimensional model. In this way, the three-dimensional model to which the feature information is added according to the analysis result is sent to the virtual viewpoint image generation unit 405.

Ｓ９０６では、Ｓ９０１にて設定された非表示モードの設定内容に応じて処理が振り分けられる。すなわち、非表示モードの設定が無効（オフ）の場合にはＳ９０７に進み、有効（オン）の場合にはＳ９０８に進む。 In S906, the processing is distributed according to the setting content of the non-display mode set in S901. That is, if the non-display mode setting is invalid (off), the process proceeds to S907, and if it is valid (on), the process proceeds to S908.

非表示モードがオフの場合のＳ９０７では、仮想視点画像生成部４０５が、処理対象フレームに存在するすべてのオブジェクトが表示された仮想視点画像を生成する。具体的には、非表示フラグの値に関わらず、入力されたすべての三次元モデルについて通常の色付けを行なって、処理対象フレームに対応する仮想視点画像を生成する。 In S907 when the non-display mode is off, the virtual viewpoint image generation unit 405 generates a virtual viewpoint image in which all the objects existing in the processing target frame are displayed. Specifically, regardless of the value of the non-display flag, all the input three-dimensional models are normally colored to generate a virtual viewpoint image corresponding to the frame to be processed.

一方、非表示モードが有効の場合のＳ９０８では、仮想視点画像生成部４０５が、処理対象フレームにおいて除外オブジェクトについては非表示にした仮想視点画像を生成する。具体的には、非表示フラグの値が「０」の三次元モデルについては通常の色付けを行ない、非表示フラグの値が「１」の三次元モデルについては透明化処理を行って、処理対象フレームに対応する仮想視点画像を生成する。 On the other hand, in S908 when the non-display mode is enabled, the virtual viewpoint image generation unit 405 generates a virtual viewpoint image in which the excluded objects are hidden in the processing target frame. Specifically, a three-dimensional model having a hidden flag value of "0" is normally colored, and a three-dimensional model having a hidden flag value of "1" is subjected to transparency processing to be processed. Generate a virtual viewpoint image corresponding to the frame.

Ｓ９０９では、仮想視点画像の生成を終了するかどうかが判定される。仮想視点情報に含まれる時刻情報で指定されるすべてのフレームについて処理が完了していれば、本処理を終了する。一方、未処理のフレームがあればＳ９０２に戻り、次の処理対象フレームの画像を取得して処理を続行する。 In S909, it is determined whether or not to end the generation of the virtual viewpoint image. If the processing is completed for all the frames specified by the time information included in the virtual viewpoint information, this processing is terminated. On the other hand, if there is an unprocessed frame, the process returns to S902, the image of the next frame to be processed is acquired, and the processing is continued.

以上が、本実施形態に係る仮想視点画像生成処理の流れである。図１０の（ａ）及び（ｂ）は本実施形態の効果を説明する図である。図１０（ａ）は従来手法で生成した仮想視点画像の一例、同（ｂ）は本実施形態の手法で生成した仮想視点画像の一例である。図１０（ａ）の従来手法の仮想視点画像の場合、ワイヤーカム１２０が前景オブジェクトとして扱われる結果、そのまま表示されてしまっている。これに対し、図１０（ｂ）の本実施形態の手法による仮想視点画像の場合は、ワイヤーカム１２０の三次元モデルが透明化処理されるので、出来上がった仮想視点画像上は視聴者が視認できないようになっている。 The above is the flow of the virtual viewpoint image generation process according to the present embodiment. FIGS. 10A and 10B are diagrams illustrating the effects of the present embodiment. FIG. 10A is an example of a virtual viewpoint image generated by the conventional method, and FIG. 10B is an example of a virtual viewpoint image generated by the method of the present embodiment. In the case of the virtual viewpoint image of the conventional method of FIG. 10A, the wire cam 120 is treated as a foreground object and is displayed as it is. On the other hand, in the case of the virtual viewpoint image by the method of the present embodiment of FIG. 10B, since the three-dimensional model of the wire cam 120 is transparently processed, the viewer cannot see it on the completed virtual viewpoint image. It has become like.

なお、上述の例では、時刻情報によって特定される所定期間内の全フレームを順に処理対象フレームとして仮想視点画像を生成しているが、これに限定されない。例えば一般的なキーフレーム法を用いて仮想視点画像を生成してもよい。キーフレーム法は、任意の仮想視点を対応付けた基準となるフレーム（「キーフレーム」と呼ばれる）を複数設定し、設定した複数のキーフレーム間を補間することによって仮想視点のパス情報を得て、所定期間に対応する仮想視点画像を生成する手法である。このようなキーフレーム法による場合も本実施形態の手法は同様に適用可能である。 In the above example, the virtual viewpoint image is generated by sequentially using all the frames within the predetermined period specified by the time information as the processing target frames, but the present invention is not limited to this. For example, a virtual viewpoint image may be generated by using a general key frame method. In the key frame method, multiple reference frames (called "key frames") associated with any virtual viewpoint are set, and the path information of the virtual viewpoint is obtained by interpolating between the set multiple key frames. , Is a method of generating a virtual viewpoint image corresponding to a predetermined period. Even in the case of such a key frame method, the method of the present embodiment can be similarly applied.

また、上述の例では説明の簡単化のため、フレーム単位のループ処理の前に非表示モードの有効／無効の設定を行っているが、これに限定されない。例えば、仮想視点画像を生成途中に非表示モードの設定を変更できるようにしてもよい。 Further, in the above example, for the sake of simplification of the explanation, the non-display mode is enabled / disabled before the loop processing for each frame, but the present invention is not limited to this. For example, the setting of the non-display mode may be changed during the generation of the virtual viewpoint image.

以上のとおり本実施形態によれば、撮影画像内に存在するオブジェクトの三次元モデルの特徴に基づき、一定条件を満たすオブジェクトの三次元モデルについてはその色付け時に透明化処理されることになる。これにより、鑑賞の妨げになるオブジェクトが仮想視点画像上は視認できなくなるので、ユーザは鑑賞の妨げになるオブジェクトを気にせずに視聴することができる。 As described above, according to the present embodiment, based on the characteristics of the three-dimensional model of the object existing in the captured image, the three-dimensional model of the object satisfying a certain condition is transparentized at the time of coloring. As a result, the objects that hinder the viewing cannot be visually recognized on the virtual viewpoint image, so that the user can view the objects without worrying about the objects that hinder the viewing.

なお、ワイヤーカムを例に説明したが、鑑賞の妨げになるオブジェクトとしてはこれに限られない。例えば、鑑賞の妨げになるオブジェクトとしては、ドローンのような小型の航空機や飛翔体にカメラが取り付けられた装置であってもよい。また、鑑賞の妨げになるオブジェクトとしては、固定カメラであってもよい。 Although the wire cam has been described as an example, the object that hinders viewing is not limited to this. For example, the object that interferes with viewing may be a small aircraft such as a drone or a device in which a camera is attached to a flying object. Further, the fixed camera may be used as an object that hinders viewing.

［実施形態２］
実施形態１は、あるオブジェクトが鑑賞の妨げになる場合にその三次元モデルについては、その色付け時に透明化処理することで仮想視点画像における表示を抑制する形態であった。次に、鑑賞の妨げになる除外オブジェクトに該当した場合にはその三次元モデルを予め削除し、残った三次元モデルのみを用いてレンダリング処理を行うことで、除外オブジェクトについての仮想視点画像における表示を抑制する態様を、実施形態２として説明する。なお、画像処理システムの基本構成など実施形態１と共通する内容については説明を省略ないしは簡略化することとし、以下では差異点を中心に説明を行うこととする。 [Embodiment 2]
In the first embodiment, when an object interferes with appreciation, the three-dimensional model is transparentized at the time of coloring to suppress the display in the virtual viewpoint image. Next, if it corresponds to an excluded object that interferes with viewing, the 3D model is deleted in advance, and rendering processing is performed using only the remaining 3D model to display the excluded object in the virtual viewpoint image. The mode of suppressing the above will be described as the second embodiment. The contents common to the first embodiment, such as the basic configuration of the image processing system, will be omitted or simplified, and the differences will be mainly described below.

（サーバのソフトウェア構成）
図１１は、本実施形態に係る、サーバ１１６のソフトウェア構成を示すブロック図である。本実施形態では、非表示にすると決定された三次元モデルを削除することで、鑑賞の妨げになるオブジェクトがそもそも存在しない仮想視点画像を生成する。図１１に示すとおり、本実施形態の場合も実施形態１と同様に５つの機能部（データ入力部４０１’、三次元モデル生成部４０２’、三次元モデル解析部４０３’、非表示設定部４０４’及び仮想視点画像生成部４０５’）を有する。以下、実施形態１と異なる部分を中心に、各部について順に説明する。 (Server software configuration)
FIG. 11 is a block diagram showing a software configuration of the server 116 according to the present embodiment. In the present embodiment, by deleting the three-dimensional model determined to be hidden, a virtual viewpoint image in which the object that hinders viewing does not exist in the first place is generated. As shown in FIG. 11, in the case of the present embodiment as well, the five functional units (data input unit 401', three-dimensional model generation unit 402', three-dimensional model analysis unit 403', and non-display setting unit 404' are the same as in the first embodiment. 'And a virtual viewpoint image generation unit 405'). Hereinafter, each part will be described in order, focusing on the parts different from the first embodiment.

データ入力部４０１’は、ＤＢ１１７から撮影画像のデータを取得し、制御装置１１８から仮想視点情報を取得すると、それらを三次元モデル生成部４０２’に送る。
三次元モデル生成部４０２’は、データ入力部４０１’から受け取った撮影画像データに基づいて、動体オブジェクトの三次元モデルを生成する。さらに三次元モデル生成部４０２’は、生成した三次元モデルを構成する各ボクセルに対し色情報を付与する処理を行う。この色情報の付与では、注目するボクセルと各実カメラとの距離や角度といった位置関係に基づいて、色情報の提供元になる撮影画像を決定し（複数の場合には必要に応じ重み付けを行なうなどして）、決定した撮影画像における対応する色のＲＧＢ値が付与される。こうしてボクセル単位で色情報が付与された三次元モデル（以下、「色付き三次元モデル」と呼ぶ。）は、三次元モデル解析部４０３’に送られる。 The data input unit 401'acquires captured image data from the DB 117, acquires virtual viewpoint information from the control device 118, and sends them to the three-dimensional model generation unit 402'.
The three-dimensional model generation unit 402'generates a three-dimensional model of a moving object based on the captured image data received from the data input unit 401'. Further, the three-dimensional model generation unit 402'performs a process of imparting color information to each voxel constituting the generated three-dimensional model. In this color information assignment, the captured image that is the source of the color information is determined based on the positional relationship such as the distance and angle between the boxel of interest and each actual camera (in the case of a plurality of images, weighting is performed as necessary). And so on), the RGB values of the corresponding colors in the determined captured image are given. The three-dimensional model to which the color information is given in voxel units (hereinafter, referred to as "colored three-dimensional model") is sent to the three-dimensional model analysis unit 403'.

三次元モデル解析部４０３’は、三次元モデル生成部４０２’が生成した色付き三次元モデルについて、除外オブジェクトの三次元モデルであるかどうかの解析処理を行う。この際、非表示モードが有効になっている場合のみ解析処理を行えば足りる。解析処理の内容は、基本的には、実施形態１で説明したとおりである。ただし、本実施形態における色解析処理では、三次元モデルが既に色情報を持っているので、その色情報を使用して色の解析を行えばよい点で実施形態１と異なる。また、除外オブジェクトの三次元モデルを削除する本実施形態の場合、非表示フラグは不要となるので、各三次元モデルに付される特徴情報の中に非表示フラグは含まれない。そして、三次元モデル解析部４０３’は、解析処理によって除外条件（＝組み合わせ条件）に合致していると判定された色付き三次元モデルについては、仮想視点画像生成部４０５’に送ることなく削除する。つまり、削除されることなく残った非除外オブジェクトの色付き三次元モデルだけが仮想視点画像生成部４０５’に送られる。 The three-dimensional model analysis unit 403'performs an analysis process of whether or not the colored three-dimensional model generated by the three-dimensional model generation unit 402'is a three-dimensional model of the excluded object. At this time, it is sufficient to perform the analysis process only when the non-display mode is enabled. The content of the analysis process is basically as described in the first embodiment. However, in the color analysis process in the present embodiment, since the three-dimensional model already has the color information, the color analysis may be performed using the color information, which is different from the first embodiment. Further, in the case of the present embodiment in which the three-dimensional model of the excluded object is deleted, the hidden flag is not required, so that the hidden flag is not included in the feature information attached to each three-dimensional model. Then, the three-dimensional model analysis unit 403'deletes the colored three-dimensional model determined by the analysis process to meet the exclusion condition (= combination condition) without sending it to the virtual viewpoint image generation unit 405'. .. That is, only the colored three-dimensional model of the non-excluded object that remains without being deleted is sent to the virtual viewpoint image generation unit 405'.

仮想視点画像生成部４０５’は、データ入力部４０１から入力される仮想視点情報及び三次元モデル解析部４０３’から入力される色付き三次元モデルを用いて、仮想視点画像を生成する。すなわち、仮想視点画像生成部４０５’は、入力されたすべての色付き三次元モデルに対し、各三次元モデルに付与されている色情報を用いて色付け処理を行って、仮想視点からの見えを表す画像を生成する。本実施形態の場合、鑑賞の妨げになる除外オブジェクトの三次元モデルは仮想視点画像生成部４０５’にそもそも入力されないので、非表示フラグによる制御は行わない。 The virtual viewpoint image generation unit 405'generates a virtual viewpoint image using the virtual viewpoint information input from the data input unit 401 and the colored three-dimensional model input from the three-dimensional model analysis unit 403'. That is, the virtual viewpoint image generation unit 405'performs coloring processing on all the input colored three-dimensional models using the color information given to each three-dimensional model to represent the appearance from the virtual viewpoint. Generate an image. In the case of the present embodiment, since the three-dimensional model of the excluded object that hinders viewing is not input to the virtual viewpoint image generation unit 405'in the first place, the control by the non-display flag is not performed.

なお、実施形態１と同様、上記５つの機能部を複数の情報処理装置で分担して実現することも可能である。例えば三次元モデル解析部４０３’の出力結果を、仮想視点画像生成部４０５’の機能を備えた別の情報処理装置に入力するような構成でもよい。特に本実施形態では、除外オブジェクトの三次元モデルは削除され不表示フラグによる表示制御は不要になるため、三次元モデルを受信して仮想視点画像を生成する汎用的な情報処理装置（仮想視点画像生成装置）との組み合わせが容易である。 As in the first embodiment, the above five functional units can be shared and realized by a plurality of information processing devices. For example, the output result of the three-dimensional model analysis unit 403'may be input to another information processing device having the function of the virtual viewpoint image generation unit 405'. In particular, in the present embodiment, the three-dimensional model of the excluded object is deleted and the display control by the non-display flag becomes unnecessary. Therefore, a general-purpose information processing device (virtual viewpoint image) that receives the three-dimensional model and generates a virtual viewpoint image. Easy to combine with a generator).

（仮想視点画像生成処理の流れ）
図１２は、本実施形態に係る、サーバ１１６における仮想視点画像生成処理の流れを示すフローチャートである。本フローチャートが示す一連の処理は、サーバ１１６のＣＰＵ２０１がＲＯＭ２０２や補助記憶装置２０４から所定のプログラムを読み出してこれを実行することで実現される。 (Flow of virtual viewpoint image generation process)
FIG. 12 is a flowchart showing the flow of the virtual viewpoint image generation process in the server 116 according to the present embodiment. The series of processes shown in this flowchart is realized by the CPU 201 of the server 116 reading a predetermined program from the ROM 202 or the auxiliary storage device 204 and executing the program.

以下、図１２のフローチャートに従って、本実施形態に係る仮想視点画像生成処理の流れを説明する。なお、図１２のフローは、実施形態１の図９のフローと同様、制御装置１１８からの仮想視点画像生成の開始信号に応答してその実行が開始するものとする。そして、当該開始信号には、前述の仮想視点情報と、非表示モードの有効／無効を示すモード選択情報が付加されているものとする。また、実行開始の時点ではデータ入力部４０１’に対し、仮想視点画像の基になる撮影画像データが入力済みであるものとする。以下、実施形態１の図９のフローと異なる点を中心に説明する。 Hereinafter, the flow of the virtual viewpoint image generation process according to the present embodiment will be described with reference to the flowchart of FIG. It should be noted that the flow of FIG. 12 is assumed to start its execution in response to the start signal of virtual viewpoint image generation from the control device 118, as in the flow of FIG. 9 of the first embodiment. Then, it is assumed that the above-mentioned virtual viewpoint information and mode selection information indicating valid / invalid of the non-display mode are added to the start signal. Further, at the time of starting the execution, it is assumed that the captured image data that is the basis of the virtual viewpoint image has already been input to the data input unit 401'. Hereinafter, the points different from the flow of FIG. 9 of the first embodiment will be mainly described.

Ｓ１２０１では、非表示設定部４０４’が、制御装置１１８から入力されたモード選択情報に基づき、非表示モードの有効／無効を設定する。実施形態１のＳ９０１と同様、非表示モードが有効に設定される場合には、さらに非表示制御の対象オブジェクト（除外オブジェクト）を特定するための除外条件も併せて設定される。本実施形態では、設定された非表示モードの有効／無効の情報は、仮想視点画像生成部４０５’ではなく、三次元モデル解析部４０３’に送られる。 In S1201, the non-display setting unit 404'sets the enable / disable of the non-display mode based on the mode selection information input from the control device 118. Similar to S901 of the first embodiment, when the non-display mode is effectively set, the exclusion condition for specifying the target object (exclusion object) of the non-display control is also set. In the present embodiment, the valid / invalid information of the set non-display mode is sent to the three-dimensional model analysis unit 403'instead of the virtual viewpoint image generation unit 405'.

Ｓ１２０２では、三次元モデル生成部４０２’が、仮想視点情報に含まれる時刻情報に基づき、処理対象フレームの画像を取得する。続くＳ１２０３では、三次元モデル生成部４０２’が、処理対象フレームの画像内に存在する各オブジェクトについて、色情報を付与した三次元モデルを生成する。生成した色付き三次元モデルは、三次元モデル解析部４０３’に送られる。 In S1202, the three-dimensional model generation unit 402'acquires an image of the frame to be processed based on the time information included in the virtual viewpoint information. In the following S1203, the three-dimensional model generation unit 402'generates a three-dimensional model to which color information is added to each object existing in the image of the frame to be processed. The generated colored three-dimensional model is sent to the three-dimensional model analysis unit 403'.

Ｓ１２０４では、非表示設定部４０１’から入力された非表示モードの有効／無効の情報に基づき、処理が振り分けられる。非表示モードが無効であればＳ１２０８に進む。この際、三次元モデル生成部４０２’から入力されたすべての色付き三次元モデルに対して非表示フラグを含まない特徴情報が付与され、三次元モデル解析部４０３’を介して仮想視点画像生成部４０５’に送られることになる。 In S1204, the processing is distributed based on the valid / invalid information of the non-display mode input from the non-display setting unit 401'. If the hidden mode is invalid, the process proceeds to S1208. At this time, feature information not including the non-display flag is given to all the colored 3D models input from the 3D model generation unit 402', and the virtual viewpoint image generation unit is added via the 3D model analysis unit 403'. It will be sent to 405'.

一方、非表示モードが有効であればＳ１２０５において、三次元モデル解析部４０３’が、三次元モデル生成部４０２’から入力された色付き三次元モデルそれぞれに対して前述した解析処理を行なう。すなわち、各色付き三次元モデルについて、位置、色、形状の観点からその特徴を特定し、当該特定された特徴が除外オブジェクトの特徴と合致するかが、Ｓ１２０１にて設定された除外条件を参照して判定される。 On the other hand, if the non-display mode is valid, in S1205, the three-dimensional model analysis unit 403'performs the above-mentioned analysis processing for each of the colored three-dimensional models input from the three-dimensional model generation unit 402'. That is, for each colored three-dimensional model, its characteristics are specified from the viewpoint of position, color, and shape, and whether the specified characteristics match the characteristics of the excluded object is referred to the exclusion condition set in S1201. Is judged.

Ｓ１２０６では、上記解析処理の結果に基づき処理が振り分けられる。除外オブジェクトと判定されたオブジェクトが存在する場合はＳ１２０７に進み、存在しなければＳ１２０８に進む。 In S1206, the processing is distributed based on the result of the analysis processing. If an object determined to be an excluded object exists, the process proceeds to S1207, and if it does not exist, the process proceeds to S1208.

Ｓ１２０７では、三次元モデル解析部４０３’が、除外オブジェクトであると判定されたオブジェクトの三次元モデルを削除する。そして、除外オブジェクトではないと判定された（つまり、削除されなかった）色付き三次元モデルに対して非表示フラグを含まない特徴情報が付与され、仮想視点画像生成部４０５に送られる。 In S1207, the three-dimensional model analysis unit 403'deletes the three-dimensional model of the object determined to be an excluded object. Then, the feature information not including the non-display flag is given to the colored three-dimensional model determined not to be an excluded object (that is, not deleted), and is sent to the virtual viewpoint image generation unit 405.

Ｓ１２０８において、仮想視点画像生成部４０５’が、入力されたすべての色付き三次元モデルを用いて、処理対象フレームに対応する仮想視点画像を生成する。具体的には、入力された各色付き三次元モデルについて、付与されている色情報を用いた通常の色付けを行なって、仮想視点からの見えを表す仮想視点画像を生成する。
Ｓ１２０９では、仮想視点画像の生成を終了するかどうかが判定される。仮想視点情報に含まれる時刻情報で指定されるすべてのフレームについて処理が完了していれば、本処理を終了する。一方、未処理のフレームがあればＳ１２０２に戻り、次の処理対象フレームの撮影画像を取得して処理を続行する。 In S1208, the virtual viewpoint image generation unit 405'generates a virtual viewpoint image corresponding to the frame to be processed by using all the input colored three-dimensional models. Specifically, each input colored three-dimensional model is normally colored using the given color information to generate a virtual viewpoint image showing the appearance from the virtual viewpoint.
In S1209, it is determined whether or not to end the generation of the virtual viewpoint image. If the processing is completed for all the frames specified by the time information included in the virtual viewpoint information, this processing is terminated. On the other hand, if there is an unprocessed frame, the process returns to S1202, the captured image of the next frame to be processed is acquired, and the processing is continued.

以上が、本実施形態に係る仮想視点画像生成処理の流れである。なお、上述の実施形態では、各三次元モデルに色情報を付与した上で除外オブジェクトの三次元モデルを削除しているが、事前に色情報を付与しておくことは必須ではない。例えば、色情報を付与しない代わりに削除フラグを付与し、削除フラグがオフの三次元モデルだけを用いて仮想視点画像を生成してもよい。この場合の色付けの際には、削除フラグがオンの三次元モデルのバウンディングボックスの情報を参照してオクルージョン判定を行うのがよい。これにより、削除されるはずの除外オブジェクトの三次元モデルの色が誤って色付け（テクスチャの貼り付け）に使用されて不自然な仮想視点画像になってしまうのを防ぐことができる。 The above is the flow of the virtual viewpoint image generation process according to the present embodiment. In the above-described embodiment, the color information is added to each three-dimensional model and then the three-dimensional model of the excluded object is deleted, but it is not essential to add the color information in advance. For example, instead of giving color information, a deletion flag may be added, and a virtual viewpoint image may be generated using only a three-dimensional model in which the deletion flag is off. When coloring in this case, it is better to make an occlusion judgment by referring to the information in the bounding box of the three-dimensional model in which the deletion flag is on. This prevents the colors of the 3D model of the excluded object that should be deleted from being mistakenly used for coloring (pasting a texture) and resulting in an unnatural virtual viewpoint image.

以上のとおり本実施形態によれば、仮想視点画像生成部４０５’は、削除された三次元モデルを意識することなく、仮想視点画像の生成が可能である。このように実施形態２の手法によっても、鑑賞の妨げになるオブジェクトを仮想視点画像から適切に取り除くことができる。また本実施形態２の場合、不要なオブジェクトが予め削除されるため、三次元モデルを受信して仮想視点画像を生成する情報処理装置を別装置とする場合、当該別装置は非表示フラグに基づく制御機能を有している必要がない。つまり、仮想視点画像を生成するための汎用的な情報処理装置で足りるため、システム構成をよりシンプルにできる。 As described above, according to the present embodiment, the virtual viewpoint image generation unit 405'can generate a virtual viewpoint image without being aware of the deleted three-dimensional model. As described above, even by the method of the second embodiment, the object that hinders the viewing can be appropriately removed from the virtual viewpoint image. Further, in the case of the second embodiment, since unnecessary objects are deleted in advance, when the information processing device that receives the three-dimensional model and generates the virtual viewpoint image is used as another device, the other device is based on the non-display flag. It is not necessary to have a control function. That is, since a general-purpose information processing device for generating a virtual viewpoint image is sufficient, the system configuration can be made simpler.

［その他の実施形態］
本開示の技術は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 [Other embodiments]
The technique of the present disclosure supplies a program that implements one or more of the functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device program the program. It can also be realized by the process of reading and executing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１１６サーバ
４０１データ入力部
４０２三次元モデル生成部
４０３三次元モデル解析部
４０４非表示設定部
４０５仮想視点画像生成部 116 Server 401 Data input unit 402 3D model generation unit 403 3D model analysis unit 404 Hidden setting unit 405 Virtual viewpoint image generation unit

Claims

A first generation means for generating shape data representing a three-dimensional shape of an object using a plurality of captured images from different viewpoints.
A determination means for determining whether or not the object related to the shape data generated by the first generation means corresponds to an unnecessary object in the virtual viewpoint image, based on at least a condition related to the shape of the object.
A second generation means for generating a virtual viewpoint image in which the display of an object determined to be an unnecessary object by the determination means is suppressed more than the display of an object not determined to be an unnecessary object by the determination means.
An information processing device characterized by having.

The determination means adds information indicating that the display in the virtual viewpoint image is unnecessary to the shape data of the object corresponding to the unnecessary object.
The second generation means performs transparency processing on the shape data to which the information indicating that the display in the virtual viewpoint image is unnecessary is added, and generates the virtual viewpoint image.
The information processing apparatus according to claim 1.

The information processing apparatus according to claim 2, wherein the transparency process includes a process for making translucent.

The determination means deletes the shape data of the object corresponding to the unnecessary object and deletes it.
The information processing apparatus according to claim 1, wherein the second generation means generates the virtual viewpoint image based on the shape data that has not been deleted.

The first generation means generates shape data to which the color information of the object is added.
The second generation means performs a coloring process using the color information given to the shape data generated by the first generation means to generate the virtual viewpoint image.
The information processing apparatus according to claim 3.

The determination means gives information instructing deletion to the shape data of the object corresponding to the unnecessary object, and gives the information.
The second generation means deletes the shape data to which the information instructing the deletion is added among the shape data, and generates the virtual viewpoint image using the shape data remaining without being deleted.
The information processing apparatus according to claim 5.

The second generation means is characterized in that the occlusion determination is performed with reference to the information of the bounding box included in the shape data to which the information instructing the deletion is given, and the virtual viewpoint image is generated. The information processing apparatus according to 6.

The determination means further comprises any one of claims 1 to 7, wherein the determination means makes the determination based on a condition relating to any one of a position, a size, and a color of an unnecessary object in three-dimensional space. The information processing device described in.

The information processing apparatus according to any one of claims 1 to 8, further comprising a setting means for setting conditions used for the determination based on a user instruction.

The setting means according to claim 9, wherein the setting means sets the characteristics of the designated object as the condition based on the user instruction to specify an unnecessary object from the objects included in the captured image. Information processing device.

The information processing apparatus according to any one of claims 1 to 10, wherein the unnecessary object is a moving object.

The information processing device according to claim 11, wherein the moving object is a photographing device suspended from a wire stretched over a space to be photographed in the photographed image.

The first generation step to generate shape data representing the three-dimensional shape of an object using multiple captured images from different viewpoints,
A determination step of determining whether or not the object related to the shape data generated in the first generation step corresponds to an unnecessary object in the virtual viewpoint image, at least based on the conditions related to the shape of the object.
With the second generation step of generating a virtual viewpoint image, the display of the object determined to be an unnecessary object in the determination step is suppressed more than the display of the object not determined to be an unnecessary object in the determination step. ,
An information processing method characterized by having.

A program for causing a computer to function as the information processing apparatus according to any one of claims 1 to 12.