JP2021018570A

JP2021018570A - Image processing apparatus, image processing system, image processing method and program

Info

Publication number: JP2021018570A
Application number: JP2019133559A
Authority: JP
Inventors: 圭輔森澤; Keisuke Morisawa
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2021-02-15

Abstract

To provide an image processing apparatus capable of determining a position where occlusion occurs in the shape data of the object while suppressing the increase of processing load.SOLUTION: A shape data, which is generated in an area around the area where shape data for rendering is generated, is acquired, and the visibility is judged whether or not it can be seen from each imaging apparatus on the shape data for rendering.SELECTED DRAWING: Figure 5

Description

本開示の技術は、仮想視点画像の生成に関し、特に複数視点画像からオブジェクトの形状データを生成する際のオクルージョン対策に関するものである。 The technique of the present disclosure relates to the generation of a virtual viewpoint image, and particularly to an occlusion measure when generating object shape data from a plurality of viewpoint images.

昨今、複数のカメラ（撮像装置）をそれぞれ異なる位置に設置して多視点で同期撮像し、該撮像により得られた複数視点画像を用いて、実在しない仮想的なカメラからの見えを表す仮想視点画像を生成する技術が注目されている。この仮想視点画像によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴閲覧することが出来るため、通常の撮像画像と比較してユーザに高臨場感を与えることができる。 In recent years, multiple cameras (imaging devices) are installed at different positions to perform synchronous imaging from multiple viewpoints, and the multi-viewpoint images obtained by the imaging are used to represent the view from a non-existent virtual camera. Attention is being paid to techniques for generating images. According to this virtual viewpoint image, for example, the highlight scenes of soccer and basketball can be viewed and viewed from various angles, so that the user can be given a high sense of presence as compared with a normal captured image.

仮想視点画像の生成においては、まず、複数のカメラで撮像された映像をサーバなどに集約し、該映像内のオブジェクト（被写体）の三次元モデルを生成する。そして、この三次元モデルに対して、指定された仮想視点に基づき色付けを行い、さらに射影変換などの二次元変換を行って、二次元の仮想視点画像が得られる。上記生成プロセスのうち三次元モデルへの色付けには、視点依存テクスチャマッピングの手法がよく用いられる（非特許文献１）。視点依存テクスチャマッピングでは、オブジェクト上の着目する点の色を決定する際、複数のカメラそれぞれの撮像画像上の対応点における色を重み付きでブレンドする。こうすることで、各視点に対応する撮像画像間の色の差異、三次元モデル形状、カメラパラメータの誤差などに起因する色変化の不連続性を抑制し、より自然なテクスチャ表現を実現している。 In the generation of the virtual viewpoint image, first, the images captured by a plurality of cameras are aggregated in a server or the like, and a three-dimensional model of an object (subject) in the image is generated. Then, the three-dimensional model is colored based on the designated virtual viewpoint, and further subjected to two-dimensional transformation such as projective transformation to obtain a two-dimensional virtual viewpoint image. Of the above generation processes, the viewpoint-dependent texture mapping method is often used for coloring the three-dimensional model (Non-Patent Document 1). In the viewpoint-dependent texture mapping, when determining the color of the point of interest on the object, the colors at the corresponding points on the captured images of the plurality of cameras are blended with weight. By doing so, the discontinuity of color change caused by the color difference between the captured images corresponding to each viewpoint, the three-dimensional model shape, the error of the camera parameter, etc. is suppressed, and a more natural texture expression is realized. There is.

上記視点依存テクスチャマッピングにおいては、特定の視点から見て手前にあるオブジェクトがその奥にあるオブジェクトの一部を隠してしまうオクルージョンによって、奥のオブジェクトの三次元モデルへの色付けが不正確になることがあった。この点、どのカメラの撮像画像においてオブジェクトに隠れが生じているかをその三次元モデルより判断し、判断結果に基づいて色付けを行う研究も行われている（非特許文献２）。 In the above viewpoint-dependent texture mapping, the object in the foreground when viewed from a specific viewpoint hides a part of the object in the back, and the coloring of the object in the back to the 3D model becomes inaccurate. was there. In this regard, research is also being conducted in which it is determined from the three-dimensional model which camera's captured image the object is hidden, and coloring is performed based on the determination result (Non-Patent Document 2).

P.E.Debevec，C.J.Taylor，and J.Malik：”Modeling and Rendering Architecture from Photographs：A Hybrid Geometry-and Image-Based Approach，”SIGGRAPH’96，pp.11-20，1996.P.E.Debevec, C.J.Taylor, and J.Malik: "Modeling and Rendering Architecture from Photographs: A Hybrid Geometry-and Image-Based Approach," SIGGRAPH'96, pp.11-20, 1996. Gregory G.Slabaugh，Ronald W.Schafer，and Mat C.Hans：”Image-Based Photo Hulls for Fast and Photo-Realistic New View Synthesis，”Real-Time Imaging,Vol.9，No.5，October 2003.Gregory G.Slabaugh, Ronald W.Schafer, and Mat C.Hans: "Image-Based Photo Hulls for Fast and Photo-Realistic New View Synthesis," Real-Time Imaging, Vol.9, No.5, October 2003.

非特許文献２に開示の技術では、複数視点画像から生成した三次元モデル自体を用いてオクルージョンの発生箇所を判別する。そのため、撮像空間における三次元モデルを生成するための三次元空間（モデル生成領域）の外に存在するオブジェクトに起因するオクルージョンについては判別できないという問題がある。この問題は、モデル生成領域を拡げることで対処可能であるが、モデル生成領域を拡げるとその分だけ三次元モデルの生成に要する演算量も膨大となってサーバ等の処理負荷が大幅に増大することになる。そのため、モデル生成領域の単純な拡大はシステムの処理効率の悪化に繋がる。 In the technique disclosed in Non-Patent Document 2, the location where occlusion occurs is determined by using the three-dimensional model itself generated from the multi-viewpoint image. Therefore, there is a problem that the occlusion caused by the object existing outside the three-dimensional space (model generation area) for generating the three-dimensional model in the imaging space cannot be discriminated. This problem can be dealt with by expanding the model generation area, but if the model generation area is expanded, the amount of calculation required to generate the 3D model will be enormous, and the processing load on the server etc. will increase significantly. It will be. Therefore, a simple expansion of the model generation area leads to deterioration of the processing efficiency of the system.

本開示の技術は、上記の課題に鑑みてなされたものであり、その目的は、処理負荷の増大を抑制しつつ、オブジェクトの形状データにおけるオクルージョンの発生箇所を判別することである。 The technique of the present disclosure has been made in view of the above problems, and an object thereof is to determine the occurrence location of occlusion in the shape data of an object while suppressing an increase in processing load.

本開示に係る画像処理装置は、複数の撮像装置で同期撮像して得られた複数視点画像の少なくとも一つに含まれるオブジェクトの三次元形状を表す形状データを、前記同期撮像の対象三次元空間のうち所定の三次元領域で生成する生成手段と、前記同期撮像の対象三次元空間のうち前記所定の三次元領域とは異なる三次元領域で生成された形状データを取得する取得手段と、前記取得手段で取得された形状データを用いて、前記生成手段で生成された形状データについて、前記撮像装置からの可視性判定を行う判定手段と、前記生成手段で生成された形状データを、前記可視性判定の結果と共に出力する出力手段と、を有し、前記判定手段は、前記生成手段で生成された形状データが、前記取得手段で取得された形状データによって遮られるかどうかを判定することを特徴とする。 The image processing apparatus according to the present disclosure captures shape data representing the three-dimensional shape of an object included in at least one of a plurality of viewpoint images obtained by synchronous imaging with a plurality of imaging devices in the target three-dimensional space of the synchronous imaging. A generation means generated in a predetermined three-dimensional region, an acquisition means for acquiring shape data generated in a three-dimensional region different from the predetermined three-dimensional region in the target three-dimensional space for synchronous imaging, and the above-mentioned Using the shape data acquired by the acquisition means, the determination means for determining the visibility of the shape data generated by the generation means from the imaging device and the shape data generated by the generation means are displayed as visible. It has an output means for outputting together with the result of the sex determination, and the determination means determines whether or not the shape data generated by the generation means is blocked by the shape data acquired by the acquisition means. It is a feature.

本開示の技術によれば、処理負荷の増大を抑制しつつ、オブジェクトの形状データにおけるオクルージョンの発生箇所を正確に判別することができる。 According to the technique of the present disclosure, it is possible to accurately determine the occurrence location of occlusion in the shape data of an object while suppressing an increase in processing load.

実施形態１に係る、仮想視点画像を生成する画像処理システムの構成を示すブロック図A block diagram showing a configuration of an image processing system that generates a virtual viewpoint image according to the first embodiment. 実施形態１に係る、カメラアレイを構成する各カメラの配置を示す図The figure which shows the arrangement of each camera which constitutes a camera array which concerns on Embodiment 1. （ａ）〜（ｃ）は、ボクセル形式の三次元モデルを説明する図(A) to (c) are diagrams for explaining a three-dimensional model in a voxel format. 可視性判定処理の概念を説明する図Diagram explaining the concept of visibility judgment processing 三次元モデル生成装置としての情報処理装置のハードウェア構成の一例を示すブロック図Block diagram showing an example of the hardware configuration of an information processing device as a three-dimensional model generator 実施形態１に係る、三次元モデル生成装置のソフトウェア構成を示す機能ブロック図A functional block diagram showing a software configuration of a three-dimensional model generator according to the first embodiment. （ａ）及び（ｂ）は、実施形態１に係る、レンダリング用領域とそれに隣接する判定用領域を説明する図(A) and (b) are diagrams for explaining the rendering area and the determination area adjacent thereto according to the first embodiment. （ａ）〜（ｃ）は、視体積交差法の基本原理を説明する図(A) to (c) are diagrams for explaining the basic principle of the visual volume crossing method. （ａ）及び（ｂ）は、可視性判定の意義を説明する図(A) and (b) are diagrams for explaining the significance of visibility determination. （ａ）は変換前の世界座標系を示す図、（ｂ）は変換後のカメラ座標系を示す図(A) is a diagram showing the world coordinate system before conversion, and (b) is a diagram showing the camera coordinate system after conversion. 実施形態１に係る、三次元モデル生成装置における処理の流れを示すフローチャートFlow chart showing the flow of processing in the three-dimensional model generator according to the first embodiment 実施形態２に係る、仮想視点画像を生成する画像処理システムの構成を示すブロック図A block diagram showing a configuration of an image processing system that generates a virtual viewpoint image according to the second embodiment. 実施形態２に係る、カメラアレイを構成する各カメラの配置を示す図The figure which shows the arrangement of each camera which constitutes a camera array which concerns on Embodiment 2. 実施形態２に係る、三次元モデル生成装置のソフトウェア構成を示す機能ブロック図Functional block diagram showing the software configuration of the three-dimensional model generator according to the second embodiment （ａ）〜（ｃ）は、実施形態２に係る、レンダリング用領域とそれに隣接する判定用領域を説明する図(A) to (c) are diagrams for explaining the rendering area and the determination area adjacent thereto according to the second embodiment. 実施形態２に係る、三次元モデル生成装置における処理の流れを示すフローチャートFlow chart showing the flow of processing in the three-dimensional model generator according to the second embodiment

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. It should be noted that the following embodiments do not limit the present invention, and not all combinations of features described in the present embodiment are essential for the means for solving the present invention. The same configuration will be described with the same reference numerals.

［実施形態１］
本実施形態では、レンダリング用の三次元モデルを生成する三次元領域の隣接領域において可視性判定用の暫定的な三次元モデルを生成し、レンダリング用の三次元モデルについての可視性判定を行う態様を説明する。なお、本実施形態では動画の場合を例に説明を行うが、静止画についても同様に適用可能である。 [Embodiment 1]
In the present embodiment, a provisional three-dimensional model for visibility determination is generated in an area adjacent to the three-dimensional region for generating a three-dimensional model for rendering, and the visibility determination for the three-dimensional model for rendering is performed. Will be explained. In this embodiment, the case of moving images will be described as an example, but the same applies to still images.

＜システム構成＞
図１は、本実施形態に係る、仮想視点画像を生成する画像処理システムの構成を示すブロック図である。画像処理システム１は、複数のカメラ（撮像装置）１０ａ−１０ｐからなるカメラアレイ１０、複数の前景抽出装置１２ａ−１２ｐからなる前景抽出装置群１２、制御装置１３、三次元モデル生成装置１４、レンダリング装置１５を有する。 <System configuration>
FIG. 1 is a block diagram showing a configuration of an image processing system for generating a virtual viewpoint image according to the present embodiment. The image processing system 1 includes a camera array 10 composed of a plurality of cameras (imaging devices) 10a-10p, a foreground extraction device group 12 composed of a plurality of foreground extraction devices 12a-12p, a control device 13, a three-dimensional model generation device 14, and rendering. It has a device 15.

カメラアレイ１０は、複数のカメラ１０ａ−１０ｐで構成され、様々な角度からオブジェクト（被写体）を同期撮像（撮影）する。各カメラで撮像された画像のデータ（そのまとまりが複数視点画像データ）は、前景抽出装置群１２へ送られる。本実施形態では、カメラアレイ１０を構成する各カメラ１０ａ−１０ｐは、図２に示すようにスタジアム内のフィールドを囲むように配置される。そして、各カメラ１０ａ−１０ｐは、フィールド上の点Ｔ０を注視点として、時刻を同期させて撮像を行う。 The camera array 10 is composed of a plurality of cameras 10a-10p, and synchronously captures (photographs) an object (subject) from various angles. The image data captured by each camera (the group of which is the multi-viewpoint image data) is sent to the foreground extraction device group 12. In the present embodiment, each camera 10a-10p constituting the camera array 10 is arranged so as to surround the field in the stadium as shown in FIG. Then, each camera 10a-10p takes an image in synchronization with the time with the point T0 on the field as the gazing point.

前景抽出装置群１２を構成する各前景抽出装置１２ａ−１２ｐは、各カメラ１０ａ−１０ｐとそれぞれ対応付けられており、自身に対応付いているカメラの撮像画像から前景となるオブジェクトの部分を抽出して、マスク画像とテクスチャ画像を生成する。ここで、前景とは、撮像空間内の任意の視点から見ることのできる動的オブジェクトを指し、本実施例ではフィールド上に存在する人物やボールがその代表例となる。そして、フィールド上のゴールや観客席といった前景以外の静的オブジェクトは背景となる。また、マスク画像とは、撮像画像のうち前景部分を白で、背景部分を黒で表した２値のシルエット画像である。テクスチャ画像とは、前景を内包する矩形（外接矩形）部分を撮像画像から切り出して得られる多値の画像である。撮像画像から前景を抽出する手法としては、例えば背景差分法がある。背景差分法は、例えば試合開始前など前景となる動的オブジェクトがいない状態で撮像を行って得られた背景画像を保持しておき、当該背景画像と動的オブジェクトがいる状態での撮像画像との差分を検出し、差分が一定以上の部分を抽出する手法である。なお、前景の抽出にはフレーム間差分法など他の手法を用いてもよい。生成したマスク画像とテクスチャ画像のデータは、三次元モデル生成装置１４に送られる。 Each foreground extraction device 12a-12p constituting the foreground extraction device group 12 is associated with each camera 10a-10p, and extracts a part of the object to be the foreground from the captured image of the camera corresponding to itself. To generate a mask image and a texture image. Here, the foreground refers to a dynamic object that can be seen from an arbitrary viewpoint in the imaging space, and in this embodiment, a person or a ball existing on the field is a typical example. Then, static objects other than the foreground, such as goals and spectators' seats on the field, become the background. The mask image is a binary silhouette image in which the foreground portion of the captured image is represented by white and the background portion is represented by black. The texture image is a multi-valued image obtained by cutting out a rectangular (extrinsic rectangular) portion including the foreground from the captured image. As a method of extracting the foreground from a captured image, for example, there is a background subtraction method. In the background subtraction method, a background image obtained by performing imaging in a state where there is no dynamic object as a foreground, for example, before the start of a game, is retained, and the background image and the captured image in a state where the dynamic object exists This is a method of detecting the difference between the two and extracting the part where the difference is above a certain level. Other methods such as the inter-frame difference method may be used to extract the foreground. The generated mask image and texture image data are sent to the three-dimensional model generation device 14.

制御装置１３は、各カメラ１０ａ−１０ｐのカメラパラメータを取得したり、不図示のＵＩ（ユーザインタフェース）を介して仮想視点情報を受け付けたりする。カメラパラメータには、外部パラメータと内部パラメータとがある。外部パラメータは、回転行列と並進行列で構成されており、カメラの位置や姿勢を示す。内部パラメータは、カメラの焦点距離、光学的中心などを含み、カメラの画角や撮像センサの大きさなどを示す。カメラパラメータを取得するための作業はキャリブレーションと呼ばれる。キャリブレーションでは、まず、チェッカーボードのような特定パターンを撮像した複数枚の画像を用意する。そして、これらの画像から、三次元の世界座標系の点とそれに対応する二次元上の点との対応関係を算出することで、カメラパラメータが求められる。キャリブレーションによって得られたカメラパラメータは、三次元モデル生成装置１４とレンダリング装置１５に送られる。仮想視点情報には、同期撮像の対象三次元空間上に設定された仮想視点（仮想カメラ）の位置・姿勢、注視点、移動経路などが含まれ、例えば専用のジョイスティック等を用いてユーザが指定したり或いは撮影シーンに応じて自動で設定したりする。ユーザ入力等に基づき設定された仮想視点情報は、レンダリング装置１５に送られる。 The control device 13 acquires camera parameters of each camera 10a-10p and receives virtual viewpoint information via a UI (user interface) (not shown). Camera parameters include external parameters and internal parameters. The external parameters are composed of a rotation matrix and a parallel traveling matrix, and indicate the position and orientation of the camera. Internal parameters include the focal length of the camera, the optical center, etc., and indicate the angle of view of the camera, the size of the image sensor, and the like. The process of acquiring camera parameters is called calibration. In calibration, first, a plurality of images of a specific pattern such as a checkerboard are prepared. Then, the camera parameters can be obtained by calculating the correspondence between the points in the three-dimensional world coordinate system and the corresponding points on the two dimensions from these images. The camera parameters obtained by the calibration are sent to the 3D model generation device 14 and the rendering device 15. The virtual viewpoint information includes the position / posture, gazing point, movement path, etc. of the virtual viewpoint (virtual camera) set in the target three-dimensional space for synchronous imaging, and is specified by the user using, for example, a dedicated joystick. Or set automatically according to the shooting scene. The virtual viewpoint information set based on the user input or the like is sent to the rendering device 15.

三次元モデル生成装置１４は、入力されたマスク画像とカメラパラメータとに基づき、レンダリング装置１５での色付けの対象となる動的オブジェクトの三次元モデル（三次元形状を表す形状データ）を生成する。本実施形態では、三次元モデルのデータ形式として、ボクセル形式を例に説明を行うものとする。ボクセル形式では、オブジェクトの三次元形状を、“ボクセル”と呼ばれる図３（ａ）で示すような微小な立方体を用いて表現する。図３（ｂ）は、三次元モデルを生成する対象三次元空間を表したボクセル集合である。そして、図３（ｃ）は、図３（ｂ）に示すボクセル集合から視体積交差法により対象三次元空間内の非前景部分のボクセルを削ることで得られた、ボクセルを構成要素とする四角錐の三次元モデルである。なお、三次元モデルのデータ形式は、形状を表現する構成要素として点群を用いた点群形式や、ポリゴンを用いたポリゴンメッシュ形式など他の形式でもよい。三次元モデル生成装置１４は、レンダリング処理の対象となる三次元モデルそれぞれについて、各カメラからどのように見えているかを判定する処理（可視性判定処理）を行う。なお、以下の説明では、レンダリング処理の対象となる三次元モデルを、単に「レンダリング用モデル」と表記する。図４は、可視性判定処理の概念を説明する図である。図４において、破線４０１はカメラｃ１（カメラアレイ１０の中の任意の１台のカメラ）の画角を示し、一点鎖線４０２及び４０３はカメラｃ１からの視線を表現している。いま、一点鎖線４０２は、モデルｍ２に遮られることなく、モデルｍ１を構成するボクセルｖ１に届いている。これは、カメラｃ１からボクセルｖ１が見えていることを意味している。この場合、ボクセルｖ１はカメラｃ１の撮像画像に写る、つまり、可視という判定結果になる。一方、一点鎖線４０３は、モデルｍ２までしか届いていない。二点鎖線４０４は、一点鎖線４０３で示す視線がモデルｍ２によって遮られていなければ届いていたはずの視線を示している。いま、二点鎖線４０４は、モデルｍ１を構成するボクセルｖ２に届いており、これはモデルｍ２が存在しなければカメラｃ１からボクセルｖ２が見えていたことを意味している。この場合、ボクセルｖ２はカメラｃ１の撮像画像には写らないので、不可視という判定結果になる。このような判定を、生成されたレンダリング用モデルを構成する各ボクセルについて行う。このような可視性判定処理によって、レンダリング用モデルを構成する各ボクセルについて、各カメラ１０ａ−１０ｐの画角内で他のオブジェクトによって遮られることなく撮像画像に写っているかどうか（オクルージョンの発生の有無）を示す情報が得られることになる。生成したレンダリング用モデル及び可視性判定処理の結果は、テクスチャ画像のデータと共に、レンダリング装置１５に送られる。 The three-dimensional model generation device 14 generates a three-dimensional model (shape data representing a three-dimensional shape) of a dynamic object to be colored by the rendering device 15 based on the input mask image and camera parameters. In this embodiment, the voxel format will be described as an example of the data format of the three-dimensional model. In the voxel format, the three-dimensional shape of an object is expressed using a minute cube called a "voxel" as shown in FIG. 3 (a). FIG. 3B is a set of voxels representing the target three-dimensional space for generating a three-dimensional model. Then, FIG. 3 (c) shows a voxel as a component obtained by removing a voxel in a non-foreground portion in the target three-dimensional space from the voxel set shown in FIG. 3 (b) by the visual volume crossing method. It is a three-dimensional model of a pyramid. The data format of the three-dimensional model may be another format such as a point cloud format using a point cloud as a component expressing the shape or a polygon mesh format using polygons. The three-dimensional model generation device 14 performs a process (visibility determination process) of determining how each of the three-dimensional models to be rendered is viewed from each camera. In the following description, the three-dimensional model to be rendered is simply referred to as a "rendering model". FIG. 4 is a diagram for explaining the concept of the visibility determination process. In FIG. 4, the dashed line 401 indicates the angle of view of the camera c1 (any one camera in the camera array 10), and the alternate long and short dash lines 402 and 403 represent the line of sight from the camera c1. Now, the alternate long and short dash line 402 reaches the voxels v1 constituting the model m1 without being blocked by the model m2. This means that the voxel v1 can be seen from the camera c1. In this case, the voxel v1 is reflected in the captured image of the camera c1, that is, the determination result is that it is visible. On the other hand, the alternate long and short dash line 403 reaches only the model m2. The alternate long and short dash line 404 indicates the line of sight that would have arrived if the line of sight indicated by the alternate long and short dash line 403 had not been blocked by the model m2. Now, the alternate long and short dash line 404 has reached the voxels v2 constituting the model m1, which means that the voxels v2 could be seen from the camera c1 if the model m2 did not exist. In this case, since the voxel v2 does not appear in the captured image of the camera c1, the determination result is invisible. Such a determination is made for each voxel that constitutes the generated rendering model. By such visibility determination processing, whether or not each voxel constituting the rendering model is captured in the captured image within the angle of view of each camera 10a-10p without being obstructed by other objects (presence or absence of occlusion). ) Will be obtained. The generated rendering model and the result of the visibility determination processing are sent to the rendering device 15 together with the texture image data.

レンダリング装置１５は、三次元モデル生成装置１４から受信したレンダリング用モデルに対し色付け処理を行って、制御装置１３から受信した仮想視点情報に従った仮想視点画像を生成する。具体的には、まず、レンダリング用モデルを構成するボクセルの中から注目するボクセルを決定する。そして、注目するボクセルについての可視性判定結果が“可視”のカメラを特定し、そのテクスチャ画像群における座標の対応関係をカメラパラメータより求め、当該対応関係にある座標の画素値をブレンドした値を、当該注目するボクセルに付与する処理を行う。画素値のブレンドにおいては、“可視”と判定されたカメラのテクスチャ画像だけを用い、その中で、注目するボクセルとの距離に近いカメラのテクスチャ画像の画素値ほど重みを大きくする。さらに、仮想視点情報で特定される任意の視点位置に基づきレンダリング用モデルを透視投影変換して、対象三次元空間上の二次元画像を生成する。 The rendering device 15 performs a coloring process on the rendering model received from the three-dimensional model generation device 14, and generates a virtual viewpoint image according to the virtual viewpoint information received from the control device 13. Specifically, first, the voxels to be focused on are determined from the voxels constituting the rendering model. Then, a camera whose visibility judgment result for the voxel of interest is "visible" is specified, the correspondence of the coordinates in the texture image group is obtained from the camera parameters, and the value obtained by blending the pixel values of the coordinates having the correspondence is obtained. , Perform the process of giving to the voxel of interest. In the blending of pixel values, only the texture image of the camera determined to be "visible" is used, and the weight is increased as the pixel value of the texture image of the camera closer to the voxel of interest is used. Further, the rendering model is perspectively projected and transformed based on an arbitrary viewpoint position specified by the virtual viewpoint information to generate a two-dimensional image on the target three-dimensional space.

以上が、本実施形態に係る画像処理システムの構成の概要である。なお、前景抽出装置１２ａ−１２ｐと三次元モデル生成装置１４との接続は、スター型、リング型、バス型等のいずれのネットワークトポロジーを採用してもよい。 The above is an outline of the configuration of the image processing system according to the present embodiment. For the connection between the foreground extraction device 12a-12p and the three-dimensional model generation device 14, any network topology such as a star type, a ring type, or a bus type may be adopted.

＜三次元モデル生成装置の詳細＞
続いて、本実施形態に係る、三次元モデル生成装置１４における処理について詳しく説明する。 <Details of 3D model generator>
Subsequently, the processing in the three-dimensional model generation device 14 according to the present embodiment will be described in detail.

＜ハードウェア構成＞
図５は、三次元モデル生成装置１４としての情報処理装置のハードウェア構成の一例を示すブロック図である。なお、前景抽出装置１２、制御装置１３、レンダリング装置１５のハードウェア構成も、以下で説明する三次元モデル生成装置１４と同様のハードウェア構成を備える。三次元モデル生成装置１４は、ＣＰＵ１０１、ＲＯＭ１０２、ＲＡＭ１０３、補助記憶装置１０４、表示部１０５、操作部１０６、通信Ｉ／Ｆ１０７、及びバス１０８を有する。 <Hardware configuration>
FIG. 5 is a block diagram showing an example of the hardware configuration of the information processing device as the three-dimensional model generation device 14. The hardware configuration of the foreground extraction device 12, the control device 13, and the rendering device 15 also has the same hardware configuration as the three-dimensional model generation device 14 described below. The three-dimensional model generation device 14 includes a CPU 101, a ROM 102, a RAM 103, an auxiliary storage device 104, a display unit 105, an operation unit 106, a communication I / F 107, and a bus 108.

ＣＰＵ１０１は、ＲＯＭ１０２やＲＡＭ１０３に格納されているコンピュータプログラムやデータを用いて装置全体を制御することで、後述の図６に示す三次元モデル生成装置１４の各機能を実現する。なお、ＣＰＵ１０１とは異なる１又は複数の専用のハードウェアを有し、ＣＰＵ１０１による処理の少なくとも一部を専用のハードウェアが実行してもよい。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。ＲＯＭ１０２は、変更を必要としないプログラムなどを格納する。ＲＡＭ１０３は、補助記憶装置１０４から供給されるプログラムやデータ、及び通信Ｉ／Ｆ１０７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置１０４は、例えばハードディスクドライブ等で構成され、画像データや音声データなどの種々のデータを記憶する。 The CPU 101 realizes each function of the three-dimensional model generation device 14 shown in FIG. 6, which will be described later, by controlling the entire device by using the computer programs and data stored in the ROM 102 and the RAM 103. It should be noted that one or a plurality of dedicated hardware different from the CPU 101 may be provided, and at least a part of the processing by the CPU 101 may be executed by the dedicated hardware. Examples of dedicated hardware include ASICs (application specific integrated circuits), FPGAs (field programmable gate arrays), and DSPs (digital signal processors). The ROM 102 stores a program or the like that does not need to be changed. The RAM 103 temporarily stores programs and data supplied from the auxiliary storage device 104, data supplied from the outside via the communication I / F 107, and the like. The auxiliary storage device 104 is composed of, for example, a hard disk drive or the like, and stores various data such as image data and audio data.

表示部１０５は、例えば液晶ディスプレイやＬＥＤ等で構成され、ユーザが三次元モデル生成装置１４を操作するためのＧＵＩ（Graphical User Interface）などを表示する。操作部１０６は、例えばキーボードやマウス、ジョイスティック、タッチパネル等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ１０１に入力する。ＣＰＵ１０１は、表示部１０５を制御する表示制御部、及び操作部１０６を制御する操作制御部として動作する。通信Ｉ／Ｆ１０７は、ＬＡＮ等のネットワークを介して接続される外部装置との通信に用いられる。例えば、外部装置と有線で接続される場合には通信用のケーブルが通信Ｉ／Ｆ１０７に接続される。また、三次元モデル生成装置１４が外部装置と無線通信する機能を有する場合には、通信Ｉ／Ｆ１０７はアンテナを備える。バス１０８は、上記各部を繋いで情報を伝達する。なお、表示部１０５と操作部１０６は、外部の独立した装置として存在していてもよい。 The display unit 105 is composed of, for example, a liquid crystal display, an LED, or the like, and displays a GUI (Graphical User Interface) for the user to operate the three-dimensional model generation device 14. The operation unit 106 is composed of, for example, a keyboard, a mouse, a joystick, a touch panel, and the like, and inputs various instructions to the CPU 101 in response to an operation by the user. The CPU 101 operates as a display control unit that controls the display unit 105 and an operation control unit that controls the operation unit 106. The communication I / F 107 is used for communication with an external device connected via a network such as a LAN. For example, when connected to an external device by wire, a communication cable is connected to the communication I / F 107. Further, when the three-dimensional model generation device 14 has a function of wirelessly communicating with an external device, the communication I / F 107 includes an antenna. The bus 108 connects the above-mentioned parts to transmit information. The display unit 105 and the operation unit 106 may exist as external independent devices.

＜ソフトウェア構成＞
図６は、本実施形態の三次元モデル生成装置１４のソフトウェア構成を示す機能ブロック図である。三次元モデル生成装置１４は、入力部２０１、モデル生成条件設定部２０２、レンダリング用モデル生成部２０３、判定用モデル生成部２０４、モデル統合部２０５、可視性判定部２０６、出力部２０７を有する。これら各機能部は、上述した三次元モデル生成装置１４内のＣＰＵ１０１が、ＲＯＭ１０２或いは補助記憶装置１０４に格納された所定のプログラムをＲＡＭ１０３に展開してこれを実行することで実現される。以下、各部の機能について説明する。 <Software configuration>
FIG. 6 is a functional block diagram showing a software configuration of the three-dimensional model generation device 14 of the present embodiment. The three-dimensional model generation device 14 includes an input unit 201, a model generation condition setting unit 202, a rendering model generation unit 203, a determination model generation unit 204, a model integration unit 205, a visibility determination unit 206, and an output unit 207. Each of these functional units is realized by the CPU 101 in the three-dimensional model generation device 14 described above expanding a predetermined program stored in the ROM 102 or the auxiliary storage device 104 into the RAM 103 and executing the program. The functions of each part will be described below.

入力部２０１は、外部装置から各種データの入力を受け付ける。具体的には、制御装置１３から各カメラ１０ａ−１０ｐのカメラパラメータを受信し、前景抽出装置群１２からカメラ１０ａ−１０ｐそれぞれに対応するテクスチャ画像とマスク画像のデータを受信する。受信したカメラパラメータやマスク画像のデータは、可視性判定処理や三次元モデルの生成処理で用いるためにＲＡＭ１０３或いは補助記憶装置１０４に格納される。なお、受信したテクスチャ画像のデータは出力部１０８を介してレンダリング装置１５に提供される。 The input unit 201 receives input of various data from an external device. Specifically, the camera parameters of each camera 10a-10p are received from the control device 13, and the texture image and mask image data corresponding to each of the cameras 10a-10p are received from the foreground extraction device group 12. The received camera parameter and mask image data are stored in the RAM 103 or the auxiliary storage device 104 for use in the visibility determination process and the three-dimensional model generation process. The received texture image data is provided to the rendering device 15 via the output unit 108.

モデル生成条件設定部２０２は、レンダリング用モデル及び判定用モデルの生成条件を、ＧＵＩ（不図示）を介したユーザ入力等に基づいてそれぞれ設定する。ここで、判定用モデルとは、レンダリング用モデルの生成対象領域外に存在するオブジェクトの形状を表す三次元モデルであって、レンダリング用モデルの可視性判定にのみ使用する暫定的な三次元モデルを指す。レンダリング用モデルの生成条件には、その生成を行う対象空間（仮想視点映像における前景を描画する三次元領域）、およびレンダリング用モデルを構成する単位ボクセルの大きさなどの情報が含まれる。判定用モデルの生成条件には、その生成を行う対象空間（レンダリング用モデルを生成する三次元領域に隣接する所定の三次元領域）、および判定用モデルを構成する単位ボクセルの大きさなどの情報が含まれる。以降、レンダリング用モデルを生成する三次元領域を「レンダリング用領域」と呼び、判定用モデルを生成する三次元領域を「判定用領域」と呼ぶこととする。レンダリング用領域の設定は、撮影シーンに応じ、ユーザが任意の三次元領域を、ＧＵＩ（不図示）などを介して指定することで行う。例えばサッカーの試合であれば図７（ａ）に示すように、長辺方向をフィールドの半面の長さ、短辺方向をフィールドの２／３の長さ、高さ方向を選手の身長の２倍程度とした三次元領域を設定する。そして、単位ボクセルのサイズは、生成する三次元モデルの細かさを規定するもので、例えば５ｍｍといった値を設定する。そして、設定されたレンダリング用領域の周辺に判定用領域を設定する。図７（ａ）に示すレンダリング用領域に対しては、例えば図７（ｂ）に示すような、長辺方向、短辺方向、高さ方向それぞれを数ｍ拡げた、レンダリング用領域に隣接する三次元領域を、判定用領域として設定している。そして、判定用領域の単位ボクセルのサイズについては、レンダリング用領域の単位ボクセルと同じサイズでもよいが、一回り大きなサイズ（例えば５ｍｍに対して１０ｍｍなど）を設定するのが望ましい。すなわち、以下の不等式で示す関係となるように単位ボクセルを決定するのが望ましい。
“レンダリング用領域のボクセルサイズ”≦“判定用領域のボクセルサイズ” The model generation condition setting unit 202 sets the generation conditions of the rendering model and the determination model based on user input via GUI (not shown). Here, the judgment model is a three-dimensional model representing the shape of an object existing outside the generation target area of the rendering model, and is a provisional three-dimensional model used only for the visibility judgment of the rendering model. Point to. The rendering model generation conditions include information such as the target space for generating the rendering model (three-dimensional area for drawing the foreground in the virtual viewpoint image) and the size of the unit voxels constituting the rendering model. Information such as the target space for generating the judgment model (a predetermined three-dimensional area adjacent to the three-dimensional area for generating the rendering model) and the size of the unit voxels constituting the judgment model are included in the generation conditions of the judgment model. Is included. Hereinafter, the three-dimensional area for generating the rendering model will be referred to as a "rendering area", and the three-dimensional area for generating the judgment model will be referred to as a "judgment area". The rendering area is set by the user by designating an arbitrary three-dimensional area via a GUI (not shown) or the like according to the shooting scene. For example, in the case of a soccer match, as shown in FIG. 7A, the long side direction is the length of one side of the field, the short side direction is the length of 2/3 of the field, and the height direction is 2 of the player's height. Set a three-dimensional area that is about doubled. The size of the unit voxel defines the fineness of the generated three-dimensional model, and a value such as 5 mm is set. Then, a determination area is set around the set rendering area. The rendering area shown in FIG. 7A is adjacent to the rendering area in which the long side direction, the short side direction, and the height direction are each expanded by several meters as shown in FIG. 7B, for example. The three-dimensional area is set as the judgment area. The size of the unit voxel in the determination area may be the same as the unit voxel in the rendering area, but it is desirable to set a size slightly larger (for example, 10 mm with respect to 5 mm). That is, it is desirable to determine the unit voxels so that the relationship is shown by the following inequality.
"Voxel size of rendering area" ≤ "Voxel size of judgment area"

レンダリング用モデルの単位ボクセルのサイズを大きくするとそれだけ粗い三次元モデルとなって画質低下に繋がってしまう。これに対し、可視性判定のみに用いる判定用モデルはレンダリング用モデルを遮る可能性のあるオブジェクトの大凡の形状が把握できれば十分なため、粗い三次元モデルであっても支障はない。また、判定用モデルの単位ボクセルサイズを大きくすることでその分だけ処理負荷を低減できる。なお、図７（ｂ）に示す判定用領域は、レンダリング用領域の全体を囲むように設定しているがこれに限定されない。その必要に応じて、例えば長辺方向の片側だけといったように、特定方向のみに隣接する領域を、判定用領域としてもよい。こうして設定された、レンダリング用モデル及び判定用モデルの生成条件（設定値）は、レンダリング用モデル生成部２０３及び判定用モデル生成部２０４に提供される。なお、レンダリング用モデル及び判定用モデルの生成条件は、ＧＵＩによる設定に限らず、ＣＵＩ（Character User Interface）により設定してもよいし、当該条件が記憶されたファイルを読み込むことで設定するようにしても構わない。また、本実施形態では、１つのモデル生成条件設定部２０２にて、レンダリング用モデルの生成条件と判定用モデルの生成条件を設定しているが、別々の設定部で行うようにしてもよい。 Increasing the size of the unit voxel of the rendering model results in a coarser three-dimensional model, which leads to deterioration of image quality. On the other hand, since it is sufficient for the judgment model used only for the visibility judgment to grasp the approximate shape of the object that may block the rendering model, there is no problem even if it is a rough three-dimensional model. Further, by increasing the unit voxel size of the judgment model, the processing load can be reduced accordingly. The determination area shown in FIG. 7B is set so as to surround the entire rendering area, but the present invention is not limited to this. If necessary, a region adjacent only in a specific direction, such as only one side in the long side direction, may be used as a determination region. The rendering model and the determination model generation conditions (set values) set in this way are provided to the rendering model generation unit 203 and the determination model generation unit 204. The rendering model and the judgment model generation condition are not limited to the GUI setting, but may be set by the CUI (Character User Interface), or may be set by reading the file in which the condition is stored. It doesn't matter. Further, in the present embodiment, one model generation condition setting unit 202 sets the rendering model generation condition and the determination model generation condition, but they may be set in separate setting units.

レンダリング用モデル生成部２０３は、モデル生成条件設定部２０２で設定されたレンダリング用モデル生成条件に基づき、入力部２０１に入力されたマスク画像とカメラパラメータを用いて、レンダリング用モデルを生成する。同様に、判定用モデル生成部２０４も、モデル生成条件設定部２０２で設定された判定用モデル生成条件に基づき、入力部２０１に入力されたマスク画像とカメラパラメータを用いて、判定用モデルを生成する。本実施形態では、レンダリング用モデル及び判定用モデルともに、視体積交差法によって、ボクセル集合で表現した三次元モデルが生成される。 The rendering model generation unit 203 generates a rendering model using the mask image and camera parameters input to the input unit 201 based on the rendering model generation conditions set by the model generation condition setting unit 202. Similarly, the judgment model generation unit 204 also generates a judgment model using the mask image and camera parameters input to the input unit 201 based on the judgment model generation conditions set by the model generation condition setting unit 202. To do. In the present embodiment, both the rendering model and the determination model generate a three-dimensional model represented by a voxel set by the visual volume crossing method.

ここで、視体積交差法について説明する。図８の（ａ）〜（ｃ）は、視体積交差法の基本原理を説明する図である。あるオブジェクトを撮像した画像からは、撮像面に当該オブジェクトの２次元シルエットを表すマスク画像が得られる（図８（ａ））。そして、カメラの投影中心からマスク画像の輪郭上の各点を通すように、三次元空間中に広がる錐体を考える（図８（ｂ））。この錐体のことを該当するカメラによる対象の「視体積」と呼ぶ。さらに、複数の視体積の共通領域、すなわち視体積の交差を求めることによって、オブジェクトの三次元形状が求まる（図８（ｃ））。なお、三次元モデルの生成手法として視体積交差法は一例であってこれに限定されるものではない。 Here, the visual volume crossing method will be described. 8 (a) to 8 (c) are diagrams for explaining the basic principle of the visual volume crossing method. From the image obtained by capturing an image of an object, a mask image showing the two-dimensional silhouette of the object can be obtained on the imaging surface (FIG. 8 (a)). Then, consider a cone that spreads in the three-dimensional space so as to pass each point on the contour of the mask image from the projection center of the camera (FIG. 8 (b)). This cone is called the "visual volume" of the object by the corresponding camera. Further, the three-dimensional shape of the object can be obtained by finding the common region of a plurality of visual volumes, that is, the intersection of the visual volumes (FIG. 8 (c)). The visual volume crossing method is an example of a three-dimensional model generation method, and the method is not limited to this.

モデル統合部２０５は、レンダリング用モデル生成部２０３で生成されたレンダリング用モデルと、判定用モデル生成部２０４で生成された判定用モデルを統合する。この統合によって、レンダリング用モデルと判定用モデルとが、共通の三次元空間内に配置されることになる。 The model integration unit 205 integrates the rendering model generated by the rendering model generation unit 203 and the determination model generated by the determination model generation unit 204. By this integration, the rendering model and the judgment model are arranged in a common three-dimensional space.

可視性判定部２０６は、同一の三次元空間上に配置されたレンダリング用モデルと判定用モデルとを用いて、レンダリング用モデルの各ボクセルについての可視性判定を行う。図９の（ａ）及び（ｂ）は、可視性判定の意義を説明する図である。いま、レンダリング用領域９０１内でモデルｍ１が生成され、隣接する判定用領域９０２内で判定用モデルｍ２が生成されている。カメラｃ１から見たとき、モデルｍ１を遮る位置にモデルｍ２は存在する。その結果、モデルｍ１を構成するボクセルの大部分は、カメラｃ１から見えない（不可視）と正しく判定することができる。一方、図９（ｂ）は、判定用領域９０２を設定しない場合の例である。上述のとおり、カメラｃ１から見たときに、本来であればモデルｍ１を遮る位置にモデルｍ２が存在するはずである。しかし、モデルｍ２の位置はレンダリング用領域９０１の外であるため、そこにオブジェクトが存在していてもその三次元モデルが生成されない。その結果、モデルｍ１を構成するボクセルの大部分はカメラｃ１から見えないにも関わらず見えるものとして扱われ、その色づけ処理においては、仮想視点からの距離に応じた重み付けがなされてしまう。本実施形態では、レンダリング用の三次元モデルを生成する三次元領域に隣接する領域内でも可視性判定用の暫定的な三次元モデルを生成することで、このような問題の発生を抑止する。可視性判定では、まずレンダリング用モデル及び判定用モデルを構成する各ボクセルの座標系を、カメラパラメータに基づき、世界座標系からカメラ座標系に変換する。図１０（ａ）は変換前の世界座標系を示し、同（ｂ）は変換後のカメラｃ１を基準とするカメラ座標系を示している。そして、カメラ１０ａ−１０ｐを順に注目カメラとし、そのカメラ座標系におけるｘ座標とｙ座標が同一、かつ、ｚ座標が小さいボクセルが他にあるかどうかを、各ボクセルを注目ボクセルとしてチェックする。そして、注目ボクセルとｘ座標とｙ座標が同一で、しかもｚ座標が小さいボクセルが他にある場合、当該注目ボクセルは、注目カメラからは見えない（不可視）と判定する。こうして、レンダリング用モデルについてボクセル単位で得られた可視性判定結果の情報は、出力部２０７へ提供される。 The visibility determination unit 206 determines the visibility of each voxel of the rendering model by using the rendering model and the determination model arranged in the same three-dimensional space. 9 (a) and 9 (b) are diagrams for explaining the significance of the visibility determination. Now, the model m1 is generated in the rendering area 901, and the determination model m2 is generated in the adjacent determination area 902. When viewed from the camera c1, the model m2 exists at a position that blocks the model m1. As a result, most of the voxels constituting the model m1 can be correctly determined to be invisible (invisible) from the camera c1. On the other hand, FIG. 9B is an example in which the determination area 902 is not set. As described above, when viewed from the camera c1, the model m2 should be present at a position that normally blocks the model m1. However, since the position of the model m2 is outside the rendering area 901, the three-dimensional model is not generated even if the object exists there. As a result, most of the voxels constituting the model m1 are treated as being visible even though they cannot be seen from the camera c1, and in the coloring process, weighting is performed according to the distance from the virtual viewpoint. In the present embodiment, the occurrence of such a problem is suppressed by generating a provisional three-dimensional model for visibility determination even in the area adjacent to the three-dimensional area for generating the three-dimensional model for rendering. In the visibility judgment, first, the coordinate system of the rendering model and each voxel constituting the judgment model is converted from the world coordinate system to the camera coordinate system based on the camera parameters. FIG. 10A shows the world coordinate system before conversion, and FIG. 10B shows the camera coordinate system with reference to the camera c1 after conversion. Then, the cameras 10a-10p are designated as the focus camera in order, and each voxel is checked as the focus box cell to see if there is another voxel whose x-coordinate and y-coordinate are the same and whose z-coordinate is smaller in the camera coordinate system. Then, when there is another voxel whose x-coordinate and y-coordinate are the same as the voxel of interest and whose z-coordinate is small, it is determined that the voxel of interest is invisible (invisible) from the camera of interest. In this way, the information of the visibility determination result obtained for each voxel for the rendering model is provided to the output unit 207.

出力部２０７は、レンダリング用の三次元モデル、当該三次元モデルについての可視性判定結果、およびテクスチャ画像データを、レンダリング装置１５へ出力する。 The output unit 207 outputs the three-dimensional model for rendering, the visibility determination result for the three-dimensional model, and the texture image data to the rendering device 15.

＜三次元モデル生成装置における処理フロー＞
図１１は、本実施形態に係る、三次元モデル生成装置１４における処理の流れを示すフローチャートである。図１１のフローチャートの実行開始前において、制御装置１３からカメラパラメータを受信しＲＡＭ１０３等に格納済みであり、また、レンダリング用モデル及び判定用モデルの生成条件がユーザ入力に基づき設定済みであるものとする。以下、図１１のフローチャートに沿って、三次元モデル生成装置１４における処理の流れを説明する。なお、以下の説明において記号「Ｓ」はステップを表す。 <Processing flow in 3D model generator>
FIG. 11 is a flowchart showing a processing flow in the three-dimensional model generation device 14 according to the present embodiment. Before the start of execution of the flowchart of FIG. 11, it is assumed that the camera parameters are received from the control device 13 and stored in the RAM 103 or the like, and the generation conditions of the rendering model and the determination model have been set based on the user input. To do. Hereinafter, the flow of processing in the three-dimensional model generation device 14 will be described with reference to the flowchart of FIG. In the following description, the symbol "S" represents a step.

Ｓ１１０１では、入力部２０１が、三次元モデルの生成に必要な入力データ（カメラ単位のマスク画像のデータ）の受信を監視する。入力データの受信が検知されれば、Ｓ１１０２に進む。なお、本実施形態では、複数視点画像データは動画であることを前提としているので、Ｓ１１０２以降の処理はフレーム単位で実行される。 In S1101, the input unit 201 monitors the reception of input data (mask image data for each camera) necessary for generating the three-dimensional model. If the reception of the input data is detected, the process proceeds to S1102. In this embodiment, since it is assumed that the multi-viewpoint image data is a moving image, the processing after S1102 is executed in frame units.

Ｓ１１０２では、レンダリング用モデル生成部２０３が、モデル生成条件設定部２０２によって設定されたレンダリング用モデルの生成条件に従い、Ｓ１１０１で受信した入力データを用いて、レンダンリング用の三次元モデルを生成する。続くＳ１１０３では、判定用モデル生成部２０４が、モデル生成条件設定部２０２によって設定された判定用モデルの生成条件に従い、Ｓ１１０１で受信した入力データを用いて、可視性判定用の暫定的な三次元モデルを生成する。 In S1102, the rendering model generation unit 203 generates a three-dimensional model for rendering using the input data received in S1101 according to the rendering model generation conditions set by the model generation condition setting unit 202. .. In the following S1103, the judgment model generation unit 204 uses the input data received in S1101 according to the judgment model generation conditions set by the model generation condition setting unit 202, and uses the input data received in S1101 to provide a provisional three-dimensional shape for visibility judgment. Generate a model.

Ｓ１１０４では、モデル統合部１０５が、Ｓ１１０２で生成されたレンダリング用の三次元モデルおよびＳ１１０３で生成された判定用の三次元モデルを、共通の三次元空間内に配置する。そして、Ｓ１１０５では、可視性判定部２０６が、共通の三次元空間に配置されたレンダリング用モデルを構成する各ボクセルについて、判定用モデルを含む他の三次元モデルによって遮られることがないか、各カメラからの可視性を判定する。 In S1104, the model integration unit 105 arranges the three-dimensional model for rendering generated in S1102 and the three-dimensional model for determination generated in S1103 in a common three-dimensional space. Then, in S1105, the visibility determination unit 206 is checked to see if each voxel constituting the rendering model arranged in the common three-dimensional space is blocked by another three-dimensional model including the determination model. Determine the visibility from the camera.

Ｓ１１０６では、出力部２０７が、Ｓ１１０２で生成されたレンダリング用モデル、Ｓ１１０５で得られた可視性判定の結果を、レンダリング装置１５に出力する。この際、入力部２０１が受け取ったテクスチャ画像のデータも一緒に出力される。 In S1106, the output unit 207 outputs the result of the visibility determination obtained in the rendering model S1105 generated in S1102 to the rendering device 15. At this time, the texture image data received by the input unit 201 is also output.

Ｓ１１０７では、Ｓ１１０１で受信した入力データの全フレームについて処理が完了したか否かが判定される。未処理のフレームがあれば、Ｓ１１０２に戻って次のフレームを対象として処理が続行される。 In S1107, it is determined whether or not the processing is completed for all the frames of the input data received in S1101. If there is an unprocessed frame, the process returns to S1102 and processing is continued for the next frame.

以上が、本実施形態に係る、三次元モデル生成装置１４における処理の流れである。なお、図１１のフローチャートでは、出力部２０７はフレーム単位で出力を行っているが、複数フレーム分をまとめて出力してもよいし、入力データを構成する全フレーム分の処理が終了した時点でまとめて出力してもよい。 The above is the flow of processing in the three-dimensional model generation device 14 according to the present embodiment. In the flowchart of FIG. 11, although the output unit 207 outputs in frame units, a plurality of frames may be output together, or when the processing for all frames constituting the input data is completed. It may be output collectively.

＜変形例＞
なお、本実施形態では、レンダリング用領域に隣接する予め規定した周辺領域を判定用領域として設定するものとしたが、これに限定されない。例えば、レンダリング用領域外のオブジェクトによってレンダリング用モデルにオクルージョンが発生し得る領域を算出して、判定用領域として設定してもよい。この算出の際には、前景となるオブジェクトの形状や大きさ、各カメラの位置及び姿勢を少なくとも含むカメラパラメータ、レンダリング用領域の形状や大きさといった情報などを考慮する。図９（ｃ）は、判定用領域を算出する際の考え方を説明する図である。図９（ｃ）において、モデルｍ４の位置は、レンダリング用領域９０１の境界に存在するモデルｍ３に対し、オクルージョンが生じる可能性のある限界点となる位置である。このモデルｍ４の位置を上述の情報を用いた計算によって求める。このような計算結果に基づき判定用領域を設定することで、より正確な可視性判定が可能となる。 <Modification example>
In the present embodiment, a predetermined peripheral area adjacent to the rendering area is set as the determination area, but the present invention is not limited to this. For example, an area where occlusion may occur in the rendering model due to an object outside the rendering area may be calculated and set as a determination area. In this calculation, information such as the shape and size of the foreground object, camera parameters including at least the position and orientation of each camera, and the shape and size of the rendering area are taken into consideration. FIG. 9C is a diagram illustrating a concept when calculating the determination area. In FIG. 9C, the position of the model m4 is a position that is a limit point where occlusion may occur with respect to the model m3 existing at the boundary of the rendering area 901. The position of this model m4 is obtained by calculation using the above information. By setting the determination area based on such a calculation result, more accurate visibility determination becomes possible.

以上のとおり本実施形態によれば、処理負荷の増大を抑制しつつ、レンダリング対象の三次元モデルにおけるオクルージョンの発生箇所を正確に判別することができる。その結果、レンダリング時において、三次元モデルに対し適切な色づけが可能となり、高品質の仮想視点画像を得ることができる。 As described above, according to the present embodiment, it is possible to accurately determine the occurrence location of occlusion in the three-dimensional model to be rendered while suppressing the increase in processing load. As a result, it is possible to appropriately color the three-dimensional model at the time of rendering, and a high-quality virtual viewpoint image can be obtained.

［実施形態２］
次に、フィールド上に複数のレンダリング用領域が隣り合って設定される場合において、一方のレンダリング用領域の一部を他方の判定用領域として扱い、複数のレンダリング用領域間で三次元モデルを融通し合う態様を、実施形態２として説明する。なお、実施形態１と共通する部分については省略ないしは簡略化し、以下では差異点を中心に説明することとする。 [Embodiment 2]
Next, when a plurality of rendering areas are set adjacent to each other on the field, a part of one rendering area is treated as the other judgment area, and the three-dimensional model is accommodated among the plurality of rendering areas. The mode of mutual interaction will be described as the second embodiment. The parts common to the first embodiment will be omitted or simplified, and the differences will be mainly described below.

＜システム構成＞
図１２は、本実施形態に係る、仮想視点画像を生成する画像処理システムの構成を示すブロック図である。画像処理システム１’は、実施形態１の図１で示した画像処理システム１に対し、カメラアレイ１０’、前景抽出装置群１２’、制御装置１３’、三次元モデル生成装置１４’の各要素が追加されている。追加要素１０’、１２’、１３’、１４’の働きは、実施形態１の図１における要素１０、１２、１３、１４の働きと同じである。 <System configuration>
FIG. 12 is a block diagram showing a configuration of an image processing system that generates a virtual viewpoint image according to the present embodiment. The image processing system 1'has each element of the camera array 10', the foreground extraction device group 12', the control device 13', and the three-dimensional model generation device 14'with respect to the image processing system 1 shown in FIG. 1 of the first embodiment. Has been added. The functions of the additional elements 10', 12', 13', and 14'are the same as those of the elements 10, 12, 13, and 14 in FIG. 1 of the first embodiment.

カメラアレイ１０及びカメラアレイ１０’を構成する各カメラは、図１３に示すようにスタジアム内のフィールドを囲むように配置される。図示の都合上、カメラアレイ１０’はカメラアレイ１０の外側に配置されてフィールドから遠いように見えるが、フィールドまでの実際の距離は、カメラアレイ１０とカメラアレイ１０’とで略同一である。本実施形態の場合、カメラアレイ１０を構成する各カメラ１０ａ−１０ｐはフィールド上の点Ｔ１を注視点として、カメラアレイ１０’を構成する各カメラ１０ａ’−１０ｐ’はフィールド上の点Ｔ２を注視点として、時刻を同期させて撮像を行う。
前景抽出装置群１２から入力データを受け取る三次元モデル生成装置１４と、前景抽出装置群１２’から入力データを受け取る三次元モデル生成装置１４’とは互いに接続されている。そして、三次元モデル生成装置１４及び三次元モデル生成装置１４’は、それぞれ自装置で生成したレンダリング用モデルの一部を、他方の装置に対し可視性判定用の三次元モデルとして提供する。 The camera array 10 and each camera constituting the camera array 10'are arranged so as to surround the field in the stadium as shown in FIG. For convenience of illustration, the camera array 10'is arranged outside the camera array 10 and appears to be far from the field, but the actual distance to the field is substantially the same for the camera array 10 and the camera array 10'. In the case of the present embodiment, each camera 10a-10p constituting the camera array 10 has a point T1 on the field as a gazing point, and each camera 10a'-10p' constituting the camera array 10' has a point T2 on the field. As a viewpoint, imaging is performed by synchronizing the time.
The three-dimensional model generation device 14 that receives the input data from the foreground extraction device group 12 and the three-dimensional model generation device 14'that receives the input data from the foreground extraction device group 12'are connected to each other. Then, the three-dimensional model generation device 14 and the three-dimensional model generation device 14'provide a part of the rendering model generated by their own device as a three-dimensional model for visibility determination to the other device.

＜ソフトウェア構成＞
図１４は、本実施形態の三次元モデル生成装置１４及び１４’のソフトウェア構成を示す機能ブロック図である。三次元モデル生成装置１４及び１４’は、入力部２０１、モデル生成条件設定部２０２’、レンダリング用モデル生成部２０３、モデル統合部２０５’、可視性判定部２０６’、出力部２０７を有する。そして、判定用モデル生成部２０４に代えて、判定用モデル送受信部１４０１及び判定用領域設定部１４０２を有する。以下、各部の機能について説明するが、実施形態１と同じ処理ブロック（入力部２０１、レンダリング用モデル生成部２０３、可視性判定部２０６及び出力部２０７）については説明を省略するものとする。 <Software configuration>
FIG. 14 is a functional block diagram showing software configurations of the three-dimensional model generation devices 14 and 14'of the present embodiment. The three-dimensional model generation devices 14 and 14'have an input unit 201, a model generation condition setting unit 202', a rendering model generation unit 203, a model integration unit 205', a visibility determination unit 206', and an output unit 207. Then, instead of the determination model generation unit 204, the determination model transmission / reception unit 1401 and the determination area setting unit 1402 are provided. Hereinafter, the functions of each unit will be described, but the same processing blocks as in the first embodiment (input unit 201, rendering model generation unit 203, visibility determination unit 206, and output unit 207) will be omitted.

モデル生成条件設定部２０２’は、レンダリング用モデル生成部２０３がレンダリング用の三次元モデルを生成する際の生成条件（レンダリング用領域及びボクセルサイズ）を、ＧＵＩを介したユーザ入力に基づいて設定する。本実施形態では、自装置で生成したレンダリング用の三次元モデルの一部が他方の三次元モデル生成装置における判定用モデルとなるので、判定用モデル生成部２０４は存在しない。よって、本実施形態のモデル生成条件設定部２０２’は、判定用モデルの生成条件の設定は行わない。 The model generation condition setting unit 202'sets the generation conditions (rendering area and voxel size) when the rendering model generation unit 203 generates a three-dimensional model for rendering based on the user input via the GUI. .. In the present embodiment, since a part of the rendering three-dimensional model generated by the own device becomes the determination model in the other three-dimensional model generation device, the determination model generation unit 204 does not exist. Therefore, the model generation condition setting unit 202'of the present embodiment does not set the generation condition of the determination model.

判定用モデル送受信部１４０１は、自装置で生成されたレンダリング用モデルの一部を、他方の三次元モデル生成装置からの取得要求に基づき、判定用モデルとして送信する。また、他方の三次元モデル生成装置に対して、判定用モデルの取得要求を行って、他方の三次元モデル生成装置で生成されたレンダリング用モデルの一部を、自装置における判定用モデルとして受信する。図１５（ａ）は、図１２に示すカメラアレイ１０の注視点Ｔ１に対応するレンダリング用領域＿１と、カメラアレイ１０’の注視点Ｔ２に対応するレンダリング用領域＿２を示している。そして、図１５（ｂ）は、カメラアレイ１０に対応する三次元モデル生成装置１４についての送信領域と受信領域を示している。ここで、送信領域は、レンダリング用領域＿１内の一部領域であって、自装置で生成したレンダリング用モデルを判定用モデルとして送信する際の対象領域を指す。この送信領域は、他装置１４’からの所得要求の付帯情報において特定される。また、受信領域は、他装置１４’から受信した判定用モデルが生成された領域（レンダリング用領域＿２内の一部領域）であって、自装置における判定用領域＿１を指す。この受信領域は、自装置１４から他装置１４’へ送る所得要求の付帯情報において特定される。同様に、図１５（ｃ）は、カメラアレイ１０’に対応する三次元モデル生成装置１４’についての送信領域と受信領域を示している。この場合の送信領域は、レンダリング用領域＿２内の一部領域であって、自装置で生成したレンダリング用モデルを送信する際の対象となる領域を指す。また、受信領域は、他装置１４から受け取る判定用モデルの生成された領域（レンダリング用領域＿１内の一部領域）であって、自装置における判定用領域＿２を指す。このように、本実施形態の場合、一方の三次元モデル生成装置において生成された三次元モデルの一部が、他方の三次元モデル生成装置での可視性判定処理における判定用モデルとして使用されることになる。 The determination model transmission / reception unit 1401 transmits a part of the rendering model generated by its own device as a determination model based on an acquisition request from the other three-dimensional model generation device. In addition, the other 3D model generator is requested to acquire the judgment model, and a part of the rendering model generated by the other 3D model generator is received as the judgment model in the own device. To do. FIG. 15A shows a rendering area _1 corresponding to the gazing point T1 of the camera array 10 shown in FIG. 12 and a rendering area _2 corresponding to the gazing point T2 of the camera array 10'. Then, FIG. 15B shows a transmission area and a reception area for the three-dimensional model generation device 14 corresponding to the camera array 10. Here, the transmission area is a part of the rendering area _1 and refers to a target area when the rendering model generated by the own device is transmitted as the determination model. This transmission area is specified in the incidental information of the income request from the other device 14'. The reception area is an area (a part of the rendering area _2) in which the determination model received from the other device 14'is generated, and refers to the determination area _1 in the own device. This reception area is specified in the incidental information of the income request sent from the own device 14 to the other device 14'. Similarly, FIG. 15C shows a transmission area and a reception area for the 3D model generator 14'corresponding to the camera array 10'. The transmission area in this case is a part of the rendering area _2 and refers to the area to be the target when transmitting the rendering model generated by the own device. Further, the receiving area is an area (a part of the area in the rendering area _1) of the determination model received from the other device 14, and refers to the determination area _2 in the own device. As described above, in the case of the present embodiment, a part of the three-dimensional model generated by one three-dimensional model generator is used as a determination model in the visibility determination process in the other three-dimensional model generator. It will be.

判定用領域設定部１４０２は、上述の所得要求の付帯情報において定義する、自装置における判定用領域（すなわち、受信領域）を、ユーザ入力に基づいて設定する。この際の設定対象となる領域は、前述のとおり、他方のレンダリング用領域内の一部であって、自装置におけるレンダリング用領域に隣接する領域である。 The determination area setting unit 1402 sets the determination area (that is, the reception area) in the own device, which is defined in the above-mentioned incidental information of the income request, based on the user input. As described above, the area to be set at this time is a part of the other rendering area and is adjacent to the rendering area in the own device.

モデル統合部２０５’は、レンダリング用モデル生成部２０３で生成されたレンダリング用モデルと、判定用モデル送受信部１４０１が他方の三次元モデル生成装置から受信した判定用モデルとを統合する。 The model integration unit 205'integrates the rendering model generated by the rendering model generation unit 203 and the determination model received by the determination model transmission / reception unit 1401 from the other three-dimensional model generation device.

＜三次元モデル生成装置における処理フロー＞
図１６は、本実施形態に係る、三次元モデル生成装置１４／１４’における処理の流れを示すフローチャートである。図１６のフローチャートの実行開始前において、制御装置１３からカメラパラメータを受信しＲＡＭ１０３等に済みであり、また、レンダリング用モデルの生成条件及び送信領域がユーザ入力に基づき設定済みであるものとする。さらに、他方の三次元モデル生成装置からの判定用モデルの取得要求についても、本フローの実行開始前に受信済みであるものとする。以下、図１６のフローチャートに沿って、三次元モデル生成装置１４／１４’における処理の流れを説明する。 <Processing flow in 3D model generator>
FIG. 16 is a flowchart showing a processing flow in the three-dimensional model generation device 14/14'according to the present embodiment. Before the start of execution of the flowchart of FIG. 16, it is assumed that the camera parameters have been received from the control device 13 and have already been set in the RAM 103 or the like, and the generation conditions and transmission area of the rendering model have been set based on the user input. Further, it is assumed that the request for acquiring the determination model from the other three-dimensional model generator has already been received before the start of execution of this flow. Hereinafter, the flow of processing in the three-dimensional model generator 14/14'will be described with reference to the flowchart of FIG.

Ｓ１６０１では、図１１のフローのＳ１１０１と同様、入力部２０１が、三次元モデルの生成に必要な入力データ（すなわち、カメラ単位のマスク画像とテクスチャ画像のデータ）の受信を監視する。入力データの受信が検知されれば、Ｓ１６０２に進む。なお、本実施形態も、複数視点画像データは動画であることを前提としているので、Ｓ１６０２以降の処理はフレーム単位で実行される。 In S1601, the input unit 201 monitors the reception of input data (that is, data of the mask image and the texture image for each camera) necessary for generating the three-dimensional model, as in S1101 of the flow of FIG. If the reception of the input data is detected, the process proceeds to S1602. Since the present embodiment also assumes that the multi-viewpoint image data is a moving image, the processing after S1602 is executed in frame units.

Ｓ１６０２では、図１１のフローのＳ１１０２と同様、Ｓ１６０１で受信した入力データを用いてレンダリング用モデル生成部２０３がレンダンリング用の三次元モデルを生成する。生成したレンダリング用モデルは、モデル統合部２０５’及び出力部２０７に加え、判定用モデル送受信部１４０１に提供される。
Ｓ１６０３では、判定用モデル送受信部１４０１が、Ｓ１６０２で生成したレンダリング用モデルのうち、他方の三次元モデル生成装置からの取得要求で特定される領域内に存在するレンダリング用モデルを抽出する。ここで抽出されたレンダリング用モデルが、他方の三次元モデル生成装置における判定用モデルとなる。 In S1602, similarly to S1102 of the flow of FIG. 11, the rendering model generation unit 203 generates a three-dimensional model for rendering using the input data received in S1601. The generated rendering model is provided to the determination model transmission / reception unit 1401 in addition to the model integration unit 205'and the output unit 207.
In S1603, the determination model transmission / reception unit 1401 extracts the rendering model existing in the region specified by the acquisition request from the other three-dimensional model generation device among the rendering models generated in S1602. The rendering model extracted here becomes a determination model in the other three-dimensional model generator.

Ｓ１６０４では、判定用モデル送受信部１４０１が、Ｓ１６０３で抽出したレンダリング用モデルを、取得要求の送信元である他方の三次元モデル生成装置へ送信する。続くＳ１６０５では、判定用モデル送受信部１４０１が、事前に送信した取得要求に基づき、他方の三次元モデル生成装置から判定用モデルを受信する。 In S1604, the determination model transmission / reception unit 1401 transmits the rendering model extracted in S1603 to the other three-dimensional model generation device that is the transmission source of the acquisition request. In the following S1605, the determination model transmission / reception unit 1401 receives the determination model from the other three-dimensional model generator based on the acquisition request transmitted in advance.

以降のＳ１６０６〜Ｓ１６０９の各ステップは、図１１のフローにおけるＳ１１０４〜Ｓ１１０７にそれぞれ対応する。すなわち、Ｓ１６０２で生成されたレンダリング用モデルとＳ１６０５で受信した判定用モデルを共通の三次元空間内に配置し（Ｓ１６０６）、レンダリング用モデルを構成する各ボクセルについて可視性判定処理をカメラ単位で実行する（Ｓ１６０７）。そして、Ｓ１６０２で生成されたレンダリング用モデル、Ｓ１６０７で得られた可視性判定の結果が、テクスチャ画像のデータと共にレンダリング装置１５に出力され（Ｓ１６０８）、それが全フレーム分繰り返される（Ｓ１６０９）。 Subsequent steps of S1606 to S1609 correspond to S1104 to S1107 in the flow of FIG. That is, the rendering model generated in S1602 and the judgment model received in S1605 are arranged in a common three-dimensional space (S1606), and the visibility judgment processing is executed for each camera for each voxel constituting the rendering model. (S1607). Then, the rendering model generated in S1602 and the result of the visibility determination obtained in S1607 are output to the rendering device 15 together with the texture image data (S1608), and this is repeated for all frames (S1609).

以上が、本実施形態に係る、三次元モデル生成装置１４／１４’における処理の流れである。なお、出力部２０７における出力を、フレーム単位に代えて、複数フレーム単位や入力データ単位としてもよいことは実施形態１と同様である。また、本実施形態では、２台の三次元モデル装置間で双方の三次元モデルを融通し合う例を説明したがこれに限定されない。例えば、カメラアレイや前景抽出装置群などをさらに増やして３つ以上のレンダリング用領域を設定し、３台以上の三次元モデル生成装置間でそれぞれ生成した三次元モデルを融通し合ってもよい。 The above is the flow of processing in the three-dimensional model generator 14/14'related to the present embodiment. It should be noted that the output in the output unit 207 may be a plurality of frame units or an input data unit instead of the frame unit, as in the first embodiment. Further, in the present embodiment, an example in which both three-dimensional models are interchanged between two three-dimensional model devices has been described, but the present invention is not limited to this. For example, the camera array, the foreground extraction device group, and the like may be further increased to set three or more rendering areas, and the three-dimensional models generated by each of the three or more three-dimensional model generation devices may be interchanged.

本実施形態によっても、処理負荷の増大を抑制しつつ、レンダリング対象の三次元モデルにおけるオクルージョンの発生箇所を正確に判別することができる。特に、本実施形態の場合は、他方の三次元モデル生成装置が生成したレンダリング用の三次元モデルを判定用モデルとして使用するため、より高精度で可視性判定を行うことができ、より高品位な仮想視点映像を得ることができる。 Also in this embodiment, it is possible to accurately determine the occurrence location of occlusion in the three-dimensional model to be rendered while suppressing the increase in processing load. In particular, in the case of the present embodiment, since the three-dimensional model for rendering generated by the other three-dimensional model generator is used as the judgment model, the visibility judgment can be performed with higher accuracy and the quality is higher. Virtual viewpoint video can be obtained.

＜その他の実施例＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other Examples>
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１画像処理システム
１４三次元モデル生成装置
２０３レンダリング用モデル生成部
２０４判定用モデル生成部
２０６可視性判定部
１４０１判定用モデル送受信部 1 Image processing system 14 Three-dimensional model generation device 203 Rendering model generation unit 204 Judgment model generation unit 206 Visibility judgment unit 1401 Judgment model transmission / reception unit

Claims

Shape data representing the three-dimensional shape of an object included in at least one of the multi-viewpoint images obtained by synchronous imaging with a plurality of imaging devices is generated in a predetermined three-dimensional region in the target three-dimensional space of the synchronous imaging. Generation means to be
An acquisition means for acquiring shape data generated in a three-dimensional region different from the predetermined three-dimensional region in the target three-dimensional space for synchronous imaging, and
Using the shape data acquired by the acquisition means, a determination means for determining the visibility of the shape data generated by the generation means from the image pickup apparatus, and
An output means that outputs the shape data generated by the generation means together with the result of the visibility determination, and
Have,
The image processing apparatus is characterized in that the determination means determines whether or not the shape data generated by the generation means is blocked by the shape data acquired by the acquisition means.

The image processing apparatus according to claim 1, wherein a three-dimensional region different from the predetermined three-dimensional region is adjacent to the predetermined three-dimensional region.

The acquisition means represents a three-dimensional shape of an object included in at least one of a plurality of viewpoint images obtained by synchronously imaging with the plurality of imaging devices in a three-dimensional region different from the predetermined three-dimensional region. The image processing apparatus according to claim 1 or 2, wherein the image processing apparatus is acquired by generating shape data.

A three-dimensional region different from the predetermined three-dimensional region is determined by calculation based on the shape and size of the object, the position and orientation of each of the plurality of imaging devices, and the shape and size of the predetermined three-dimensional region. The image processing apparatus according to claim 3, wherein the image processing apparatus is characterized by the above.

The image processing apparatus according to claim 3 or 4, wherein the unit of the component of the shape data acquired by the acquisition means is smaller than the unit of the component of the shape data generated by the generation means.

The fifth aspect of claim 5 is characterized in that the determination means determines, in units of the components, whether or not the shape data generated by the generation means is blocked by the shape data acquired by the acquisition means. Image processing equipment.

The determination means
The coordinate system of the shape data generated by the generation means and the shape data acquired by the acquisition means is converted from the world coordinate system to the image pickup device coordinate system.
When the position of the shape data acquired by the acquisition means is closer to the image pickup device of interest, it is determined that the shape data generated by the generation means is blocked by the shape data acquired by the acquisition means. The image processing apparatus according to claim 6, wherein the image processing apparatus is used.

An external device having means for generating shape data representing a three-dimensional shape of an object to be captured in a multi-viewpoint image obtained by synchronous imaging with the plurality of imaging devices in a three-dimensional region different from the predetermined three-dimensional region. Connected with
The acquisition means acquires by receiving shape data generated in a three-dimensional region different from the predetermined three-dimensional region in the target three-dimensional space of the synchronous imaging from the external device.
The image processing apparatus according to claim 1 or 2.

The image processing apparatus according to claim 8, wherein the unit of the component of the shape data acquired by the acquisition means is the same as the unit of the component of the shape data generated by the generation means.

The ninth aspect of the present invention is characterized in that the determination means determines, in units of the components, whether or not the shape data generated by the generation means is blocked by the shape data acquired by the acquisition means. Image processing equipment.

The determination means
The coordinate system of the shape data generated by the generation means and the shape data acquired by the acquisition means is converted from the world coordinate system to the image pickup device coordinate system.
When the position of the shape data acquired by the acquisition means is closer to the image pickup apparatus, it is determined that the shape data generated by the generation means is blocked by the shape data acquired by the acquisition means. The image processing apparatus according to claim 10.

Shape data representing the three-dimensional shape of an object included in at least one of the multi-viewpoint images obtained by synchronous imaging with a plurality of imaging devices is generated in a predetermined three-dimensional region in the target three-dimensional space of the synchronous imaging. Generation means to be
An acquisition means for acquiring shape data generated in a three-dimensional region different from the predetermined three-dimensional region in the target three-dimensional space for synchronous imaging, and
Using the shape data acquired by the acquisition means, a determination means for determining the visibility of the shape data generated by the generation means from the image pickup apparatus, and
An output means that outputs the shape data generated by the generation means together with the result of the visibility determination, and
An acquisition means for acquiring information of a virtual viewpoint set in the three-dimensional space where the synchronous imaging was performed, and
A rendering means for coloring the shape data output by the output means based on the information of the virtual viewpoint acquired by the acquisition means and the result of the visibility determination output by the output means.
Have,
The image processing system is characterized in that the determination means determines whether or not the shape data generated by the generation means is blocked by the shape data acquired by the acquisition means.

Shape data representing the three-dimensional shape of an object included in at least one of the multi-viewpoint images obtained by synchronous imaging with a plurality of imaging devices is generated in a predetermined three-dimensional region in the target three-dimensional space of the synchronous imaging. Generation steps and
An acquisition step of acquiring shape data generated in a three-dimensional region different from the predetermined three-dimensional region in the target three-dimensional space of the synchronous imaging, and
Using the shape data acquired in the acquisition step, a determination step of determining the visibility of the shape data generated in the generation step from the imaging device, and
An output step that outputs the shape data generated in the generation step together with the result of the visibility determination,
Including
The image processing method is characterized in that the determination step determines whether or not the shape data generated in the generation step is blocked by the shape data acquired in the acquisition step.

A program for causing a computer to function as the image processing device according to any one of claims 1 to 11.