JP6403862B1

JP6403862B1 - Three-dimensional model generation apparatus, generation method, and program

Info

Publication number: JP6403862B1
Application number: JP2017239891A
Authority: JP
Inventors: 圭輔森澤; 究小林
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2018-10-10
Anticipated expiration: 2037-12-14
Also published as: JP2019106145A

Abstract

【課題】対象オブジェクトの一部を遮ってしまうような構造物等が撮影シーン内に存在していても、生成される３次元モデルにおいて欠損が生じないようにする。【解決手段】３次元モデルの生成装置であって、複数の視点で撮影した各画像内の静止しているオブジェクトである構造物の領域を示す第１マスク画像、及び前記複数の視点で撮影した各画像内の動体のオブジェクトである前景の領域を示す第２マスク画像を取得する取得手段と、取得した前記第１マスク画像と前記第２マスク画像とを合成して、前記複数の視点で撮影した画像内の前記構造物の領域と前記前景の領域とを統合した第３マスク画像を生成する合成手段と、前記第３マスク画像を用いた視体積交差法により、前記構造物と前記前景とを含む３次元モデルを生成する生成手段と、を有することを特徴とする。【選択図】図４An object of the present invention is to prevent a defect from being generated in a generated three-dimensional model even if a structure or the like that blocks a part of a target object exists in a shooting scene. An apparatus for generating a three-dimensional model, a first mask image showing a region of a structure that is a stationary object in each image photographed from a plurality of viewpoints, and photographed from the plurality of viewpoints. Photographing from the plurality of viewpoints by synthesizing the obtained first mask image and the second mask image with an obtaining means for obtaining a second mask image indicating a foreground region which is a moving object in each image The structure and the foreground are generated by combining means for generating a third mask image in which the region of the structure and the foreground region in the obtained image are integrated, and a view volume intersection method using the third mask image. Generating means for generating a three-dimensional model including [Selection] Figure 4

Description

本発明は、画像内オブジェクトの３次元モデルの生成に関する。 The present invention relates to generation of a three-dimensional model of an object in an image.

従来、複数台のカメラによって異なる視点から同期撮影された複数視点画像を用いて、オブジェクトの３次元形状を推定する手法として、「視体積交差法（Visual Hull）」と呼ばれる手法が知られている（特許文献１）。図１の（ａ）〜（ｃ）は、視体積交差法の基本原理を示す図である。あるオブジェクトを撮影した画像からは、撮像面に当該オブジェクトの２次元シルエットを表すマスク画像が得られる（図１（ａ））。そして、カメラの投影中心からマスク画像の輪郭上の各点を通すように、３次元空間中に広がる錐体を考える（図１（ｂ））。この錐体のことを該当するカメラによる対象の「視体積」と呼ぶ。さらに、複数の視体積の共通領域、すなわち視体積の交差を求めることによって、オブジェクトの３次元形状（３次元モデル）が求まる（図１（ｃ））。このように視体積交差法による形状推定では、オブジェクトが存在する可能性のある空間中のサンプリング点をマスク画像に射影し、複数の視点で共通して射影した点がマスク画像に含まれるかを検証することにより、オブジェクトの３次元形状を推定する。 Conventionally, a technique called “visual volume crossing method (Visual Hull)” is known as a technique for estimating the three-dimensional shape of an object using a plurality of viewpoint images synchronously photographed from different viewpoints by a plurality of cameras. (Patent Document 1). (A)-(c) of FIG. 1 is a figure which shows the basic principle of a visual volume intersection method. From an image obtained by photographing an object, a mask image representing a two-dimensional silhouette of the object is obtained on the imaging surface (FIG. 1A). Then, consider a cone extending in the three-dimensional space so that each point on the contour of the mask image passes from the projection center of the camera (FIG. 1B). This cone is called the “viewing volume” of the object by the corresponding camera. Furthermore, the three-dimensional shape (three-dimensional model) of the object is obtained by obtaining the common region of the plurality of viewing volumes, that is, the intersection of the viewing volumes (FIG. 1C). As described above, in the shape estimation by the visual volume intersection method, sampling points in a space where an object may exist are projected onto the mask image, and whether the points projected in common at a plurality of viewpoints are included in the mask image. By verifying, the three-dimensional shape of the object is estimated.

特開２０１４−１０８０５号公報JP 2014-10805 A

上述の視体積交差法では、マスク画像が対象オブジェクトのシルエットを正しく表現できている必要があり、マスク画像上のシルエットが不正確な場合は生成される３次元形状も不正確なものになってしまう。例えば、対象オブジェクトである人物の一部が、当該人物の手前に存在する構造物等の静止物体によって遮られ、マスク画像が示す人物のシルエットの一部が欠けてしまうと、生成される３次元モデルに欠損が生じてしまう。また、シルエットの一部が欠けたマスク画像については使用しないこととすると、得られる３次元モデルの形状精度が落ちてしまう。特に、構造物によって遮られている部分が相対的に小さい場合は、たとえシルエットの一部が掛けたマスク画像であっても、使用することで高い形状精度の３次元モデルが得られるので極力利用することが望ましい。 In the visual volume intersection method described above, the mask image needs to correctly represent the silhouette of the target object, and if the silhouette on the mask image is incorrect, the generated three-dimensional shape is also incorrect. End up. For example, when a part of a person who is a target object is blocked by a stationary object such as a structure existing in front of the person and a part of the silhouette of the person indicated by the mask image is missing, the generated three-dimensional The model is deficient. If the mask image lacking a part of the silhouette is not used, the shape accuracy of the obtained three-dimensional model is lowered. In particular, if the part obstructed by the structure is relatively small, even if it is a mask image with a part of the silhouette, a 3D model with high shape accuracy can be obtained by using it as much as possible. It is desirable to do.

本発明は、上記の課題に鑑みてなされたものであり、その目的は、対象オブジェクトの一部を遮ってしまうような構造物等が撮影シーン内に存在していても、生成される３次元モデルにおいて欠損が生じないようにすることである。 The present invention has been made in view of the above problems, and its purpose is to generate a three-dimensional image even if a structure or the like that blocks a part of the target object exists in the shooting scene. It is to prevent defects from occurring in the model.

本発明に係る生成装置は、複数の撮影方向からの撮影により得られた複数の画像内のオブジェクトの領域を示す第１領域情報を取得する第１取得手段と、前記複数の撮影方向の少なくとも一つの撮影方向からの撮影時に前記オブジェクトを遮る可能性のある構造物の領域を示す第２領域情報を取得する第２取得手段と、前記第１取得手段により取得したオブジェクトの領域を示す第１領域情報と前記第２取得手段により取得した構造物の領域を示す第２領域情報の両方に基づき、前記オブジェクトに対応する３次元形状データを生成する生成手段と、を有することを特徴とする。 The generation apparatus according to the present invention includes first acquisition means for acquiring first area information indicating an area of an object in a plurality of images obtained by shooting from a plurality of shooting directions, and at least one of the plurality of shooting directions. Second acquisition means for acquiring second area information indicating an area of a structure that may block the object during shooting from one shooting direction; and a first area indicating the area of the object acquired by the first acquisition means And generating means for generating three-dimensional shape data corresponding to the object based on both the information and the second area information indicating the area of the structure acquired by the second acquiring means.

本発明によれば、対象オブジェクトの一部を遮ってしまうような構造物等が撮影シーン内に存在していても、欠損のない、もしくは低減させた高品質な３次元モデルの生成が可能となる。 According to the present invention, it is possible to generate a high-quality three-dimensional model that is free from defects or reduced, even if a structure or the like that blocks a part of the target object exists in the shooting scene. Become.

（ａ）〜（ｃ）は、視体積交差法の基本原理を示す図(A)-(c) is a figure which shows the basic principle of a visual volume intersection method （ａ）は仮想視点画像生成システムの構成を示すブロック図、（ｂ）はカメラアレイを構成する各カメラの配置例を示す図(A) is a block diagram showing a configuration of a virtual viewpoint image generation system, (b) is a diagram showing an arrangement example of each camera constituting a camera array ３次元モデル生成装置の内部構成を示す機能ブロック図Functional block diagram showing the internal configuration of the three-dimensional model generation device 実施形態１に係る、３次元モデル形成処理の流れを示すフローチャートThe flowchart which shows the flow of the three-dimensional model formation process based on Embodiment 1. FIG. （ａ）〜（ｈ）は、各カメラで撮影された画像の一例を示す図(A)-(h) is a figure which shows an example of the image image | photographed with each camera. （ａ）〜（ｈ）は、構造物マスクの一例を示す図(A)-(h) is a figure which shows an example of a structure mask （ａ）〜（ｈ）は、前景マスクの一例を示す図(A)-(h) is a figure which shows an example of a foreground mask （ａ）〜（ｈ）は、統合マスクの一例を示す図(A)-(h) is a figure which shows an example of an integrated mask. 統合マスクを元に生成される統合３次元モデルの一例を示す図The figure which shows an example of the integrated three-dimensional model produced | generated based on an integrated mask 従来手法による、前景マスクのみを用いて生成した３次元モデルの一例を示す図The figure which shows an example of the three-dimensional model produced | generated using only the foreground mask by the conventional method 実施形態２に係る、３次元モデル形成処理の流れを示すフローチャートThe flowchart which shows the flow of the three-dimensional model formation process based on Embodiment 2. FIG. （ａ）は統合マスクを元に生成した統合３次元モデルを示し、（ｂ）は構造物マスクのみに基づいて生成した構造物の３次元モデルを示し、（ｃ）は（ａ）の統合３次元モデルと（ｂ）の構造物の３次元モデルとの差分により得られた前景のみの３次元モデルを示す図(A) shows the integrated three-dimensional model generated based on the integrated mask, (b) shows the three-dimensional model of the structure generated based only on the structure mask, and (c) shows the integrated three-dimensional model of (a). The figure which shows the three-dimensional model of only the foreground obtained by the difference of a three-dimensional model of a three-dimensional model and the structure of (b)

以下、添付図面を参照して、本発明を実施形態に従って詳細に説明する。なお、以下の実施形態において示す構成は一例にすぎず、本発明は図示された構成に限定されるものではない。 Hereinafter, the present invention will be described in detail according to embodiments with reference to the accompanying drawings. Note that the configurations shown in the following embodiments are merely examples, and the present invention is not limited to the illustrated configurations.

Embodiment 1

本実施形態では、撮影シーンにおける前景の２次元シルエットに加え、その少なくとも一部を遮る構造物の２次元シルエットをも含むマスク画像を用いて、前景について欠損のない、もしくは低減させた３次元モデルを生成する態様について説明する。この態様では、前景の一部を遮る構造物等を含んだ３次元モデルが生成される。なお、本明細書において、「前景」とは、時系列で同じアングルから撮影を行った場合において動きのある（その絶対位置が変化し得る）、仮想的な視点から見ることが可能な、撮影画像内に存在する動的オブジェクト（動体）を指す。また、「構造物」とは、時系列で同じアングルから撮影を行った場合において動きのない（その絶対位置が変化しない、即ち静止している）、前景を遮ってしまう可能性のある、撮影画像内に存在する静的オブジェクトを指す。 In the present embodiment, a foreground is free from or reduced in 3D model using a mask image including a 2D silhouette of a foreground in a shooting scene and a 2D silhouette of a structure that blocks at least a part thereof. A mode of generating the will be described. In this aspect, a three-dimensional model including a structure that blocks a part of the foreground is generated. Note that in this specification, “foreground” refers to an image that can be viewed from a virtual viewpoint that has movement (its absolute position may change) when imaged from the same angle in time series. A dynamic object (moving object) existing in an image. In addition, “structure” means that there is no movement when shooting from the same angle in time series (the absolute position does not change, that is, it is stationary), which may block the foreground. A static object that exists in an image.

以下の説明では、サッカーの試合を撮影シーンとして仮想視点画像を生成する場合において、選手やボールといった前景（動的オブジェクト）の一部が、サッカーゴール等の構造物（静的オブジェクト）によって遮られてしまうケースを想定している。なお、仮想視点画像とは、エンドユーザ及び／又は選任のオペレータ等が自由に仮想カメラの位置及び姿勢を操作することによって生成される映像であり、自由視点画像や任意視点画像などとも呼ばれる。また、生成される仮想視点画像やその元になる複数視点画像は、動画であっても、静止画であってもよい。以下に述べる各実施形態では、動画の複数視点画像を用いて動画の仮想視点画像を生成するための３次元モデルを生成する場合を例に説明するものとする。 In the following description, when a virtual viewpoint image is generated using a soccer game as a shooting scene, a part of the foreground (dynamic object) such as a player or a ball is blocked by a structure (static object) such as a soccer goal. This assumes a case where Note that a virtual viewpoint image is a video generated by an end user and / or a selected operator who freely manipulates the position and orientation of a virtual camera, and is also referred to as a free viewpoint image or an arbitrary viewpoint image. In addition, the generated virtual viewpoint image and the multiple viewpoint images that are the basis thereof may be a moving image or a still image. In each embodiment described below, a case where a three-dimensional model for generating a virtual viewpoint image of a moving image is generated using a plurality of viewpoint images of the moving image will be described as an example.

本実施形態ではサッカーを撮影シーンとし、固定的に設置されたサッカーゴールを構造物として以下説明を行うものとするが、これに限定されない。例えば、さらにコーナーフラッグを構造物として扱ってもよいし、屋内スタジオなどを撮影シーンとする場合には家具や小道具を構造物として扱うこともできる。即ち、静止または静止に近い状態が継続する静止物体であればよい。 In the present embodiment, soccer is taken as a shooting scene, and a soccer goal fixedly set is assumed to be a structure. However, the present invention is not limited to this. For example, the corner flag may be further handled as a structure, and furniture or props can be handled as a structure when an indoor studio or the like is used as a shooting scene. In other words, it may be a stationary object that continues to be stationary or nearly stationary.

（システム構成）
図２（ａ）は、本実施形態に係る、３次元モデル生成装置を含む仮想視点画像生成システムの構成の一例を示すブロック図である。仮想視点画像生成システム１００は、複数のカメラを含むカメラアレイ１１０、制御装置１２０、前景分離装置１３０、３次元モデル生成装置１４０、レンダリング装置１５０で構成される。制御装置１２０、前景分離装置１３０、３次元モデル生成装置１４０及びレンダリング装置１５０は、演算処理を行うＣＰＵ、演算処理の結果やプログラム等を記憶するメモリなどを備えた一般的なコンピュータ（情報処理装置）によって実現される。 (System configuration)
FIG. 2A is a block diagram illustrating an example of a configuration of a virtual viewpoint image generation system including a three-dimensional model generation apparatus according to the present embodiment. The virtual viewpoint image generation system 100 includes a camera array 110 including a plurality of cameras, a control device 120, a foreground separation device 130, a three-dimensional model generation device 140, and a rendering device 150. The control device 120, the foreground separation device 130, the three-dimensional model generation device 140, and the rendering device 150 are each a general computer (information processing device) that includes a CPU that performs arithmetic processing, a memory that stores arithmetic processing results, programs, and the like. ).

図２（ｂ）は、カメラアレイ１１０を構成する全８台のカメラ２１１〜２１８の配置を、フィールド２００を真上から見た俯瞰図において示した図である。各カメラ２１１〜２１８は、地上からある一定の高さにフィールド２００を囲むように設置されており、一方のゴール前を様々な角度から撮影して、視点の異なる複数視点画像データを取得する。芝生のフィールド２００上には、サッカーコート２０１が（実際には白のラインで）描かれており、その左側にサッカーゴール２０２が置かれている。また、サッカーゴール２０２の前の×印２０３は、カメラ２１１〜２１８の共通の視線方向（注視点）を示し、破線の円２０４は注視点２０３を中心としてカメラ２１１〜２１８がそれぞれ撮影可能なエリアを示している。本実施形態では、フィールド２００の1つの角を原点として、長手方向をx軸、短手方向をy軸、高さ方向をz軸とした座標系で表すこととする。カメラアレイ１１０の各カメラで得られた複数視点画像のデータは、制御装置１２０及び前景分離装置１３０へ送られる。なお、図２（ａ）では、各カメラ２１１〜２１８と、制御装置１２０及び前景分離装置１３０とは、スター型のトポロジーで接続されているがデイジーチェーン接続によるリング型やバス型のトポロジーでもよい。また、図２において、カメラ８台の例を示したが、カメラの数は、８台未満または８台を超えてもよい。 FIG. 2B is a diagram showing an arrangement of all eight cameras 211 to 218 constituting the camera array 110 in an overhead view when the field 200 is viewed from directly above. Each of the cameras 211 to 218 is installed so as to surround the field 200 at a certain height from the ground, and captures a plurality of viewpoint image data with different viewpoints by photographing one goal front from various angles. On the grass field 200, a soccer court 201 is drawn (actually with a white line), and a soccer goal 202 is placed on the left side. A cross mark 203 in front of the soccer goal 202 indicates a common gaze direction (gaze point) of the cameras 211 to 218, and a broken circle 204 is an area where the cameras 211 to 218 can shoot around the gaze point 203. Is shown. In the present embodiment, the coordinate system is represented by a coordinate system in which one corner of the field 200 is the origin, the longitudinal direction is the x axis, the lateral direction is the y axis, and the height direction is the z axis. Multi- viewpoint image data obtained by each camera of the camera array 110 is sent to the control device 120 and the foreground separation device 130. In FIG. 2A, the cameras 211 to 218, the control device 120, and the foreground separation device 130 are connected in a star type topology, but may be a ring type or bus type topology by daisy chain connection. . Moreover, although the example of eight cameras was shown in FIG. 2, the number of cameras may be less than eight or more than eight.

制御装置１２０は、カメラパラメータや構造物マスクを生成し、３次元モデル生成装置１４０に供給する。カメラパラメータは、各カメラの位置や姿勢（視線方向）を表す外部パラメータと、各カメラが備えるレンズの焦点距離や画角（撮影領域）などを表す内部パラメータからなり、キャリブレーションによって得られる。キャリブレーションは、チェッカーボードのような特定パターンを撮影した複数の画像を用いて取得した３次元の世界座標系の点とそれに対応する２次元上の点との対応関係を求める処理である。構造物マスクは、各カメラ２１１〜２１８で取得される各撮影画像中に存在する構造物の２次元シルエットを示すマスク画像である。マスク画像は、撮影画像内の抽出対象の部分がどこであるかを特定する基準画像であり、０と１で表される２値画像である。本実施形態では、サッカーゴール２０２を構造物として扱い、各カメラそれぞれが所定位置から所定アングルで撮影した画像内のサッカーゴール２０２の領域（２次元シルエット）を示すシルエット画像が構造物マスクとなる。なお、構造物マスクの元になる撮影画像は、試合の前後やハーフタイム中など、前景となる選手等が存在していないタイミングで撮影したものを使用すればよい。ただし、例えば屋外では日照変動の影響を受けるなどにより、事前・事後に撮影した画像では不適切な場合がある。このような場合、例えば選手等が写っている動画のうち所定数のフレーム（例えば連続する１０秒分のフレーム）を用いて、そこから選手等を消すことで得てもよい。この場合、各フレームにおける各画素値の中央値を採用した画像に基づいて構造物マスクを得ることができる。 The control device 120 generates camera parameters and a structure mask and supplies them to the three-dimensional model generation device 140. The camera parameters include external parameters that indicate the position and orientation (line-of-sight direction) of each camera, and internal parameters that indicate the focal length and angle of view (shooting area) of the lens provided in each camera, and are obtained by calibration. The calibration is a process for obtaining a correspondence relationship between a point in the three-dimensional world coordinate system acquired using a plurality of images obtained by capturing a specific pattern such as a checkerboard and a corresponding two-dimensional point. The structure mask is a mask image showing a two-dimensional silhouette of a structure existing in each captured image acquired by each camera 211 to 218. The mask image is a reference image that specifies where the portion to be extracted in the photographed image is, and is a binary image represented by 0 and 1. In the present embodiment, the soccer goal 202 is treated as a structure, and a silhouette image indicating a region (two-dimensional silhouette) of the soccer goal 202 in an image captured by each camera at a predetermined angle from a predetermined position is a structure mask. In addition, what is necessary is just to use the image | photographed image used as the origin of a structure mask image | photographed at the timing when the player etc. which become a foreground do not exist, such as before and after a game, and half time. However, there are cases where images taken before and after the event are inappropriate due to, for example, the influence of sunshine fluctuations outdoors. In such a case, for example, it may be obtained by using a predetermined number of frames (for example, frames for 10 consecutive seconds) in a moving image in which a player or the like is shown and erasing the player or the like from there. In this case, a structure mask can be obtained based on an image employing the median value of each pixel value in each frame.

前景分離装置１３０は、入力される複数視点の各撮影画像それぞれに対し、フィールド２００上の選手やボールに対応する前景領域とそれ以外の背景領域を判別する処理を行なう。この前景領域の判別には、予め用意した背景画像（構造物マスクの元になる撮影画像と同じでよい）を用いる。具体的には、各撮影画像について背景画像との差分を求め、当該差分に対応する領域を前景領域として特定する。これにより、撮影画像毎の前景領域を示す前景マスクを生成する。本実施形態においては、撮影画像の中の、選手やボールを表す前景領域に属する画素を“０”、それ以外の背景領域に属する画素を“１”で表す２値画像が、前景マスクとして生成されることになる。 The foreground separation device 130 performs a process of discriminating a foreground area corresponding to a player or a ball on the field 200 and a background area other than that for each of the input images of the plurality of viewpoints. For the discrimination of the foreground region, a background image prepared in advance (which may be the same as the captured image that is the basis of the structure mask) is used. Specifically, a difference from the background image is obtained for each captured image, and an area corresponding to the difference is specified as a foreground area. Thus, a foreground mask indicating the foreground area for each captured image is generated. In the present embodiment, a binary image in which a pixel belonging to the foreground area representing the player or the ball in the captured image is represented by “0” and a pixel belonging to the other background area represented by “1” is generated as the foreground mask. Will be.

３次元モデル生成装置１４０は、カメラパラメータや複数視点画像に基づいて、オブジェクトの３次元モデルを生成する。３次元モデル生成装置１４０の詳細については後述する。生成した３次元モデルのデータは、レンダリング装置１５０に出力される。 The three-dimensional model generation device 140 generates a three-dimensional model of an object based on camera parameters and multiple viewpoint images. Details of the three-dimensional model generation apparatus 140 will be described later. The generated three-dimensional model data is output to the rendering device 150.

レンダリング装置１５０は、３次元モデル生成装置１４０から受け取った３次元モデル、制御装置１２０から受け取ったカメラパラメータ、前景分離装置１３０から受け取った前景画像、予め用意した背景画像に基づいて、仮想視点画像を生成する。具体的には、カメラパラメータから前景画像と３次元モデルとの位置関係を求め、３次元モデルに対応する前景画像をマッピングして、任意のアングルから注目オブジェクトを見た場合の仮想視点画像が生成される。こうして例えば、選手が得点を決めたゴール前の決定的シーンの仮想視点画像を得ることができる。 The rendering device 150 generates a virtual viewpoint image based on the three-dimensional model received from the three-dimensional model generation device 140, the camera parameters received from the control device 120, the foreground image received from the foreground separation device 130, and a background image prepared in advance. Generate. Specifically, the positional relationship between the foreground image and the 3D model is obtained from the camera parameters, the foreground image corresponding to the 3D model is mapped, and a virtual viewpoint image is generated when the object of interest is viewed from an arbitrary angle. Is done. Thus, for example, a virtual viewpoint image of a definitive scene before the goal where the player has scored can be obtained.

なお、図２に示した仮想視点画像生成システムの構成は一例でありこれに限定されない。例えば、１台のコンピュータが複数の装置（例えば前景分離装置１３０と３次元モデル生成装置１４０など）の機能を兼ね備えてもよい。或いは、各カメラのモジュールに前景分離装置１３０の機能を持たせ、各カメラから撮影画像とその前景マスクのデータを供給するように構成してもよい。 The configuration of the virtual viewpoint image generation system shown in FIG. 2 is an example and is not limited to this. For example, one computer may have the functions of a plurality of devices (for example, the foreground separation device 130 and the three-dimensional model generation device 140). Alternatively, each camera module may be provided with the function of the foreground separation device 130, and each camera may be configured to supply the captured image and its foreground mask data.

（３次元モデル生成装置）
図３は、本実施形態に係る３次元モデル生成装置１４０の内部構成を示す機能ブロック図である。３次元モデル生成装置１４０は、データ受信部３１０、構造物マスク保存部３２０、マスク合成部３３０、座標変換部３４０、３次元モデル形成部３５０、データ出力部３６０で構成される。以下、各部について詳しく説明する。 (3D model generator)
FIG. 3 is a functional block diagram showing the internal configuration of the three-dimensional model generation apparatus 140 according to this embodiment. The three-dimensional model generation apparatus 140 includes a data reception unit 310, a structure mask storage unit 320, a mask composition unit 330, a coordinate conversion unit 340, a three-dimensional model formation unit 350, and a data output unit 360. Hereinafter, each part will be described in detail.

データ受信部３１０は、カメラアレイ１１０を構成する各カメラのカメラパラメータ及び撮影シーン内に存在する構造物の２次元シルエットを表す構造物マスクを、制御装置１２０から受信する。また、カメラアレイ１１０の各カメラで得られた撮影画像（複数視点画像）及び各撮影画像内に存在する前景の２次元シルエットを表す前景マスクのデータを前景分離装置１３０から受信する。受信したデータのうち、構造物マスクは構造物マスク保存部３２０に、前景マスクはマスク合成部３３０に、複数視点画像は座標変換部３４０に、カメラパラメータは座標変換部３４０と３次元モデル形成部３５０に、それぞれ渡される。 The data receiving unit 310 receives from the control device 120 camera parameters of each camera constituting the camera array 110 and a structure mask representing a two-dimensional silhouette of the structure existing in the shooting scene. Further, the foreground mask data representing the captured images (multiple viewpoint images) obtained by the respective cameras of the camera array 110 and the two-dimensional silhouette of the foreground existing in the captured images is received from the foreground separation device 130. Of the received data, the structure mask is stored in the structure mask storage unit 320, the foreground mask is stored in the mask composition unit 330, the multi- viewpoint image is stored in the coordinate conversion unit 340, and the camera parameters are stored in the coordinate conversion unit 340 and the 3D model formation unit. 350, respectively.

構造物マスク保存部３２０は、構造物マスクをＲＡＭ等に格納・保持し、必要に応じてマスク合成部３３０へ供給する。 The structure mask storage unit 320 stores and holds the structure mask in a RAM or the like, and supplies the structure mask to the mask composition unit 330 as necessary.

マスク合成部３３０は、構造物マスク保存部３２０から構造物マスクを読み出し、これをデータ受信部３１０から受け取った前景マスクと合成して、両者を１つに統合したマスク画像（以下、「統合マスク」と呼ぶ）を生成する。生成した統合マスクは、３次元モデル形成部３５０へ送られる。 The mask composition unit 330 reads the structure mask from the structure mask storage unit 320, combines it with the foreground mask received from the data reception unit 310, and integrates the two into one mask image (hereinafter, “integrated mask”). "). The generated integrated mask is sent to the three-dimensional model forming unit 350.

座標変換部３４０は、データ受信部３１０から受け取った複数視点画像を、カメラパラメータに基づき、カメラ座標系から世界座標系に変換する。この座標変換により、視点の異なる各撮影画像が、それぞれ３次元空間上のどの領域を示しているのかを表す情報に変換される。 The coordinate conversion unit 340 converts the multiple viewpoint images received from the data reception unit 310 from the camera coordinate system to the world coordinate system based on the camera parameters. By this coordinate conversion, each captured image having a different viewpoint is converted into information indicating which region in the three-dimensional space is indicated.

３次元モデル形成部３５０は、世界座標系に変換された複数視点画像、各カメラに対応する統合マスクを用いて、撮影シーン内の構造物を含むオブジェクトの３次元モデルを視体積交差法により生成する。生成したオブジェクトの３次元モデルのデータは、データ出力部３６０を介してレンダリング装置１５０へ出力される。 The three-dimensional model forming unit 350 generates a three-dimensional model of an object including a structure in a shooting scene by a view-volume intersection method using a plurality of viewpoint images converted to the world coordinate system and an integrated mask corresponding to each camera. To do. Data of the generated three-dimensional model of the object is output to the rendering device 150 via the data output unit 360.

（３次元モデルの形成処理）
図４は、本実施形態に係る、３次元モデル形成処理の流れを示すフローチャートである。この一連の処理は、３次元モデル生成装置１４０が備えるＣＰＵが、ＲＯＭやＨＤＤ等の記憶媒体にされた所定のプログラムをＲＡＭに展開してこれを実行することで実現される。以下、図４のフローに沿って説明する。 (Three-dimensional model formation process)
FIG. 4 is a flowchart showing the flow of the three-dimensional model formation process according to this embodiment. This series of processing is realized by a CPU provided in the three-dimensional model generation apparatus 140 developing a predetermined program stored in a storage medium such as a ROM or HDD on the RAM and executing the program. Hereinafter, it demonstrates along the flow of FIG.

まず、ステップ４０１では、データ受信部３１０が、各カメラ２１１〜２１８から見た場合の構造物（ここでは、サッカーゴール２０２）の２次元シルエットを表す構造物マスクと、各カメラのカメラパラメータを、制御装置１２０から受信する。図５（ａ）〜（ｈ）は、カメラアレイ１１０を構成するカメラ２１１〜２２２でそれぞれ撮影される画像を示している。いま、サッカーコート２０１上に選手（ゴールキーパ）が一人、サッカーゴール２０２の前に存在している。そして、図５（ａ）、（ｂ）、（ｈ）の各撮像画像においては、カメラと選手との間にサッカーゴール２０２が位置するため、選手の一部がサッカーゴール２０２によって隠れてしまっている。図５（ａ）〜（ｈ）の各撮影画像からは、サッカーゴール２０２の領域が１（白）、それ以外の領域が０（黒）の２値で表現された、構造物マスクがそれぞれ得られることになる。図６（ａ）〜（ｈ）は、図５（ａ）〜（ｈ）の各撮影画像に対応する構造物マスクを示している。 First, in step 401, the data receiving unit 310 obtains a structure mask representing a two-dimensional silhouette of a structure (here, soccer goal 202) when viewed from each camera 211 to 218, and camera parameters of each camera. Received from the control device 120. 5A to 5H show images captured by the cameras 211 to 222 constituting the camera array 110, respectively. Now, one player (goal keeper) is present in front of the soccer goal 202 on the soccer court 201. 5A, 5B, and 5H, the soccer goal 202 is positioned between the camera and the player, so that a part of the player is hidden by the soccer goal 202. Yes. Each of the captured images of FIGS. 5 (a) to 5 (h) obtains a structure mask in which the soccer goal 202 area is expressed as a binary value of 1 (white) and the other areas are expressed as 0 (black). Will be. FIGS. 6A to 6H show structure masks corresponding to the captured images shown in FIGS. 5A to 5H.

次に、ステップ４０２では、データ受信部３１０が、各カメラ２１１〜２１８で撮影された画像における前景（ここでは、選手やボール）の２次元シルエットを示す前景マスクを、その元になった複数視点画像と共に、前景分離装置１３０から受信する。図７（ａ）〜（ｈ）は、図５（ａ）〜（ｈ）の各撮影画像に対応する前景マスクをそれぞれ示している。前景分離装置１３０は、同じアングルから撮影された画像間で時間的に変化のある領域を前景として抽出するため、図７（ａ）、（ｂ）、（ｈ）の各図では、サッカーゴール２０２に隠れている選手の一部の領域は前景領域として抽出されない。受信した前景マスクのデータはマスク合成部３３０に送られる。 Next, in step 402, the data receiving unit 310 uses a plurality of viewpoints based on a foreground mask indicating a two-dimensional silhouette of a foreground (here, a player or a ball) in an image captured by each of the cameras 211 to 218. It is received from the foreground separation device 130 together with the image. FIGS. 7A to 7H show foreground masks corresponding to the captured images of FIGS. 5A to 5H, respectively. Since the foreground separation device 130 extracts, as the foreground, a region that changes in time between images taken from the same angle, in each of FIGS. 7A, 7B, and 7H, the soccer goal 202 is displayed. Some areas of players hidden behind are not extracted as foreground areas. The received foreground mask data is sent to the mask composition unit 330.

次に、ステップ４０３では、マスク合成部３１０が、構造物マスク保存部３２０から構造物マスクのデータを読み出し、読み出した構造物マスクと、データ受信部３１０から受け取った前景マスクとを合成する処理を実行する。この合成は、２値（白黒）で表される前景マスクと構造物マスクの各画素について論理和（ＯＲ）を求める演算処理である。図８（ａ）〜（ｈ）は、図６（ａ）〜（ｈ）に示した各構造物マスクと、図７（ａ）〜（ｈ）で示した各前景マスクとをそれぞれ合成して得られた統合マスクを示している。出来上がった統合マスクにおいては、選手のシルエットに欠損は見られない。 Next, in step 403, the mask composition unit 310 reads the structure mask data from the structure mask storage unit 320, and synthesizes the read structure mask and the foreground mask received from the data reception unit 310. Run. This synthesis is an arithmetic process for obtaining a logical sum (OR) for each pixel of the foreground mask and the structure mask expressed in binary (black and white). 8A to 8H are obtained by combining the structure masks shown in FIGS. 6A to 6H and the foreground masks shown in FIGS. 7A to 7H, respectively. The resulting integrated mask is shown. In the completed integrated mask, there are no defects in the silhouette of the players.

そして、ステップ４０４において、３次元モデル形成部３５０が、ステップ４０３で得た統合マスクを元に視体積交差法を用いて３次元モデルを生成する。これにより、異なる視点から撮影された複数画像間の共通撮影領域に存在する前景と構造物の３次元形状を表すモデル（以下、「統合３次元モデル」と呼ぶ）が生成される。本実施形態の場合であれば、選手やボールに加え、サッカーゴール２０２を含んだ統合３次元モデルが生成されることになる。統合３次元モデルの生成は、具体的には以下のような手順で行う。まず、フィールド２００上の３次元空間を一定の大きさを持つ立方体（ボクセル）で充填したボリュームデータを用意する。ボリュームデータを構成するボクセルの値は０と１で表現され、「１」は形状領域、「０」は非形状領域をそれぞれ示す。次に、各カメラ２１１〜２１８のカメラパラメータ（設置位置や視線方向など）を用いて、ボクセルの３次元座標を世界座標系からカメラ座標系に変換する。そして、統合マスクで示される構造物及び前景がそのカメラ座標系にある場合は、ボクセルによって当該構造物及び前景の３次元形状を表したモデルが生成される。なお、ボクセルそのものではなく、ボクセルの中心を示す点の集合（点群）によって、３次元形状を表現してもよい。図９は、図８で示した統合マスクを元に生成される統合３次元モデルを示しており、符号９０１は前景である選手の３次元形状、符号９０２は構造物であるサッカーゴール２０２の３次元形状に相当する。前述の通り、統合マスクには前景である選手のシルエットに欠損が無いため、出来上がった統合３次元モデルにおいても欠損は生じていない。図１０は、従来手法による、前景マスクのみを用いて生成した３次元モデルを示している。前述の通り、図７の（ａ）、（ｂ）、（ｈ）で示す前景マスクでは、選手の一部が前景領域として表現されていないため、生成される３次元モデルにおいて当該一部が欠損してしまう。本実施形態の手法では、前景マスクと構造物マスクを合成したマスク画像を用いることで、前景の３次元モデルの一部に欠損が生じるのを回避することが可能となる。 In step 404, the three-dimensional model forming unit 350 generates a three-dimensional model using the visual volume intersection method based on the integrated mask obtained in step 403. Thereby, a model (hereinafter referred to as “integrated three-dimensional model”) representing the three-dimensional shape of the foreground and the structure existing in a common photographing region between a plurality of images photographed from different viewpoints is generated. In the case of the present embodiment, an integrated three-dimensional model including the soccer goal 202 in addition to the player and the ball is generated. Specifically, the integrated three-dimensional model is generated in the following procedure. First, volume data in which a three-dimensional space on the field 200 is filled with a cube (voxel) having a certain size is prepared. The values of voxels constituting the volume data are represented by 0 and 1, where “1” indicates a shape area and “0” indicates a non-shape area. Next, the three-dimensional coordinates of the voxel are converted from the world coordinate system to the camera coordinate system using the camera parameters (installation position, line-of-sight direction, etc.) of each camera 211 to 218. When the structure and foreground indicated by the integrated mask are in the camera coordinate system, a model representing the three-dimensional shape of the structure and foreground is generated by the voxel. Note that the three-dimensional shape may be expressed not by the voxel itself but by a set of points (point group) indicating the center of the voxel. FIG. 9 shows an integrated three-dimensional model generated based on the integrated mask shown in FIG. 8. Reference numeral 901 is the three-dimensional shape of the player as the foreground, and reference numeral 902 is the 3 of the soccer goal 202 as the structure. Corresponds to a dimensional shape. As described above, since there is no defect in the silhouette of the player as the foreground in the integrated mask, there is no defect in the completed integrated three-dimensional model. FIG. 10 shows a three-dimensional model generated using only the foreground mask according to the conventional method. As described above, in the foreground mask shown in FIGS. 7A, 7 </ b> B, and 7 </ b> H, a part of the player is not represented as the foreground region, and therefore the part is missing in the generated three-dimensional model. Resulting in. In the method of the present embodiment, it is possible to avoid the occurrence of a defect in a part of the three-dimensional model of the foreground by using a mask image obtained by combining the foreground mask and the structure mask.

以上が、本実施形態に係る、３次元モデル形成処理の内容である。動画の仮想視点画像を生成する場合には、上述の各ステップの処理をフレーム単位で繰り返し行い、フレーム毎の３次元モデルを生成する。ただし、構造物マスクの受信と保存（ステップ４０１）については、フローの開始直後にのみ行えば足り、２フレーム目以降については省略可能である。さらに、同じ撮影場所にて日時を変えて撮影を行うような場合は、構造物マスクの受信・保存を初回だけ行なってＲＡＭ等に保持しておき、次回以降は保持しておいたものを利用してもよい。 The above is the content of the three-dimensional model formation process according to the present embodiment. When generating a virtual viewpoint image of a moving image, the processing of each step described above is repeated for each frame to generate a three-dimensional model for each frame. However, the reception and storage of the structure mask (step 401) need only be performed immediately after the start of the flow, and can be omitted for the second and subsequent frames. Furthermore, when shooting at the same shooting location with different dates and times, the structure mask is received and saved only for the first time and stored in RAM, etc. May be.

以上のとおり本実施形態によれば、前景となるオブジェクトを隠してしまう構造物が存在していても、前景に欠損のない、もしくは低減させた高精度な３次元モデルを生成することができる。 As described above, according to the present embodiment, it is possible to generate a highly accurate three-dimensional model that is free from or reduced in the foreground even if there is a structure that hides the object that is the foreground.

Embodiment 2

実施形態１では、撮影シーン内に存在する構造物を含む形で、欠損のない、もしくは低減させた前景の３次元モデルを生成した。次に、構造物を取り除いた、欠損のない、もしくは低減させた前景のみの３次元モデルを生成する態様を、実施形態２として説明する。なお、システム構成など実施形態１と共通する内容については説明を省略ないしは簡略化し、以下では差異点を中心に説明するものとする。 In the first embodiment, a three-dimensional model of the foreground that is free of defects or reduced is generated in a form that includes structures present in the shooting scene. Next, a mode of generating a three-dimensional model with only the foreground without a defect or with a reduced structure will be described as a second embodiment. The contents common to the first embodiment such as the system configuration will not be described or simplified, and the differences will be mainly described below.

本実施形態の３次元モデル生成装置１４０の構成も、実施形態１と基本的には同じであるが（図３を参照）、以下の点で異なっている。 The configuration of the three-dimensional model generation apparatus 140 of the present embodiment is basically the same as that of the first embodiment (see FIG. 3), but differs in the following points.

まず、構造部マスク保存部３２０に対する構造物マスクの読み出しが、マスク合成部３３０だけでなく、３次元モデル生成部３５０によってもなされる。図３における破線の矢印はこのことを表している。そして、３次元モデル生成部３５０では、統合マスクを用いた前景＋構造物の統合３次元モデルの生成に加え、構造物マスクを用いた構造物のみの３次元モデルの生成も行う。そして、統合マスクを元に生成した統合３次元モデルと、構造物マスクを元に生成した構造物の３次元モデルとの差分を求めることで、欠損のない、もしくは低減させた前景のみの３次元モデルを抽出する。 First, the structure mask is read out from the structure mask storage unit 320 not only by the mask composition unit 330 but also by the three-dimensional model generation unit 350. The broken arrow in FIG. 3 represents this. In addition to the generation of the foreground + structure integrated 3D model using the integrated mask, the 3D model generation unit 350 also generates a 3D model of only the structure using the structure mask. Then, by obtaining a difference between the integrated three-dimensional model generated based on the integrated mask and the three-dimensional model of the structure generated based on the structure mask, there is no missing or reduced three-dimensional foreground only. Extract the model.

（３次元モデルの形成処理）
図１１は、本実施形態に係る、３次元モデル形成処理の流れを示すフローチャートである。この一連の処理は、３次元モデル生成装置１４０が備えるＣＰＵが、ＲＯＭやＨＤＤ等の記憶媒体にされた所定のプログラムをＲＡＭに展開してこれを実行することで実現される。以下、図１１のフローに沿って説明する。 (Three-dimensional model formation process)
FIG. 11 is a flowchart showing the flow of the three-dimensional model formation process according to this embodiment. This series of processing is realized by a CPU provided in the three-dimensional model generation apparatus 140 developing a predetermined program stored in a storage medium such as a ROM or HDD on the RAM and executing the program. Hereinafter, it demonstrates along the flow of FIG.

ステップ１１０１〜ステップ１１０４は、実施形態１の図４のフローにおけるステップ４０１〜ステップ４０４にそれぞれ対応し、異なるところはないので説明を省略する。 Steps 1101 to 1104 correspond to Steps 401 to 404 in the flow of FIG. 4 of the first embodiment, respectively, and are not different, so description thereof is omitted.

続くステップ１１０５において、３次元モデル形成部３５０は、構造部マスク保存部３２０から構造物マスクを読み出し、視体積交差法により構造物の３次元モデルを生成する。 In subsequent step 1105, the three-dimensional model forming unit 350 reads the structure mask from the structure unit mask storage unit 320 and generates a three-dimensional model of the structure by the visual volume intersection method.

次に、ステップ１１０６において、３次元モデル形成部３５０は、ステップ１１０４で生成した前景＋構造物の合成３次元モデルとステップ１１０５で生成した構造物の３次元モデルとの差分を求め、前景のみの３次元モデルを抽出する。ここで、構造物の３次元モデルを３次元空間上で例えば１０％程度膨張させてから統合３次元モデルとの差分を求めてもよい。これにより、統合３次元モデルから構造物に対応する部分を確実に除去することができる。このとき、構造物の３次元モデルの一部のみを膨張させるようにしてもよい。例えば、サッカーゴール２０２の場合であれば、サッカーコート２０１内には選手が存在する可能性が高いため、コート２０１側には膨張させないようにし、コート２０１と反対側のみ膨張させるといった具合に、領域に応じて膨張させる部分を決定してもよい。さらには、選手やボール等の前景となるオブジェクトが構造物からどれだけ離れているかによって膨張させる割合（膨張率）を変化させてもよい。例えば、前景となるオブジェクトが構造物から遠い位置にある場合は、膨張率を大きくすることで、確実に構造物の３次元モデルが除去されるようにする。また、前景となるオブジェクトが構造物に近い位置にあるほど膨張率を小さくすることで、前景の３次元モデルの部分までが誤って除去されないようにする。この際の膨張率は、前景からの距離に応じてリニアに変化させてもよいし、１又は複数の基準となる距離を設けて段階的に決定してもよい。 Next, in step 1106, the three-dimensional model forming unit 350 obtains a difference between the foreground + structure synthesized three-dimensional model generated in step 1104 and the three-dimensional model of the structure generated in step 1105, and calculates only the foreground. A three-dimensional model is extracted. Here, the difference from the integrated three-dimensional model may be obtained after expanding the three-dimensional model of the structure in the three-dimensional space by, for example, about 10%. Thereby, the part corresponding to a structure can be reliably removed from the integrated three-dimensional model. At this time, only a part of the three-dimensional model of the structure may be expanded. For example, in the case of the soccer goal 202, since there is a high possibility that a player exists in the soccer court 201, the area is not expanded on the side of the court 201 and only on the side opposite to the court 201. The portion to be expanded may be determined according to the above. Further, the rate of expansion (expansion rate) may be changed depending on how far the foreground object such as a player or ball is away from the structure. For example, when the foreground object is at a position far from the structure, the three-dimensional model of the structure is surely removed by increasing the expansion rate. In addition, the expansion rate is decreased as the foreground object is closer to the structure, so that even the foreground three-dimensional model portion is not erroneously removed. The expansion rate at this time may be linearly changed according to the distance from the foreground, or may be determined stepwise by providing one or a plurality of reference distances.

図１２（ａ）は、前述の図９と同じ、統合マスクを元に生成した統合３次元モデルを示している。図１２（ｂ）は、構造物マスクのみに基づいて生成した構造物の３次元モデルを示している。そして、図１２（ｃ）は、図１２（ａ）の統合３次元モデルと図１２（ｂ）の構造物の３次元モデルとの差分により得られた、前景のみの３次元モデルを示している。 FIG. 12A shows an integrated three-dimensional model generated based on the integrated mask, similar to FIG. 9 described above. FIG. 12B shows a three-dimensional model of the structure generated based only on the structure mask. FIG. 12C shows a three-dimensional model with only the foreground obtained by the difference between the integrated three-dimensional model in FIG. 12A and the three-dimensional model of the structure in FIG. .

以上が、本実施形態に係る、３次元モデルの形成処理の内容である。なお、動画の仮想視点画像を生成する場合は、上述の各ステップの処理をフレーム単位で繰り返し行い、フレーム毎の３次元モデルを生成する。ただし、構造物マスクの受信と保存（ステップ１１０１）及び構造物の３次元モデルの生成（ステップ１１０５）については、フローの開始直後にのみ行えば足り、２フレーム目以降については省略可能である。さらに、同じ撮影場所にて日時を変えて撮影を行うような場合は、構造物マスクの受信・保存及び構造物の３次元モデル生成を初回だけ行なってＲＡＭ等に保持しておき、次回以降は保持しておいたものを利用してもよい。 The above is the content of the three-dimensional model formation process according to the present embodiment. When generating a virtual viewpoint image of a moving image, the above-described steps are repeated for each frame to generate a three-dimensional model for each frame. However, the reception and storage of the structure mask (step 1101) and the generation of the three-dimensional model of the structure (step 1105) need only be performed immediately after the start of the flow, and the second and subsequent frames can be omitted. Furthermore, when shooting at the same shooting location with different dates and times, the reception and storage of the structure mask and the generation of the 3D model of the structure are performed for the first time and stored in the RAM or the like. You may use what you hold.

（変形例）
なお、本実施形態では、前景＋構造物の統合３次元モデルから、構造物の３次元モデルを差し引くことで、前景のみの３次元モデルを生成したがこれに限定されない。例えば、前景＋構造物の統合３次元モデルを構成するボクセル毎（或いは所定領域毎）にどのマスク画像に含まれるかをカウントし、カウント値が閾値以下の部分を統合３次元モデルから削除することで前景のみの３次元モデルを求めてもよい。この際の閾値は、全カメラ台数より少ない任意の値を、各カメラの設置位置や視線方向などを考慮して設定する。カメラ台数が全８台で図２（ａ）のようなカメラ配置の本実施形態の場合は、閾値として例えば“２”を設定することで、サッカーゴールのみを削除することができる。 (Modification)
In the present embodiment, the three-dimensional model of the foreground is generated by subtracting the three-dimensional model of the structure from the integrated three-dimensional model of the foreground + structure. However, the present invention is not limited to this. For example, it counts which mask image is included in each voxel (or for each predetermined area) constituting the integrated 3D model of the foreground + structure, and deletes the portion whose count value is equal to or less than the threshold from the integrated 3D model. Thus, a three-dimensional model with only the foreground may be obtained. The threshold value at this time is set to an arbitrary value smaller than the total number of cameras in consideration of the installation position of each camera, the line-of-sight direction, and the like. In the case of the present embodiment in which the total number of cameras is eight and the camera is arranged as shown in FIG. 2A, only the soccer goal can be deleted by setting, for example, “2” as the threshold value.

以上のとおり本実施形態によれば、前景となるオブジェクトを隠してしまう構造物が存在していても、構造物を含まない高精度な前景のみの３次元モデルを生成することができる。 As described above, according to the present embodiment, it is possible to generate a highly accurate foreground-only three-dimensional model that does not include a structure even if there is a structure that hides the object that is the foreground.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１４０３次元モデル生成装置
３１０データ受信部
３３０マスク合成部
３５０３次元モデル形成部 140 3D Model Generation Device 310 Data Receiving Unit 330 Mask Synthesis Unit 350 3D Model Formation Unit

Claims

First acquisition means for acquiring first area information indicating areas of objects in a plurality of images obtained by shooting from a plurality of shooting directions;
Second acquisition means for acquiring second region information indicating a region of a structure that may obstruct the object during photographing from at least one of the plurality of photographing directions;
Three-dimensional shape data corresponding to the object based on both the first area information indicating the area of the object acquired by the first acquisition means and the second area information indicating the area of the structure acquired by the second acquisition means. Generating means for generating
A generation apparatus comprising:

The first area information is an image indicating an area of the object,
The second area information is an image showing an area of the structure.
The generating apparatus according to claim 1, wherein:

The image processing apparatus further includes a combining unit that combines an image showing the object area and an image showing the structure area,
The generating device according to claim 2, wherein the generating unit generates the three-dimensional shape data corresponding to the object based on the image combined by the combining unit.

4. The composition unit according to claim 3, wherein the composition unit generates an image showing both the object and the structure based on an image showing the object area and an image showing the structure area. Generator.

5. The generation apparatus according to claim 1, wherein the three-dimensional shape data corresponding to the object includes three-dimensional shape data corresponding to the structure.

The generating means includes
Based on the second area information acquired by the second acquisition means, three-dimensional shape data corresponding to the structure is generated,
3D corresponding to the structure based on the generated 3D shape data corresponding to the structure and 3D shape data corresponding to the object including the 3D shape data corresponding to the structure The generating apparatus according to claim 5, wherein three-dimensional shape data corresponding to the object not including shape data is generated.

The generation unit generates three-dimensional shape data corresponding to the object that does not include the three-dimensional shape data corresponding to the structure, based on the three-dimensional shape data corresponding to the structure that is at least partially expanded. The generating apparatus according to claim 6.

8. The generation according to claim 7, wherein the generation unit determines a portion for expanding the three-dimensional shape data corresponding to the structure according to a region in the three-dimensional space where the structure exists. apparatus.

The generation unit determines a ratio of expanding the three-dimensional shape data corresponding to the structure according to a distance between the structure and the object in a three-dimensional space where the structure exists. Item 9. The generation device according to Item 7 or 8.

The generation device according to claim 9, wherein the generation unit increases the ratio of expanding the three-dimensional shape data corresponding to the structure as the distance between the structure and the object increases.

11. The generation according to claim 1, wherein the object is a moving body whose position can change in each image when the shooting is performed in time series from the same shooting direction. apparatus.

The generating apparatus according to claim 1, wherein the object is at least one of a person and a ball.

The structure, generating device according to any one of claims 1 to 12, wherein the quiescent state is the object to continue.

The generation device according to claim 1, wherein at least one of a soccer goal and a corner flag used in a soccer game is the structure.

The structure, generating device according to any one of claims 1 to 14, characterized in that it is installed objects in place.

16. The generation apparatus according to claim 1, wherein at least a part of the structure is installed on a field where a person who is an object plays a game.

The generation apparatus according to claim 1, wherein the structure is a designated object.

A first acquisition step of acquiring first region information indicating regions of an object in a plurality of images obtained by photographing from a plurality of photographing directions;
A second acquisition step of acquiring second region information indicating a region of a structure that may block the object at the time of photographing from at least one of the plurality of photographing directions;
Three-dimensional shape data corresponding to the object based on both the first region information indicating the region of the object acquired in the first acquisition step and the second region information indicating the region of the structure acquired in the second acquisition step. A generating step for generating
A generation method characterized by comprising:

The generation method according to claim 18 , wherein the three-dimensional shape data corresponding to the object includes three-dimensional shape data corresponding to the structure.

The first area information is an image indicating an area of the object,
The generation method according to claim 18 or 19 , wherein the second area information is an image showing an area of the structure.

And further comprising a combining step of combining the image indicating the region of the object and the image indicating the region of the structure,
The generation method according to claim 20 , wherein, in the generation step, the three-dimensional shape data corresponding to the object is generated based on the image combined in the combination step.

The program for functioning a computer as a production | generation apparatus of any one of Claims 1 thru | or 17 .