JP2019106145A

JP2019106145A - Generation device, generation method and program of three-dimensional model

Info

Publication number: JP2019106145A
Application number: JP2017239891A
Authority: JP
Inventors: 圭輔森澤; Keisuke Morisawa; 究小林; Kiwamu Kobayashi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2019-06-27
Anticipated expiration: 2037-12-14
Also published as: JP6403862B1

Abstract

To prevent a defect from arising in a three-dimensional model generated even when a structure or the like that may shield a portion of a target object exists in an imaging scene.SOLUTION: A generation device of a three-dimensional model comprises: acquisition means that acquires a first mask image indicating a region of a structure being a stationary object in each image imaged by a plurality of viewpoints and a second mask image indicating a region of a foreground being an object of a moving object in each image imaged by the plurality of viewpoints; synthesis means that synthesizes the first mask image and the second mask image acquired to generate a third mask image integrating the region of the structure and the region of the foreground in the image imaged by the plurality of viewpoints; and generation means that generates the three-dimensional model including the structure and the foreground by means of a view volume crossing method using the third mask image.SELECTED DRAWING: Figure 4

Description

本発明は、画像内オブジェクトの３次元モデルの生成に関する。 The present invention relates to the generation of three-dimensional models of objects in an image.

従来、複数台のカメラによって異なる視点から同期撮影された複数視点画像を用いて、オブジェクトの３次元形状を推定する手法として、「視体積交差法（Visual Hull）」と呼ばれる手法が知られている（特許文献１）。図１の（ａ）〜（ｃ）は、視体積交差法の基本原理を示す図である。あるオブジェクトを撮影した画像からは、撮像面に当該オブジェクトの２次元シルエットを表すマスク画像が得られる（図１（ａ））。そして、カメラの投影中心からマスク画像の輪郭上の各点を通すように、３次元空間中に広がる錐体を考える（図１（ｂ））。この錐体のことを該当するカメラによる対象の「視体積」と呼ぶ。さらに、複数の視体積の共通領域、すなわち視体積の交差を求めることによって、オブジェクトの３次元形状（３次元モデル）が求まる（図１（ｃ））。このように視体積交差法による形状推定では、オブジェクトが存在する可能性のある空間中のサンプリング点をマスク画像に射影し、複数の視点で共通して射影した点がマスク画像に含まれるかを検証することにより、オブジェクトの３次元形状を推定する。 Conventionally, a method called "visual volume intersection method (Visual Hull)" is known as a method of estimating a three-dimensional shape of an object using a plurality of viewpoint images synchronously shot from different viewpoints by a plurality of cameras. (Patent Document 1). (A)-(c) of FIG. 1 is a figure which shows the basic principle of the visual volume intersection method. From an image obtained by photographing an object, a mask image representing a two-dimensional silhouette of the object is obtained on the imaging surface (FIG. 1A). Then, a cone that spreads in a three-dimensional space is considered so as to pass each point on the outline of the mask image from the projection center of the camera (FIG. 1 (b)). This cone is called the "view volume" of the object by the corresponding camera. Furthermore, the three-dimensional shape (three-dimensional model) of the object is determined by finding the intersection of a plurality of viewing volumes, that is, the intersection of the viewing volumes (FIG. 1 (c)). As described above, in shape estimation by the view volume intersection method, sampling points in a space in which an object may exist are projected onto a mask image, and whether a point projected in common by a plurality of viewpoints is included in the mask image By examining, the three-dimensional shape of the object is estimated.

特開２０１４−１０８０５号公報JP, 2014-10805, A

上述の視体積交差法では、マスク画像が対象オブジェクトのシルエットを正しく表現できている必要があり、マスク画像上のシルエットが不正確な場合は生成される３次元形状も不正確なものになってしまう。例えば、対象オブジェクトである人物の一部が、当該人物の手前に存在する構造物等の静止物体によって遮られ、マスク画像が示す人物のシルエットの一部が欠けてしまうと、生成される３次元モデルに欠損が生じてしまう。また、シルエットの一部が欠けたマスク画像については使用しないこととすると、得られる３次元モデルの形状精度が落ちてしまう。特に、構造物によって遮られている部分が相対的に小さい場合は、たとえシルエットの一部が掛けたマスク画像であっても、使用することで高い形状精度の３次元モデルが得られるので極力利用することが望ましい。 In the visual volume intersection method described above, the mask image needs to correctly represent the silhouette of the target object, and if the silhouette on the mask image is incorrect, the generated three-dimensional shape will also be incorrect. I will. For example, when a part of a person who is a target object is interrupted by a stationary object such as a structure existing in front of the person and a part of a person's silhouette indicated by a mask image is missing, a three-dimensional image is generated There will be a defect in the model. In addition, when not using a mask image in which a part of the silhouette is missing, the shape accuracy of the obtained three-dimensional model is degraded. In particular, when the portion blocked by the structure is relatively small, even if it is a mask image that is partially covered by the silhouette, it is possible to obtain a three-dimensional model with high shape accuracy, so it is used as much as possible. It is desirable to do.

本発明は、上記の課題に鑑みてなされたものであり、その目的は、対象オブジェクトの一部を遮ってしまうような構造物等が撮影シーン内に存在していても、生成される３次元モデルにおいて欠損が生じないようにすることである。 The present invention has been made in view of the above problems, and an object of the present invention is to generate a three-dimensional object even if a structure or the like that obstructs a part of a target object is present in a shooting scene. The goal is to ensure that no defects occur in the model.

本発明に係る３次元モデルの生成装置は、複数の視点で撮影した各画像内の静止しているオブジェクトである構造物の領域を示す第１マスク画像、及び前記複数の視点で撮影した各画像内の動体のオブジェクトである前景の領域を示す第２マスク画像を取得する取得手段と、取得した前記第１マスク画像と前記第２マスク画像とを合成して、前記複数の視点で撮影した画像内の前記構造物の領域と前記前景の領域とを統合した第３マスク画像を生成する合成手段と、前記第３マスク画像を用いた視体積交差法により、前記構造物と前記前景とを含む３次元モデルを生成する生成手段と、を有することを特徴とする。 A three-dimensional model generation apparatus according to the present invention includes a first mask image indicating an area of a structure which is a stationary object in each image captured at a plurality of viewpoints, and each image captured at the plurality of viewpoints Acquisition means for acquiring a second mask image indicating a foreground area that is an object of a moving object in the image, and an image photographed from the plurality of viewpoints by combining the acquired first mask image and the acquired second mask image Combining the structure and the foreground by combining means for generating a third mask image in which the area of the structure and the foreground area are integrated and the third mask image using the third mask image And generating means for generating a three-dimensional model.

本発明によれば、対象オブジェクトの一部を遮ってしまうような構造物等が撮影シーン内に存在していても、欠損のない、もしくは低減させた高品質な３次元モデルの生成が可能となる。 According to the present invention, it is possible to generate a high-quality three-dimensional model free from defects or reduced even if a structure or the like that partially blocks a target object is present in a shooting scene. Become.

（ａ）〜（ｃ）は、視体積交差法の基本原理を示す図(A)-(c) is a figure which shows the basic principle of the visual volume intersection method （ａ）は仮想視点画像生成システムの構成を示すブロック図、（ｂ）はカメラアレイを構成する各カメラの配置例を示す図(A) is a block diagram showing the configuration of a virtual viewpoint image generation system, (b) is a diagram showing an arrangement example of each camera constituting a camera array ３次元モデル生成装置の内部構成を示す機能ブロック図Functional block diagram showing the internal configuration of the three-dimensional model generation apparatus 実施形態１に係る、３次元モデル形成処理の流れを示すフローチャートFlowchart showing a flow of three-dimensional model formation processing according to the first embodiment （ａ）〜（ｈ）は、各カメラで撮影された画像の一例を示す図(A)-(h) is a figure which shows an example of the image image | photographed with each camera （ａ）〜（ｈ）は、構造物マスクの一例を示す図(A)-(h) is a figure which shows an example of a structure mask （ａ）〜（ｈ）は、前景マスクの一例を示す図(A)-(h) is a figure which shows an example of a foreground mask （ａ）〜（ｈ）は、統合マスクの一例を示す図(A)-(h) is a figure which shows an example of an integrated mask 統合マスクを元に生成される統合３次元モデルの一例を示す図Diagram showing an example of an integrated 3D model generated based on an integrated mask 従来手法による、前景マスクのみを用いて生成した３次元モデルの一例を示す図A diagram showing an example of a three-dimensional model generated using only a foreground mask according to the conventional method 実施形態２に係る、３次元モデル形成処理の流れを示すフローチャートFlowchart showing a flow of three-dimensional model formation processing according to the second embodiment （ａ）は統合マスクを元に生成した統合３次元モデルを示し、（ｂ）は構造物マスクのみに基づいて生成した構造物の３次元モデルを示し、（ｃ）は（ａ）の統合３次元モデルと（ｂ）の構造物の３次元モデルとの差分により得られた前景のみの３次元モデルを示す図(A) shows an integrated three-dimensional model generated based on an integrated mask, (b) shows a three-dimensional model of a structure generated based only on a structure mask, and (c) shows an integrated 3 of (a) A diagram showing a foreground-only three-dimensional model obtained by the difference between a two-dimensional model and the three-dimensional model of the structure of (b)

以下、添付図面を参照して、本発明を実施形態に従って詳細に説明する。なお、以下の実施形態において示す構成は一例にすぎず、本発明は図示された構成に限定されるものではない。 Hereinafter, the present invention will be described in detail according to an embodiment with reference to the attached drawings. In addition, the structure shown in the following embodiment is only an example, and this invention is not limited to the illustrated structure.

Embodiment 1

本実施形態では、撮影シーンにおける前景の２次元シルエットに加え、その少なくとも一部を遮る構造物の２次元シルエットをも含むマスク画像を用いて、前景について欠損のない、もしくは低減させた３次元モデルを生成する態様について説明する。この態様では、前景の一部を遮る構造物等を含んだ３次元モデルが生成される。なお、本明細書において、「前景」とは、時系列で同じアングルから撮影を行った場合において動きのある（その絶対位置が変化し得る）、仮想的な視点から見ることが可能な、撮影画像内に存在する動的オブジェクト（動体）を指す。また、「構造物」とは、時系列で同じアングルから撮影を行った場合において動きのない（その絶対位置が変化しない、即ち静止している）、前景を遮ってしまう可能性のある、撮影画像内に存在する静的オブジェクトを指す。 In the present embodiment, a three-dimensional model with no or reduced defects in the foreground using a mask image that also includes a two-dimensional silhouette of a structure that blocks at least a part of the foreground in addition to the two-dimensional silhouette of the foreground in a shooting scene The aspect which produces | generates is demonstrated. In this aspect, a three-dimensional model including a structure that partially blocks the foreground is generated. In the present specification, “foreground” refers to imaging that can be viewed from a virtual viewpoint that moves (when its absolute position may change) when imaging is performed from the same angle in time series. It refers to a dynamic object (moving object) present in an image. In addition, “structure” refers to shooting that may block the foreground if there is no movement (the absolute position does not change, ie, it is stationary) when shooting from the same angle in chronological order Refers to a static object present in the image.

以下の説明では、サッカーの試合を撮影シーンとして仮想視点画像を生成する場合において、選手やボールといった前景（動的オブジェクト）の一部が、サッカーゴール等の構造物（静的オブジェクト）によって遮られてしまうケースを想定している。なお、仮想視点画像とは、エンドユーザ及び／又は選任のオペレータ等が自由に仮想カメラの位置及び姿勢を操作することによって生成される映像であり、自由視点画像や任意視点画像などとも呼ばれる。また、生成される仮想視点画像やその元になる複数視点画像は、動画であっても、静止画であってもよい。以下に述べる各実施形態では、動画の複数視点画像を用いて動画の仮想視点画像を生成するための３次元モデルを生成する場合を例に説明するものとする。 In the following description, when generating a virtual viewpoint image with a soccer game as a shooting scene, part of the foreground (dynamic object) such as a player or a ball is blocked by a structure (static object) such as a soccer goal. The case is assumed. The virtual viewpoint image is an image generated when the end user and / or an appointed operator freely manipulates the position and orientation of the virtual camera, and is also called a free viewpoint image or an arbitrary viewpoint image. In addition, the virtual viewpoint image to be generated and the multiple viewpoint images that are the origin of the virtual viewpoint image may be a moving image or a still image. In each embodiment described below, the case where a three-dimensional model for generating a virtual viewpoint image of a moving image is generated using a plurality of viewpoint images of the moving image will be described as an example.

本実施形態ではサッカーを撮影シーンとし、固定的に設置されたサッカーゴールを構造物として以下説明を行うものとするが、これに限定されない。例えば、さらにコーナーフラッグを構造物として扱ってもよいし、屋内スタジオなどを撮影シーンとする場合には家具や小道具を構造物として扱うこともできる。即ち、静止または静止に近い状態が継続する静止物体であればよい。 In the present embodiment, soccer is used as a shooting scene, and a soccer goal fixedly installed is described below as a structure, but the present invention is not limited to this. For example, the corner flag may be further treated as a structure, and when an indoor studio or the like is used as a shooting scene, furniture and props may be treated as a structure. That is, it may be a stationary object in which the stationary state or the near stationary state continues.

（システム構成）
図２（ａ）は、本実施形態に係る、３次元モデル生成装置を含む仮想視点画像生成システムの構成の一例を示すブロック図である。仮想視点画像生成システム１００は、複数のカメラを含むカメラアレイ１１０、制御装置１２０、前景分離装置１３０、３次元モデル生成装置１４０、レンダリング装置１５０で構成される。制御装置１２０、前景分離装置１３０、３次元モデル生成装置１４０及びレンダリング装置１５０は、演算処理を行うＣＰＵ、演算処理の結果やプログラム等を記憶するメモリなどを備えた一般的なコンピュータ（情報処理装置）によって実現される。 (System configuration)
FIG. 2A is a block diagram showing an example of the configuration of a virtual viewpoint image generation system including a three-dimensional model generation device according to the present embodiment. The virtual viewpoint image generation system 100 includes a camera array 110 including a plurality of cameras, a control device 120, a foreground separation device 130, a three-dimensional model generation device 140, and a rendering device 150. The control device 120, the foreground separation device 130, the three-dimensional model generation device 140, and the rendering device 150 generally include a CPU that performs arithmetic processing, a memory that stores results of arithmetic processing, programs, and the like. Realized by

図２（ｂ）は、カメラアレイ１１０を構成する全８台のカメラ２１１〜２１８の配置を、フィールド２００を真上から見た俯瞰図において示した図である。各カメラ２１１〜２１８は、地上からある一定の高さにフィールド２００を囲むように設置されており、一方のゴール前を様々な角度から撮影して、視点の異なる複数視点画像データを取得する。芝生のフィールド２００上には、サッカーコート２０１が（実際には白のラインで）描かれており、その左側にサッカーゴール２０２が置かれている。また、サッカーゴール２０２の前の×印２０３は、カメラ２１１〜２１８の共通の視線方向（注視点）を示し、破線の円２０４は注視点２０３を中心としてカメラ２１１〜２１８がそれぞれ撮影可能なエリアを示している。本実施形態では、フィールド２００の1つの角を原点として、長手方向をx軸、短手方向をy軸、高さ方向をz軸とした座標系で表すこととする。カメラアレイ１１０の各カメラで得られた複数始点画像のデータは、制御装置１２０及び前景分離装置１３０へ送られる。なお、図２（ａ）では、各カメラ２１１〜２１８と、制御装置１２０及び前景分離装置１３０とは、スター型のトポロジーで接続されているがデイジーチェーン接続によるリング型やバス型のトポロジーでもよい。また、図２において、カメラ８台の例を示したが、カメラの数は、８台未満または８台を超えてもよい。 FIG. 2B is a view showing the arrangement of all eight cameras 211 to 218 constituting the camera array 110 in an overhead view when the field 200 is viewed from directly above. Each of the cameras 211 to 218 is installed to surround the field 200 at a certain height from the ground, and one of the goals is photographed from various angles to acquire multi-viewpoint image data having different viewpoints. On a grass field 200, a soccer court 201 is drawn (actually by a white line), and a soccer goal 202 is placed on the left side thereof. In addition, a cross mark 203 in front of the soccer goal 202 indicates a common line of sight direction (gaze point) of the cameras 211 to 218, and a circle 204 indicated by a broken line is an area where the cameras 211 to 218 can be photographed respectively Is shown. In this embodiment, a coordinate system in which the longitudinal direction is the x axis, the short direction is the y axis, and the height direction is the z axis, with one corner of the field 200 as the origin. Data of multiple source images obtained by each camera of the camera array 110 is sent to the control device 120 and the foreground separation device 130. In FIG. 2A, the cameras 211 to 218, and the control device 120 and the foreground separation device 130 are connected in a star topology, but may be a ring or bus topology by daisy chain connection. . Also, although FIG. 2 shows an example of eight cameras, the number of cameras may be less than eight or more than eight.

制御装置１２０は、カメラパラメータや構造物マスクを生成し、３次元モデル生成装置１４０に供給する。カメラパラメータは、各カメラの位置や姿勢（視線方向）を表す外部パラメータと、各カメラが備えるレンズの焦点距離や画角（撮影領域）などを表す内部パラメータからなり、キャリブレーションによって得られる。キャリブレーションは、チェッカーボードのような特定パターンを撮影した複数の画像を用いて取得した３次元の世界座標系の点とそれに対応する２次元上の点との対応関係を求める処理である。構造物マスクは、各カメラ２１１〜２１８で取得される各撮影画像中に存在する構造物の２次元シルエットを示すマスク画像である。マスク画像は、撮影画像内の抽出対象の部分がどこであるかを特定する基準画像であり、０と１で表される２値画像である。本実施形態では、サッカーゴール２０２を構造物として扱い、各カメラそれぞれが所定位置から所定アングルで撮影した画像内のサッカーゴール２０２の領域（２次元シルエット）を示すシルエット画像が構造物マスクとなる。なお、構造物マスクの元になる撮影画像は、試合の前後やハーフタイム中など、前景となる選手等が存在していないタイミングで撮影したものを使用すればよい。ただし、例えば屋外では日照変動の影響を受けるなどにより、事前・事後に撮影した画像では不適切な場合がある。このような場合、例えば選手等が写っている動画のうち所定数のフレーム（例えば連続する１０秒分のフレーム）を用いて、そこから選手等を消すことで得てもよい。この場合、各フレームにおける各画素値の中央値を採用した画像に基づいて構造物マスクを得ることができる。 The controller 120 generates a camera parameter and a structure mask, and supplies the camera parameter and the structure mask to the three-dimensional model generator 140. The camera parameters include external parameters representing the position and orientation (line of sight direction) of each camera, and internal parameters representing the focal length and angle of view (shooting area) of the lens of each camera, and are obtained by calibration. Calibration is a process of obtaining a correspondence between a point in a three-dimensional world coordinate system acquired using a plurality of images obtained by photographing a specific pattern such as a checkerboard and a corresponding two-dimensional point. The structure mask is a mask image showing a two-dimensional silhouette of a structure present in each captured image acquired by each of the cameras 211 to 218. The mask image is a reference image for specifying where the part to be extracted in the photographed image is, and is a binary image represented by 0 and 1. In the present embodiment, the soccer goal 202 is treated as a structure, and a silhouette image indicating an area (two-dimensional silhouette) of the soccer goal 202 in an image captured by each camera at a predetermined angle from a predetermined position is a structure mask. In addition, what is necessary is just to use what was image | photographed at the timing in which the player etc. who become a foreground, etc. do not exist, such as before and behind a game and in a half time, as the picked-up image used as a structure mask. However, for example, due to the influence of sunshine fluctuations outdoors, there may be cases where images taken in advance and after are inappropriate. In such a case, for example, a predetermined number of frames (for example, consecutive 10-second frames) of a moving image in which a player or the like appears may be used to delete the player or the like therefrom. In this case, a structure mask can be obtained based on an image adopting the median value of each pixel value in each frame.

前景分離装置１３０は、入力される複数視点の各撮影画像それぞれに対し、フィールド２００上の選手やボールに対応する前景領域とそれ以外の背景領域を判別する処理を行なう。この前景領域の判別には、予め用意した背景画像（構造物マスクの元になる撮影画像と同じでよい）を用いる。具体的には、各撮影画像について背景画像との差分を求め、当該差分に対応する領域を前景領域として特定する。これにより、撮影画像毎の前景領域を示す前景マスクを生成する。本実施形態においては、撮影画像の中の、選手やボールを表す前景領域に属する画素を“０”、それ以外の背景領域に属する画素を“１”で表す２値画像が、前景マスクとして生成されることになる。 The foreground separation device 130 performs processing to discriminate a foreground area corresponding to a player or a ball on the field 200 and a background area other than that for each of the photographed images of a plurality of viewpoints to be input. For the determination of the foreground area, a background image prepared in advance (which may be the same as the photographed image which is the source of the structure mask) is used. Specifically, for each captured image, the difference with the background image is obtained, and the region corresponding to the difference is specified as the foreground region. Thereby, a foreground mask indicating the foreground area for each photographed image is generated. In the present embodiment, a binary image in which the pixels belonging to the foreground area representing the player or the ball in the photographed image are represented by “0” and the pixels belonging to the other background areas by “1” is generated as the foreground mask. It will be done.

３次元モデル生成装置１４０は、カメラパラメータや複数視点画像に基づいて、オブジェクトの３次元モデルを生成する。３次元モデル生成装置１４０の詳細については後述する。生成した３次元モデルのデータは、レンダリング装置１５０に出力される。 The three-dimensional model generation device 140 generates a three-dimensional model of the object based on the camera parameters and the multi-viewpoint image. Details of the three-dimensional model generation device 140 will be described later. The data of the generated three-dimensional model is output to the rendering device 150.

レンダリング装置１５０は、３次元モデル生成装置１４０から受け取った３次元モデル、制御装置１２０から受け取ったカメラパラメータ、前景分離装置１３０から受け取った前景画像、予め用意した背景画像に基づいて、仮想視点画像を生成する。具体的には、カメラパラメータから前景画像と３次元モデルとの位置関係を求め、３次元モデルに対応する前景画像をマッピングして、任意のアングルから注目オブジェクトを見た場合の仮想視点画像が生成される。こうして例えば、選手が得点を決めたゴール前の決定的シーンの仮想視点画像を得ることができる。 The rendering device 150 is based on the three-dimensional model received from the three-dimensional model generator 140, the camera parameters received from the control device 120, the foreground image received from the foreground separation device 130, and the background image prepared in advance. Generate Specifically, the positional relationship between the foreground image and the three-dimensional model is obtained from the camera parameters, the foreground image corresponding to the three-dimensional model is mapped, and a virtual viewpoint image is generated when the object of interest is viewed from an arbitrary angle. Be done. Thus, for example, it is possible to obtain a virtual viewpoint image of a decisive scene before the goal that the player has scored.

なお、図２に示した仮想視点画像生成システムの構成は一例でありこれに限定されない。例えば、１台のコンピュータが複数の装置（例えば前景分離装置１３０と３次元モデル生成装置１４０など）の機能を兼ね備えてもよい。或いは、各カメラのモジュールに前景分離装置１３０の機能を持たせ、各カメラから撮影画像とその前景マスクのデータを供給するように構成してもよい。 The configuration of the virtual viewpoint image generation system illustrated in FIG. 2 is an example and is not limited thereto. For example, one computer may have the functions of a plurality of devices (for example, the foreground separation device 130 and the three-dimensional model generation device 140). Alternatively, the module of each camera may be provided with the function of the foreground separation device 130, and the data of the photographed image and the foreground mask may be supplied from each camera.

（３次元モデル生成装置）
図３は、本実施形態に係る３次元モデル生成装置１４０の内部構成を示す機能ブロック図である。３次元モデル生成装置１４０は、データ受信部３１０、構造物マスク保存部３２０、マスク合成部３３０、座標変換部３４０、３次元モデル形成部３５０、データ出力部３６０で構成される。以下、各部について詳しく説明する。 (3D model generator)
FIG. 3 is a functional block diagram showing an internal configuration of the three-dimensional model generation device 140 according to the present embodiment. The three-dimensional model generation device 140 includes a data receiving unit 310, a structure mask storage unit 320, a mask combining unit 330, a coordinate conversion unit 340, a three-dimensional model forming unit 350, and a data output unit 360. Each part will be described in detail below.

データ受信部３１０は、カメラアレイ１１０を構成する各カメラのカメラパラメータ及び撮影シーン内に存在する構造物の２次元シルエットを表す構造物マスクを、制御装置１２０から受信する。また、カメラアレイ１１０の各カメラで得られた撮影画像（複数視点画像）及び各撮影画像内に存在する前景の２次元シルエットを表す前景マスクのデータを前景分離装置１３０から受信する。受信したデータのうち、構造物マスクは構造物マスク保存部３２０に、前景マスクはマスク合成部３３０に、複数始点画像は座標変換部３４０に、カメラパラメータは座標変換部３４０と３次元モデル形成部３５０に、それぞれ渡される。 The data reception unit 310 receives, from the control device 120, the camera parameters of the respective cameras constituting the camera array 110 and the structure mask representing the two-dimensional silhouette of the structure present in the photographed scene. Also, from the foreground separation device 130, captured images (multi-viewpoint images) obtained by each camera of the camera array 110 and foreground mask data representing a two-dimensional silhouette of the foreground present in each captured image are received. Of the received data, the structure mask is for the structure mask storage unit 320, the foreground mask is for the mask combination unit 330, the multi-start point image is for the coordinate conversion unit 340, and the camera parameters are the coordinate conversion unit 340 and the three-dimensional model formation unit Passed to 350, respectively.

構造物マスク保存部３２０は、構造物マスクをＲＡＭ等に格納・保持し、必要に応じてマスク合成部３３０へ供給する。 The structure mask storage unit 320 stores and holds the structure mask in a RAM or the like, and supplies the structure mask to the mask combining unit 330 as necessary.

マスク合成部３３０は、構造物マスク保存部３２０から構造物マスクを読み出し、これをデータ受信部３１０から受け取った前景マスクと合成して、両者を１つに統合したマスク画像（以下、「統合マスク」と呼ぶ）を生成する。生成した統合マスクは、３次元モデル形成部３５０へ送られる。 The mask composition unit 330 reads out the structure mask from the structure mask storage unit 320, combines it with the foreground mask received from the data reception unit 310, and integrates both into one mask image (hereinafter referred to as “integrated mask To generate). The generated integrated mask is sent to the three-dimensional model formation unit 350.

座標変換部３４０は、データ受信部３１０から受け取った複数視点画像を、カメラパラメータに基づき、カメラ座標系から世界座標系に変換する。この座標変換により、視点の異なる各撮影画像が、それぞれ３次元空間上のどの領域を示しているのかを表す情報に変換される。 The coordinate conversion unit 340 converts the multi-viewpoint image received from the data reception unit 310 from the camera coordinate system to the world coordinate system based on the camera parameters. By this coordinate conversion, each photographed image from different viewpoints is converted into information indicating which region in the three-dimensional space is shown.

３次元モデル形成部３５０は、世界座標系に変換された複数視点画像、各カメラに対応する統合マスクを用いて、撮影シーン内の構造物を含むオブジェクトの３次元モデルを視体積交差法により生成する。生成したオブジェクトの３次元モデルのデータは、データ出力部３６０を介してレンダリング装置１５０へ出力される。 The three-dimensional model formation unit 350 generates a three-dimensional model of an object including a structure in a photographed scene by the view volume intersection method using the multi-viewpoint image converted to the world coordinate system and the integrated mask corresponding to each camera. Do. The data of the generated three-dimensional model of the object is output to the rendering device 150 via the data output unit 360.

（３次元モデルの形成処理）
図４は、本実施形態に係る、３次元モデル形成処理の流れを示すフローチャートである。この一連の処理は、３次元モデル生成装置１４０が備えるＣＰＵが、ＲＯＭやＨＤＤ等の記憶媒体にされた所定のプログラムをＲＡＭに展開してこれを実行することで実現される。以下、図４のフローに沿って説明する。 (Formation process of 3D model)
FIG. 4 is a flowchart showing a flow of three-dimensional model formation processing according to the present embodiment. This series of processing is realized by the CPU included in the three-dimensional model generation device 140 developing a predetermined program stored in a storage medium such as a ROM or an HDD in the RAM and executing the program. Hereinafter, description will be made along the flow of FIG.

まず、ステップ４０１では、データ受信部３１０が、各カメラ２１１〜２１８から見た場合の構造物（ここでは、サッカーゴール２０２）の２次元シルエットを表す構造物マスクと、各カメラのカメラパラメータを、制御装置１２０から受信する。図５（ａ）〜（ｈ）は、カメラアレイ１１０を構成するカメラ２１１〜２２２でそれぞれ撮影される画像を示している。いま、サッカーコート２０１上に選手（ゴールキーパ）が一人、サッカーゴール２０２の前に存在している。そして、図５（ａ）、（ｂ）、（ｈ）の各撮像画像においては、カメラと選手との間にサッカーゴール２０２が位置するため、選手の一部がサッカーゴール２０２によって隠れてしまっている。図５（ａ）〜（ｈ）の各撮影画像からは、サッカーゴール２０２の領域が１（白）、それ以外の領域が０（黒）の２値で表現された、構造物マスクがそれぞれ得られることになる。図６（ａ）〜（ｈ）は、図５（ａ）〜（ｈ）の各撮影画像に対応する構造物マスクを示している。 First, in step 401, the data receiving unit 310 represents a structure mask representing a two-dimensional silhouette of a structure (here, soccer goal 202) as viewed from each of the cameras 211 to 218, and a camera parameter of each camera. It receives from the control device 120. FIGS. 5A to 5H show images captured by the cameras 211 to 222 constituting the camera array 110, respectively. Now, one player (goal keeper) exists on the soccer court 201 in front of the soccer goal 202. And in each image pick-up picture of Drawing 5 (a), (b), (h), since soccer goal 202 is located between a camera and a player, a part of players may be covered by soccer goal 202. There is. From each photographed image in FIGS. 5A to 5H, a structure mask is obtained in which the area of the soccer goal 202 is represented by 1 (white) and the other area is expressed by 0 (black). Will be FIGS. 6A to 6H show structure masks corresponding to the photographed images of FIGS. 5A to 5H.

次に、ステップ４０２では、データ受信部３１０が、各カメラ２１１〜２２２で撮影された画像における前景（ここでは、選手やボール）の２次元シルエットを示す前景マスクを、その元になった複数始点画像と共に、前景分離装置１３０から受信する。図７（ａ）〜（ｈ）は、図５（ａ）〜（ｈ）の各撮影画像に対応する前景マスクをそれぞれ示している。前景分離装置１３０は、同じアングルから撮影された画像間で時間的に変化のある領域を前景として抽出するため、図７（ａ）、（ｂ）、（ｈ）の各図では、サッカーゴール２０２に隠れている選手の一部の領域は前景領域として抽出されない。受信した前景マスクのデータはマスク合成部３３０に送られる。 Next, in step 402, a plurality of starting points from which the data receiving unit 310 is a foreground mask indicating a two-dimensional silhouette of the foreground (here, a player or a ball) in the image captured by each of the cameras 211 to 222 The image is received from the foreground separation device 130 together with the image. FIGS. 7 (a) to 7 (h) respectively show foreground masks corresponding to the photographed images of FIGS. 5 (a) to 5 (h). The foreground separation device 130 extracts a region having a temporal change between images captured from the same angle as the foreground, so in each of FIGS. 7 (a), (b) and (h), the soccer goal 202 Some areas of the player hidden in the are not extracted as foreground areas. The received foreground mask data is sent to the mask combining unit 330.

次に、ステップ４０３では、マスク合成部３１０が、構造物マスク保存部３２０から構造物マスクのデータを読み出し、読み出した構造物マスクと、データ受信部３１０から受け取った前景マスクとを合成する処理を実行する。この合成は、２値（白黒）で表される前景マスクと構造物マスクの各画素について論理和（ＯＲ）を求める演算処理である。図８（ａ）〜（ｈ）は、図６（ａ）〜（ｈ）に示した各構造物マスクと、図７（ａ）〜（ｈ）で示した各前景マスクとをそれぞれ合成して得られた統合マスクを示している。出来上がった統合マスクにおいては、選手のシルエットに欠損は見られない。 Next, in step 403, the mask combining unit 310 reads out the data of the structure mask from the structure mask storage unit 320, and combines the read structure mask and the foreground mask received from the data receiving unit 310. Run. This combination is an arithmetic processing for obtaining a logical sum (OR) for each pixel of the foreground mask and the structure mask represented by binary (black and white). FIGS. 8 (a) to 8 (h) show the structure masks shown in FIGS. 6 (a) to 6 (h) and the foreground masks shown in FIGS. 7 (a) to 7 (h) respectively. The obtained integrated mask is shown. In the finished integrated mask, no loss is seen in the silhouette of the player.

そして、ステップ４０４において、３次元モデル形成部３５０が、ステップ４０３で得た統合マスクを元に視体積交差法を用いて３次元モデルを生成する。これにより、異なる視点から撮影された複数画像間の共通撮影領域に存在する前景と構造物の３次元形状を表すモデル（以下、「統合３次元モデル」と呼ぶ）が生成される。本実施形態の場合であれば、選手やボールに加え、サッカーゴール２０２を含んだ統合３次元モデルが生成されることになる。統合３次元モデルの生成は、具体的には以下のような手順で行う。まず、フィールド２００上の３次元空間を一定の大きさを持つ立方体（ボクセル）で充填したボリュームデータを用意する。ボリュームデータを構成するボクセルの値は０と１で表現され、「１」は形状領域、「０」は非形状領域をそれぞれ示す。次に、各カメラ２１１〜２１８のカメラパラメータ（設置位置や視線方向など）を用いて、ボクセルの３次元座標を世界座標系からカメラ座標系に変換する。そして、統合マスクで示される構造物及び前景がそのカメラ座標系にある場合は、ボクセルによって当該構造物及び前景の３次元形状を表したモデルが生成される。なお、ボクセルそのものではなく、ボクセルの中心を示す点の集合（点群）によって、３次元形状を表現してもよい。図９は、図８で示した統合マスクを元に生成される統合３次元モデルを示しており、符号９０１は前景である選手の３次元形状、符号９０２は構造物であるサッカーゴール２０２の３次元形状に相当する。前述の通り、統合マスクには前景である選手のシルエットに欠損が無いため、出来上がった統合３次元モデルにおいても欠損は生じていない。図１０は、従来手法による、前景マスクのみを用いて生成した３次元モデルを示している。前述の通り、図７の（ａ）、（ｂ）、（ｈ）で示す前景マスクでは、選手の一部が前景領域として表現されていないため、生成される３次元モデルにおいて当該一部が欠損してしまう。本実施形態の手法では、前景マスクと構造物マスクを合成したマスク画像を用いることで、前景の３次元モデルの一部に欠損が生じるのを回避することが可能となる。 Then, at step 404, the three-dimensional model formation unit 350 generates a three-dimensional model using the visual volume intersection method based on the integrated mask obtained at step 403. As a result, a model (hereinafter referred to as “integrated three-dimensional model”) representing the three-dimensional shape of the foreground and the structure present in the common imaging region between the plurality of images captured from different viewpoints is generated. In the case of this embodiment, an integrated three-dimensional model including the soccer goal 202 is generated in addition to the player and the ball. Specifically, the generation of the integrated three-dimensional model is performed according to the following procedure. First, volume data is prepared by filling a three-dimensional space on the field 200 with a cube (voxel) having a certain size. The values of the voxels constituting the volume data are expressed by 0 and 1, respectively, “1” indicates a shape area, and “0” indicates a non-shape area. Next, the three-dimensional coordinates of the voxel are converted from the world coordinate system to the camera coordinate system using the camera parameters (such as the installation position and the line of sight direction) of each of the cameras 211 to 218. Then, when the structure indicated by the integrated mask and the foreground are in the camera coordinate system, a model is generated which represents the three-dimensional shape of the structure and the foreground by voxels. The three-dimensional shape may be expressed not by voxels themselves but by a set of points (point group) indicating the centers of the voxels. FIG. 9 shows an integrated three-dimensional model generated based on the integrated mask shown in FIG. 8, in which reference numeral 901 denotes the three-dimensional shape of the player who is the foreground, and reference numeral 902 denotes the soccer goal 202 which is a structure. It corresponds to a dimensional shape. As described above, since the integrated mask has no loss in the foreground player silhouette, no loss occurs in the completed integrated three-dimensional model. FIG. 10 shows a three-dimensional model generated using only the foreground mask according to the conventional method. As described above, in the foreground masks shown in (a), (b) and (h) of FIG. 7, a part of the player is not represented as a foreground area, so the part is missing in the generated three-dimensional model Resulting in. In the method of the present embodiment, it is possible to avoid the occurrence of a defect in a part of the three-dimensional model of the foreground by using a mask image obtained by combining the foreground mask and the structure mask.

以上が、本実施形態に係る、３次元モデル形成処理の内容である。動画の仮想視点画像を生成する場合には、上述の各ステップの処理をフレーム単位で繰り返し行い、フレーム毎の３次元モデルを生成する。ただし、構造物マスクの受信と保存（ステップ４０１）については、フローの開始直後にのみ行えば足り、２フレーム目以降については省略可能である。さらに、同じ撮影場所にて日時を変えて撮影を行うような場合は、構造物マスクの受信・保存を初回だけ行なってＲＡＭ等に保持しておき、次回以降は保持しておいたものを利用してもよい。 The above is the contents of the three-dimensional model formation processing according to the present embodiment. In the case of generating a virtual viewpoint image of a moving image, the processing of each step described above is repeatedly performed in frame units to generate a three-dimensional model for each frame. However, it is sufficient to receive and save the structure mask (step 401) only immediately after the start of the flow, and the second and subsequent frames can be omitted. Furthermore, in the case where shooting is performed by changing the date and time at the same shooting location, reception and storage of the structure mask are performed only for the first time, held in the RAM, etc., and those held from the next time on are used. You may

以上のとおり本実施形態によれば、前景となるオブジェクトを隠してしまう構造物が存在していても、前景に欠損のない、もしくは低減させた高精度な３次元モデルを生成することができる。 As described above, according to the present embodiment, even if there is a structure that hides the foreground object, it is possible to generate a highly accurate three-dimensional model with no or reduced loss in the foreground.

Embodiment 2

実施形態１では、撮影シーン内に存在する構造物を含む形で、欠損のない、もしくは低減させた前景の３次元モデルを生成した。次に、構造物を取り除いた、欠損のない、もしくは低減させた前景のみの３次元モデルを生成する態様を、実施形態２として説明する。なお、システム構成など実施形態１と共通する内容については説明を省略ないしは簡略化し、以下では差異点を中心に説明するものとする。 In the first embodiment, a three-dimensional model of a defect-free or reduced foreground is generated so as to include structures present in a photographed scene. Next, an aspect of generating a three-dimensional model with no defect or reduced or reduced foreground only will be described as a second embodiment. The contents common to the first embodiment, such as the system configuration, will be omitted or simplified, and in the following, differences will be mainly described.

本実施形態の３次元モデル生成装置１４０の構成も、実施形態１と基本的には同じであるが（図３を参照）、以下の点で異なっている。 The configuration of the three-dimensional model generation apparatus 140 of the present embodiment is also basically the same as that of the first embodiment (see FIG. 3), but differs in the following points.

まず、構造部マスク保存部３２０に対する構造物マスクの読み出しが、マスク合成部３３０だけでなく、３次元モデル生成部３５０によってもなされる。図３における破線の矢印はこのことを表している。そして、３次元モデル生成部３５０では、統合マスクを用いた前景＋構造物の統合３次元モデルの生成に加え、構造物マスクを用いた構造物のみの３次元モデルの生成も行う。そして、統合マスクを元に生成した統合３次元モデルと、構造物マスクを元に生成した構造物の３次元モデルとの差分を求めることで、欠損のない、もしくは低減させた前景のみの３次元モデルを抽出する。 First, readout of the structure mask from the structure mask storage unit 320 is performed not only by the mask combining unit 330 but also by the three-dimensional model generation unit 350. The dashed arrow in FIG. 3 represents this. Then, in addition to the generation of the integrated three-dimensional model of the foreground + the structure using the integrated mask, the three-dimensional model generation unit 350 also generates the three-dimensional model of only the structure using the structure mask. Then, the difference between the integrated three-dimensional model generated based on the integrated mask and the three-dimensional model of the structure generated based on the structure mask is calculated to obtain the three-dimensional of the foreground without defects or reduced. Extract the model

（３次元モデルの形成処理）
図１１は、本実施形態に係る、３次元モデル形成処理の流れを示すフローチャートである。この一連の処理は、３次元モデル生成装置１４０が備えるＣＰＵが、ＲＯＭやＨＤＤ等の記憶媒体にされた所定のプログラムをＲＡＭに展開してこれを実行することで実現される。以下、図１１のフローに沿って説明する。 (Formation process of 3D model)
FIG. 11 is a flowchart showing a flow of three-dimensional model formation processing according to the present embodiment. This series of processing is realized by the CPU included in the three-dimensional model generation device 140 developing a predetermined program stored in a storage medium such as a ROM or an HDD in the RAM and executing the program. Hereinafter, description will be made along the flow of FIG.

ステップ１１０１〜ステップ１１０４は、実施形態１の図４のフローにおけるステップ４０１〜ステップ４０４にそれぞれ対応し、異なるところはないので説明を省略する。 Steps 1101 to 1104 correspond to steps 401 to 404 in the flow of FIG. 4 of the first embodiment, respectively, and the description will be omitted because there is no difference.

続くステップ１１０５において、３次元モデル形成部３５０は、構造部マスク保存部３２０から構造物マスクを読み出し、視体積交差法により構造物の３次元モデルを生成する。 In the subsequent step 1105, the three-dimensional model formation unit 350 reads the structure mask from the structure mask storage unit 320, and generates a three-dimensional model of the structure by the visual volume intersection method.

次に、ステップ１１０６において、３次元モデル形成部３５０は、ステップ１１０４で生成した前景＋構造物の合成３次元モデルとステップ１１０５で生成した構造物の３次元モデルとの差分を求め、前景のみの３次元モデルを抽出する。ここで、構造物の３次元モデルを３次元空間上で例えば１０％程度膨張させてから統合３次元モデルとの差分を求めてもよい。これにより、統合３次元モデルから構造物に対応する部分を確実に除去することができる。このとき、構造物の３次元モデルの一部のみを膨張させるようにしてもよい。例えば、サッカーゴール２０２の場合であれば、サッカーコート２０１内には選手が存在する可能性が高いため、コート２０１側には膨張させないようにし、コート２０１と反対側のみ膨張させるといった具合に、領域に応じて膨張させる部分を決定してもよい。さらには、選手やボール等の前景となるオブジェクトが構造物からどれだけ離れているかによって膨張させる割合（膨張率）を変化させてもよい。例えば、前景となるオブジェクトが構造物から遠い位置にある場合は、膨張率を大きくすることで、確実に構造物の３次元モデルが除去されるようにする。また、前景となるオブジェクトが構造物に近い位置にあるほど膨張率を小さくすることで、前景の３次元モデルの部分までが誤って除去されないようにする。この際の膨張率は、前景からの距離に応じてリニアに変化させてもよいし、１又は複数の基準となる距離を設けて段階的に決定してもよい。 Next, at step 1106, the three-dimensional model formation unit 350 obtains the difference between the combined three-dimensional model of the foreground + structure generated at step 1104 and the three-dimensional model of the structure generated at step 1105 Extract a 3D model. Here, the three-dimensional model of the structure may be expanded by, for example, about 10% in three-dimensional space, and then the difference with the integrated three-dimensional model may be obtained. This makes it possible to reliably remove the part corresponding to the structure from the integrated three-dimensional model. At this time, only a part of the three-dimensional model of the structure may be expanded. For example, in the case of the soccer goal 202, since there is a high possibility that a player is present in the soccer court 201, the player is not allowed to inflate on the court 201 side, and is inflated only on the opposite side to the court 201. The portion to be inflated may be determined according to Furthermore, the expansion rate (expansion rate) may be changed depending on how far the foreground object such as the player or the ball is from the structure. For example, when the foreground object is at a position far from the structure, increasing the expansion rate ensures that the three-dimensional model of the structure is removed. Also, by decreasing the expansion rate as the foreground object is closer to the structure, the foreground three-dimensional model portion is prevented from being erroneously removed. The expansion rate at this time may be changed linearly according to the distance from the foreground, or may be determined stepwise by providing one or more reference distances.

図１２（ａ）は、前述の図９と同じ、統合マスクを元に生成した統合３次元モデルを示している。図１２（ｂ）は、構造物マスクのみに基づいて生成した構造物の３次元モデルを示している。そして、図１２（ｃ）は、図１２（ａ）の統合３次元モデルと図１２（ｂ）の構造物の３次元モデルとの差分により得られた、前景のみの３次元モデルを示している。 FIG. 12A shows an integrated three-dimensional model generated based on the integrated mask, as in FIG. 9 described above. FIG. 12 (b) shows a three-dimensional model of a structure generated based only on the structure mask. And FIG.12 (c) has shown the three-dimensional model of only a foreground obtained by the difference of the integrated three-dimensional model of Fig.12 (a), and the three-dimensional model of the structure of FIG.12 (b). .

以上が、本実施形態に係る、３次元モデルの形成処理の内容である。なお、動画の仮想視点画像を生成する場合は、上述の各ステップの処理をフレーム単位で繰り返し行い、フレーム毎の３次元モデルを生成する。ただし、構造物マスクの受信と保存（ステップ１１０１）及び構造物の３次元モデルの生成（ステップ１１０５）については、フローの開始直後にのみ行えば足り、２フレーム目以降については省略可能である。さらに、同じ撮影場所にて日時を変えて撮影を行うような場合は、構造物マスクの受信・保存及び構造物の３次元モデル生成を初回だけ行なってＲＡＭ等に保持しておき、次回以降は保持しておいたものを利用してもよい。 The above is the contents of the process of forming a three-dimensional model according to the present embodiment. In the case of generating a virtual viewpoint image of a moving image, the processing of each step described above is repeatedly performed in frame units to generate a three-dimensional model for each frame. However, the reception and storage of the structure mask (step 1101) and the generation of the three-dimensional model of the structure (step 1105) may be performed only immediately after the start of the flow, and may be omitted for the second and subsequent frames. Furthermore, when changing the date and time at the same shooting location and shooting, the reception and storage of the structure mask and the three-dimensional model generation of the structure are performed only for the first time and held in the RAM etc. You may use what you hold.

（変形例）
なお、本実施形態では、前景＋構造物の統合３次元モデルから、構造物の３次元モデルを差し引くことで、前景のみの３次元モデルを生成したがこれに限定されない。例えば、前景＋構造物の統合３次元モデルを構成するボクセル毎（或いは所定領域毎）にどのマスク画像に含まれるかをカウントし、カウント値が閾値以下の部分を統合３次元モデルから削除することで前景のみの３次元モデルを求めてもよい。この際の閾値は、全カメラ台数より少ない任意の値を、各カメラの設置位置や視線方向などを考慮して設定する。カメラ台数が全８台で図２（ａ）のようなカメラ配置の本実施形態の場合は、閾値として例えば“２”を設定することで、サッカーゴールのみを削除することができる。 (Modification)
In the present embodiment, the three-dimensional model of only the foreground is generated by subtracting the three-dimensional model of the structure from the integrated three-dimensional model of the foreground + the structure, but the present invention is not limited thereto. For example, it counts which mask image is included in each voxel (or each predetermined area) which composes an integrated 3D model of foreground + structure, and deletes a portion having a count value equal to or less than a threshold from the integrated 3D model The foreground only three-dimensional model may be obtained by The threshold value in this case is set to an arbitrary value smaller than the total number of cameras in consideration of the installation position of each camera, the line of sight direction, and the like. In the case of the present embodiment in which the number of cameras is eight and the camera arrangement is as shown in FIG. 2A, only the soccer goal can be deleted by setting, for example, "2" as the threshold value.

以上のとおり本実施形態によれば、前景となるオブジェクトを隠してしまう構造物が存在していても、構造物を含まない高精度な前景のみの３次元モデルを生成することができる。 As described above, according to the present embodiment, even if there is a structure that hides the foreground object, it is possible to generate a highly accurate three-dimensional model of the foreground only, which does not include the structure.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or storage medium, and one or more processors in a computer of the system or apparatus read and execute the program. Can also be realized. It can also be implemented by a circuit (eg, an ASIC) that implements one or more functions.

１４０３次元モデル生成装置
３１０データ受信部
３３０マスク合成部
３５０３次元モデル形成部 140 Three-Dimensional Model Generator 310 Data Receiver 330 Mask Synthesizer 350 Three-Dimensional Model Generator

本発明に係る生成装置は、複数の撮影方向からの撮影により得られた複数の画像内のオブジェクトの領域を示す第１領域情報を取得する第１取得手段と、前記複数の撮影方向の少なくとも一つの撮影方向からの撮影時に前記オブジェクトを遮る可能性のある構造物の領域を示す第２領域情報を取得する第２取得手段と、前記第１取得手段により取得したオブジェクトの領域を示す第１領域情報と前記第２取得手段により取得した構造物の領域を示す第２領域情報の両方に基づき、前記オブジェクトに対応する３次元形状データを生成する生成手段と、を有することを特徴とする。 The generation apparatus according to the present invention comprises: a first acquisition unit for acquiring first area information indicating an area of an object in a plurality of images obtained by photographing from a plurality of photographing directions; and at least one of the plurality of photographing directions. A second acquisition unit that acquires second area information indicating an area of a structure that may obstruct the object at the time of imaging from one imaging direction; and a first area that indicates the area of the object acquired by the first acquisition unit And generating means for generating three-dimensional shape data corresponding to the object based on both the information and the second area information indicating the area of the structure acquired by the second acquiring means.

図２（ｂ）は、カメラアレイ１１０を構成する全８台のカメラ２１１〜２１８の配置を、フィールド２００を真上から見た俯瞰図において示した図である。各カメラ２１１〜２１８は、地上からある一定の高さにフィールド２００を囲むように設置されており、一方のゴール前を様々な角度から撮影して、視点の異なる複数視点画像データを取得する。芝生のフィールド２００上には、サッカーコート２０１が（実際には白のラインで）描かれており、その左側にサッカーゴール２０２が置かれている。また、サッカーゴール２０２の前の×印２０３は、カメラ２１１〜２１８の共通の視線方向（注視点）を示し、破線の円２０４は注視点２０３を中心としてカメラ２１１〜２１８がそれぞれ撮影可能なエリアを示している。本実施形態では、フィールド２００の1つの角を原点として、長手方向をx軸、短手方向をy軸、高さ方向をz軸とした座標系で表すこととする。カメラアレイ１１０の各カメラで得られた複数視点画像のデータは、制御装置１２０及び前景分離装置１３０へ送られる。なお、図２（ａ）では、各カメラ２１１〜２１８と、制御装置１２０及び前景分離装置１３０とは、スター型のトポロジーで接続されているがデイジーチェーン接続によるリング型やバス型のトポロジーでもよい。また、図２において、カメラ８台の例を示したが、カメラの数は、８台未満または８台を超えてもよい。 FIG. 2B is a view showing the arrangement of all eight cameras 211 to 218 constituting the camera array 110 in an overhead view when the field 200 is viewed from directly above. Each of the cameras 211 to 218 is installed to surround the field 200 at a certain height from the ground, and one of the goals is photographed from various angles to acquire multi-viewpoint image data having different viewpoints. On a grass field 200, a soccer court 201 is drawn (actually by a white line), and a soccer goal 202 is placed on the left side thereof. In addition, a cross mark 203 in front of the soccer goal 202 indicates a common line of sight direction (gaze point) of the cameras 211 to 218, and a circle 204 indicated by a broken line is an area where the cameras 211 to 218 can be photographed centering on the gaze point 203. Is shown. In this embodiment, a coordinate system in which the longitudinal direction is the x axis, the short direction is the y axis, and the height direction is the z axis, with one corner of the field 200 as the origin. Data of multi- viewpoint images obtained by each camera of the camera array 110 is sent to the control device 120 and the foreground separation device 130. In FIG. 2A, the cameras 211 to 218, and the control device 120 and the foreground separation device 130 are connected in a star topology, but may be a ring or bus topology by daisy chain connection. . Also, although FIG. 2 shows an example of eight cameras, the number of cameras may be less than eight or more than eight.

データ受信部３１０は、カメラアレイ１１０を構成する各カメラのカメラパラメータ及び撮影シーン内に存在する構造物の２次元シルエットを表す構造物マスクを、制御装置１２０から受信する。また、カメラアレイ１１０の各カメラで得られた撮影画像（複数視点画像）及び各撮影画像内に存在する前景の２次元シルエットを表す前景マスクのデータを前景分離装置１３０から受信する。受信したデータのうち、構造物マスクは構造物マスク保存部３２０に、前景マスクはマスク合成部３３０に、複数視点画像は座標変換部３４０に、カメラパラメータは座標変換部３４０と３次元モデル形成部３５０に、それぞれ渡される。 The data reception unit 310 receives, from the control device 120, the camera parameters of the respective cameras constituting the camera array 110 and the structure mask representing the two-dimensional silhouette of the structure present in the photographed scene. Also, from the foreground separation device 130, captured images (multi-viewpoint images) obtained by each camera of the camera array 110 and foreground mask data representing a two-dimensional silhouette of the foreground present in each captured image are received. Among the received data, the structure mask is for the structure mask storage unit 320, the foreground mask is for the mask combination unit 330, the multi- viewpoint image is for the coordinate conversion unit 340, and the camera parameters are the coordinate conversion unit 340 and the three-dimensional model formation unit Passed to 350, respectively.

次に、ステップ４０２では、データ受信部３１０が、各カメラ２１１〜２１８で撮影された画像における前景（ここでは、選手やボール）の２次元シルエットを示す前景マスクを、その元になった複数視点画像と共に、前景分離装置１３０から受信する。図７（ａ）〜（ｈ）は、図５（ａ）〜（ｈ）の各撮影画像に対応する前景マスクをそれぞれ示している。前景分離装置１３０は、同じアングルから撮影された画像間で時間的に変化のある領域を前景として抽出するため、図７（ａ）、（ｂ）、（ｈ）の各図では、サッカーゴール２０２に隠れている選手の一部の領域は前景領域として抽出されない。受信した前景マスクのデータはマスク合成部３３０に送られる。 Next, in step 402, the data receiving unit 310 is a plurality of viewpoints from which a foreground mask indicating a two-dimensional silhouette of the foreground (here, a player or a ball) in the image captured by each of the cameras 211 to 218 is derived . It is received from the foreground separation device 130 together with the image. FIGS. 7 (a) to 7 (h) respectively show foreground masks corresponding to the photographed images of FIGS. 5 (a) to 5 (h). The foreground separation device 130 extracts a region having a temporal change between images captured from the same angle as the foreground, so in each of FIGS. 7 (a), (b) and (h), the soccer goal 202 Some areas of the player hidden in the are not extracted as foreground areas. The received foreground mask data is sent to the mask combining unit 330.

Claims

A first mask image showing a region of a structure which is a stationary object in each image photographed at a plurality of viewpoints, and a foreground region which is an object of a moving object in each image photographed at each of the plurality of viewpoints Acquisition means for acquiring a second mask image;
The acquired first mask image and the second mask image are combined to generate a third mask image in which an area of the structure and an area of the foreground in the images captured at the plurality of viewpoints are integrated. Synthesis means,
Generation means for generating a three-dimensional model including the structure and the foreground by a view volume intersection method using the third mask image;
An apparatus for generating a three-dimensional model, comprising:

The generation apparatus according to claim 1, wherein the combining unit performs the combining by obtaining a logical sum of the first mask image and the second mask image.

The foreground area is an area in which a dynamic object whose position can change in each image when the photographing is performed in time series from the same angle is reflected,
The area of the structure is an area where there is a static object which may block at least a part of the foreground, whose position does not change in each image when the photographing is performed in time series from the same angle The generating device according to claim 1 or 2, characterized in that:

The generation means is
A three-dimensional model of the structure is further generated by a view volume intersection method using the first mask image,
A three-dimensional model of the foreground is generated from a difference between the generated three-dimensional model of the structure and the integrated three-dimensional model.
The generator according to any one of claims 1 to 3, wherein

The said generation means calculates | requires the difference with the said integrated three-dimensional model using the three-dimensional model of the said structure which expanded the at least one part on three-dimensional space, The said three-dimensional model Generator of.

The generating device according to claim 5, wherein the generating means determines an expanding portion of a three-dimensional model of the structure in accordance with a region in the three-dimensional space in which the structure exists.

The generation apparatus according to claim 5, wherein the generation means determines an expansion ratio of a three-dimensional model of the structure in accordance with a distance between the structure and the foreground.

8. The generation apparatus according to claim 7, wherein the generation unit increases the expansion ratio of the three-dimensional model of the structure as the structure is farther from the foreground.

Acquiring a first mask image indicating a region of a structure in each image captured at a plurality of viewpoints, and acquiring a second mask image indicating a region of a foreground in each image captured at the plurality of viewpoints;
The acquired first mask image and the second mask image are combined to generate a third mask image in which an area of the structure and an area of the foreground in the images captured at the plurality of viewpoints are integrated. A synthesis step,
Generating a three-dimensional model including the structure and the foreground by a view volume intersection method using the third mask image;
A method of generating a three-dimensional model, comprising:

A program for causing a computer to function as the generation device according to any one of claims 1 to 8.