JP2018163467A

JP2018163467A - Method, device and program for generating and displaying free viewpoint image

Info

Publication number: JP2018163467A
Application number: JP2017059554A
Authority: JP
Inventors: 浩嗣三功; Hiroshi Sanko
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-03-24
Filing date: 2017-03-24
Publication date: 2018-10-18
Anticipated expiration: 2037-03-24
Also published as: JP6818606B2

Abstract

【課題】オクルージョン領域についても主観品質を損なうことなくオブジェクトの3Dモデルを短時間で再現できる自由視点画像の生成表示方法、装置およびプログラムを提供する。【解決手段】２つのオブジェクトOj1，Oj2間にオクルージョンが発生すると、オクルージョン領域以外の画素ついては、矩形投影画像を抽出したカメラ画像の対応画素の画素値をテクスチャとして取得する。オブジェクトOj1の脚部のテクスチャは、その抽出元カメラ画像である視点２のカメラ画像からは取得できないので、オクルージョン領域（灰色）の画素については、光線探索により投影元ボクセルBtを特定する。そして、当該投影元ボクセルBtを他のカメラ画像に投影し、前記各オブジェクトOj1，Oj2の三次元形状モデルの位置関係に基づいて当該ボクセルの観測可否を判定すると、観測可能なカメラ画像から対応画素の画素値を取得する。【選択図】図４PROBLEM TO BE SOLVED: To provide a method, an apparatus and a program for generating and displaying a free viewpoint image capable of reproducing a 3D model of an object in a short time without impairing the subjective quality of an occlusion area. When an occlusion occurs between two objects Oj1 and Oj2, the pixel values of the corresponding pixels of the camera image from which the rectangular projection image is extracted are acquired as a texture for the pixels other than the occlusion area. Since the texture of the leg of the object Oj1 cannot be obtained from the camera image of the viewpoint 2 which is the extraction source camera image, the projection source voxel Bt is specified by the ray search for the pixels in the occlusion region (gray). Then, when the projection source voxel Bt is projected onto another camera image and the observability of the voxel is determined based on the positional relationship of the three-dimensional shape models of the objects Oj1 and Oj2, the corresponding pixel is determined from the observable camera image. Get the pixel value of. [Selection diagram] Fig. 4

Description

本発明は、自由視点画像の生成表示方法、装置およびプログラムに係り、特に、スポーツ競技等の比較的大きな空間で複数台のカメラを密に配置できない撮影条件に好適な自由視点画像の生成表示方法、装置およびプログラムに関する。 The present invention relates to a free viewpoint image generation and display method, apparatus, and program, and more particularly to a free viewpoint image generation and display method suitable for shooting conditions in which a plurality of cameras cannot be closely arranged in a relatively large space such as a sporting event. , Apparatus and program.

自由視点映像の生成手法は、大きくモデルベース型とイメージベース型とに分類される。モデルベース型の代表的な手法が非特許文献１に開示されている。非特許文献１では、被写体の3次元形状を3Dモデルとして復元し、3Dモデル表面を細かなポリゴンに分割した後、各ポリゴンのテクスチャを複数の撮影カメラ映像から取得して視点位置に応じて適切な割合でブレンドすることで、撮影カメラが存在しない視点からの映像を合成する。 Free viewpoint video generation methods are roughly classified into a model-based type and an image-based type. A typical model-based method is disclosed in Non-Patent Document 1. In Non-Patent Document 1, the 3D shape of a subject is restored as a 3D model, the surface of the 3D model is divided into fine polygons, and then the texture of each polygon is acquired from a plurality of shooting camera images and appropriate according to the viewpoint position. By blending at a proper ratio, the video from the viewpoint where there is no shooting camera is synthesized.

この手法は、理論的には任意視点での見え方を滑らかに再現することが可能であるが、最終的な合成画像が、ポリゴンの繋ぎ合わせとして表現される特性上、合成画像の品質が3Dモデルやカメラパラメータの精度に大きく依存するので、スポーツシーン等、カメラを密に配置できない場合の品質には限界がある。 Although this method can theoretically reproduce the appearance from an arbitrary viewpoint smoothly, the quality of the synthesized image is 3D due to the characteristic that the final synthesized image is represented as a combination of polygons. Since it greatly depends on the accuracy of the model and camera parameters, there is a limit to the quality when the cameras cannot be arranged closely, such as in a sports scene.

一方、イメージベース型の代表的な手法が非特許文献２に開示されている。非特許文献２では、各撮影カメラ映像から被写体の存在領域を抽出し、1枚の矩形型ポリゴンとして表現するビルボード方式が提案されている。 On the other hand, a typical image-based technique is disclosed in Non-Patent Document 2. Non-Patent Document 2 proposes a billboard method in which an existing area of a subject is extracted from each shooting camera video and expressed as a single rectangular polygon.

ビルボード方式では、被写体の3次元形状は復元しないが、対象とする3次元空間においてビルボードを設置する座標を算出し、撮影カメラ映像から取得される被写体領域のテクスチャをマッピングすることで、被写体同士や、被写体と背景との位置関係を任意視点において再現することが可能である。 In the billboard method, the 3D shape of the subject is not restored, but the subject is calculated by calculating the coordinates where the billboard is installed in the target 3D space and mapping the texture of the subject area obtained from the camera image. It is possible to reproduce the positional relationship between each other and the subject and the background at an arbitrary viewpoint.

ビルボード方式では、3Dモデル方式のような滑らかな見え方の変化を再現することはできないが、撮影カメラ映像のテクスチャを加工することなく、そのまま利用するため、3Dモデル方式に比べて高精細な見え方を実現できる。 The billboard method cannot reproduce the smooth changes in appearance as in the 3D model method, but it is used as it is without processing the texture of the shooting camera image, so it has higher definition than the 3D model method. You can see how it looks.

特開2015-191538号公報JP-A-2015-191538

T. Kanade, P. W. Rander, and P. J. Narayanan, "Virtualized Reality: Constructing Virtual Worlds from Real Scenes," IEEE Multimedia, vol. 4, no. 1, pp. 34-47, 1997.T. Kanade, P. W. Rander, and P. J. Narayanan, "Virtualized Reality: Constructing Virtual Worlds from Real Scenes," IEEE Multimedia, vol. 4, no. 1, pp. 34-47, 1997. Y. Ohta, I. Kitahara, Y. Kameda, H. Ishikawa, and T. Koyama, "Live 3D Video in Soccer Stadium," International Journal of Computer Vision (IJCV), vol. 75, no. 1, pp. 173-187, 2007.Y. Ohta, I. Kitahara, Y. Kameda, H. Ishikawa, and T. Koyama, "Live 3D Video in Soccer Stadium," International Journal of Computer Vision (IJCV), vol. 75, no. 1, pp. 173 -187, 2007. Hiroshi Sankoh and Sei Naito, "Free-viewpoint Video Rendering in Large Outdoor Space such as Soccer Stadium based on Object Extraction and Tracking Technology," The Journal of The Institute of Image Information and Television Engineers (ITE), Vol. 68, No. 3, pp. J125-J134, 2014.Hiroshi Sankoh and Sei Naito, "Free-viewpoint Video Rendering in Large Outdoor Space such as Soccer Stadium based on Object Extraction and Tracking Technology," The Journal of The Institute of Image Information and Television Engineers (ITE), Vol. 68, No. 3, pp. J125-J134, 2014.

モデルベース型およびイメージベース型に共通する最大の課題として、撮影カメラ映像における被写体同士の重なり等によって生じるオクルージョンがある。特にビルボード方式では、各カメラ映像における被写体領域から取得されるテクスチャを加工することなくそのまま利用するため、別の被写体との重なり等によって観測できない領域が生じる場合、当該領域の見え方を再現することは不可能である。 The biggest problem common to the model base type and the image base type is occlusion caused by overlapping of subjects in the captured camera video. In particular, in the billboard method, the texture acquired from the subject area in each camera image is used as it is without being processed. Therefore, when an area that cannot be observed due to an overlap with another subject occurs, the appearance of the area is reproduced. It is impossible.

オクルージョン領域において、カメラに対して手前側に存在する被写体であっても、重なりの境界を適切に分離することは困難であるため、適切な被写体領域の抽出は難易度の高い課題であると言える。 Even in the occlusion area, it is difficult to properly separate the boundary of overlap even if the object is on the near side of the camera, so it can be said that the extraction of the appropriate object area is a difficult task. .

このような技術課題に対して、各被写体に個別IDを与え、時間方向でIDが保持されるよう追跡を行うことで、オクルージョンの検出と、フレーム間での補間を行う手法が非特許文献３および特許文献１に開示されている。 In order to deal with such technical problems, a technique for detecting occlusion and interpolating between frames by assigning an individual ID to each subject and performing tracking so that the ID is held in the time direction is described in Non-Patent Document 3. And Patent Document 1.

これらの先行技術は、複数カメラの情報を用いることで、オクルージョン領域における各被写体のテクスチャと、ビルボードの設置座標の各々をフレーム間で補間する手法を提案する。しかしながら、オクルージョンの継続時間（フレーム数）が長い場合、補間性能に限界があるために主観品質を損なう問題がある。 These prior arts propose a method of interpolating between the texture of each subject in the occlusion area and the installation coordinates of the billboard between frames by using information from a plurality of cameras. However, when the occlusion duration (the number of frames) is long, there is a problem of impairing subjective quality due to the limited interpolation performance.

また、追跡処理に誤りが含まれる場合、異なる被写体間でテクスチャや位置を補正することとなり、合成映像の主観品質を著しく損なうという課題もある。追跡誤りに対する頑健性を高める目的から、特許文献１では、目視による追跡IDの確認・修正を可能とする手法が提案されているが、非常に多くの時間を要するという課題がある。 Further, when an error is included in the tracking process, the texture and position are corrected between different subjects, and there is a problem that the subjective quality of the synthesized video is significantly impaired. For the purpose of improving robustness against tracking errors, Patent Document 1 proposes a technique that enables confirmation / correction of a tracking ID by visual observation. However, there is a problem that much time is required.

本発明の目的は、上記の技術課題を解決し、ビルボード方式での合成過程において各カメラにおける各オブジェクトの存在領域を個別に抽出することにより、ビルボード方式の固有の高精細な見え方を維持しながら、特にオクルージョン領域についても主観品質を損なうことなくオブジェクトの3Dモデルを短時間で再現できる自由視点画像の生成表示方法、装置およびプログラムを提供することにある。 The object of the present invention is to solve the above technical problem and individually extract the existence area of each object in each camera in the composition process in the billboard system, thereby providing a unique high-definition appearance of the billboard system. An object of the present invention is to provide a free viewpoint image generation and display method, apparatus, and program capable of reproducing a 3D model of an object in a short time without sacrificing subjective quality, particularly in an occlusion area.

上記の目的を達成するために、本発明は、自由視点画像の生成表示方法、装置およびプログラムにおいて、以下のような構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is characterized in that a free viewpoint image generation and display method, apparatus, and program have the following configurations.

(1) 本発明の自由視点画像を生成して表示する装置は、オブジェクトを異なる視点で撮影した複数のカメラ画像に基づいて生成した３次元形状モデルを各視点へ投影し、各オブジェクトの投影像部分を含む矩形投影画像を視点ごとに抽出する手段と、オブジェクトごとに各視点における矩形投影画像のテクスチャを取得する手段と、各矩形投影画像およびそのテクスチャに基づいてビルボードを生成する手段と、自由視点を特定する情報に基づいてビルボードを表示するカメラを選択して、そのカメラに関して生成した全てのビルボードを表示する手段とを具備した。 (1) An apparatus for generating and displaying a free viewpoint image according to the present invention projects a three-dimensional shape model generated based on a plurality of camera images obtained by photographing an object from different viewpoints to each viewpoint, and a projected image of each object. Means for extracting a rectangular projection image including a portion for each viewpoint, means for obtaining a texture of the rectangular projection image at each viewpoint for each object, means for generating a billboard based on each rectangular projection image and its texture, Means for selecting a camera that displays a billboard based on information identifying a free viewpoint and displaying all billboards generated for that camera.

(2) 本発明のコンピュータが自由視点画像を生成して表示する方法は、オブジェクトを異なる視点で撮影した複数のカメラ画像に基づいて生成した３次元形状モデルを各視点へ投影し、各オブジェクトの投影像部分を含む矩形投影画像を視点ごとに抽出し、オブジェクトごとに各視点における矩形投影画像のテクスチャを対応するカメラ画像から取得し、各矩形投影画像およびそのテクスチャに基づいてビルボードを生成し、自由視点を特定する情報に基づいてビルボードを表示するカメラを選択して、そのカメラに関して生成した全てのビルボードを表示するようにした。 (2) The computer according to the present invention generates and displays a free viewpoint image by projecting a three-dimensional shape model generated based on a plurality of camera images obtained by photographing an object from different viewpoints to each viewpoint. A rectangular projection image including a projected image portion is extracted for each viewpoint, and the texture of the rectangular projection image at each viewpoint is acquired from each camera for each object, and a billboard is generated based on each rectangular projection image and its texture. The camera that displays the billboard is selected based on the information for specifying the free viewpoint, and all the billboards generated for the camera are displayed.

(3) 本発明の自由視点画像を生成して表示するプログラムは、オブジェクトを異なる視点で撮影した複数のカメラ画像に基づいて生成した３次元形状モデルを各視点へ投影し、各オブジェクトの投影像部分を含む矩形投影画像を視点ごとに抽出する手順と、オブジェクトごとに各視点における矩形投影画像のテクスチャを対応するカメラ画像から取得する手順と、前記各矩形投影画像およびそのテクスチャに基づいてビルボードを生成する手順と、自由視点を特定する情報に基づいてビルボードを表示するカメラを選択し、そのカメラに関して生成した全てのビルボードを表示する手順とを、コンピュータが実行可能に記述した。 (3) A program for generating and displaying a free viewpoint image according to the present invention projects a three-dimensional shape model generated based on a plurality of camera images obtained by photographing an object from different viewpoints to each viewpoint, and projects each object. A procedure for extracting a rectangular projection image including a part for each viewpoint, a procedure for obtaining a texture of the rectangular projection image at each viewpoint from a corresponding camera image for each object, and a billboard based on each rectangular projection image and its texture And a procedure for selecting a camera that displays a billboard based on information for specifying a free viewpoint and displaying all billboards generated for the camera.

(1) 自由視点映像の最終的な表示およびレンダリングはビルボード方式により実施することを前提に、合成過程でオブジェクトの三次元形状モデルを復元し、当該モデルの各カメラへの投影像を利活用することで、各カメラにおけるオクルージョン領域を検出し、オブジェクトごとの存在領域およびテクスチャを抽出できるようになる。 (1) Assuming that the final display and rendering of the free viewpoint video will be carried out using the billboard method, the 3D shape model of the object is restored during the synthesis process, and the projected image of each model on each camera is used. By doing so, it becomes possible to detect the occlusion area in each camera and extract the existence area and texture for each object.

すなわち、ビルボードを作成する際、非オクルージョン領域に関しては投影像の抽出元カメラ画像から対応画素のテクスチャを取得する一方、オクルージョン領域に関しては、抽出元カメラ以外のカメラ画像の対応画素からテクスチャを取得することができる。 That is, when creating a billboard, the texture of the corresponding pixel is acquired from the source image of the projection image for the non-occlusion area, while the texture is acquired from the corresponding pixel of the camera image other than the source camera for the occlusion area. can do.

したがって、主観品質を損なうことなくオブジェクトの3Dモデルを短時間で再現できるようになり、オクルージョンが頻繁に発生するようなシーンにおいても、手作業を必要とすることなく高品質な自由視点画像を自動生成することが可能となる。 Therefore, 3D models of objects can be reproduced in a short time without impairing subjective quality, and high-quality free viewpoint images are automatically generated without requiring manual work even in scenes where frequent occlusion occurs. Can be generated.

(2) 特に大空間を対象とする場合、被写体の3次元形状の復元性能には限界があるが、本発明では、視体積交差法で復元されるVisual Hullの各カメラへの投影像に注目することで、各カメラにおける被写体の存在領域を個別に抽出できるようになる。 (2) Although there is a limit to the restoration performance of the three-dimensional shape of the subject, especially when targeting a large space, in the present invention, pay attention to the projected image on each camera of Visual Hull restored by the visual volume intersection method By doing so, it is possible to individually extract a subject existing area in each camera.

本発明の自由視点画像の生成表示方法、装置およびプログラムが適用されるシステムの一実施形態の構成を示した機能ブロック図である。1 is a functional block diagram showing a configuration of an embodiment of a system to which a method and apparatus for generating and displaying a free viewpoint image according to the present invention are applied. オブジェクトの三次元形状を推定する方法を示した図である。It is the figure which showed the method of estimating the three-dimensional shape of an object. オブジェクトごとに、その投影像を含む矩形投影画像を視点ごとに取得する方法を示した図である。It is the figure which showed the method of acquiring the rectangular projection image containing the projection image for every object for every viewpoint. 矩形投影画像の投影像部分に張り付けるテクスチャの取得方法を示した図である。It is the figure which showed the acquisition method of the texture stuck on the projection image part of a rectangular projection image. ビルボードの生成方法を示した図である。It is the figure which showed the production | generation method of a billboard. ビルボードの表示方法を示した図である。It is the figure which showed the display method of a billboard.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明に係る自由視点画像の生成表示方法、装置およびプログラムの一実施形態の構成を示した機能ブロック図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing a configuration of an embodiment of a free viewpoint image generation and display method, apparatus and program according to the present invention.

このようなシステムは、汎用のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装して構成しても良いし、あるいはアプリケーションの一部がハードウェア化またはROM化された専用機や単能機として構成しても良い。 Such a system may be configured by mounting an application (program) for realizing each function on a general-purpose computer or server, or a dedicated machine or a single unit in which a part of the application is implemented in hardware or ROM. You may comprise as a function.

多視点画像入力部１０は、オブジェクトObjを異なる視点で撮影する複数台のカメラCa（Ca1，Ca2，Ca3…）からカメラ画像をフレーム単位で取得する。以下の説明では、各カメラCaまたはそのカメラ画像を「視点１」、「視点２」…と表現する場合もある。三次元形状モデル生成部２０は、各カメラCaから取得したカメラ画像に基づいてフレーム単位でオブジェクトObjごとに三次元形状モデルを生成する。 The multi-viewpoint image input unit 10 acquires camera images in units of frames from a plurality of cameras Ca (Ca1, Ca2, Ca3...) That capture the object Obj from different viewpoints. In the following description, each camera Ca or its camera image may be expressed as “viewpoint 1”, “viewpoint 2”. The three-dimensional shape model generation unit 20 generates a three-dimensional shape model for each object Obj on a frame basis based on the camera image acquired from each camera Ca.

前記三次元形状モデル生成部２０において、カメラパラメータ推定部２１は、各カメラCaの中心射影行列（カメラパラメータ）を推定する。マスク画像抽出部２２は、フレーム画像ごとに各オブジェクトの存在領域を示すマスク画像を抽出する。三次元形状推定部２３は、図２に示したように、各視点において抽出したマスク画像を実空間に投影した視体積の中にオブジェクトが含まれるという制約に基づいて、視体積交差法により、複数のマスク画像に対応する視体積の共通部分をオブジェクトの三次元形状（３次元ボクセルデータ）と推定し、３次元ボクセル空間におけるオブジェクトの存在領域を示すVisual Hullを復元する。 In the three-dimensional shape model generation unit 20, the camera parameter estimation unit 21 estimates the center projection matrix (camera parameter) of each camera Ca. The mask image extraction unit 22 extracts a mask image indicating the existence area of each object for each frame image. As shown in FIG. 2, the three-dimensional shape estimation unit 23 uses a visual volume intersection method based on the constraint that an object is included in the visual volume obtained by projecting the mask image extracted at each viewpoint into the real space. A common part of the visual volume corresponding to a plurality of mask images is estimated as a three-dimensional shape (three-dimensional voxel data) of the object, and a Visual Hull indicating an object existing area in the three-dimensional voxel space is restored.

三次元形状モデル復元部２４は、前記ボクセル空間の幾何連結性に基づいて前記Visual Hullをクラス分類し、前記クラス分類されたVisual Hullのうち、大きさ（ボクセル数）や高さ（y座標）が所定の条件を満たすVisual Hullのみを各オブジェクトの三次元形状モデルとして復元する。 The three-dimensional shape model restoration unit 24 classifies the Visual Hull based on the geometric connectivity of the voxel space, and the size (number of voxels) and height (y coordinate) of the classified Visual Hull. Only Visual Hull satisfying the predetermined condition is restored as the three-dimensional shape model of each object.

矩形投影画像抽出部３０は、図３に示したように、復元されたオブジェクトの三次元形状モデルを各視点のカメラ画像へ投影し、各オブジェクトの投影像を含む矩形画像（以下、矩形投影画像と表現する場合もある）を視点ごとに取得する。 As shown in FIG. 3, the rectangular projection image extraction unit 30 projects the restored three-dimensional model of the object onto the camera image of each viewpoint, and includes a rectangular image including the projection image of each object (hereinafter, rectangular projection image). For each viewpoint.

前記矩形投影画像抽出部３０において、オクルージョン領域特定部３１は、各矩形投影画像の投影像部分の画素ごとに光線探索を行うことで、各視点のオブジェクトごとにオクルージョン領域（灰色）を特定する。 In the rectangular projection image extraction unit 30, the occlusion region specifying unit 31 performs a ray search for each pixel in the projection image portion of each rectangular projection image, thereby specifying an occlusion region (gray) for each object at each viewpoint.

テクスチャ取得部４０は、各視点のカメラ画像からオブジェクトのテクスチャを取得して各矩形投影画像の投影像部分に張り付ける。図４は、前記テクスチャ取得部４０によるテクスチャの取得方法を、視点２のカメラ画像から抽出した矩形投影画像に注目して説明するための図である。 The texture acquisition unit 40 acquires the texture of the object from each viewpoint camera image and pastes it on the projection image portion of each rectangular projection image. FIG. 4 is a diagram for explaining the texture acquisition method by the texture acquisition unit 40 by paying attention to the rectangular projection image extracted from the camera image of the viewpoint 2.

テクスチャ取得部４０において、第1テクスチャ取得部４１は、矩形投影画像の投影像部分のうち、オクルージョン領域以外（非オクルージョン領域）の画素ついて、当該矩形投影画像を抽出したカメラ画像（抽出元カメラ画像）の対応画素の画素値をテクスチャとして取得する。第２テクスチャ取得部４２は、前記矩形投影画像の投影像部分のうち、オクルージョン領域の画素ついて、抽出元カメラ画像以外のカメラ画像の対応画素の画素値をテクスチャとして取得する。 In the texture acquisition unit 40, the first texture acquisition unit 41 extracts a camera image (extraction source camera image) obtained by extracting the rectangular projection image for pixels other than the occlusion region (non-occlusion region) in the projection image portion of the rectangular projection image. ) Is acquired as a texture. The second texture acquisition unit 42 acquires, as a texture, the pixel value of the corresponding pixel of the camera image other than the extraction source camera image for the pixel of the occlusion area in the projection image portion of the rectangular projection image.

図４の例では、２つのオブジェクトOj1，Oj2間にオクルージョンが発生しており、オブジェクトOj1の脚部のテクスチャを、その抽出元カメラ画像である視点２のカメラ画像からは取得できない。 In the example of FIG. 4, occlusion occurs between the two objects Oj1 and Oj2, and the texture of the leg of the object Oj1 cannot be acquired from the camera image of the viewpoint 2 that is the extraction source camera image.

前記第２テクスチャ取得部４２は、矩形投影画像のオクルージョン領域（灰色）の画素について、光線探索により投影元ボクセルBtを特定する。そして、当該投影元ボクセルBtを他のカメラ画像に投影し、前記各オブジェクトOj1，Oj2の三次元形状モデルの位置関係に基づいて当該ボクセルの観測可否を判定すると、観測可能なカメラ画像から対応画素の画素値を取得する。 The second texture acquisition unit 42 specifies the projection source voxel Bt by ray search for the pixels in the occlusion area (gray) of the rectangular projection image. Then, by projecting the projection source voxel Bt onto another camera image and determining whether or not the voxel can be observed based on the positional relationship of the three-dimensional shape model of the objects Oj1 and Oj2, the corresponding pixels are obtained from the observable camera image. The pixel value of is acquired.

なお、観測可能なカメラが複数存在する場合には、抽出元のカメラに最寄りのカメラを特定し、当該最寄りのカメラから対応画素の画素値を優先的に取得する。図示の例では、視点２のカメラ映像からは取得できないオブジェクトOj1の脚部のテクスチャを、視点１および視点Nのいずれからも取得できるが、最寄りの視点１のカメラ画像から取得している。 When there are a plurality of cameras that can be observed, the camera nearest to the extraction source camera is specified, and the pixel value of the corresponding pixel is preferentially acquired from the nearest camera. In the illustrated example, the texture of the leg of the object Oj1 that cannot be acquired from the camera image of the viewpoint 2 can be acquired from either the viewpoint 1 or the viewpoint N, but is acquired from the camera image of the nearest viewpoint 1.

ビルボード生成部５０は、視点ごとに全てのオブジェクトObjのビルボードを生成する。テクスチャ画像抽出部５１は、各視点のオブジェクトごとにテクスチャを含む前記矩形投影画像を抽出する。 The billboard generation unit 50 generates billboards for all objects Obj for each viewpoint. The texture image extraction unit 51 extracts the rectangular projection image including the texture for each viewpoint object.

ビルボードサイズ決定部５２は、図５に示したように、矩形投影画像の底辺に存在する画素の投影元ボクセルを特定し、当該投影元ボクセルと対象となるカメラとの距離および前記矩形投影画像の縦横比に基づいて各ビルボードのサイズを算出する。ビルボード設置部５３は、前記各投影元ボクセルの3次元座標に前記生成したビルボードを前記算出したサイズで設置する。 As shown in FIG. 5, the billboard size determination unit 52 identifies the projection source voxel of the pixel existing on the bottom side of the rectangular projection image, the distance between the projection source voxel and the target camera, and the rectangular projection image The size of each billboard is calculated based on the aspect ratio. The billboard installation unit 53 installs the generated billboard in the calculated size on the three-dimensional coordinates of each projection source voxel.

ビルボード表示部６０は、視点操作情報に基づいて仮想視点に最寄りのカメラを推定し、当該カメラに関して生成された全てのビルボードを選表示する。 The billboard display unit 60 estimates a camera nearest to the virtual viewpoint based on the viewpoint operation information, and selects and displays all billboards generated for the camera.

前記ビルボード表示部６０において、仮想視点算出部６１は、各カメラから取得した注視点の変更、撮影カメラ視点への移動、前進後退、左右回転、上下回転などの視点操作情報に基づいて仮想視点の位置および向きを算出する。 In the billboard display unit 60, the virtual viewpoint calculation unit 61 performs a virtual viewpoint based on viewpoint operation information such as change of the gazing point acquired from each camera, movement to the photographing camera viewpoint, forward / backward movement, left / right rotation, and vertical rotation. Calculate the position and orientation of.

ビルボード選択部６２、図６に示したように、隣接するカメラペア（A，B）ごとにカメラ選択に関する境界面を予め設定しておき、前記算出された仮想視点が一方のカメラAに割り当てられた領域を指向していれば当該カメラAの画像を抽出元とする全てのビルボードを表示する。また、他方のカメラBに割り当てられた領域を指向していれば当該カメラBを抽出元とする全てのビルボードを表示する。 As shown in FIG. 6, the billboard selection unit 62 preliminarily sets a boundary for camera selection for each adjacent camera pair (A, B), and the calculated virtual viewpoint is assigned to one camera A. If it is directed to the designated area, all billboards from which the image of the camera A is extracted are displayed. If the area assigned to the other camera B is pointed, all billboards with the camera B as an extraction source are displayed.

１０…多視点画像入力部，２０…三次元形状モデル生成部，２１…カメラパラメータ推定部，２２…マスク画像抽出部，２３…三次元形状推定部，２４…三次元形状モデル復元部，３０…矩形投影画像抽出部，３１…オクルージョン領域特定部，４０…テクスチャ取得部，４１…第1テクスチャ取得部，４２…第２テクスチャ取得部，５０…ビルボード生成部，５１…テクスチャ画像抽出部，５２…ビルボードサイズ決定部，５３…ビルボード設置部，６０…ビルボード表示部，６１…仮想視点算出部，６２…ビルボード選択部 DESCRIPTION OF SYMBOLS 10 ... Multi viewpoint image input part, 20 ... Three-dimensional shape model production | generation part, 21 ... Camera parameter estimation part, 22 ... Mask image extraction part, 23 ... Three-dimensional shape estimation part, 24 ... Three-dimensional shape model restoration part, 30 ... Rectangular projection image extraction unit 31... Occlusion region specifying unit 40. Texture acquisition unit 41. First texture acquisition unit 42. Second texture acquisition unit 50. ... billboard size determination unit, 53 ... billboard installation unit, 60 ... billboard display unit, 61 ... virtual viewpoint calculation unit, 62 ... billboard selection unit

Claims

In an apparatus for generating and displaying a free viewpoint image from a multi-viewpoint image,
Means for individually generating a three-dimensional shape model of each object based on a plurality of camera images taken from different viewpoints;
Means for projecting each three-dimensional shape model onto each viewpoint and extracting a rectangular projection image including a projection image portion of each object for each viewpoint;
Means for acquiring the texture of the rectangular projection image at each viewpoint for each object from the corresponding camera image;
Means for generating a billboard based on each rectangular projection image and its texture;
Means for selecting a camera to display a billboard based on information identifying a free viewpoint;
A free-viewpoint image generation / display apparatus comprising: means for displaying all billboards generated with respect to the selected camera.

The means for generating a three-dimensional shape model of the object includes:
Means for estimating the camera parameters of each camera;
Means for extracting a mask image of an object in each camera image;
Means for restoring Visual Hull of each object in a three-dimensional voxel space by a view volume intersection method based on the camera parameter and the mask image,
The free-viewpoint image generation and display device according to claim 1, wherein the Visual Hull is adopted as a three-dimensional shape model.

Means for classifying the restored Visual Hull based on geometric connectivity in the voxel space;
3. The free viewpoint image generation and display device according to claim 2, wherein a class satisfying a predetermined condition among the classified Visual Hulls is adopted as a three-dimensional shape model.

The means for extracting the rectangular projection image for each viewpoint includes means for specifying an occlusion area for each object at each viewpoint by performing a ray search for each pixel of the projection image portion of each rectangular projection image. The free viewpoint image generation and display device according to any one of claims 1 to 3.

The means for obtaining the texture comprises:
First texture acquisition means for acquiring a pixel value of a corresponding pixel of an extraction source camera image for a pixel other than an occlusion area in a projection image portion of the rectangular projection image;
2. A second texture acquisition unit configured to acquire a pixel value of a corresponding pixel of a camera image other than an extraction source camera image for a pixel of an occlusion area in the projection image portion of the rectangular projection image. Item 5. The free viewpoint image generation and display device according to any one of Items 1 to 4.

The second texture acquisition means
A means for specifying a projection source voxel on a three-dimensional shape model for pixels in an occlusion area;
Means for projecting the projection source voxel onto a camera image different from the extraction source camera image and determining whether or not the voxel can be observed;
6. The apparatus for generating and displaying a free viewpoint image according to claim 5, wherein a pixel value of a corresponding pixel is acquired from an observable camera image.

When there are a plurality of observable cameras, a camera nearest to an extraction source camera is specified from the plurality of cameras, and a pixel value of a corresponding pixel of an image of the nearest camera is acquired. 7. The free viewpoint image generation and display device according to 6.

The means for generating the billboard includes:
Means for extracting a rectangular projection image including a texture for each object;
Means for calculating a distance between a camera and a projection source voxel of a pixel existing on a bottom side of the rectangular projection image;
Means for determining the size of each billboard based on the distance and the aspect ratio of the rectangular projection image;
8. The apparatus for generating and displaying a free viewpoint image according to claim 1, further comprising means for setting the billboard of the determined size at a three-dimensional coordinate of the projection source voxel.

The means for displaying the billboard is:
Means for calculating a virtual viewpoint based on input information related to a viewpoint operation;
9. The apparatus for generating and displaying a free viewpoint image according to claim 1, further comprising means for displaying all billboards generated with respect to the camera nearest to the virtual viewpoint.

In a method in which a computer generates and displays a free viewpoint image from a multi-viewpoint image,
A 3D shape model of each object is generated individually based on multiple camera images taken from different viewpoints.
Projecting each 3D shape model to each viewpoint and extracting a rectangular projection image including the projected image portion of each object for each viewpoint,
For each object, obtain the texture of the rectangular projection image at each viewpoint from the corresponding camera image,
A billboard is generated based on each rectangular projection image and its texture,
Select the camera that displays the billboard based on the information that identifies the free viewpoint,
A method for generating and displaying a free viewpoint image, comprising displaying all billboards generated for the selected camera.

In a program that generates and displays a free viewpoint image from a multi-viewpoint image,
A procedure for individually generating a three-dimensional shape model of each object based on a plurality of camera images taken from different viewpoints;
A procedure for projecting each three-dimensional shape model onto each viewpoint and extracting a rectangular projection image including a projection image portion of each object for each viewpoint;
A procedure for acquiring the texture of the rectangular projection image at each viewpoint from the corresponding camera image for each object,
A procedure for generating a billboard based on each rectangular projection image and its texture;
A procedure for selecting a camera to display a billboard based on information identifying a free viewpoint;
A program for generating and displaying a free viewpoint image in which a procedure for displaying all billboards generated for the selected camera is described so as to be executable by a computer.