JP2022029730A

JP2022029730A - Three-dimensional (3d) model generation apparatus, virtual viewpoint video generation apparatus, method, and program

Info

Publication number: JP2022029730A
Application number: JP2020133176A
Authority: JP
Inventors: 良亮渡邊; Ryosuke Watanabe
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2022-02-18
Anticipated expiration: 2040-08-05
Also published as: JP7328942B2

Abstract

To provide a virtual viewpoint video that does not feel strange even in an environment where a zoom-out camera and a zoom-in camera coexist when generating a 3D model of a subject and further generating a virtual viewpoint video.SOLUTION: A silhouette image acquisition unit 101 acquires a silhouette image from each of camera videos of a zoom-in camera and a zoom-out camera. A camera classification unit 103 classifies each of the camera in to the zoom-in camera and the zoom-out camera. A 3D model generation unit 102 generates a 3D model of a subject assuming that a silhouette of the subject exists outside an angle of view of the zoom-in camera based on a visual volume intersection method based on the camera video of the zoom-in camera and the zoom-out camera. A boundary determination unit 104 determines whether each of the 3D models of the subject exist on a boundary of the angle of view of the zoom-in camera. A rendering server 20 perform texture mapping from the camera video of the zoom-in camera including inside of the angle of view of the zoom-in camera to the 3D model existing on the boundary of the angle of view.SELECTED DRAWING: Figure 5

Description

本発明は、被写体の3Dモデルを生成し、さらに仮想視点映像を生成する際に、引きカメラと寄りカメラとが混在する環境でも3Dモデルを消失させることなく違和感のない仮想視点映像を提供できる3Dモデル生成装置ならびに仮想視点映像生成装置、方法およびプログラムに関する。 INDUSTRIAL APPLICABILITY The present invention can provide a comfortable virtual viewpoint image without losing the 3D model even in an environment where a pulling camera and a close-up camera coexist when generating a 3D model of a subject and further generating a virtual viewpoint image. It relates to a model generator and a virtual viewpoint image generator, a method and a program.

自由視点（仮想視点）映像技術は、複数台のカメラ映像を取得し、カメラが存在しない視点も含めた任意の視点からの映像視聴を可能とする技術である。自由視点映像を実現する一手法として、非特許文献１が開示する視体積交差法に基づく3Dモデルベースの自由視点映像生成手法が存在する。 Free viewpoint (virtual viewpoint) video technology is a technology that acquires video from multiple cameras and enables video viewing from any viewpoint, including viewpoints where no camera exists. As a method for realizing a free-viewpoint image, there is a 3D model-based free-viewpoint image generation method based on the visual volume crossing method disclosed in Non-Patent Document 1.

視体積交差法は、図11に示すように各カメラ映像から被写体の部分だけを抽出した2値のシルエット画像を3D空間に投影し、その積集合となる部分のみを3DCGのモデルとして残すことによって3Dモデルを生成する手法である。 As shown in Fig. 11, the visual volume crossing method projects a binary silhouette image obtained by extracting only the subject part from each camera image into 3D space, and leaves only the part that is the product set as a 3DCG model. It is a method to generate a 3D model.

視体積交差法は、特許文献1が開示するフルモデル方式自由視点（3Dモデルの形状を忠実に表現する方式）や、非特許文献2が開示するビルボード方式自由視点（3Dモデルをビルボードと呼ばれる板の形状で制作し、近いカメラからのテクスチャをビルボードにマッピングする方式）を実現する上での基礎技術として利用されている。 The visual volume crossing method includes a full model free viewpoint (a method that faithfully expresses the shape of a 3D model) disclosed in Patent Document 1 and a billboard method free viewpoint (3D model is referred to as a billboard) disclosed in Non-Patent Document 2. It is used as a basic technology to realize a method of creating a board in the shape of a so-called board and mapping the texture from a nearby camera to the billboard).

非特許文献1が開示する自由視点制作では、まず自由視点映像を制作したい3D空間を立方体の格子で区切ったボクセルグリッドで埋め尽くす。次いで、各ボクセルグリッドの3次元位置を各カメラのシルエット画像上に逆投影し、対応する位置のシルエット画像を参照する。そして、多くのカメラでシルエットが白（被写体が存在する）と判定されたボクセルグリッドがモデル化される。 In the free-viewpoint production disclosed in Non-Patent Document 1, first, the 3D space in which the free-viewpoint video is to be produced is filled with a voxel grid divided by a cubic grid. Next, the three-dimensional position of each voxel grid is back-projected onto the silhouette image of each camera, and the silhouette image of the corresponding position is referred to. Then, a voxel grid whose silhouette is determined to be white (the subject is present) is modeled by many cameras.

このような自由視点映像は、リアルタイムでインタラクティブに任意の視点からスポーツを視聴して楽しむような用途や、任意の視点の映像を作り出せるという特徴を利用して、決定されたカメラワークに基づいて臨場感のあるリプレイ動画を作ることなどを目的に利用されてきた。 Such free-viewpoint video is available based on the camera work that has been decided, taking advantage of the purpose of watching and enjoying sports from any viewpoint interactively in real time and the feature of being able to create video from any viewpoint. It has been used for the purpose of making a replay video with a feeling.

特開2018-063635号公報Japanese Unexamined Patent Publication No. 2018-063635

Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994).Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994). H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732, (2018)H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732, (2018) J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), WeAT17.2, (2019).J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE / RSJ International Conference on Intelligent Robots and Systems ( IROS 2019), WeAT17.2, (2019). Qiang Yao, Hiroshi Sankoh, Nonaka Keisuke, Sei Naito. "Automatic camera self-calibration for immersive navigation of free viewpoint sports video," 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), 1-6, 2016.Qiang Yao, Hiroshi Sankoh, Nonaka Keisuke, Sei Naito. "Automatic camera self-calibration for immersive navigation of free viewpoint sports video," 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), 1-6, 2016. C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999).C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999).

自由視点映像の制作ではカメラの配置が重要となる。例えば、大きく被写体に寄ったカメラがある場合には引きカメラの画角をスタートに当該寄りカメラをゴールとするようなカメラワークを作ることで、徐々に被写体に近付いていくようなワークを高い鮮明度を持ったテクスチャで実現することが可能である。 The placement of the camera is important in the production of free-viewpoint video. For example, if there is a camera that is very close to the subject, by creating a camera work that starts with the angle of view of the pulling camera and ends with the camera that is close to the subject, the work that gradually approaches the subject becomes highly clear. It is possible to realize with a texture with a degree.

カメラが被写体に近ければ近いほど鮮明なテクスチャが得られる一方、カメラが被写体に近づくほどカメラに映りこむスタジアム上の領域が小さくなってしまう。特に、特定のカメラだけが大きく被写体に近づいているようなケースでは、寄りカメラの画角範囲外にある被写体が3Dモデル形成されずに消失してしまう。 The closer the camera is to the subject, the clearer the texture will be, while the closer the camera is to the subject, the smaller the area on the stadium that will be reflected in the camera. In particular, in the case where only a specific camera is very close to the subject, the subject outside the angle of view of the close-up camera disappears without forming a 3D model.

図12は、全てのカメラが引きカメラの場合[同図(a)]と寄りカメラを一つ含む場合[同図(b)]との積集合の形成される範囲を比較した図であり、同図(b)では同図(a)に比べて積集合の範囲が小さくなり、寄りカメラの画角範囲外にある被写体の3Dモデルが形成されずに消失し得ることが解る。 FIG. 12 is a diagram comparing the range of formation of the intersection between the case where all the cameras are pull cameras [the figure (a)] and the case where the camera includes one close-up camera [the figure (b)]. In Fig. (B), the range of the intersection is smaller than in Fig. (A), and it can be seen that the 3D model of the subject outside the angle of view of the close-up camera can disappear without being formed.

このような技術課題は、カメラがN台ある環境でN-1台から見えている部分は3Dモデル化するなど、3Dモデル生成に関するカメラ台数の閾値を変更することで解決できる。しかしながら、台数を少なくするほど本来被写体ではないところが3Dモデル化されてしまうというノイズの増加が問題となるため、安易に閾値を変更すると品質の悪化に繋がる可能性がある。 Such a technical problem can be solved by changing the threshold value of the number of cameras related to 3D model generation, such as modeling the part visible from N-1 units in an environment with N units. However, as the number of units decreases, the increase in noise that the part that is not originally the subject is modeled in 3D becomes a problem, so changing the threshold value easily may lead to deterioration of quality.

また、上記のカメラ台数の閾値を制御することで3Dモデルが生成されたとしても、寄りカメラに写っている被写体は寄りカメラから、引きカメラにしか映っていない被写体は引きカメラから、それぞれマッピングするようなレンダリングプロセスが必要になる。加えて、寄りカメラと引きカメラとの画角境界に存在する被写体では、図13に示したように、各カメラの解像度の差やマッピングするカメラの違いが原因で画角境界に切れ目がはっきりと目立ってしまい、視聴品質に違和感が生まれるという課題があった。 Even if a 3D model is generated by controlling the threshold value of the number of cameras described above, the subject reflected in the close-up camera is mapped from the close-up camera, and the subject reflected only in the pull-down camera is mapped from the pull-down camera. A rendering process like this is required. In addition, for subjects existing at the angle of view boundary between the close-up camera and the pulling camera, as shown in Fig. 13, there is a clear break at the angle of view boundary due to the difference in resolution of each camera and the difference in the mapping camera. There was a problem that it became conspicuous and the viewing quality became uncomfortable.

本発明の目的は、引きカメラと寄りカメラとが混在する環境で3Dモデルを制作し、さらに仮想視点映像を生成するにあたり、画角範囲外に関して寄りカメラは被写体が存在するものとして扱う一方、引きカメラは被写体が存在しないものとして処理することで、寄りカメラの画角範囲外の3Dモデルを消失させない3Dモデル生成装置ならびに自由視点映像生成装置、方法およびプログラムを提供することにある。 An object of the present invention is to create a 3D model in an environment in which a pulling camera and a close-up camera coexist, and to generate a virtual viewpoint image. It is an object of the present invention to provide a 3D model generator as well as a free viewpoint image generator, a method and a program in which a camera treats a subject as if it does not exist so that the 3D model outside the angle of view of the close-up camera is not erased.

上記の目的を達成するために、本発明は、被写体を複数の視点で撮影したカメラ映像に基づいて3Dモデルを生成する3Dモデル生成装置において、以下の構成を具備した。 In order to achieve the above object, the present invention has the following configuration in a 3D model generation device that generates a 3D model based on a camera image of a subject taken from a plurality of viewpoints.

(1) 各カメラを寄りカメラまたは引きカメラに分類する手段と、寄りカメラおよび引きカメラのカメラ映像に基づく視体積交差法により被写体の3Dモデルを生成する手段とを具備し、前記3Dモデルを生成する手段は、寄りカメラの画角範囲外に被写体が存在するとみなすようにした。 (1) The 3D model is generated by providing a means for classifying each camera into a close-up camera or a pull-down camera and a means for generating a 3D model of a subject by a visual volume crossing method based on the camera images of the close-up camera and the pull-down camera. As a means of doing so, it is considered that the subject exists outside the angle of view range of the close-up camera.

また、本発明は被写体を複数の視点で撮影したカメラ映像に基づいて仮想視点映像を生成する仮想視点映像生成装置において、以下の構成を具備した。 Further, the present invention has the following configuration in a virtual viewpoint image generation device that generates a virtual viewpoint image based on a camera image of a subject taken from a plurality of viewpoints.

(2) 各カメラを寄りカメラまたは引きカメラに分類する手段と、寄りカメラおよび引きカメラのカメラ映像に基づく視体積交差法により被写体の3Dモデルを生成する手段と、前記3Dモデルに各カメラ映像のテクスチャをマッピングして仮想視点映像を合成する手段とを具備し、前記3Dモデルを生成する手段は、寄りカメラの画角範囲外に被写体が存在するとみなすようにした。 (2) A means for classifying each camera into a close-up camera or a pull-down camera, a means for generating a 3D model of a subject by a visual volume crossing method based on the camera images of the close-up camera and the pull-down camera, and a means for generating a 3D model of the subject in the above-mentioned 3D model. It is equipped with a means for mapping textures and synthesizing a virtual viewpoint image, and the means for generating the 3D model is such that the subject exists outside the angle of view range of the close-up camera.

(3) 被写体の各3Dモデルが寄りカメラの画角境界上に存在するか否かを判定する手段をさらに具備し、画角境界上に存在する3Dモデルに対しては、寄りカメラの画角内を含めて引きカメラのカメラ映像からテクスチャマッピングを行うようにした。 (3) Further provided with a means for determining whether or not each 3D model of the subject exists on the angle of view boundary of the close-up camera, and for a 3D model existing on the angle-of-view boundary, the angle of view of the close-up camera is provided. Texture mapping is performed from the camera image of the pull camera including the inside.

(1) 本発明の3Dモデル生成装置は、寄りカメラおよび引きカメラのカメラ映像に基づく視体積交差法により被写体の3Dモデルを生成する際に、寄りカメラの画角範囲外に被写体が存在するとみなすようにしたので、引きカメラの画角内であって寄りカメラの画角外に存在する被写体3Dモデルの喪失を防止できるようになる。 (1) The 3D model generator of the present invention considers that the subject exists outside the angle of view range of the close-up camera when generating a 3D model of the subject by the visual volume crossing method based on the camera images of the close-up camera and the pull-back camera. This makes it possible to prevent the loss of the subject 3D model that exists within the angle of view of the pulling camera and outside the angle of view of the close-up camera.

(2) 本発明の仮想視点映像生成装置は、寄りカメラおよび引きカメラのカメラ映像に基づく視体積交差法により被写体の3Dモデルを生成する際に、寄りカメラの画角範囲外に被写体が存在するとようにしたので、引きカメラの画角内であって寄りカメラの画角外に存在する被写体の仮想視点映像が合成されなくなることを防止できるようになる。 (2) When the virtual viewpoint image generator of the present invention generates a 3D model of a subject by the visual volume crossing method based on the camera images of the close-up camera and the pull-back camera, the subject exists outside the angle of view range of the close-up camera. Therefore, it is possible to prevent the virtual viewpoint image of the subject existing within the angle of view of the pulling camera and outside the angle of view of the camera from being synthesized.

(3) 本発明の仮想視点映像生成装置は、寄りカメラの画角境界上に存在する3Dモデルに対しては、寄りカメラの画角内を含めて引きカメラのカメラ映像からテクスチャマッピングを行うようにしたので、一つの3Dモデルに寄りカメラおよび引きカメラの双方からテクスチャがマッピングされることで生じ得る品質低下を防止できるようになる。 (3) The virtual viewpoint image generator of the present invention performs texture mapping from the camera image of the pull camera for the 3D model existing on the angle of view boundary of the near camera, including the inside of the angle of view of the near camera. Therefore, it is possible to prevent the quality deterioration that may occur due to the mapping of textures from both the camera and the pulling camera by leaning toward one 3D model.

本発明の第1実施形態に係る3Dモデル生成装置の機能ブロック図である。It is a functional block diagram of the 3D model generation apparatus which concerns on 1st Embodiment of this invention. 引きカメラと寄りカメラとで画角範囲外に参照が発生した場合の取り扱いを異ならせる例を示した図である。It is a figure which showed the example which handles differently when a reference occurs outside the angle of view range between a pulling camera and a close-up camera. 本発明の第2実施形態に係る3Dモデル生成装置の機能ブロック図である。It is a functional block diagram of the 3D model generation apparatus which concerns on 2nd Embodiment of this invention. カメラパラメータの例を示した図である。It is a figure which showed the example of a camera parameter. 本発明の第3実施形態に係る仮想視点映像生成装置の機能ブロック図である。It is a functional block diagram of the virtual viewpoint image generator which concerns on 3rd Embodiment of this invention. 3Dモデルをバウンディングボックスにより被写体ごとに分割する例を示した図である。It is a figure which showed the example which divides a 3D model for each subject by a bounding box. 各3Dモデルへのテクスチャマッピングを画角に応じたカメラ映像から行う方法を示した図である。It is a figure which showed the method of performing the texture mapping to each 3D model from the camera image according to the angle of view. 本発明の第4実施形態に係る仮想視点映像生成装置の機能ブロック図である。It is a functional block diagram of the virtual viewpoint image generator which concerns on 4th Embodiment of this invention. 本発明の第5実施形態に係る仮想視点映像生成装置の機能ブロック図である。It is a functional block diagram of the virtual viewpoint image generator which concerns on 5th Embodiment of this invention. オクルージョン情報を境界判定の結果に基づいて書き替える例を示した図である。It is a figure which showed the example which rewrites the occlusion information based on the result of the boundary determination. 視体積交差法による3Dモデルの生成方法を示した図である。It is a figure which showed the generation method of the 3D model by the visual volume crossing method. 寄りカメラの画角範囲外にある被写体が3Dモデル形成されずに消失される例を示した図である。It is a figure which showed the example which the subject which is out of the angle of view range of a close-up camera disappears without forming a 3D model. 一つの3Dモデルに対して寄りカメラおよび引きカメラからテクスチャがマッピングされることで映像品質が低下する例を示した図である。It is a figure which showed the example which the image quality deteriorates by mapping a texture from a close-up camera and a pulling camera to one 3D model.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図1は、本発明の第1実施形態に係る3Dモデル制作サーバ10の主要部の構成を示した機能ブロック図であり、ここではスポーツシーンをN台のカメラCam1～CamNで撮影し、その一部が寄りカメラ、残りが引きカメラである場合を例にして説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing the configuration of the main part of the 3D model production server 10 according to the first embodiment of the present invention. Here, a sports scene is photographed by N cameras Cam1 to CamN, and one of them is taken. The case where the part is a close-up camera and the rest is a pulling camera will be described as an example.

シルエット画像取得部101は、寄りカメラおよび引きカメラの各カメラ映像から視体積交差法による3Dモデル生成に用いるシルエット画像をシルエット画像データベース30から取得する。視体積交差法により3Dモデルを生成するためには3台以上のカメラからシルエット画像を取得することが望ましい。シルエット画像は3Dモデルを生成する被写体領域を白（=1）、それ以外の領域を黒（=0）で表した2値のマスク画像の形式で与えられる。このようなシルエット画像の生成には、非特許文献5に開示された背景差分法に代表される任意の既存手法を利用できる。 The silhouette image acquisition unit 101 acquires a silhouette image used for generating a 3D model by the visual volume crossing method from the silhouette image database 30 from each camera image of the close-up camera and the pull camera. In order to generate a 3D model by the visual volume crossing method, it is desirable to acquire silhouette images from three or more cameras. The silhouette image is given in the form of a binary mask image in which the subject area for generating the 3D model is represented by white (= 1) and the other areas are represented by black (= 0). Any existing method represented by the background subtraction method disclosed in Non-Patent Document 5 can be used to generate such a silhouette image.

3Dモデル生成部102は、シルエット画像取得部101が取得したシルエット画像および別途に与えられるカメラ分類情報に基づいて、N枚のシルエット画像を用いた視体積交差法により被写体の3Dボクセルモデルを計算する。ここで、カメラ分類情報とは各カメラが寄りカメラおよび引きカメラのいずれであるかを識別する情報である。 The 3D model generation unit 102 calculates a 3D voxel model of the subject by the visual volume crossing method using N silhouette images based on the silhouette image acquired by the silhouette image acquisition unit 101 and the camera classification information separately given. .. Here, the camera classification information is information for identifying whether each camera is a close-up camera or a pulling camera.

視体積交差法は、N枚のシルエット画像を3次元ワールド座標に投影した際の視錐体の共通部分を視体積（Visual Hull）VH(I)として獲得するものであり、次式(1)で示される。ここで、集合Iは各カメラのシルエット画像の集合であり、Viはi番目のカメラから得られるシルエット画像から計算される視錐体である。 In the visual volume crossing method, the intersection of the visual cones when N silhouette images are projected onto the 3D world coordinates is acquired as the visual volume (Visual Hull) VH (I), and the following equation (1) Indicated by. Here, the set I is a set of silhouette images of each camera, and Vi is a visual cone calculated from the silhouette images obtained from the i-th camera.

視体積交差法では、各カメラから得られるシルエット画像を同等に扱うことが通常であるが、本実施形態ではカメラ分類情報に基づいて、寄りカメラのシルエット画像と引きカメラのシルエット画像との扱いを異ならせている。 In the visual volume crossing method, the silhouette images obtained from each camera are usually treated equally, but in the present embodiment, the silhouette image of the close-up camera and the silhouette image of the pulling camera are treated based on the camera classification information. It's different.

具体的には図2に示すように、3D空間に配置した各ボクセルグリッドをモデル化するか否かを各シルエット画像の対応画素を参照して判定する際、画角範囲外（ボクセルグリッドがカメラに映っていない領域）に参照が発生すると、引きカメラの場合は被写体が存在しない（シルエット画像上は黒）とみなす一方、寄りカメラの場合は被写体が存在する（シルエット画像上は白）とみなすことで、寄りカメラの画角範囲外について引きカメラでの参照結果に応じてモデル化の余地を残すようにしている。 Specifically, as shown in Fig. 2, when determining whether to model each boxel grid arranged in 3D space by referring to the corresponding pixels of each silhouette image, it is out of the angle of view range (the boxel grid is the camera). When a reference occurs in an area that is not reflected in the camera, it is considered that the subject does not exist (black on the silhouette image) in the case of the pull camera, while the subject exists (white on the silhouette image) in the case of the close-up camera. By doing so, there is room for modeling outside the angle of view range of the close-up camera according to the reference result of the pulling camera.

なお、本実施形態によれば寄りカメラの画角範囲外に被写体が実際に存在するか否かを問わず、被写体が存在するとみなすが、被写体が実際には存在しない領域については、引きカメラでの参照結果が否定となるので、被写体が実際に存在していたボクセルグリッドのみがモデル化される。したがって、寄りカメラと引きカメラとが混在する環境下でも被写体の正確な3Dモデル生成が可能となる。 According to the present embodiment, it is considered that the subject exists regardless of whether or not the subject actually exists outside the angle of view range of the close-up camera. However, in the area where the subject does not actually exist, the pulling camera is used. Since the reference result of is negative, only the voxel grid in which the subject actually existed is modeled. Therefore, it is possible to generate an accurate 3D model of the subject even in an environment where a close-up camera and a pulling camera coexist.

こうして生成されたボクセルモデルは、ボクセルのままで扱われてもよいが、マーチンキューブ法などに基づいてポリゴンモデルに変換されてもよい。ここではポリゴンモデルに変換されるものとして説明を続ける。 The voxel model generated in this way may be treated as a voxel as it is, or may be converted into a polygon model based on the Marching cube method or the like. Here, the explanation will be continued assuming that it is converted into a polygon model.

なお、上記の第1実施形態ではカメラ分類情報が別途に与えられるものとして説明したが、本発明はこれのみに限定されるものではなく、図3に示した第2実施形態のように、カメラパラメータに基づいてカメラ分類情報を出力するカメラ分類部103を設け、ズーム操作等により変化する焦点距離に応じて適応的に分類結果が変化するようにしても良い。 Although the camera classification information has been described separately in the first embodiment described above, the present invention is not limited to this, and the camera is as shown in the second embodiment shown in FIG. A camera classification unit 103 that outputs camera classification information based on parameters may be provided so that the classification result can be adaptively changed according to the focal length that changes due to a zoom operation or the like.

カメラ分類部103は、次式(2)で与えられるカメラパラメータを利用することでN台のカメラを寄りカメラまたは引きカメラに自動で分類する。 The camera classification unit 103 automatically classifies N cameras into closer cameras or pull cameras by using the camera parameters given by the following equation (2).

カメラパラメータは、ワールド座標上の点(X, Y, Z)をカメラ映像上の2Dの点(u, v)に変換するために用いられ、r₁₁～r₃₃はカメラの向きを示す回転行列、t₁～t₃はカメラの位置を表す並進行列であり、二つを合わせてカメラの外部パラメータと呼ばれる。 Camera parameters are used to convert a point (X, Y, Z) on world coordinates to a 2D point (u, v) on the camera image, where r ₁₁ to r ₃₃ are rotation matrices indicating the direction of the camera. , T ₁ to t ₃ are parallel traveling matrices that represent the position of the camera, and the two are collectively called the external parameters of the camera.

f_x，f_yはズーム具合を示すピクセル単位の焦点距離、c_x，c_yは画像の主点であり、通常は画像中心となることが多い。この焦点距離や主点などのパラメータはカメラの内部パラメータと呼ばれる（歪補正に関するパラメータを含むことも多いが、ここでは簡単のため省略する）。 f _x and f _y are focal lengths in pixel units that indicate the degree of zoom, and c _x and c _y are the principal points of the image, and are usually the center of the image. These parameters such as focal length and principal point are called internal parameters of the camera (although they often include parameters related to distortion correction, they are omitted here for simplicity).

sは[u, v, 1]とするためのスケーリングに用いる変数である。このカメラパラメータは事前に非特許文献4が開示する技術を使って計算できる。実際に入力されるカメラパラメータの例を図4に示す。 s is a variable used for scaling to make [u, v, 1]. This camera parameter can be calculated in advance using the technique disclosed in Non-Patent Document 4. Figure 4 shows an example of the camera parameters that are actually input.

ここで、f_x，f_yはズーム具合を示すピクセル単位の焦点距離であるから、この値が大きいカメラは大きくズームされている可能性が高い。よって、カメラ分類部103はf_xおよびf_yをチェックすることで自動的に寄りカメラを分類できる。 Here, since f _x and f _y are focal lengths in pixel units indicating the degree of zooming, it is highly possible that a camera with a large value is zoomed greatly. Therefore, the camera classification unit 103 can automatically classify the closer cameras by checking f _x and f _y .

寄りカメラへの分類数は1台に限定されず、f_x，f_yが一定の値より大きい数台（≦N台）のカメラを全て寄りカメラへ分類しても良いし、f_x，f_yが大きい方からM台（≦N台）を寄りカメラに分類しても良い。さらに、f_x，f_yが大きい方から優先的に全カメラ台数のL%（Lは0～100の任意の定数）のカメラを寄りカメラに分類しても良い。さらにはf_x，f_yではなく外部パラメータから計算されるカメラの位置に基づいて分類が行われるようにしても良い。 The number of cameras classified as close-up cameras is not limited to one, and all several cameras (≤N) whose f _x and f _y are larger than a certain value may be classified as close-up cameras, or f _x and f. M units (≤N units) may be classified as cameras from the one with the larger _y . Furthermore, the cameras with L% of the total number of cameras (L is an arbitrary constant of 0 to 100) may be preferentially classified as a closer camera from the one with the larger f _x and f _y . Furthermore, the classification may be performed based on the camera position calculated from external parameters instead of f _x and f _y .

あるいは、前のフレームで制作した被写体3Dモデルや、事前に用意されたゴールポストなどの汎用3Dモデルが各カメラに映りこむサイズを計測することで各カメラを分類しても良い。例えば、寄りカメラを含めた全カメラが捉える領域に、事前に用意された3Dモデルを配置し、この3Dモデルをカメラ方向に逆投影したときに現れるシルエットの大きさに基づいて分類することができる。 Alternatively, each camera may be classified by measuring the size of the subject 3D model created in the previous frame or the general-purpose 3D model such as a goal post prepared in advance reflected in each camera. For example, a 3D model prepared in advance can be placed in the area captured by all cameras including the close-up camera, and the 3D model can be classified based on the size of the silhouette that appears when it is back-projected toward the camera. ..

なお、仮想視点映像の制作中にカメラの故障などが原因でカメラ台数が大幅に変わってしまうと、例えば引きカメラの台数が極端に少なくなり、ほとんどのカメラが寄りカメラとなってしまう可能性がある。このような場合、本実施形態では寄りカメラの画角範囲外のシルエットは全て被写体があるとみなすことから、寄りカメラの画角の外側の領域に、本来モデル化されるべきではない3Dモデルが生成されてしまう可能性がある。 If the number of cameras changes drastically due to a camera failure during the production of virtual viewpoint video, for example, the number of pulling cameras will become extremely small, and most cameras may become closer cameras. be. In such a case, since all silhouettes outside the angle of view of the close-up camera are considered to have a subject in the present embodiment, a 3D model that should not be originally modeled is placed in the area outside the angle of view of the close-up camera. It may be generated.

このような事態を避けるためには、寄りカメラの台数あるいは全体のカメラに占める寄りカメラの比率が高くなった場合には、3Dモデル生成部102の「寄りカメラのシルエット画像の画角範囲外には被写体が存在するものとみなす」という機能を不動化し、従来技術と同様に画角範囲外には被写体が存在しないものとして処理するようにしても良い。 In order to avoid such a situation, if the number of close-up cameras or the ratio of close-up cameras to the total number of cameras is high, the 3D model generator 102 "outside the angle of view of the close-up camera silhouette image" The function of "considering that the subject exists" may be immobilized so that the subject does not exist outside the angle of view range as in the conventional technique.

本実施形態によれば、寄りカメラおよび引きカメラのカメラ映像に基づく視体積交差法により被写体の3Dモデルを生成する際に、寄りカメラの画角範囲外に被写体が存在するとみなすようにしたので、引きカメラの画角内であって寄りカメラの画角外に存在する被写体3Dモデルの喪失を防止できるようになる。 According to the present embodiment, when the 3D model of the subject is generated by the visual volume crossing method based on the camera images of the close-up camera and the pull-back camera, it is considered that the subject exists outside the angle of view range of the close-up camera. It will be possible to prevent the loss of the subject 3D model that exists within the angle of view of the pulling camera and outside the angle of view of the close-up camera.

図5は、本発明の第3実施形態に係る仮想視点映像生成装置1の主要部の構成を示した機能ブロック図であり、3Dモデル制作サーバ10およびレンダリングサーバ20を主要な構成としている。 FIG. 5 is a functional block diagram showing the configuration of the main part of the virtual viewpoint image generation device 1 according to the third embodiment of the present invention, and has the 3D model production server 10 and the rendering server 20 as the main configurations.

3Dモデル制作サーバ10は、第1または第2実施形態と同様にシルエット画像取得部101、3Dモデル生成部102およびカメラ分類部103を含み、寄りカメラの画角範囲外に被写体が実際に存在するか否かを問わず、被写体が存在するとみなし、そのシルエット画像の画角範囲外を被写体が存在する白色として扱って3Dモデルを生成する。 The 3D model production server 10 includes the silhouette image acquisition unit 101, the 3D model generation unit 102, and the camera classification unit 103 as in the first or second embodiment, and the subject actually exists outside the angle of view range of the closer camera. Regardless of whether or not the subject exists, the 3D model is generated by treating the outside of the angle of view range of the silhouette image as white in which the subject exists.

境界判定部104は、3Dモデル生成部102が生成した3Dモデルが寄りカメラの画角境界上に存在するか否かを判定する。本実施形態では、図6に示したように独立した各3Dモデルの塊を内包する3Dバウンディングボックスを定義し、当該3Dバウンディングボックス単位で3Dモデルが画角境界上に存在するか否かを判定する。 The boundary determination unit 104 determines whether or not the 3D model generated by the 3D model generation unit 102 exists on the angle of view boundary of the close-up camera. In the present embodiment, as shown in FIG. 6, a 3D bounding box containing an independent mass of each 3D model is defined, and it is determined whether or not the 3D model exists on the angle of view boundary for each 3D bounding box. do.

3Dバウンディングボックスを対象とした判定では、その8頂点全てが寄りカメラの画角範囲内あるいは8頂点全てが寄りカメラの画角範囲外であれば、当該3Dバウンディングボックスは画角境界上に存在しないと判定する。バウンディングボックス単位での判定によれば8頂点のチェックで済むため非常に高速な判定が可能になる。 In the judgment targeting the 3D bounding box, if all 8 vertices are within the angle of view range of the close camera or all 8 vertices are outside the angle of view range of the close camera, the 3D bounding box does not exist on the angle of view boundary. Is determined. According to the judgment in units of the bounding box, only 8 vertices can be checked, which enables a very high-speed judgment.

一方、3Dバウンディングボックスは3Dモデルの形状と厳密には同一ではない。このため内包された3Dモデルは寄りカメラの画角内に収まっているにも関わらず3Dバウンディングボックスの頂点だけが寄りカメラの画角外に漏れていると境界判定にミスが発生し得る。 On the other hand, the 3D bounding box is not exactly the same as the shape of the 3D model. Therefore, even though the included 3D model is within the angle of view of the close-up camera, if only the apex of the 3D bounding box leaks out of the angle of view of the close-up camera, a mistake may occur in the boundary determination.

精度面を考慮すれば3Dバウンディングボックス単位ではなく、3Dバウンディングボックスに内包されているボクセルモデルを使って判定することが望ましい。例えば、ボクセルモデル内の全てのボクセルの中心点を寄りカメラ方向に逆投影し、寄りカメラの画角内に収まる中心点と収まらない中心点とが存在すれば、この被写体は境界領域に存在するものとして判定を行う。境界判定の結果は、寄りカメラが複数台であれば被写体数×寄りカメラ数だけ計算されてもよい。 Considering accuracy, it is desirable to use the voxel model contained in the 3D bounding box instead of the 3D bounding box unit. For example, if the center points of all voxels in the voxel model are back-projected toward the camera, and there is a center point that fits within the angle of view of the camera and a center point that does not fit, this subject exists in the boundary region. Judgment is made as a thing. If there are a plurality of close-up cameras, the boundary determination result may be calculated by multiplying the number of subjects by the number of close-up cameras.

レンダリングサーバ20は、3Dモデル制作サーバ10が制作した被写体3Dモデルの形状情報と各カメラ映像（テクスチャ）とを用いて仮想視点から見た合成映像をレンダリングする。本実施例では、フルモデルでの自由視点レンダリングを行う。 The rendering server 20 renders a composite image viewed from a virtual viewpoint using the shape information of the subject 3D model produced by the 3D model production server 10 and each camera image (texture). In this embodiment, free viewpoint rendering is performed with a full model.

なお、レンダリングサーバ20は3Dモデル制作サーバ10と同一の計算機上に構成されても良いし、別々のサーバで構成しても良い。一般に、3Dモデルは特定のフレームに対して1回計算されればよいのでハイエンドなPCなどで高速に計算を行って保存しておき、この3Dモデルを、レンダリング機能を備えた仮想視点視聴端末に配信するように構成することで、ハイエンドなPC1台と低スペック端末も含む多端末への映像配信を実現することができる。 The rendering server 20 may be configured on the same computer as the 3D model production server 10, or may be configured on different servers. Generally, a 3D model only needs to be calculated once for a specific frame, so it is calculated and saved at high speed on a high-end PC, etc., and this 3D model is stored in a virtual viewpoint viewing terminal equipped with a rendering function. By configuring it for distribution, it is possible to realize video distribution to multiple terminals including one high-end PC and low-spec terminals.

レンダリングサーバ20において、仮想視点選択部201は、作業者による視点選択操作を検知して仮想視点p_vの位置および向きを取得する。境界依存マッピング部202は、仮想視点p_vおよび境界判定の結果に基づいて、3Dモデルの各ポリゴンへ各カメラ映像からテクスチャをマッピングする。仮想視点映像出力部203は、レンダリングされた合成映像を仮想視点映像として出力する。 In the rendering server 20, the virtual viewpoint selection unit 201 detects the viewpoint selection operation by the operator and acquires the position and orientation of the virtual viewpoint p _v . The boundary-dependent mapping unit 202 maps the texture from each camera image to each polygon of the 3D model based on the result of the virtual viewpoint _pv and the boundary determination. The virtual viewpoint video output unit 203 outputs the rendered composite video as a virtual viewpoint video.

図7は、境界依存マッピング部202によるテクスチャのマッピング方法を模式的に示した図である。画角境界を跨がずに寄りカメラの画角内に収まっていると判定された3Dモデルに対しては、寄りカメラのカメラ映像から抽出したテクスチャのみがマッピングされる。また、画角境界を跨がずに引きカメラの画角内に収まっていると判定された3Dモデルに対しては、引きカメラのカメラ映像から抽出したテクスチャのみがマッピングされる。 FIG. 7 is a diagram schematically showing a texture mapping method by the boundary-dependent mapping unit 202. Only the texture extracted from the camera image of the close camera is mapped to the 3D model that is determined to be within the angle of view of the close camera without straddling the angle of view boundary. Further, only the texture extracted from the camera image of the pulling camera is mapped to the 3D model determined to be within the angle of view of the pulling camera without straddling the angle of view boundary.

なお、画角境界を跨がずに引きカメラの画角内に収まっていると判定された3Dモデルのうち寄りカメラの画角内にも収まっている3Dモデルに対しては、寄りカメラのみからテクスチャがマッピングされるようにしても良い。 Of the 3D models that are determined to be within the angle of view of the pulling camera without straddling the angle of view boundary, for the 3D model that is also within the angle of view of the near-angle camera, only the close-up camera is used. The texture may be mapped.

これに対して、寄りカメラの画角境界上にあると判定された3Dモデルに対しては、寄りカメラの画角内に収まっている領域も含めて引きカメラのカメラ映像から抽出したテクスチャのみがマッピングされる。これにより画角境界上の3Dモデルに対して、寄りカメラおよび引きカメラの双方のテクスチャがマッピングされることにより生じ得る映像品質の劣化を防止できるようになる。 On the other hand, for the 3D model determined to be on the angle of view boundary of the near camera, only the texture extracted from the camera image of the pull camera including the area within the angle of view of the near camera is available. It is mapped. This makes it possible to prevent the deterioration of image quality that may occur due to the mapping of the textures of both the close-up camera and the pulling camera to the 3D model on the angle of view boundary.

図8は、本発明の第4実施形態に係る仮想視点映像生成装置1の主要部の構成を示した機能ブロック図であり、前記と同一の符号は同一または同等部分を表している。本実施形態は、3Dモデル生成部102が低解像ボクセルモデル生成部102aおよび高解像ボクセルモデル生成部102bを具備し、低解像ボクセルモデルに基づいて境界判定が行われるようにした点に特徴がある。 FIG. 8 is a functional block diagram showing the configuration of the main part of the virtual viewpoint image generation device 1 according to the fourth embodiment of the present invention, and the same reference numerals as those described above represent the same or equivalent parts. In this embodiment, the 3D model generation unit 102 includes a low-resolution voxel model generation unit 102a and a high-resolution voxel model generation unit 102b so that boundary determination is performed based on the low-resolution voxel model. There is a feature.

低解像ボクセルモデル生成部102aは、単位ボクセルサイズがMの粗いボクセルグリッドを対象にボクセルモデルを生成する。単位ボクセルサイズMは高解像ボクセル生成部102bにおける単位ボクセルサイズLよりも大きな値であり、例えばM=5cmなどに設定される。本実施形態では、3Dモデル生成の対象範囲（例えば、スポーツ映像なら当該スポーツが行われるフィールド等）に単位ボクセルサイズMでボクセルグリッドを配置しておき、このボクセルグリッドを対象に3Dモデルを形成するか否かを視体積交差法に基づき判定する。 The low-resolution voxel model generation unit 102a generates a voxel model for a coarse voxel grid having a unit voxel size of M. The unit voxel size M is a value larger than the unit voxel size L in the high-resolution voxel generation unit 102b, and is set to, for example, M = 5 cm. In the present embodiment, a voxel grid is arranged in a target range of 3D model generation (for example, a field where the sport is performed in the case of sports video) with a unit voxel size M, and a 3D model is formed for this voxel grid. Whether or not it is determined based on the visual volume crossing method.

次いで、形成された粗いボクセルモデルを対象に、連結しているボクセルは同一の被写体であるとみなす作業を繰り返すことで粗いボクセルモデルの塊ごとにラベリング処理が行われる。 Next, for the formed coarse voxel model, labeling processing is performed for each block of the coarse voxel model by repeating the work of assuming that the connected voxels are the same subject.

次いで、こうして得られた塊に対して、それを内包するような形で3Dバウンディングボックスを定義し、この3Dバウンディングボックスの内部のみに単位ボクセルサイズLのボクセルグリッドを生成し、上記と同様にして細かいボクセル生成を行う。このような2段階のボクセル生成手法は非特許文献3に開示されている。前記境界判定部104は、低解像度ボクセルモデル生成部102aが生成した3Dバウンディングボックス単位で境界判定を行う。
このように、低解像度ボクセルモデルの時点で判定を行うようにすれば、高解像度ボクセルモデル生成の結果を待たずに並列して境界判定を行うことができるため、高速に処理を動作させることができる。ただし、本発明は高解像度ボクセルモデルに基づいて境界判定を行うことを妨げるものはない。このように、高解像度ボクセルモデルを用いて境界判定を行えば、低解像度ボクセルモデルを用いる場合よりも精緻なモデル形状が得られることから、境界判定をより正確に実施できるようになる。 Next, for the mass thus obtained, a 3D bounding box is defined so as to include it, and a voxel grid with a unit voxel size L is generated only inside this 3D bounding box, and the same as above is performed. Perform fine voxel generation. Such a two-step voxel generation method is disclosed in Non-Patent Document 3. The boundary determination unit 104 determines the boundary in units of the 3D bounding box generated by the low-resolution voxel model generation unit 102a.
In this way, if the judgment is made at the time of the low-resolution voxel model, the boundary judgment can be performed in parallel without waiting for the result of the high-resolution voxel model generation, so that the processing can be operated at high speed. can. However, the present invention does not prevent the boundary determination from being performed based on the high-resolution voxel model. In this way, if the boundary determination is performed using the high-resolution voxel model, a more precise model shape can be obtained than in the case of using the low-resolution voxel model, so that the boundary determination can be performed more accurately.

図9は、本発明の第5実施形態に係る仮想視点映像生成装置1の主要部の構成を示した機能ブロック図であり、第4実施形態と同一の符号は同一または同等部分を表しているので、その説明は省略する。 FIG. 9 is a functional block diagram showing the configuration of the main part of the virtual viewpoint image generation device 1 according to the fifth embodiment of the present invention, and the same reference numerals as those of the fourth embodiment represent the same or equivalent parts. Therefore, the description thereof will be omitted.

本実施形態は、3Dモデル制作サーバ10がオクルージョン情報生成部105を具備し、前記境界判定部104による判定結果に基づいてオクルージョン情報を書き替え、レンダリングサーバ20の境界依存マッピング部202が書き替え後のオクルージョン情報に基づいてテクスチャをマッピングを行うようにした点に特徴がある。 In this embodiment, the 3D model production server 10 includes an occlusion information generation unit 105, the occlusion information is rewritten based on the determination result by the boundary determination unit 104, and the boundary dependence mapping unit 202 of the rendering server 20 is rewritten. The feature is that the texture is mapped based on the occlusion information of.

オクルージョン情報生成部105は、3Dモデルの各頂点を可視のカメラと不可視のカメラとに分別するオクルージョン情報を生成する。本実施形態のようにN台のカメラが存在する環境では、3Dモデルの頂点ごとにN個のオクルージョン情報が計算され、可視のカメラには「1」、不可視のカメラには「0」などの情報が記録される。 The occlusion information generation unit 105 generates occlusion information that separates each vertex of the 3D model into a visible camera and an invisible camera. In an environment where there are N cameras as in this embodiment, N occlusion information is calculated for each vertex of the 3D model, such as "1" for visible cameras and "0" for invisible cameras. Information is recorded.

サッカーの競技シーンで選手が二人重なり、あるカメラ映像において選手Aが選手Bを覆い隠す場合、選手Bの3Dモデルに選手Aのテクスチャが映り込まないようにテクスチャをマッピングする必要がある。このような場合、選手Bの3Dモデルの遮蔽される部分の頂点に関しては、当該カメラに関するオクルージョン情報が「不可視」として記録されている。このオクルージョン情報は、例えば特許文献1のようなデプスマップを用いた手法等を用いて計算される。 When two players overlap in a soccer competition scene and player A obscures player B in a certain camera image, it is necessary to map the texture so that the texture of player A is not reflected in the 3D model of player B. In such a case, the occlusion information about the camera is recorded as "invisible" for the apex of the shielded part of the 3D model of player B. This occlusion information is calculated by using, for example, a method using a depth map as in Patent Document 1.

前記境界依存マッピング部202は、境界判別の結果に応じて仮想視点近傍の2台のカメラ（c_1, c_2）を選択し、これらのカメラ映像を3Dモデルのポリゴンgにマッピングする。すなわち、マッピング対象の3Dモデルが画角境界上になく、その全てが寄りカメラの画角内に収まっていれば、寄りカメラを対象に仮想視点近傍の2台のカメラが選択される。これに対して、マッピング対象の3Dモデルが画角境界上にあるか、あるいはその全てが引きカメラの画角内に収まっていれば、引きカメラを対象に仮想視点近傍の2台のカメラが選択される。 The boundary-dependent mapping unit 202 selects two cameras (c_1, c_2) near the virtual viewpoint according to the result of the boundary determination, and maps these camera images to the polygon g of the 3D model. That is, if the 3D model to be mapped is not on the angle of view boundary and all of them are within the angle of view of the close-up camera, two cameras near the virtual viewpoint are selected for the close-up camera. On the other hand, if the 3D model to be mapped is on the angle of view boundary or all of them are within the angle of view of the pulling camera, two cameras near the virtual viewpoint are selected for the pulling camera. Will be done.

なお、本実施形態ではその前処理として、あるポリゴンgを構成する3頂点のオクルージョン情報を用いて当該ポリゴンの可視判定を行う（3頂点は3Dモデルが三角ポリゴンで形成される場合であり、実際にはそれぞれのポリゴンを構成する頂点数に依存する）。 In the present embodiment, as the preprocessing, the visibility of the polygon is determined using the occlusion information of the three vertices constituting the polygon g (the three vertices are the case where the 3D model is formed by the triangular polygon, and the actual situation is high. Depends on the number of vertices that make up each polygon).

例えば、カメラc_1に対するポリゴンgの可視判定フラグをg_(c_1 )と表現する場合、ポリゴンgを構成する3頂点すべてが可視であればg_(c_1 )は可視、3頂点のうちいずれかでも不可視であればg_(c_1 )は不可視と設定する。本実施形態では、このようなカメラごとのポリゴンの可視判定の結果に応じて、以下のようにテクスチャマッピングを行う。 For example, when the visibility judgment flag of polygon g for camera c_1 is expressed as g_ (c_1), g_ (c_1) is visible if all three vertices constituting polygon g are visible, and any one of the three vertices is invisible. If so, g_ (c_1) is set to be invisible. In the present embodiment, texture mapping is performed as follows according to the result of such visibility determination of polygons for each camera.

ケース1：ポリゴンgに関するカメラc₁，c₂の可視判定フラグg_c1，g_c2がいずれも「可視」の場合
次式(3)に基づいてアルファブレンドによるマッピングを行う。 Case 1: When the visibility judgment flags g _c1 and g _c2 of the cameras c ₁ and c ₂ related to the polygon g are both "visible", mapping by alpha blending is performed based on the following equation (3).

ここで、texture_c1(g)，texture_c2(g)はポリゴンgがカメラc₁，c₂において対応するカメラ映像領域を示し、texture(g)は当該ポリゴンにマッピングされるテクスチャを示す。アルファブレンドの比率aは仮想視点p_vと各カメラ位置p_(c_1 ), p_(c_2 )との距離（アングル）の比に応じて算出される。 Here, texture _c1 (g) and texture _c2 (g) indicate the camera image area in which the polygon g corresponds to the cameras c ₁ and c ₂ , and texture (g) indicates the texture mapped to the polygon. The alpha blend ratio a is calculated according to the ratio of the distance (angle) between the virtual viewpoint p _v and each camera position p_ (c_1), p_ (c_2).

ケース2：可視判定フラグg_c1，g_c2の一方のみが可視の場合
ポリゴンgを可視であるカメラのテクスチャのみを用いてレンダリングを行う。すなわち上式(3)において、可視であるカメラのtexture_(c_i )に対応する比率aの値を1とする。あるいは仮想視点p_vからみて次に近い第3のカメラc_3を不可視である一方のカメラの代わりに参照し、ケース1の場合と同様に上式(3)に基づくアルファブレンドによりマッピングを行う。 Case 2: When only one of the visibility judgment flags g _c1 and g _c2 is visible The polygon g is rendered using only the visible camera texture. That is, in the above equation (3), the value of the ratio a corresponding to the texture_ (c_i) of the visible camera is set to 1. Alternatively, the third camera c_3, which is the next closest to the virtual viewpoint p_v, is referred to instead of one of the invisible cameras, and mapping is performed by alpha blending based on the above equation (3) as in the case of Case 1.

ケース3：可視判定フラグg_c1，g_c2のいずれもが不可視の場合
仮想視点p_v近傍（一般には、アングルが近いもの）の他のカメラを選択することを、少なくとも一方の可視判定フラグが可視となるまで繰り返し、各カメラ映像の参照画素位置のテクスチャを、ケース1の場合と同様に上式(3)に基づくアルファブレンドによりポリゴンgにマッピングする。 Case 3: When both the visibility judgment flags g _c1 and g _c2 are invisible, at least one of the visibility judgment flags is visible to select another camera near the virtual viewpoint p _v (generally, the one with a close angle). The texture of the reference pixel position of each camera image is mapped to the polygon g by the alpha blend based on the above equation (3) as in the case of Case 1.

なお、上記の実施形態では初期参照する近傍カメラ台数を2台としているが、ユーザ設定により変更してもよい。その際は、初期参照カメラ台数bに応じて、上式(1)はb台のカメラの線形和（重みの総和が1）とする拡張が行われる。また、すべてのカメラにおいて不可視となったポリゴンについてはテクスチャをマッピングしない。 In the above embodiment, the number of nearby cameras to be initially referred to is two, but it may be changed by user setting. In that case, the above equation (1) is expanded to the linear sum of b cameras (the sum of the weights is 1) according to the number of initial reference cameras b. Also, textures are not mapped to polygons that are invisible in all cameras.

本実施形態では、前記境界依存マッピング部202がオクルージョン情報を参照するのみで、オクルージョンおよび境界条件に基づいて適正なカメラ映像からテクスチャをマッピングできるように、オクルージョン情報が前記境界判定部104の判定結果に応じて書き替えられるようにしている。 In the present embodiment, the occlusion information is the determination result of the boundary determination unit 104 so that the boundary-dependent mapping unit 202 can map the texture from an appropriate camera image based on the occlusion and the boundary condition only by referring to the occlusion information. It is designed to be rewritten according to.

図10は、オクルージョン情報の書き換え例を示した図であり、ここでは寄りカメラの可視／不可視（遮蔽）が最下位ビットに割り当てられており、境界上に位置する3Dモデルを構成する各ポリゴンについては、各頂点が可視／不可視（遮蔽）のいずれであるかを問わず、常に遮蔽状態を示す「0」に書き換えられる。 FIG. 10 is a diagram showing an example of rewriting the occlusion information. Here, the visible / invisible (shielding) of the close-up camera is assigned to the least significant bit, and for each polygon constituting the 3D model located on the boundary. Is always rewritten to "0" indicating the shielded state, regardless of whether each vertex is visible or invisible (shielded).

1…仮想視点映像生成装置，10…3Dモデル制作サーバ，20…レンダリングサーバ，101…シルエット画像取得部，102…3Dモデル生成部，102a…低解像ボクセルモデル生成部，102b…高解像ボクセルモデル生成部，103…カメラ分類部，104…境界判定部，105…オクルージョン情報生成部，201…仮想視点選択部，202…境界依存マッピング部，203…仮想視点映像出力部 1 ... Virtual viewpoint video generator, 10 ... 3D model production server, 20 ... Rendering server, 101 ... Silhouette image acquisition unit, 102 ... 3D model generation unit, 102a ... Low resolution voxel model generation unit, 102b ... High resolution voxel Model generation unit, 103 ... camera classification unit, 104 ... boundary determination unit, 105 ... occlusion information generation unit, 201 ... virtual viewpoint selection unit, 202 ... boundary-dependent mapping unit, 203 ... virtual viewpoint video output unit

Claims

In a 3D model generator that generates a 3D model based on camera images taken from multiple viewpoints of the subject.
A means of classifying each camera into a close-up camera or a pulling camera,
It is equipped with a means for generating a 3D model of the subject by the visual volume crossing method based on the camera images of the close-up camera and the pull camera.
The means for generating the 3D model is a 3D model generation device characterized in that it considers that the subject exists outside the angle of view range of the close-up camera.

A means for generating a silhouette image from the camera images of the close-up camera and the pulling camera is provided.
The 3D model generation device according to claim 1, wherein the means for generating the 3D model is to construct a voxel model by assuming that the silhouette of the subject exists outside the angle of view range of the close-up camera.

The 3D model generator according to claim 1 or 2, wherein the classification means classifies each camera into a close-up camera or a pulling camera based on a camera parameter.

In a virtual viewpoint image generator that generates a virtual viewpoint image based on a camera image of a subject taken from multiple viewpoints.
A means of classifying each camera into a close-up camera or a pulling camera,
A means to generate a 3D model of the subject by the visual volume crossing method based on the camera images of the close-up camera and the pull camera,
A means for synthesizing a virtual viewpoint image by mapping the texture of each camera image to the 3D model is provided.
The means for generating the 3D model is a virtual viewpoint image generation device characterized in that the subject is considered to exist outside the angle of view range of the close-up camera.

A means for generating a silhouette image from each camera image of the close-up camera and the pulling camera is provided.
The virtual viewpoint image generation device according to claim 4, wherein the means for generating the 3D model is to construct a voxel model by assuming that the silhouette of the subject exists outside the angle of view range of the close-up camera.

The virtual viewpoint image generation device according to claim 4, wherein the classification means classifies each camera into a close-up camera or a pulling camera based on a camera parameter.

Further provided with a means for determining whether or not each 3D model of the subject is on the angle-of-view boundary of the close-up camera.
The virtual according to any one of claims 4 to 6, wherein the 3D model existing on the angle of view boundary is texture-mapped from the camera image of the pulling camera including the inside of the angle of view of the close-up camera. Viewpoint image generator.

The virtual viewpoint image generation device according to claim 7, wherein the determination means determines whether or not a 3D bounding box containing each 3D model exists on the angle-of-view boundary of a close-up camera.

The means for generating the 3D model is
A means of building a low-resolution voxel model based on the silhouette of the subject,
It is equipped with a means for constructing a high-resolution voxel model in the area of the low-resolution voxel model based on the silhouette of the subject.
The means according to any one of claims 4 to 8, wherein the determination means determines whether or not the low-resolution voxel model exists on the angle-of-view boundary of the close-up camera for each 3D model. Virtual viewpoint video generator.

The virtual means according to claim 9, wherein the means for generating the 3D model determines whether or not the 3D bounding box containing each low-resolution voxel model exists on the angle-of-view boundary of the close-up camera. Viewpoint image generator.

The 3D model is a polygon model,
It is provided with a means for generating occlusion information that records whether each polygon of each 3D model is visible or invisible from each camera.
The means for synthesizing the virtual viewpoint image is to map the texture to each polygon from the camera in which the polygon is visible.
The virtual viewpoint image generation device according to any one of claims 4 to 10, wherein the occlusion information of the close-up camera relating to the polygon of the 3D model existing on the angle of view boundary is invisiblely rewritten.

In a virtual viewpoint image generation method in which a computer generates a virtual viewpoint image based on a camera image of a subject taken from multiple viewpoints.
Classify each camera as a close-up camera or a pull camera,
A 3D model of the subject is generated by the visual volume crossing method based on the camera images of the close-up camera and the pull camera.
By mapping the texture of each camera image to the above 3D model and synthesizing the virtual viewpoint image,
A virtual viewpoint image generation method characterized in that when the 3D model is generated, it is considered that the subject exists outside the angle of view range of the close-up camera.

Determines whether each 3D model of the subject is on the angle of view boundary of the close-up camera.
The virtual viewpoint image generation method according to claim 12, wherein the 3D model existing on the angle of view boundary is texture-mapped from the camera image of the pulling camera including the inside of the angle of view of the close-up camera.

In a virtual viewpoint image generation program that generates a virtual viewpoint image based on camera images taken from multiple viewpoints of the subject.
The procedure for classifying each camera as a close-up camera or a pull camera,
The procedure for generating a 3D model of the subject by the visual volume crossing method based on the camera images of the close-up camera and the pull camera,
Let the computer execute the procedure of mapping the texture of each camera image to the 3D model and synthesizing the virtual viewpoint image.
In the procedure for generating the 3D model, a virtual viewpoint image generation program is characterized in that the subject is considered to exist outside the angle of view range of the close-up camera.

It also includes a procedure to determine if each 3D model of the subject is closer to the angle of view boundary of the camera.
The virtual viewpoint image generation program according to claim 14, wherein the 3D model existing on the angle of view boundary is texture-mapped from the camera image of the pulling camera including the inside of the angle of view of the close-up camera.