JP7319939B2

JP7319939B2 - Free-viewpoint video generation method, device, and program

Info

Publication number: JP7319939B2
Application number: JP2020053507A
Authority: JP
Inventors: 良亮渡邊
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2023-08-02
Anticipated expiration: 2040-03-25
Also published as: JP2021152828A

Description

本発明は、視点の異なる複数のカメラ画像に基づいて自由視点映像を生成する方法、装置およびプログラムに係り、特に、オクルージョン部分に欠損が生じない3Dモデルを生成し、オクルージョン部分への適切なテクスチャマッピングを実現する自由視点映像生成方法、装置およびプログラムに関する。 The present invention relates to a method, apparatus, and program for generating a free-viewpoint video based on a plurality of camera images with different viewpoints. The present invention relates to a free-viewpoint video generation method, apparatus, and program that realize mapping.

自由視点映像技術は、視点の異なる複数台のカメラ映像に基づいてカメラが存在しない視点も含めた任意の視点からの映像視聴を可能とする技術である。自由視点映像を実現する一手法として、非特許文献１に開示される視体積交差法に基づく3Dモデルベースの自由視点映像生成手法が存在する。 Free-viewpoint video technology is a technology that enables video viewing from any viewpoint, including viewpoints where no camera exists, based on videos from a plurality of cameras with different viewpoints. As a method for realizing a free viewpoint video, there is a 3D model-based free viewpoint video generation method based on the visual volume intersection method disclosed in Non-Patent Document 1.

視体積交差法は、図１０に示したように各カメラcamの映像から被写体の部分だけを抽出した２値のシルエット画像を用いて、各カメラcamのシルエット画像を3D空間に投影して視体積を求め、その積集合となる部分のみを3DCGのモデルとして残すことによって3Dモデルを生成する手法である。 In the visual volume intersection method, as shown in Fig. 10, using a binary silhouette image obtained by extracting only the part of the subject from the video of each camera cam, the silhouette image of each camera cam is projected into a 3D space to obtain a visual volume This is a method of generating a 3D model by obtaining , and leaving only the intersection of the parts as a 3DCG model.

このような視体積交差法は、非特許文献2に開示されるフルモデル方式自由視点（＝3Dモデルの形状を忠実に表現する方式）や、非特許文献３に開示されるビルボード方式自由視点（＝3Dモデルをビルボードと呼ばれる板の形状で制作し、近いカメラからのテクスチャをビルボードにマッピングする方式）を実現する上での基礎技術として利用されている。 Such visual volume intersection methods include the full model method free viewpoint (= a method that faithfully expresses the shape of a 3D model) disclosed in Non-Patent Document 2, and the billboard method free viewpoint disclosed in Non-Patent Document 3. It is used as a basic technology to realize (= a method of creating a 3D model in the shape of a board called a billboard and mapping textures from a nearby camera onto the billboard).

視体積交差法で利用する積集合を得るためのシルエット画像の抽出手法としては、非特許文献４に代表される背景差分法ベースの手法が知られている。背景差分法は、背景モデルと呼ばれる被写体が存在しない状態のモデルと、入力画像の差分を基に被写体を抽出する手法である。 As a silhouette image extraction method for obtaining a product set used in the visual volume intersection method, a method based on the background subtraction method represented by Non-Patent Document 4 is known. The background subtraction method is a method of extracting a subject based on the difference between a model in which the subject does not exist, called a background model, and an input image.

ところで、例えばスポーツシーンなどでは、フィールド上に移動しない構造物（例えば、サッカーのゴールポストやバレーのネット）が登場するケースがある。背景差分法ベースのシルエット抽出により取得したシルエット画像を用いて視体積交差法を適用する場合、このような構造物が自由視点の品質に悪影響を与える場合がある。 By the way, in sports scenes, for example, structures that do not move on the field (for example, goalposts in soccer or nets in volleyball) may appear. When applying the visual volume intersection method using silhouette images obtained by background subtraction-based silhouette extraction, such structures may adversely affect the quality of the free viewpoint.

例えば、スポーツ選手などの被写体の前にゴールポストなどの構造物が覆いかぶさる場合、これらの構造物は静止していることから背景差分法では背景と判定され、シルエットを抽出できない。 For example, when a structure such as a goalpost overhangs a subject such as an athlete, the background subtraction method cannot extract a silhouette because the structure is stationary because it is determined as a background.

視体積交差法では、モデル化されるか否かを判定するボクセルグリッドに対応するシルエット画像の画素が、多くのカメラにおいて前景と判定されるとボクセルグリッドがモデル化される。この前景判定の閾値となるカメラ台数が少なくなると、誤った部位が3Dモデル化されやすくなることから、実運用としては全てのカメラにおいて前景と判定された場合に、ボクセルグリッドをモデル化するケースが多い。したがって、構造物によってシルエットに欠損が生じていると、図１１に示したように、あるカメラから見て構造物の裏側に存在する被写体に欠損が生じ得る。 In the visual volume intersection method, the voxel grid is modeled when pixels of the silhouette image corresponding to the voxel grid for which it is determined whether or not to be modeled are determined to be the foreground in many cameras. If the number of cameras used as the threshold for determining the foreground decreases, it becomes easier to create a 3D model of an incorrect part. many. Therefore, if the silhouette is lost due to the structure, as shown in FIG. 11, the subject existing on the back side of the structure as seen from a certain camera may be lost.

このような技術課題は、背景差分法を用いたシルエット抽出において現れやすい傾向にあるが、例えば非特許文献５や非特許文献６が開示するDeep Learningをベースとした背景差分法以外のシルエット抽出手法でも、構造物に遮蔽された部分がシルエットとして抽出されない可能性があり、背景差分法に限定されるものではない。 Such technical problems tend to appear in silhouette extraction using the background subtraction method. However, there is a possibility that the portion shielded by the structure may not be extracted as a silhouette, and the method is not limited to the background subtraction method.

特許文献１は、このような技術課題を解決するために、サッカーのゴールポストなどの被写体を遮蔽する構造物のシルエット画像（＝以後「遮蔽物シルエット画像」と表現する場合もある）をカメラごとに用意し、背景差分法で取得した被写体シルエット画像に遮蔽物シルエット画像を加算して得られる統合シルエット画像を用いて視体積交差法を行うことで、遮蔽物による欠損のない3Dモデルの生成を可能にしている。 In order to solve such a technical problem, Patent Document 1 discloses that a silhouette image of a structure that shields a subject, such as a soccer goalpost (hereinafter sometimes referred to as a "shielding object silhouette image"), is captured for each camera. , and by performing the visual volume intersection method using the integrated silhouette image obtained by adding the silhouette image of the shielding object to the silhouette image of the subject obtained by the background subtraction method, it is possible to generate a 3D model without defects due to the shielding object. making it possible.

しかしながら、統合シルエット画像を用いた視体積交差法では、ゴールポストの3Dモデルもモデル化されてしまう。ゴールポストがモデル化されると、例えば非特許文献３のビルボード自由視点を実現する際に、ゴールポストモデルに接触している人物がゴールポストのモデルと一体化して巨大なビルボードが生成され、被写体の表示位置の誤差が大きくなってしまう課題がある。 However, the 3D model of the goal post is also modeled by the visual volume intersection method using the integrated silhouette image. When the goal post is modeled, for example, when realizing the billboard free viewpoint of Non-Patent Document 3, a person who is in contact with the goal post model is integrated with the goal post model to generate a huge billboard. , there is a problem that the error of the display position of the subject becomes large.

すなわち、ビルボード自由視点では、被写体の位置にビルボードというボードを立てて表現を行う都合上、視体積交差法により生成されるモデルの塊ごとに3Dオブジェクトをラベリングし、各々の塊に応じてビルボードが形成される。被写体が巨大な構造物などに触れた場合、被写体と構造物のモデルは一つの大きな塊として扱われ、一つのビルボードにまとめられる。 In other words, in the billboard free viewpoint, a 3D object is labeled for each block of the model generated by the visual volume intersection method for the convenience of expressing by standing a board called a billboard at the position of the subject, and according to each block A billboard is formed. When the subject touches a huge structure, etc., the subject and the model of the structure are treated as one large mass and put together on one billboard.

このビルボードは、ボードの中心を軸にユーザの選択視点に正対するように回転することから、構造物と人物がくっついたまま回転するような違和感を与える。また、この塊が解消された瞬間に人物の表示位置が大幅に変わるなどの違和感の原因となる。 Since this billboard rotates around the center of the board so as to face the user's selected viewpoint, it gives a sense of incongruity as if the structure and the person are rotating while attached to each other. In addition, the display position of the person changes significantly at the moment when the clump is eliminated, which causes a sense of incongruity.

加えて、統合シルエット画像を用いた視体積交差法では、ゴールポストモデルがフレーム毎に形成されることになるので3Dモデルのデータサイズが増大する。 In addition, in the visual volume intersection method using the integrated silhouette image, the goal post model is formed for each frame, so the data size of the 3D model increases.

このような技術課題に対して、特許文献１では視体積交差法で3D空間のモデル化を行った後に、この視体積交差法でモデル化されるゴールポストを削除する機能も開示されている。特許文献１によれば、遮蔽物が被写体を覆い隠す場合であっても欠損のない被写体の3Dシェイプの再構成が可能となる。 In order to address such a technical problem, Patent Document 1 discloses a function of deleting the goalposts modeled by the visual volume intersection method after the 3D space is modeled by the visual volume intersection method. According to Patent Document 1, it is possible to reconstruct the 3D shape of an object without defects even when the object is obscured by an obstacle.

なお、構造物の3Dモデルを削除すると3D空間内に本来あるべき構造物が存在しなくなるが、自由視点映像を視聴する際には、このような構造物は静的な汎用3DCGモデルなどを用いて配置すればよく、このような実装により視体積交差法由来の構造物モデルを用いるよりも形状が正確な3Dモデルを表示させることが可能になる。 If the 3D model of a structure is deleted, the structure that should exist in the 3D space will no longer exist. This kind of implementation makes it possible to display a 3D model with a more accurate shape than using a structure model derived from the visual volume intersection method.

特開2019-106170号公報Japanese Patent Application Laid-Open No. 2019-106170

Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994).Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994). J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), Montreal, QC, 2007, pp. 177-184.J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), Montreal, QC, 2007, pp. 177-184. H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732, (2018)H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732, (2018) C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999).C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999). D. Bolya, C. Zhou, F. Xiao, Y. J. Lee, "YOLACT: Real-Time Instance Segmentation", The IEEE International Conference on Computer Vision (ICCV), pp. 9157-9166, (2019).D. Bolya, C. Zhou, F. Xiao, Y. J. Lee, "YOLACT: Real-Time Instance Segmentation", The IEEE International Conference on Computer Vision (ICCV), pp. 9157-9166, (2019). L. A. Lim and H. Y. Keles, "Learning multi-scale features for foreground segmentation," Pattern Analysis and Applications, pp. 1-12, (2019).L. A. Lim and H. Y. Keles, "Learning multi-scale features for foreground segmentation," Pattern Analysis and Applications, pp. 1-12, (2019). Qiang Yao, Hiroshi Sankoh, Nonaka Keisuke, Sei Naito. "Automatic camera self-calibration for immersive navigation of free viewpoint sports video," 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), 1-6, 2016.Qiang Yao, Hiroshi Sankoh, Nonaka Keisuke, Sei Naito. "Automatic camera self-calibration for immersive navigation of free viewpoint sports video," 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), 1-6, 2016. J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), WeAT17.2, (2019).J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems ( IROS 2019), WeAT17.2, (2019).

特許文献１は、3Dモデル生成（3Dモデルの形状を得る処理）に関する機構を開示するのみで、遮蔽物を考慮したテクスチャマッピングの方法については開示していない。 Patent Literature 1 only discloses a mechanism relating to 3D model generation (processing for obtaining the shape of a 3D model), and does not disclose a method of texture mapping that considers obstructions.

遮蔽物としてサッカーのゴールポストを例にして説明すると、ゴールポストの背後に存在する人物モデルにはゴールポストのテクスチャが映り込まないようにする必要がある。しかしながら、特許文献１が開示する機構を用いてテクスチャマッピングを行うと、ゴールポストのテクスチャが人物の3Dモデルにマッピングされてしまう。 Taking a soccer goal post as an example of a shield, it is necessary to prevent the texture of the goal post from being reflected in the human model behind the goal post. However, when texture mapping is performed using the mechanism disclosed in Patent Document 1, the texture of the goal post is mapped onto the 3D model of the person.

加えて、特許文献１には遮蔽物シルエット画像を自動生成する機構が開示されていない。一般にカメラ台数が多数に及ぶ場合、遮蔽物シルエット画像を手動で作成することは人的リソースの面などから課題が大きいため、自動生成のソリューションが必須である。 In addition, Patent Literature 1 does not disclose a mechanism for automatically generating shielding object silhouette images. In general, when a large number of cameras are used, manually creating silhouette images of shielding objects poses a significant problem in terms of human resources, so an automatic generation solution is essential.

本発明の目的は、上記の技術課題を解決し、オクルージョン部分に欠損が生じない3Dモデルを生成し、かつオクルージョン部分への適切なテクスチャマッピングを実現できる自由視点映像生成方法、装置およびプログラムを提供することにある。 An object of the present invention is to provide a free-viewpoint video generation method, apparatus, and program capable of solving the above technical problems, generating a 3D model without defects in the occlusion part, and realizing appropriate texture mapping to the occlusion part. to do.

上記の目的を達成するために、本発明は、被写体および遮蔽物を視点の異なる複数のカメラで同期撮影したカメラ画像に基づいて自由視点映像を生成する自由視点映像生成装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention provides a free-viewpoint video generating apparatus for generating a free-viewpoint video based on camera images of a subject and a shield that are synchronously captured by a plurality of cameras with different viewpoints, and has the following configuration. It is characterized by the fact that it is equipped.

(1) 本発明は、カメラごとに被写体シルエット画像を生成する手段と、カメラごとに遮蔽物シルエット画像を生成する手段と、カメラごとに被写体および遮蔽物の各シルエット画像を統合して統合シルエット画像を生成する手段と、各統合シルエット画像を用いた視体積交差法により統合3Dモデルを生成する手段と、統合3Dモデルの各部位が各カメラの視点で可視および不可視のいずれであるかを登録したオクルージョン情報を生成する手段と、オクルージョン情報に基づいて、統合3Dモデルの部位ごとに一部のカメラで不可視の部位へ当該部位が可視のカメラで取得したテクスチャをマッピングする手段とを具備した点に第１の特徴がある。 (1) The present invention comprises means for generating a subject silhouette image for each camera, means for generating a shielding object silhouette image for each camera, and an integrated silhouette image by integrating the silhouette images of the subject and the shielding object for each camera. , means to generate an integrated 3D model by visual volume intersection method using each integrated silhouette image, and registration of whether each part of the integrated 3D model is visible or invisible from the viewpoint of each camera. The method includes means for generating occlusion information, and means for mapping textures acquired by a visible camera for each part of the integrated 3D model based on the occlusion information to a part invisible with a camera. There is a first feature.

(2) 本発明は、統合3Dモデルから遮蔽物3Dモデルを減じる手段を更に具備し、マッピングする手段は、遮蔽物の3Dモデルが減ぜられた統合3Dモデルの各部位に前記オクルージョン情報を用いてテクスチャをマッピングするようにした点に第２の特徴がある。 (2) The present invention further comprises means for subtracting the 3D model of the occluder from the integrated 3D model, and the means for mapping uses the occlusion information for each part of the integrated 3D model from which the 3D model of the occluder has been reduced. The second feature is that the texture is mapped by using

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 本発明は前記第１の特徴を具備したので、遮蔽物を考慮して欠損のない3Dモデル生成を行えることに加えて、遮蔽物が存在することによる遮蔽を考慮したテクスチャマッピングが可能になるので、品質面に優れた自由視点映像を生成することができる。 (1) Since the present invention has the first feature, in addition to being able to generate a 3D model with no loss considering the obstructing object, it is possible to perform texture mapping considering the obstruction caused by the existence of the obstructing object. , it is possible to generate a free-viewpoint video with excellent quality.

(2) 本発明は前記第２の特徴を具備したので、3Dモデルのデータ量軽減が期待できることに加えて、ビルボード自由視点を実現する際に、遮蔽物の3Dモデルと被写体の3Dモデルとが統合されたままの巨大なビルボードが回転する現象の発生を抑止できる。 (2) Since the present invention has the second feature, it is possible to reduce the amount of 3D model data. You can prevent the phenomenon of rotating a huge billboard that is still integrated.

発明の第１実施形態に係る自由視点映像生成装置の所要部の構成を示した機能ブロック図である。1 is a functional block diagram showing the configuration of required parts of a free-viewpoint video generating device according to a first embodiment of the invention; FIG. 遮蔽物シルエット画像の生成方法を示した図である。It is the figure which showed the generation method of the shielding object silhouette image. カメラパラメータの例を示した図である。FIG. 4 is a diagram showing an example of camera parameters; 統合シルエット画像の生成方法を示した図である。It is the figure which showed the production|generation method of an integrated silhouette image. レンダリング方法を模式的に示した図である。FIG. 4 is a diagram schematically showing a rendering method; 本発明により生成されるレンダリングモデルを従来技術により生成されるレンダリングモデルと比較した図である。FIG. 4 is a diagram comparing a rendering model generated according to the present invention with a rendering model generated according to the prior art; 発明の第２実施形態に係る自由視点映像生成装置の所要部の構成を示した機能ブロック図である。FIG. 10 is a functional block diagram showing the configuration of the required parts of the free viewpoint video generating device according to the second embodiment of the invention; 複数の視聴端末へ仮想視点の異なるレンダリング画像を配信する多端末配信システムへの適用例（その１）を示した図である。FIG. 10 is a diagram showing an application example (part 1) to a multi-terminal distribution system that distributes rendering images with different virtual viewpoints to a plurality of viewing terminals; 複数の視聴端末へ仮想視点の異なるレンダリング画像を配信する多端末配信システムへの適用例（その２）を示した図である。FIG. 10 is a diagram showing an application example (part 2) to a multi-terminal distribution system that distributes rendering images with different virtual viewpoints to a plurality of viewing terminals; 視体積交差法を説明するための図である。It is a figure for demonstrating the visual volume intersection method. 遮蔽物により被写体シルエット画像に欠損が生じる例を示した図である。FIG. 10 is a diagram showing an example in which a subject silhouette image is deficient due to an obstructing object;

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の第１実施形態に係る自由視点映像生成装置１の所要部の構成を示した機能ブロック図であり、ここではスポーツシーンとしてサッカーに注目し、サッカーの競技シーンを視点の異なる複数のカメラで同期撮影した映像に基づいて自由視点映像を生成する場合を例にして説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing the configuration of essential parts of a free-viewpoint video generating apparatus 1 according to the first embodiment of the present invention. An example will be described in which a free viewpoint video is generated based on video captured synchronously by a plurality of different cameras.

このような自由視点映像生成装置１は、CPU、メモリ、インタフェースおよびこれらを接続するバス等を備えた汎用のコンピュータやモバイル端末に、後述する各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいは、アプリケーションの一部をハードウェア化またはプログラム化した専用機や単能機としても構成できる。 Such a free-viewpoint video generation device 1 can be realized by installing an application (program) that realizes each function described later in a general-purpose computer or mobile terminal equipped with a CPU, a memory, an interface, and a bus connecting them. Configurable. Alternatively, a part of the application can be configured as a dedicated machine or a single-function machine that is hardware or programmed.

カメラ映像取得部１０１は、競技フィールドを撮影する複数のカメラCamからカメラ映像を取得する。本実施形態では、フルモデル自由視点を制作することとし、全てのカメラCamが固定されており、試合中に各カメラの画角が変化することは想定しない。 A camera image acquisition unit 101 acquires camera images from a plurality of cameras that capture images of a competition field. In this embodiment, a full model free viewpoint is produced, all cameras Cam are fixed, and it is not assumed that the angle of view of each camera changes during the game.

被写体シルエット画像生成部１０２は、フレーム間で動きのある動的オブジェクト（以下、被写体と表現する）のシルエット画像を、例えば背景差分法によりカメラ画像ごとにフレーム単位で生成する。 A subject silhouette image generation unit 102 generates a silhouette image of a dynamic object (hereinafter referred to as a subject) that moves between frames for each camera image by, for example, the background subtraction method.

遮蔽物シルエット画像生成部１０３は、フレーム間で動きの無い静的オブジェクト（以下、遮蔽物と表現する）のシルエット画像を、予め定義された汎用の遮蔽物3Dモデルおよびカメラパラメータを用いてカメラごとに自動生成する。前記カメラパラメータは、遮蔽物に代表される既知の構造物から抽出した各特徴点とカメラ画像から抽出した遮蔽物の各特徴点とのマッチング結果に基づいて推定できる。例えば、サッカーの試合におけるゴールポストがスタジアムの３次元空間中のどこに配置されるかという情報は既知である。ゴールポストのサイズも規格で決定されていることを加味すれば、ゴールポストの角などの特徴点の3次元位置は既知である。各カメラから得られる2D画像中からこのような特徴点を特定し、特定された特徴点と既知の3次元位置とのマッチングを取ることで、カメラの位置や向きを特定（＝カメラキャリブレーション）できる。 The shielding object silhouette image generation unit 103 generates a silhouette image of a static object that does not move between frames (hereafter referred to as a shielding object) for each camera using a predefined general-purpose 3D shielding model and camera parameters. automatically generated to . The camera parameters can be estimated based on matching results between each feature point extracted from a known structure represented by a shielding object and each feature point of the shielding object extracted from the camera image. For example, information about where the goalposts in a soccer match are located in the three-dimensional space of a stadium is known. Considering that the size of the goalposts is also determined by the standard, the 3D positions of feature points such as the corners of the goalposts are already known. By identifying such feature points in the 2D images obtained from each camera and matching the identified feature points with known 3D positions, the position and orientation of the camera can be determined (=camera calibration). can.

本実施形態では、カメラが固定されているので遮蔽物シルエット画像の生成は最初に一度だけ行えば良い。生成された遮蔽物シルエット画像は遮蔽物シルエット画像DB１０４に蓄積される。 In this embodiment, since the camera is fixed, it is sufficient to generate the shielding object silhouette image only once at the beginning. The generated shielding object silhouette image is accumulated in the shielding object silhouette image DB 104 .

汎用の遮蔽物3Dモデルは、.objや.fbxなどの汎用3Dモデル形式として用意できるが、本実施形態ではゴールポストが遮蔽物と見なされるところ、その形状は競技規定等により既知である。したがって、汎用3Dモデルを用意する代わりに、複数の直方体や円柱の3Dモデルを組み合わせてゴールポストを模した遮蔽物3Dモデルを生成しても良い。 A general-purpose shield 3D model can be prepared in a general-purpose 3D model format such as .obj or .fbx. Therefore, instead of preparing a general-purpose 3D model, a plurality of 3D models of rectangular parallelepipeds and cylinders may be combined to generate a 3D model of a shield that resembles a goal post.

前記遮蔽物シルエット画像生成部１０３は、競技場を模した3D空間中の所定位置に前記遮蔽物3Dモデルを配置し、図２に示したように、カメラパラメータを用いてこれを各カメラ画像上に逆投影することで遮蔽物シルエット画像を生成する。ここで言うカメラパラメータとは、カメラ行列（内部パラメータ行列）及び外部パラメータ行列のことを指し、例えば、図３のような形式で与えられる。 The shielding object silhouette image generation unit 103 arranges the shielding object 3D model at a predetermined position in a 3D space simulating a stadium, and as shown in FIG. , to generate the silhouette image of the shielding object. The camera parameters referred to here refer to a camera matrix (intrinsic parameter matrix) and an extrinsic parameter matrix, and are given in the format shown in FIG. 3, for example.

カメラパラメータは手動で取得しても良いし、非特許文献７に開示されるように、オートキャリブレーションにより取得しても良い。非特許文献７のようにコートの形状からオートキャリブレーションを行う手法と組み合わせればキャリブレーションまで含めた全過程を全自動で行うことができる。 The camera parameters may be obtained manually or by auto-calibration as disclosed in Non-Patent Document 7. If this is combined with the method of auto-calibrating from the shape of the coat as in Non-Patent Document 7, the entire process including calibration can be performed fully automatically.

前記遮蔽物シルエット画像生成部１０３がカメラごとに出力するゴールポストの遮蔽物シルエット画像には、本発明者等による先の特許出願（特願2019-231270号）の発明を適用することで、その輪郭を膨張する等の画像加工を行ってもよい。 By applying the invention of the previous patent application by the present inventors (Japanese Patent Application No. 2019-231270) to the shielding object silhouette image of the goal post output by the shielding object silhouette image generation unit 103 for each camera, Image processing such as expanding the outline may be performed.

例えば、3Dモデルを逆投影することによって得られるシルエット画像は、シルエット画像自体が離散的な位置しか表現できないことから、誤差が発生して不正確になる可能性がある。このようなシルエットを用いて再び視体積交差法で3Dモデルを生成すると、実際のゴールポストよりも小さいポストモデルが生成されてしまう可能性がある。このような誤差を軽減する観点で、得られたシルエットの輪郭を膨張させるなどのシルエット画像加工を行ってもよい。 For example, a silhouette image obtained by back-projecting a 3D model can be inaccurate due to errors because the silhouette image itself can only represent discrete positions. If a 3D model is generated again by the visual volume intersection method using such a silhouette, there is a possibility that a post model smaller than the actual goal post will be generated. From the viewpoint of reducing such errors, silhouette image processing such as expanding the outline of the obtained silhouette may be performed.

シルエット統合部１０５は、図４に一例を示したように、カメラごとにフレーム単位で遮蔽物シルエット画像と被写体シルエット画像とを統合して統合シルエット画像を生成する。この統合処理は、例えばシルエットの前景が255、背景が0で表現される際に、入力される二つのマスクのいずれかが255であれば被写体を前景とする論理和によって行われる。 As an example is shown in FIG. 4, the silhouette integration unit 105 generates an integrated silhouette image by integrating the shielding object silhouette image and the subject silhouette image frame by frame for each camera. For example, when the foreground of the silhouette is represented by 255 and the background is represented by 0, if either of the two input masks is 255, the object is the foreground.

3Dモデル生成部１０６は、シルエット統合部１０５が出力するN枚の統合シルエット画像を用いた視体積交差法により、被写体および遮蔽物の3Dボクセルモデル（統合3Dモデル）を生成する。本実施形態では、3Dモデル生成の対象範囲（例えば、スポーツ映像なら当該スポーツが行われるフィールド等）に単位ボクセルサイズMでボクセルグリッドを配置しておき、ボクセルグリッドごとに3Dモデルを形成するか否かが視体積交差法に基づいて判定される。 The 3D model generating unit 106 generates a 3D voxel model (integrated 3D model) of the subject and the shielding objects by the visual volume intersection method using the N integrated silhouette images output by the silhouette integrating unit 105 . In this embodiment, voxel grids are arranged with a unit voxel size M in the target range of 3D model generation (for example, in the case of a sports video, the field where the sport is played, etc.), and whether or not to form a 3D model for each voxel grid is determined. is determined based on the visual volume intersection method.

視体積交差法は、N枚のシルエット画像を3次元ワールド座標に投影した際の視錐体の共通部分を次式(1)に基づいて視体積（Visual Hull）VH(I)として獲得する技術である。 The visual volume intersection method is a technique to acquire the common portion of the visual frustum when N silhouette images are projected onto the 3D world coordinates as the visual volume (Visual Hull) VH(I) based on the following equation (1). is.

上式(1)にて、集合Iは各カメラのシルエット画像の集合であり、V_iはi番目のカメラから得られるシルエット画像に基づいて計算される視錐体である。また、通常はN枚全てのカメラの共通部分となる部分がモデル化されるが、N-1枚が共通する場合にモデル化するなど、モデル化が成されるカメラ台数に関しては変更してもよい。視体積が生成されるカメラ台数の閾値を下げることで、少ない枚数のシルエット画像で被写体が欠けた場合にも3Dモデルの復元が可能になる一方、ノイズが多くなるなどの副作用が現れる可能性がある。このカメラ台数の閾値は手動で設定される。 In the above equation (1), set I is a set of silhouette images of each camera, and V _i is a viewing cone calculated based on the silhouette image obtained from the i-th camera. Also, normally, the part that is common to all N cameras is modeled, but the number of cameras that are modeled can be changed, such as modeling when N-1 cameras are common. good. By lowering the threshold for the number of cameras that generate the visual volume, it is possible to restore the 3D model even if the subject is missing in a small number of silhouette images, but there is a possibility that side effects such as increased noise will appear. be. This threshold for the number of cameras is set manually.

前記3Dモデル生成部１０６が出力する統合3Dモデルでは、ゴールポスト部分のシルエットが統合できているため、遮蔽物の背後に隠れる物体について欠損のない3Dモデルを生成することが可能となる。 In the integrated 3D model output by the 3D model generation unit 106, the silhouettes of the goalposts are integrated, so it is possible to generate a defect-free 3D model of an object hidden behind an obstacle.

この視体積交差法の処理は、非特許文献８に示されるような２段階の視体積交差法に対して行ってもよい。この場合、２段階の視体積交差法のいずれの段階でも、シルエット統合部で生成した統合シルエット画像を利用して視体積交差法でモデル化を行う。 This visual volume intersection method processing may be applied to the two-stage visual volume intersection method as shown in Non-Patent Document 8. In this case, in both stages of the two-stage visual volume intersection method, modeling is performed by the visual volume intersection method using the integrated silhouette image generated by the silhouette integration unit.

このとき、例えばマーチンキューブ法などのボクセルモデルをポリゴンモデルに変換する手法を用いてボクセルモデルをポリゴンモデルに変換する機能を追加し、ポリゴンモデルとして3Dモデルを出力する機能を有していても良い。本実施例では、3Dモデル生成部１０６で視体積交差法を行った後、マーチンキューブ法に基づいてボクセルモデルがポリゴンモデルに変換される。 At this time, a function of converting a voxel model into a polygon model using a method such as the Martin Cube method for converting a voxel model into a polygon model may be added, and a function of outputting a 3D model as a polygon model may be provided. . In this embodiment, after the visual volume intersection method is performed by the 3D model generation unit 106, the voxel model is converted into a polygon model based on the martin cube method.

遮蔽物3Dモデル生成部１０７は、前記遮蔽物シルエット画像DB１０４にカメラごとに蓄積されている遮蔽物シルエット画像を用いた視体積交差法により遮蔽物3Dモデルを生成する。本実施形態ではカメラが固定されているので遮蔽物3Dモデルの生成はカメラごとに一度だけ行えば良い。 The shielding object 3D model generation unit 107 generates a shielding object 3D model by the visual volume intersection method using the shielding object silhouette images accumulated for each camera in the shielding object silhouette image DB 104 . In this embodiment, since the cameras are fixed, the shielding object 3D model needs to be generated only once for each camera.

生成された遮蔽物3Dモデルは、後述する遮蔽物3Dモデル減算部１０９に入力されるが、当該遮蔽物3Dモデル生成部１０７を省略し、前記遮蔽物シルエット画像生成部１０３が遮蔽物シルエット画像の生成に用いた汎用の遮蔽物3Dモデルを遮蔽物3Dモデル減算部１０９に直接入力するようにしても良い。 The generated shielding object 3D model is input to the shielding object 3D model subtraction unit 109, which will be described later. The general-purpose shielding object 3D model used for generation may be directly input to the shielding object 3D model subtraction unit 109 .

しかしながら、遮蔽物3Dモデル減算部１０９へは、前記統合3Dモデルとして統合された遮蔽物3Dモデルと同一モデルを入力することが望ましいことから、本実施形態では統合された遮蔽物3Dモデルを視体積交差法により生成した際に用いた遮蔽物シルエット画像を用いて遮蔽物3Dモデルを生成している。 However, since it is desirable to input the same model as the integrated 3D model of the shield as the integrated 3D model to the shield 3D model subtraction unit 109, in this embodiment, the integrated 3D model of the shield is used as the visual volume A 3D model of the shielding object is generated using the silhouette image of the shielding object generated by the intersection method.

オクルージョン情報生成部１０８は、3Dモデルのオクルージョン情報の計算を行う。オクルージョン情報とは、生成された統合3Dモデルの各部位が各カメラから可視または遮蔽による不可視のいずれの状態であるかを記録した情報であり、後述する自由視点レンダリング部１１０は、当該オクルージョン情報を参照することによって、不可視部位のテクスチャマッピングを可視のカメラ映像に基づいて行えるようになる。 The occlusion information generator 108 calculates occlusion information of the 3D model. Occlusion information is information that records whether each part of the generated integrated 3D model is visible from each camera or invisible due to shielding. By referencing, texture mapping of invisible parts can be performed based on visible camera images.

本実施例では、3Dモデル生成部１０６により3Dのポリゴンモデルが生成されるため、3Dポリゴンモデルの各頂点部位に関する遮蔽関係がオクルージョン情報として記録される。例えば、N台のカメラが存在する環境であれば、3Dポリゴンモデルの頂点部位ごとにN個のオクルージョン情報が記録される。 In this embodiment, since a 3D polygon model is generated by the 3D model generation unit 106, the shielding relationship regarding each vertex portion of the 3D polygon model is recorded as occlusion information. For example, in an environment with N cameras, N pieces of occlusion information are recorded for each vertex part of the 3D polygon model.

本実施形態では、頂点部位が可視であれば「1」、不可視であれば「0」などの形式でオクルージョン情報が記録される。これにより各頂点部位のオクルージョン情報を可視／不可視の1bitで表現できる。オクルージョン情報は、遮蔽物に起因した遮蔽のみならず、他の被写体に起因した遮蔽も含めて全ての遮蔽関係が考慮される。 In this embodiment, the occlusion information is recorded in a format such as "1" if the vertex part is visible and "0" if it is invisible. This makes it possible to express the occlusion information of each vertex part with visible/invisible 1 bit. The occlusion information takes into consideration not only the occlusion caused by the occluding object, but also all occlusion relationships including occlusion caused by other subjects.

例えば、二人の選手A，Bがあるカメラ視点で重なることでオクルージョンが発生し、このとき選手Aが選手Bを覆い隠していれば選手Bに選手Aのテクスチャが映り込まないようにテクスチャをマッピングする必要がある。このような場合、選手Bの不可視となる頂点部位もオクルージョン情報が「０」（不可視）として記録される。 For example, occlusion occurs when two players A and B overlap from a certain camera viewpoint, and if player A covers player B at this time, the texture of player A will not be reflected on player B. need to be mapped. In such a case, the occlusion information of the invisible vertex portion of player B is also recorded as "0" (invisible).

遮蔽物3Dモデル減算部１０９は、3Dモデル生成部１０６が生成した統合3Dモデルから遮蔽物3Dモデルに相当する部分を取り除く処理を行う。本実施形態では、遮蔽物3Dモデル生成部１０７が生成した遮蔽物3Dモデルの位置を参照し、その位置に存在するポリゴンが統合3Dモデルから消去される。 The shielding object 3D model subtraction unit 109 performs processing for removing a portion corresponding to the shielding object 3D model from the integrated 3D model generated by the 3D model generation unit 106 . In this embodiment, the position of the shielding 3D model generated by the shielding 3D model generation unit 107 is referred to, and the polygon existing at that position is deleted from the integrated 3D model.

この減算処理を行うことによって3Dモデルのデータ量軽減が期待できることに加えて、ビルボード自由視点を実現する際に、ポストの3Dモデルと被写体の3Dモデルが繋がってしまい、巨大なビルボードが回転する現象の発生を抑止できる。 In addition to the fact that this subtraction process can be expected to reduce the amount of 3D model data, when implementing the billboard free viewpoint, the 3D model of the post and the 3D model of the subject are connected, causing the huge billboard to rotate. It is possible to suppress the occurrence of the phenomenon that

自由視点レンダリング部１１０は、遮蔽物3Dモデル減算部１０９が出力する被写体のみの3Dモデル、オクルージョン情報生成部１０８で生成されたオクルージョン情報および各カメラ画像（テクスチャ）を用いて、任意の仮想視点p_vから見た合成映像をレンダリングする。 The free-viewpoint rendering unit 110 renders an arbitrary virtual viewpoint p Render the composite image as seen from _v .

図５は、自由視点レンダリング部１１０によるレンダリング方法を模式的に示した図である。本実施形態では、統合3Dモデルから遮蔽3Dモデルを減じて取得した実質的に被写体の3Dモデルの各部位（本実施形態では、ポリゴン）の可視／不可視をオクルージョン情報に基づいてカメラごとに判断し、一部のカメラ画像で不可視の部位を他の可視のカメラ画像を用いてテクスチャマッピングするようにしている。 FIG. 5 is a diagram schematically showing a rendering method by the free viewpoint rendering section 110. As shown in FIG. In this embodiment, the visibility of each part (polygon in this embodiment) of the 3D model of the subject obtained by subtracting the shielding 3D model from the integrated 3D model is determined for each camera based on the occlusion information. , texture mapping is performed on an invisible part in some camera images using other visible camera images.

本実施形態では、初めに要求された仮想視点p_vに最近傍の２台のカメラCam₁，Cam₂を選択し、各カメラ画像Ic₁，Ic₂を3DモデルM_jのポリゴンgにマッピングする。その前処理として、本実施形態ではポリゴンgを構成する全ての頂点のオクルージョン情報を用いて当該ポリゴンgの可視判定を行う。ポリゴンgが三角ポリゴンであれば、３つの頂点の各オクルージョン情報に基づいて可視判定が行われる。 In this embodiment, two cameras Cam ₁ and Cam ₂ closest to the first requested virtual viewpoint p _v are selected, and each camera image Ic ₁ and Ic ₂ is mapped onto the polygon g of the 3D model M _j . . As a pre-process, in this embodiment, the visibility of the polygon g is determined using the occlusion information of all the vertices forming the polygon g. If the polygon g is a triangular polygon, visibility determination is made based on the occlusion information of each of the three vertices.

例えば、カメラCam1に対するポリゴンgの可視判定フラグをg_c1と表現するとき、三角ポリゴンgを構成する３頂点の全てが可視であればフラグg_c1は可視、３頂点のうちいずれか一つでも不可視であればフラグg_c1は不可視とされる。このようにして各ポリゴンの可視判定の結果が得られると、以下のようにケース別でテクスチャマッピングが行われる。 For example, when the visibility determination flag of polygon g for camera Cam1 is expressed as g _c1 , the flag g _c1 is visible if all three vertices that make up triangular polygon g are visible, and even one of the three vertices is invisible. , the flag g _c1 is made invisible. After obtaining the results of the visibility determination for each polygon in this manner, texture mapping is performed for each case as follows.

ケース１．フラグg_c1，g_c2がいずれも可視の場合：
次式(2)によりアルファブレンドによるマッピングが行われる。 Case 1. If both flags g _c1 and g _c2 are visible:
Mapping by alpha blending is performed by the following equation (2).

ここで、texture_c1(g)、texture_c2(g)はポリゴンgがカメラCam₁，Cam₂において対応するカメラ画像領域を示し、texture(g)は当該ポリゴンにマッピングされるテクスチャを示す。また、アルファブレンドの比率aは仮想視点p_vと各カメラ視点pc₁，pc₂との距離（アングル）の比に応じて算出される。 Here, texture _c1 (g) and texture _c2 (g) indicate camera image areas corresponding to polygon g in cameras Cam ₁ and Cam ₂ , and texture (g) indicates the texture mapped to the polygon. Also, the alpha blend ratio a is calculated according to the ratio of the distances (angles) between the virtual viewpoint _pv and the camera viewpoints _pc1 and _pc2 .

ケース２．フラグg_c1，g_c2のいずれかのみが可視の場合：
可視であるカメラのテクスチャのみを用いてポリゴンgがレンダリングされる。すなわち上式(2)において、可視であるカメラのtexture_ci(g)に対応するアルファブレンド比率aの値を1とする。その他の形態としては、仮想視点p_vからみて次に近いカメラCam₃を、カメラCam₁，Cam₂うち不可視であるカメラの代わりとして参照する。この際、テクスチャのアルファブレンドの方法は上式(2)と同様である。 Case 2. If only one of the flags g _c1 , g _c2 is visible:
Polygon g is rendered using only the camera textures that are visible. That is, in the above equation (2), the value of the alpha blend ratio a corresponding to the visible camera texture _ci (g) is set to 1. Alternatively, the camera _Cam3 , which is next closest to the virtual viewpoint _pv , is referred to instead of the invisible camera among the cameras _Cam1 and _Cam2 . At this time, the texture alpha-blending method is the same as the above formula (2).

ケース３．フラグg_c1，g_c2の全てが不可視である場合：
仮想視点p_vからみて次に近いカメラCam₃のテクスチャを用いてレンダリングする。カメラCam₃も不可視である場合は、さらに次に近いカメラCam₄…といったように、距離の近いカメラから順にカメラテクスチャを参照する。この際、順次参照するカメラの台数を２以上として、上式(2)に則ってブレンディング処理を行っても良い。 Case 3. If all flags g _c1 , g _c2 are invisible:
Render using the texture of the camera Cam ₃ , which is next closest to the virtual viewpoint p _v . If the camera Cam ₃ is also invisible, the camera textures are referenced in order from the closest camera, such as the next closest camera Cam ₄ , and so on. At this time, the number of cameras to be sequentially referred to may be two or more, and the blending process may be performed according to the above equation (2).

上記の例では、初期参照する近傍カメラ台数を２台としているが、ユーザ設定により変更しても良い。その際、初期参照カメラ台数bに応じて、上式(2)はb台のカメラの線形和（重みの総和が１）とする拡張が行われる。また、全てのカメラにおいて不可視となったポリゴンについてはテクスチャがマッピングされない。 In the above example, the number of nearby cameras to be initially referred to is two, but it may be changed by user settings. At that time, the above equation (2) is extended to the linear sum of the b cameras (sum of weights is 1) according to the initial number of reference cameras b. Also, textures are not mapped for polygons that are invisible to all cameras.

なお、自由視点レンダリング部１１０における遮蔽物3Dモデルの表示は、予め用意された汎用3Dモデルなどを入力として、それを配置することで行われる。これは、ゴールポストなどの3Dモデルは一般的に時刻と共に大きく変化することがないことに加え、視体積交差法由来のモデルはあくまでN台のカメラから合成することで生成された3Dモデルのため、品質面でも事前に用意されたものに劣る可能性が高いからである。 The display of the shielding 3D model in the free-viewpoint rendering unit 110 is performed by inputting a general-purpose 3D model prepared in advance and arranging it. This is because 3D models such as goalposts generally do not change significantly over time, and models derived from the visual volume intersection method are 3D models generated by synthesizing from N cameras. , is likely to be inferior to those prepared in advance in terms of quality.

図６は、本実施形態により生成されるレンダリングモデル[同図(b)]を従来技術により生成されるレンダリング画像[同図(a)]と比較した図である。 FIG. 6 is a diagram comparing the rendering model [FIG. 6(b)] generated by the present embodiment with the rendering image [FIG. 6(a)] generated by the conventional technique.

従来技術では、ゴールポストにより遮蔽されるシルエット画像の左脚部分に欠損が生じているのに対して、本実施形態により生成されたレンダリングモデルでは左脚部分にテクスチャが正確にマッピングされており、欠損や違和感のない正確な自由視点映像が再現されていることが判る。 In the conventional technology, the left leg portion of the silhouette image blocked by the goal post is missing, whereas in the rendering model generated by this embodiment, the texture is accurately mapped to the left leg portion, It can be seen that an accurate free-viewpoint video image is reproduced without loss or discomfort.

なお、上記の第１実施形態では遮蔽物3Dモデル減算部１０９を設け、統合3Dモデルから遮蔽物3Dモデルを除去し、実質的に被写体3Dモデルのみを対象にレンダリングを行うものとして説明した。 In the above-described first embodiment, the shielding object 3D model subtraction unit 109 is provided, the shielding object 3D model is removed from the integrated 3D model, and rendering is performed substantially only for the subject 3D model.

しかしながら、本発明はこれのみに限定されるものではなく、図７に示した第２実施形態のように、遮蔽物3Dモデル生成部１０７および遮蔽物3Dモデル減算部１０９を省略し、遮蔽物3Dモデルが減算されていない統合3Dモデルを対象に自由視点レンダリングが行われるようにしても良い。 However, the present invention is not limited to this. As in the second embodiment shown in FIG. Free-viewpoint rendering may be performed on an integrated 3D model in which the model is not subtracted.

このようにしても、自由視点レンダリング部１１０において遮蔽物3Dモデルが汎用3Dモデルなどで入力される3DCGで覆い隠されれば見た目の違和感は生じにくい。 Even in this case, if the 3D model of the shielding object is covered with 3DCG input as a general-purpose 3D model or the like in the free-viewpoint rendering unit 110, the appearance of the 3D model is less likely to appear strange.

図８，９は、複数の視聴端末へ仮想視点の異なるレンダリング画像を配信する多端末配信システムへの適用例を示した図である。 8 and 9 are diagrams showing examples of application to a multi-terminal distribution system that distributes rendering images with different virtual viewpoints to a plurality of viewing terminals.

一般に、3Dモデルの生成やオクルージョン情報は各フレームに対して1回計算されればよいため、ハイエンドなPCなどで高速に計算を行って保存しておく。そして、この3Dモデルやオクルージョン情報を、自由視点を視聴したい視聴端末に配信し、各視聴端末にレンダリング部を配置するような構成とすることで、ハイエンドなPCが１台と、低スペックな複数の視聴端末とで多端末配信を実現できる。 In general, 3D model generation and occlusion information need only be calculated once for each frame, so high-speed calculations are performed on a high-end PC and saved. Then, by distributing this 3D model and occlusion information to viewing terminals that want to view the free viewpoint, and arranging a rendering unit on each viewing terminal, one high-end PC and multiple low-spec PCs can be used. Multi-terminal delivery can be realized with multiple viewing terminals.

3Dモデルの遮蔽関係自体は、自由視点レンダリング部１１０に入力される3Dモデルを用いて当該レンダリング部で改めて計算することも可能である。しかしながら、事前にオクルージョン情報という形で保存しておくことで、レンダリング部はオクルージョン情報を参照するだけで遮蔽関係を読み解くことが可能になることから、自由視点レンダリング部１１０の処理負荷を低減できる効果が期待される。 The shielding relationship itself of the 3D model can also be recalculated by the rendering unit using the 3D model input to the free viewpoint rendering unit 110 . However, by storing the occlusion information in advance, the rendering unit can decipher the occluded relationship simply by referring to the occlusion information, so the processing load of the free viewpoint rendering unit 110 can be reduced. There is expected.

図８の例では、レンダリングに特化した複数の専用PCを用意し、各視聴端末からの視聴要求に応答して視点の異なる自由視点映像をレンダリングして配信している。 In the example of FIG. 8, a plurality of dedicated PCs specialized for rendering are prepared, and free viewpoint videos with different viewpoints are rendered and distributed in response to viewing requests from each viewing terminal.

図９の例では、各視聴端末に自由視点レンダリング部１１０を実装し、視聴端末ごとにレンダリングが実行されるようにしている。 In the example of FIG. 9, a free-viewpoint rendering unit 110 is installed in each viewing terminal so that rendering is executed for each viewing terminal.

なお、上記の実施形態では各カメラが固定である場合を例にして説明したが、本発明はこれのみに限定されるものではなく、移動カメラを用いる場合にも同様に適用できる。以下、移動カメラを用いる場合に第１実施形態から変更される構成について説明する。 In the above embodiment, the case where each camera is fixed has been described as an example, but the present invention is not limited to this, and can be similarly applied to the case of using a moving camera. The configuration that is changed from the first embodiment when using a moving camera will be described below.

前記被写体シルエット画像生成部１０２は、第１実施形態では背景差分法を用いて被写体シルエット画像を生成した。しかしながら、移動カメラを用いると背景が変化し、被写体シルエット画像を背景差分法で生成することは難しい。そこで、非特許文献５に開示されるように、フレームごとに独立した処理を行えるシルエット抽出手法を用いることができる。 The subject silhouette image generation unit 102 generates the subject silhouette image using the background subtraction method in the first embodiment. However, when a moving camera is used, the background changes, and it is difficult to generate a subject silhouette image by the background subtraction method. Therefore, as disclosed in Non-Patent Document 5, a silhouette extraction method capable of performing independent processing for each frame can be used.

前記遮蔽物シルエット生成部１０３は、第１実施形態では遮蔽物3Dモデルおよびカメラパラメータに基づいて、最初に１回だけ遮蔽物シルエット画像を生成するものとして説明した。しかしながら、移動カメラを用いるとフレームごとにカメラパラメータが変化する。そこで、フレームごとに最新のカメラパラメータに基づいて3Dモデルを3D空間に配置し、これを各カメラ画像上に逆投影することができる。 In the first embodiment, the shielding object silhouette generation unit 103 has been described as generating the shielding object silhouette image only once at the beginning based on the 3D model of the shielding object and the camera parameters. However, with a moving camera the camera parameters change from frame to frame. A 3D model can then be placed in 3D space based on the latest camera parameters for each frame and backprojected onto each camera image.

なお、カメラパラメータの算出作業をフレームごとに手動で行うことは困難であることから、非特許文献７に開示されるように、オートキャリブレーションを行いながらフレームごとにカメラ行列および外部パラメータ行列を計算し、フレームごとに異なる遮蔽物シルエット画像を算出するようにしても良い。 Since it is difficult to manually calculate the camera parameters for each frame, the camera matrix and the extrinsic parameter matrix are calculated for each frame while performing auto-calibration, as disclosed in Non-Patent Document 7. Alternatively, a different shielding object silhouette image may be calculated for each frame.

遮蔽物シルエット画像生成部１０３は、第１実施形態では遮蔽物3Dモデルおよびカメラパラメータに基づいて、最初に１回だけ遮蔽物シルエット画像を生成するものとして説明した。しかしながら、移動カメラを用いるとフレームごとに遮蔽物シルエット画像生成部１０３で生成される遮蔽物シルエット画像が変化するので、遮蔽物3Dモデル生成部１０７もフレームごとに遮蔽物3Dモデルを生成する機能を具備していてもよい。 In the first embodiment, the shielding object silhouette image generation unit 103 has been described as first generating a shielding object silhouette image only once based on the 3D model of the shielding object and the camera parameters. However, if a moving camera is used, the shielding object silhouette image generated by the shielding object silhouette image generating unit 103 changes for each frame, so the shielding object 3D model generating unit 107 also has a function of generating a shielding 3D model for each frame. may be provided.

１…自由視点映像生成装置，１０１…カメラ映像取得部，１０２…被写体シルエット画像生成部，１０３…遮蔽物シルエット画像生成部，１０４…遮蔽物シルエット画像DB，１０５…シルエット統合部，１０６…3Dモデル生成部，１０７…遮蔽物3Dモデル生成部，１０８…オクルージョン情報生成部，１０９…遮蔽物3Dモデル減算部，１１０…自由視点レンダリング部 Reference Signs List 1 Free-viewpoint video generation device 101 Camera video acquisition unit 102 Subject silhouette image generation unit 103 Shielding object silhouette image generation unit 104 Shielding object silhouette image DB 105 Silhouette integration unit 106 3D model Generating unit 107 : occlusion 3D model generation unit 108 : occlusion information generation unit 109 : shielding 3D model subtraction unit 110 : free viewpoint rendering unit

Claims

A free-viewpoint video generation device that generates a free-viewpoint video based on camera images obtained by synchronously capturing a subject and a shield with a plurality of cameras with different viewpoints,
means for generating a subject silhouette image for each camera;
means for generating a shielding object silhouette image for each camera;
means for generating an integrated silhouette image by integrating the silhouette images of the subject and the shielding object for each camera;
means for generating an integrated 3D model by a visual volume intersection method using each integrated silhouette image;
means for generating occlusion information that registers whether each part of the integrated 3D model is visible or invisible at the viewpoint of each camera;
means for mapping, for each part of the integrated 3D model, a texture obtained by a camera in which the part is visible to a part that is invisible to some cameras, based on the occlusion information. Video production device.

means for subtracting an occluder 3D model from the integrated 3D model;
2. The free-viewpoint video generation apparatus according to claim 1, wherein the mapping means uses the occlusion information to map a texture to each part of the integrated 3D model from which the 3D model of the obstructing object has been reduced.

3. The free-viewpoint video generation device according to claim 2, wherein the shielding object 3D model is a general-purpose 3D model imitating the shielding object.

means for generating a 3D model of the shield based on the silhouette image of the shield;
3. The free-viewpoint video generation apparatus according to claim 2, wherein the means for subtracting the shielding object 3D model subtracts the generated shielding object 3D model from the integrated 3D model.

the integrated 3D model is a polygon model,
5. The free-viewpoint video generation according to any one of claims 1 to 4, wherein in the occlusion information, for each vertex portion of each polygon, whether it is visible or invisible at the viewpoint of each camera is registered. Device.

The means for generating the shield silhouette image arranges a separately prepared 3D model of the shield at a fixed position of the shield in a three-dimensional space, and reverses the shield at the fixed position to each camera based on the camera parameters. 6. The free-viewpoint video generating apparatus according to any one of claims 1 to 5, wherein each shielding object silhouette image is generated by projection.

6. The camera parameters are estimated based on matching results between each feature point extracted from a known structure that simulates the shield and each feature point of the shield extracted from the camera image. 3. The free-viewpoint video generation device according to .

In a free-viewpoint video generation method in which a computer generates a free-viewpoint video based on camera images synchronously photographed by a plurality of cameras with different viewpoints of a subject and shielding objects,
Generate a subject silhouette image for each camera,
Generating shielding object silhouette images for each camera,
Generate an integrated silhouette image by integrating each silhouette image of the subject and the shielding object for each camera,
Generate an integrated 3D model by the visual volume intersection method using each integrated silhouette image,
generating occlusion information that registers whether each part of the integrated 3D model is visible or invisible at the viewpoint of each camera;
A free-viewpoint video generation method, characterized in that, based on the occlusion information, for each part of the integrated 3D model, a texture obtained by a camera that allows the part to be visible is mapped to a part that is invisible to a part of the cameras.

subtracting an occluder 3D model from the integrated 3D model;
9. The free-viewpoint video generation method according to claim 8, wherein a texture is mapped to each part of the integrated 3D model from which the 3D model of the obstructing object is reduced, using the occlusion information.

In a free-viewpoint video generation program that generates a free-viewpoint video based on camera images synchronously photographed by multiple cameras with different viewpoints of a subject and shields,
a procedure for generating a subject silhouette image for each camera;
A procedure for generating a shielding object silhouette image for each camera;
a procedure for generating an integrated silhouette image by integrating silhouette images of a subject and shielding objects for each camera;
A procedure for generating an integrated 3D model by the visual volume intersection method using each integrated silhouette image;
A procedure for generating occlusion information that registers whether each part of the integrated 3D model is visible or invisible at the viewpoint of each camera;
a step of mapping a texture acquired by a camera that is visible to a part that is invisible to some cameras for each part of the integrated 3D model based on the occlusion information;
A free-viewpoint video generation program that causes a computer to execute

subtracting an occluder 3D model from the integrated 3D model;
11. The free-viewpoint video generating program according to claim 10, wherein, in the mapping step, textures are mapped to each part of the integrated 3D model from which the 3D model of the obstructing object has been reduced, using the occlusion information.