JP2021056679A

JP2021056679A - Image processing apparatus, method and program

Info

Publication number: JP2021056679A
Application number: JP2019178048A
Authority: JP
Inventors: 智明今野; Tomoaki Konno
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2019-09-27
Filing date: 2019-09-27
Publication date: 2021-04-08
Anticipated expiration: 2039-09-27
Also published as: JP7197451B2

Abstract

To provide an image processing apparatus for generating a three dimensional model that can suppress calculation costs for generating a model by considering user visual field information and can realize a natural display even when used for drawing.SOLUTION: An image processing apparatus 10 comprises: an extraction unit 12 for extracting an object area being photographed as a mask image from an image of each viewpoint in a multi-viewpoint image; and a generation unit 13 for applying a visual volume crossing method to the mask image to generate a three dimensional model for the object by determining whether or not each voxel belongs to the model. The generation unit 13 arranges the acquired depth information of a user viewpoint in a voxel space with reference to a virtual camera viewpoint to determine a spatial position given by the depth information before applying the visual volume crossing method, then determines whether each voxel is closer to or farther from the virtual camera viewpoint than the spatial position given by the depth information, and applies the visual volume crossing method only to the voxel determined that it is closer.SELECTED DRAWING: Figure 1

Description

本発明は、ユーザの視界情報を考慮することでモデル生成の計算コストを抑制でき、且つ、描画に用いた際にも自然な表示を実現できる３Ｄモデルを生成する画像処理装置、方法及びプログラムに関する。 The present invention relates to an image processing apparatus, method and program for generating a 3D model that can suppress the calculation cost of model generation by considering the user's visibility information and can realize a natural display even when used for drawing. ..

現実空間の物理オブジェクトに、仮想オブジェクトを重畳してユーザに提示するAR（Augmented Reality、拡張現実）技術に関しての研究・開発が進められている。ユーザは、スマートフォンやスマートグラスなどのHMD（Head Mounted Display、ヘッドマウントディスプレイ）を利用することで、ビデオシースルー方式や光学シースルー方式でのAR表現が可能となる。ARでの表現力を高める上で、前後関係など、表示される仮想オブジェクトとユーザの周辺環境の物理オブジェクトとが自然であることは重要である。 Research and development is underway on AR (Augmented Reality) technology that superimposes virtual objects on physical objects in real space and presents them to users. By using HMDs (Head Mounted Display) such as smartphones and smart glasses, users can perform AR expression using the video see-through method or optical see-through method. In order to enhance the expressiveness in AR, it is important that the displayed virtual objects such as contexts and the physical objects in the user's surrounding environment are natural.

仮想オブジェクトを表示するときのユーザ周辺の実オブジェクトとの幾何学的整合性を考慮した表示を行うためのシステムが提案されている（特許文献１）。一方で、仮想オブジェクトの生成方法に関して、カメラ映像を利用して３Ｄ（３次元）モデルを生成する手法がある。例えば、複数のカメラを被写体の周りを囲むように配置して撮影された映像から３Ｄモデルを生成することができる（特許文献２）。特許文献２においては、被写体が複数いる場合、遠景のオブジェクトが近景のオブジェクトに隠れてしまう場合、幾何学的整合性を保つため、遠景のオブジェクトを近景のオブジェクトで上書きするといった手法が記載されている。 A system has been proposed for displaying a virtual object in consideration of geometrical consistency with a real object around the user (Patent Document 1). On the other hand, regarding the method of generating a virtual object, there is a method of generating a 3D (three-dimensional) model using a camera image. For example, a 3D model can be generated from an image taken by arranging a plurality of cameras so as to surround the subject (Patent Document 2). Patent Document 2 describes a method of overwriting a distant object with a near object in order to maintain geometrical consistency when there are a plurality of subjects, or when a distant object is hidden by a near object. There is.

特開２０１８−１０６２６２号公報JP-A-2018-106262 特開２０１９−１０１７９５号公報Japanese Unexamined Patent Publication No. 2019-101795 特開２０１８−１６３４６７号公報JP-A-2018-163467

A. Laurentini, ``The visual hull concept for silhouette-based image understanding,'' IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 2, Feb 1994.A. Laurentini, `` The visual hull concept for silhouette-based image understanding,'' IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 2, Feb 1994.

AR体験においては、ユーザの視界に含まれる物理オブジェクトとの整合性を保った上で、リアルタイムに変化するユーザの視点位置に応じて仮想オブジェクトが表示されることで、体験品質の向上が期待される。しかしながら、上述の複数のカメラ映像を利用した３Ｄモデル生成を行う場合、カメラの台数や３Ｄモデル化する対象となる領域の密度によって、計算コストが大きくなる可能性がある。 In the AR experience, it is expected that the experience quality will be improved by displaying virtual objects according to the user's viewpoint position that changes in real time while maintaining consistency with the physical objects included in the user's field of view. The object. However, when generating a 3D model using the above-mentioned plurality of camera images, the calculation cost may increase depending on the number of cameras and the density of the area to be 3D modeled.

この計算コスト増大への対処として、ユーザの視界情報を考慮することで、表示上不必要な領域に関して、３Ｄモデル化する対象領域を削減できる可能性がある。しかしながら、従来技術ではこの可能性については検討されておらず、この点において課題を有していた。 As a countermeasure against this increase in calculation cost, by considering the user's visibility information, there is a possibility that the target area to be 3D modeled can be reduced with respect to the area unnecessary for display. However, this possibility has not been examined in the prior art, and there is a problem in this respect.

特許文献１では、ユーザの視界情報を考慮して物理オブジェクトと仮想オブジェクトがマッチするような表示の仕組みについて記載されているが、仮想オブジェクトの生成における計算コストの削減に関しては記載がなされていない。すなわち、仮想オブジェクトと実オブジェクトとの幾何学的整合性を確認するためには、表示上不必要な領域であっても仮想オブジェクトを点群データとして求める必要があり、計算コストを削減できない。 Patent Document 1 describes a display mechanism that matches a physical object and a virtual object in consideration of user's visibility information, but does not describe a reduction in calculation cost in generating a virtual object. That is, in order to confirm the geometrical consistency between the virtual object and the real object, it is necessary to obtain the virtual object as point cloud data even in an area unnecessary for display, and the calculation cost cannot be reduced.

また、特許文献２では、複数の仮想オブジェクト同士のオクルージョンを考慮したモデル生成について記載されているが、この生成の際の計算コストの削減については記載されておらず、またユーザ周辺の物理環境などは考慮されていない。すなわち、特許文献２も特許文献１と同様に、幾何学的整合性を保って近景オブジェクトにより上書きして描画するためには、表示上不必要な領域であっても遠景オブジェクトを求める必要があり、計算コストを削減できない。 Further, Patent Document 2 describes model generation in consideration of occlusion between a plurality of virtual objects, but does not describe reduction of calculation cost at the time of this generation, and also describes the physical environment around the user and the like. Is not considered. That is, in patent document 2 as well as patent document 1, in order to maintain geometrical consistency and overwrite with a near view object for drawing, it is necessary to obtain a distant view object even in an area unnecessary for display. , The calculation cost cannot be reduced.

非特許文献1では、複数の映像から形状モデルを生成する手法が提案されているが、モデルの生成時にユーザの視界情報などは考慮されない。 Non-Patent Document 1 proposes a method of generating a shape model from a plurality of images, but the user's view information and the like are not taken into consideration when the model is generated.

上記従来技術の課題に鑑み、本発明は、ユーザの視界情報を考慮することでモデル生成の計算コストを抑制したうえで、描画に用いた際にも自然な表示を実現できる３Ｄモデルを生成する画像処理装置、方法及びプログラムを提供することを目的とする。 In view of the above problems of the prior art, the present invention generates a 3D model that can realize a natural display even when used for drawing, while suppressing the calculation cost of model generation by considering the user's view information. It is an object of the present invention to provide an image processing device, a method and a program.

上記目的を達成するため、本発明は、多視点画像の各視点の画像より、撮影されているオブジェクトの領域をマスク画像として抽出する抽出部と、前記マスク画像に視体積交差法を適用して、前記オブジェクトの３Ｄモデルを、所定のボクセル集合の各ボクセルにつき当該３Ｄモデルに属するか否かを判定することによって生成する生成部と、を備える画像処理装置であって、前記生成部は、前記適用する前に予め、ユーザ視点におけるものとして取得された深度情報を、仮想カメラ視点を基準としたボクセル空間に配置して当該深度情報の与える空間位置を定めたうえで、各ボクセルが当該深度情報の与える空間位置よりも仮想カメラ視点に近い側にあるか遠い側にあるかを判定し、近い側にあると判定されたボクセルのみを、視体積交差法の適用対象として前記３Ｄモデルに属するか否かを判定することを特徴とする。また、前記画像処理装置に対応する方法又はプログラムであることを特徴とする。 In order to achieve the above object, the present invention applies an extraction unit that extracts a region of an object being photographed as a mask image from an image of each viewpoint of a multi-viewpoint image, and a visual volume crossing method to the mask image. An image processing device including a generation unit that generates a 3D model of the object by determining whether or not each voxel of a predetermined voxel set belongs to the 3D model, and the generation unit is the above-mentioned generation unit. Before applying, the depth information acquired from the user's viewpoint is placed in the voxel space with reference to the virtual camera viewpoint to determine the spatial position given by the depth information, and then each voxel determines the depth information. Whether the voxels that are closer to or farther from the virtual camera viewpoint than the spatial position given by the voxels belong to the 3D model as the application target of the visual volume crossing method. It is characterized in determining whether or not it is. Further, it is characterized in that it is a method or program corresponding to the image processing apparatus.

本発明によれば、ユーザ視点におけるものとして取得された深度情報を用いることで、ユーザ視点でオクルージョン領域に該当すると判定される領域に関して視体積交差法の適用を抑制することで、計算コストを抑制したうえで、描画に用いた際にも自然な表示を実現できる３Ｄモデルを生成することができる。 According to the present invention, by using the depth information acquired from the user's viewpoint, the calculation cost is suppressed by suppressing the application of the visual volume crossing method to the region determined to correspond to the occlusion region from the user's viewpoint. Then, it is possible to generate a 3D model that can realize a natural display even when used for drawing.

一実施形態に係る画像処理システムの機能構成を示す図である。It is a figure which shows the functional structure of the image processing system which concerns on one Embodiment. 画像処理システムのユースケースとしてのテレプレゼンスの模式図である。It is a schematic diagram of telepresence as a use case of an image processing system. 本発明の一実施形態に係る画像処理システムと従来技術のサーバサイドレンダリングとを対比した表（各情報の模式的なイラストを含む）である。It is a table (including the schematic illustration of each information) which compared the image processing system which concerns on one Embodiment of this invention, and the server-side rendering of the prior art. 一実施形態に係る取得部の機能ブロック図である。It is a functional block diagram of the acquisition part which concerns on one Embodiment. 既存技術としての視体積交差法を模式的に示す図である。It is a figure which shows typically the visual volume crossing method as an existing technique. 生成部においてオクルージョン領域を除外して視体積交差法を適用することで計算負荷が低減されることを、従来技術との対比で模式的に示す図である。It is a figure which shows typically in comparison with the prior art that a calculation load is reduced by applying a visual volume crossing method excluding an occlusion region in a generation part. 一実施形態に係る生成部による視体積交差法のフローチャートである。It is a flowchart of the visual volume crossing method by the generation part which concerns on one Embodiment. 一実施形態に係るステップS12でのオクルージョン領域にあるか否かの判定処理のフローチャートである。It is a flowchart of the determination process whether or not it is in the occlusion area in step S12 which concerns on one Embodiment. 第一実施形態で発生しうる課題を模式的に示す図である。It is a figure which shows typically the problem which can occur in 1st Embodiment. 図９に模式的に示された第一実施形態の課題に対して第二実施形態によって提供される解決策を模式的に示す図である。FIG. 9 is a diagram schematically showing a solution provided by the second embodiment to the problem of the first embodiment schematically shown in FIG. 第二実施形態による生成部13のモデル生成の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of model generation of the generation part 13 by 2nd Embodiment. 説明例に対応する時刻t=0秒の描画情報と時刻t=0.1秒での描画情報との模式例を示す図である。It is a figure which shows the schematic example of the drawing information at time t = 0 second and the drawing information at time t = 0.1 second corresponding to the explanatory example. 一般的なコンピュータ装置におけるハードウェア構成の例を示す図である。It is a figure which shows the example of the hardware configuration in a general computer apparatus.

図１は、一実施形態に係る画像処理システムの機能構成を示す図である。画像処理システム100は、ネットワークNWを介して相互に通信可能な画像処理装置10及び端末装置20を備えて構成される。画像処理装置10は機能ブロック構成として、撮影部11、抽出部12、生成部13及び描画部14を備える。端末装置20は機能ブロック構成として、取得部21、表示側撮影部22及び表示部23を備える。 FIG. 1 is a diagram showing a functional configuration of an image processing system according to an embodiment. The image processing system 100 includes an image processing device 10 and a terminal device 20 capable of communicating with each other via a network NW. The image processing device 10 includes a photographing unit 11, an extracting unit 12, a generating unit 13, and a drawing unit 14 as a functional block configuration. The terminal device 20 includes an acquisition unit 21, a display side shooting unit 22, and a display unit 23 as a functional block configuration.

画像処理システム100のユースケースとして、図２にその模式図を示すように、複数のカメラ映像から任意の視点のビューを作り出すことができる自由視点映像技術を活用して、遠隔にいる人物が別の場所にあたかも存在するかのような体験が可能なテレプレゼンスなどを挙げることができる。この場合、撮影環境PE側にサーバ装置としての画像処理装置10が存在し、複数台（N台、N≧2）のカメラC1,C2,…,CNを利用して、被写体OBの３Ｄモデルを作る。作られた被写体の３Ｄモデルは仮想カメラにより描画（レンダリング）され、描画結果の仮想オブジェクトVOBがユーザに送信される。 As a use case of the image processing system 100, as shown in the schematic diagram in FIG. 2, a remote person is separated by utilizing the free viewpoint video technology that can create a view of an arbitrary viewpoint from a plurality of camera images. There is a telepresence that allows you to experience as if you were in the place of. In this case, the image processing device 10 as a server device exists on the shooting environment PE side, and a 3D model of the subject OB is created by using multiple cameras C1, C2, ..., CN of multiple cameras (N units, N ≧ 2). create. The created 3D model of the subject is drawn (rendered) by the virtual camera, and the virtual object VOB of the drawing result is transmitted to the user.

そして、模式図としての図２にさらに示されるように、視聴環境WE側では、ユーザUはスマートグラスなどのARが視聴可能なデバイスとして構成される端末装置20を身につけており、ユーザの視聴環境の物理オブジェクトPOBと描画された仮想オブジェクトVOBが重畳された状態で表示される。（スマートグラス等の端末装置20を利用せずにユーザUが直接の目視で見た場合、物理オブジェクトPOBは実物として存在するが、端末装置20でAR用の表示として描画された仮想オブジェクトVOBは実物としては存在しない。）視聴環境WEには、テーブルなどの物理的オブジェクトPOBがあり、テーブルの奥側に仮想オブジェクトVOBとしての人物を配置しようとした場合、図２にも模式的に示されるように、仮想オブジェクトVOBの一部の領域が物理オブジェクトPOBに遮られる形でオクルージョンを考慮した表示がなされることが望ましい。 Then, as further shown in FIG. 2 as a schematic diagram, on the viewing environment WE side, the user U wears a terminal device 20 configured as a device capable of viewing AR such as smart glasses, and the user's viewing The physical object POB of the environment and the drawn virtual object VOB are displayed in a superimposed state. (When the user U directly visually looks at the terminal device 20 such as smart glasses, the physical object POB exists as a real object, but the virtual object VOB drawn as a display for AR on the terminal device 20 is The viewing environment WE has a physical object POB such as a table, and when a person as a virtual object VOB is placed behind the table, it is schematically shown in FIG. As described above, it is desirable that the display in consideration of occlusion is made so that a part of the area of the virtual object VOB is blocked by the physical object POB.

本発明の一実施形態に係る画像処理システム100ではこのように視聴ユーザ側のオクルージョンを考慮した表示が可能であり、且つ、オクルージョンに関連する領域は３Ｄモデルを生成することを省略して、計算負荷を低減させることが可能である。 In the image processing system 100 according to the embodiment of the present invention, it is possible to display in consideration of occlusion on the viewing user side in this way, and the area related to occlusion is calculated without generating a 3D model. It is possible to reduce the load.

図１にシステム構成を示したような画像処理システム100の枠組み自体は、ARにおけるサーバサイドレンダリングとして従来技術でも存在するものであるが、本発明の一実施形態に係る画像処理システム100は上記のようにオクルージョンに関連して従来技術にない効果を奏するものである。図３は、本発明の一実施形態に係る画像処理システム100と従来技術のサーバサイドレンダリングとを対比した表（各情報の模式的なイラストを含む）である。 The framework itself of the image processing system 100 as shown in the system configuration in FIG. 1 also exists in the prior art as server-side rendering in AR, but the image processing system 100 according to the embodiment of the present invention is described above. As described above, it has an effect not found in the prior art in relation to occlusion. FIG. 3 is a table (including a schematic illustration of each information) comparing the image processing system 100 according to the embodiment of the present invention with the server-side rendering of the prior art.

図３に示されるように、AR表示の視聴者であるユーザ側からの取得情報は、従来技術ではスマートグラス等を介したユーザの視線情報のみが取得されるのに対し、本発明の一実施形態（以下、図３の説明において「本手法」と略称する）では視線情報に加えてデプス情報が取得される。生成３Ｄモデルに関して、従来技術ではオクルージョン箇所を考慮せずに生成されるのに対して、本手法ではオクルージョンは除外して計算負荷を低減して生成することが可能である。従って、この生成３Ｄモデルを用いた描画結果においても、従来技術ではオクルージョン箇所が除外されないのに対して本手法では除外して描画がなされ、ユーザに対するAR表示としての表示画像も同様に、従来技術ではオクルージョンが反映されないのに対して、本手法では反映される。 As shown in FIG. 3, as the information acquired from the user side who is the viewer of the AR display, only the line-of-sight information of the user is acquired through smart glasses or the like in the prior art, whereas one embodiment of the present invention is implemented. In the form (hereinafter, abbreviated as "this method" in the description of FIG. 3), depth information is acquired in addition to line-of-sight information. In the conventional technique, the generated 3D model is generated without considering the occlusion location, whereas in this method, the occlusion can be excluded and the calculation load can be reduced. Therefore, even in the drawing result using this generated 3D model, the occlusion part is not excluded in the conventional technique, but is excluded in the drawing by this method, and the display image as the AR display for the user is also the conventional technique. The occlusion is not reflected in this method, but it is reflected in this method.

こうして、図３の模式的なイラストにも示されるように、従来技術では３Ｄモデルを生成する際に、ユーザの視聴環境にある物理オブジェクト（図２のテーブル等の物理オブジェクトPOB）は考慮されないため、３Ｄモデルを生成する被写体が人物であったなら、その全身がモデル生成の対象となる。そして、全身をレンダリングしたビュー画像が送られてくるため、視聴デバイス側でオクルージョン処理などを行わない限りは、物理オブジェクトに関係なく全身の仮想オブジェクトが表示されることになる。従来技術では、物理的なテーブルの上に仮想的な人物が重なってしまう表示となる。 Thus, as shown in the schematic illustration of FIG. 3, in the prior art, the physical object (physical object POB such as the table of FIG. 2) in the user's viewing environment is not considered when generating the 3D model. If the subject that generates the 3D model is a person, the whole body is the target of model generation. Then, since the view image in which the whole body is rendered is sent, the virtual object of the whole body is displayed regardless of the physical object unless the viewing device performs occlusion processing or the like. In the prior art, a virtual person is displayed on top of a physical table.

上記のように従来技術とは異なり、オクルージョンを考慮することで３Ｄモデル生成の計算負荷を低減し、且つ、これによりオクルージョンを考慮した描画も可能な一実施形態に係る画像処理システム100の動作の詳細を、図１に示す機能ブロックの各機能部の詳細として以下で説明する。 As described above, unlike the conventional technique, the calculation load of 3D model generation is reduced by considering occlusion, and the operation of the image processing system 100 according to the embodiment capable of drawing in consideration of occlusion. Details will be described below as details of each functional unit of the functional block shown in FIG.

＜撮影部11＞
撮影部11は、図２の撮影環境PEにおいてその模式例を示したように、撮影環境PE（例えば撮影スタジオ等）において、３Ｄモデル生成対象となる人物等のオブジェクトOBを取り囲んで撮影するように配置された複数N台（N≧2）のカメラC1,C2,…,CNをハードウェアとして構成される。撮影部11は当該各視点のカメラでオブジェクトOBを撮影することで得られる画像（N視点の多視点画像）を抽出部12及び描画部14へと出力する。 <Shooting section 11>
As shown in the schematic example in the shooting environment PE of FIG. 2, the shooting unit 11 surrounds and shoots an object OB such as a person to be generated as a 3D model in the shooting environment PE (for example, a shooting studio). Multiple N (N ≧ 2) cameras C1, C2, ..., CN are configured as hardware. The photographing unit 11 outputs an image (multi-viewpoint image of N viewpoints) obtained by photographing the object OB with the camera of each viewpoint to the extraction unit 12 and the drawing unit 14.

ここで、撮影部11をハードウェアとして構成するN台（N≧2）の各カメラC1,C2,…,CNのカメラパラメータ（内部パラメータ及び外部パラメータ）は既知または事前のキャリブレーションによって推定されており、画像処理装置10ではこのカメラパラメータの情報を参照して利用可能であるものとする。（例えば、後述する生成部13や描画部14の処理は、このカメラパラメータを参照して行うことが可能である。） Here, the camera parameters (internal parameters and external parameters) of each camera C1, C2, ..., CN of N units (N ≧ 2) that configure the photographing unit 11 as hardware are estimated by known or prior calibration. Therefore, it is assumed that the image processing device 10 can be used by referring to the information of this camera parameter. (For example, the processing of the generation unit 13 and the drawing unit 14, which will be described later, can be performed with reference to this camera parameter.)

なお、撮影部11では映像としてリアルタイムにオブジェクトOBを撮影してN視点の多視点映像を取得し、画像処理装置10ではリアルタイムにこのオブジェクトOBを仮想オブジェクトVOBとして描画してその結果を端末装置20の側においてリアルタイムで表示させることができる。画像処理システム100の各機能部に関する以下の説明は、時間軸上での処理に関する特段の言及がない限り、このようなリアルタイムの処理における、ある１つの任意の時刻に関するものとする。すなわち、撮影部11で得る多視点画像とは、多視点映像におけるある任意の１時刻のフレームであるものとする。 The photographing unit 11 photographs the object OB in real time as an image to acquire a multi-viewpoint image of N viewpoints, and the image processing device 10 draws this object OB as a virtual object VOB in real time and draws the result as a virtual object VOB in the terminal device 20. It can be displayed in real time on the side of. The following description of each functional part of the image processing system 100 relates to any one arbitrary time in such real-time processing, unless otherwise specified for processing on the time axis. That is, the multi-viewpoint image obtained by the photographing unit 11 is assumed to be an arbitrary one-time frame in the multi-viewpoint video.

＜抽出部12＞
抽出部12では、撮影部11で得た多視点画像におけるN視点のN枚の画像のそれぞれについて、撮影されているオブジェクトOBを前景のシルエットとして抽出することでマスク画像（シルエットの前景に該当する画素には値「1」を、それ以外の背景に該当する画素には値「0」を与えた２値マスク画像）を作成し、当該抽出されたN枚のマスク画像を生成部13へと出力する。 <Extractor 12>
In the extraction unit 12, the mask image (corresponding to the foreground of the silhouette) is obtained by extracting the captured object OB as the foreground silhouette for each of the N images of the N viewpoints in the multi-viewpoint image obtained by the shooting unit 11. A binary mask image in which the value "1" is given to the pixels and the value "0" is given to the pixels corresponding to the other backgrounds), and the extracted N mask images are sent to the generation unit 13. Output.

抽出部12でマスク画像を抽出する手法には任意の既存技術を利用してよい。例えば、多視点画像の各視点のN枚の画像についてそれぞれオブジェクトOBが存在しない状態で撮影された背景画像を予め用意しておき、背景差分法によりこの背景画像と相違すると判定される領域を前景と判定することで、抽出部12はマスク画像を抽出するようにしてよい。 Any existing technique may be used for the method of extracting the mask image by the extraction unit 12. For example, for each of the N images of each viewpoint of the multi-viewpoint image, a background image taken in a state where the object OB does not exist is prepared in advance, and a region determined to be different from this background image by the background subtraction method is the foreground. By determining that, the extraction unit 12 may extract the mask image.

＜取得部21＞
AR表示の視聴を行うユーザUが存在する視聴環境PE（図２）にある端末装置20（AR表示の視聴デバイス）の側に備わる取得部21は、この端末装置20を利用するユーザの環境情報をリアルタイムで取得して、ネットワークNWを経由してこの環境情報を画像処理装置10の生成部13へと送信する。ここで、取得部21で環境情報を取得した際は、その取得時刻がタイムスタンプとして紐づけられたうえで、生成部13へと送信される。 <Acquisition department 21>
The acquisition unit 21 provided on the side of the terminal device 20 (viewing device for AR display) in the viewing environment PE (Fig. 2) in which the user U who views the AR display exists provides the environment information of the user who uses the terminal device 20. Is acquired in real time, and this environmental information is transmitted to the generation unit 13 of the image processing device 10 via the network NW. Here, when the acquisition unit 21 acquires the environment information, the acquisition time is linked as a time stamp and then transmitted to the generation unit 13.

図４は一実施形態に係る取得部21の機能ブロック図であり、取得部21は位置姿勢取得部211及びデプス取得部212を備える。取得部21ではユーザの環境情報として、位置姿勢取得部211が取得するユーザ視点に関する位置姿勢の情報（視線情報）と、デプス取得部212が取得するユーザから見た視聴環境における深度情報（深度の空間的な分布情報）を表すデプス画像の情報と、を生成部13へと送信することができる。位置姿勢取得部211及びデプス取得部212では位置姿勢の情報及びデプス画像をそれぞれ取得するに際して、後述する表示側撮影部22で撮影される画像を利用するようにしてもよいし、この画像を利用しないで環境情報を取得するようにしてもよい。 FIG. 4 is a functional block diagram of the acquisition unit 21 according to the embodiment, and the acquisition unit 21 includes a position / posture acquisition unit 211 and a depth acquisition unit 212. In the acquisition unit 21, as the user's environmental information, the position / posture information (line-of-sight information) regarding the user's viewpoint acquired by the position / attitude acquisition unit 211 and the depth information (depth information) in the viewing environment as seen by the user acquired by the depth acquisition unit 212. Information of the depth image representing (spatial distribution information) can be transmitted to the generation unit 13. When the position / orientation acquisition unit 211 and the depth acquisition unit 212 acquire the position / orientation information and the depth image, respectively, the image captured by the display side photographing unit 22 described later may be used, or this image may be used. You may try to acquire the environmental information without doing so.

（位置姿勢取得部211）
位置姿勢取得部211では、任意の既存手法により、端末装置20の位置姿勢の情報を取得することができる。この位置姿勢の情報は、カメラパラメータにおける外部パラメータの情報に相当するものとして、ユーザUが存在する視聴環境PE（図２）で定義される３次元世界座標系において、端末装置20の位置及び姿勢を与えるものである。 (Position / Posture Acquisition Unit 211)
The position / orientation acquisition unit 211 can acquire the position / orientation information of the terminal device 20 by any existing method. This position / orientation information corresponds to the information of the external parameters in the camera parameters, and the position and orientation of the terminal device 20 in the three-dimensional world coordinate system defined in the viewing environment PE (FIG. 2) in which the user U exists. To give.

例えば、位置姿勢取得部211はハードウェアとして位置姿勢を取得するセンサ（加速度センサ、ジャイロセンサ、方位センサ等の全部又は一部）を備えて構成され、当該センサの計測出力より位置姿勢の情報をリアルタイムに取得するようにしてもよい。また、位置姿勢取得部211は表示側撮影部22で撮影して得られた画像を解析することにより、位置姿勢を取得するようにしてもよい。例えば、ユーザUが存在する視聴環境PE（図２）には予め、カメラの位置姿勢の検出に利用可能な所定のマーカ（AR技術で利用される正方マーカ等）を配置しておき、表示側撮影部22で得た画像に対してコーナ検出又はSIFT特徴量等の検出を行うことにより画像内でのマーカ領域を検出したうえで、外部パラメータとして位置姿勢を取得するようにしてもよい。 For example, the position / orientation acquisition unit 211 is configured to include sensors (accelerometers, gyro sensors, orientation sensors, etc.) that acquire the position / attitude as hardware, and obtains position / attitude information from the measurement output of the sensor. It may be acquired in real time. Further, the position / orientation acquisition unit 211 may acquire the position / orientation by analyzing the image acquired by the display side photographing unit 22. For example, in the viewing environment PE (FIG. 2) in which the user U exists, a predetermined marker (such as a square marker used in AR technology) that can be used to detect the position and orientation of the camera is arranged in advance on the display side. The position and orientation may be acquired as an external parameter after detecting the marker region in the image by performing corner detection or SIFT feature amount detection on the image obtained by the photographing unit 22.

（デプス取得部212）
デプス取得部212では、任意の既存手法により、ユーザUが存在する視聴環境PE（図２）にある端末装置20から見たデプス画像（位置姿勢取得部211の取得する位置姿勢にデプスカメラ等があるものとしたデプス画像）を取得することができる。 (Depth acquisition section 212)
In the depth acquisition unit 212, the depth image seen from the terminal device 20 in the viewing environment PE (FIG. 2) in which the user U exists (the depth camera or the like is used for the position / orientation acquired by the position / orientation acquisition unit 211) by an arbitrary existing method. It is possible to acquire a certain depth image).

例えば、デプス画像取得部212はハードウェアとしてデプスカメラ（ＴＯＦ（光の到達時間）方式やパターン照射方式等の任意の既存のデプスカメラ）を備えて構成されることで、当該デプスカメラによりデプス画像を取得してよい。また、デプス画像取得部212は表示側撮影部22で撮影して得られた画像を解析することにより、デプス画像を取得するようにしてもよい。例えば、表示側撮影部22で撮影した現時刻の画像と過去時刻の画像との間で同一点対応を求めたうえでステレオマッチングにより深度を取得してデプス画像を求めるようにしてもよいし、予め深層学習によって撮影画像からデプス画像を出力するよう学習されたニューラルネットワークを適用することでデプス画像を求めるようにしてもよい。 For example, the depth image acquisition unit 212 is configured to include a depth camera (any existing depth camera such as a TOF (light arrival time) method or a pattern irradiation method) as hardware, so that the depth image can be captured by the depth camera. May be obtained. Further, the depth image acquisition unit 212 may acquire the depth image by analyzing the image acquired by the display side photographing unit 22. For example, the depth image may be obtained by acquiring the depth by stereo matching after obtaining the same point correspondence between the image of the current time and the image of the past time taken by the display side shooting unit 22. A depth image may be obtained by applying a neural network trained in advance to output a depth image from a captured image by deep learning.

なお、前述の図３の表内の「デプス情報」に対する模式的なイラストとして、デプス画像取得部212で得るデプス画像の模式例が示されている。この模式的なデプス情報は、図２の物理オブジェクトPOBの例としてのテーブルに関するものであり、テーブル面（垂直及び水平の２面）に該当する箇所は深度値が小さいものとして白色寄りで、テーブル面以外の箇所は深度値が大きいものとして黒色寄りで、デプス画像をグレースケール画像として表現したものとなっている。 As a schematic illustration for the "depth information" in the table of FIG. 3 described above, a schematic example of the depth image obtained by the depth image acquisition unit 212 is shown. This schematic depth information relates to a table as an example of the physical object POB in FIG. 2, and the part corresponding to the table surface (two vertical and horizontal surfaces) is closer to white as the depth value is small, and the table. The parts other than the surface are closer to black as the depth value is large, and the depth image is expressed as a grayscale image.

＜生成部13＞
生成部13では、抽出部12で得たN視点のN枚のマスク画像に対して、取得部21から送信される端末装置20側のユーザの環境情報を考慮することで、オクルージョン領域を除外した形で視体積交差法を適用し、生成されたオブジェクトOBの３Ｄモデル（仮想オブジェクトVOB）を描画部14へと出力する。なお、抽出部12から得られるN枚のマスク画像には撮影部11における共通の撮影時刻（N台のカメラで同期されている）が紐づいており、取得部21での環境情報にもタイムスタンプとして取得時刻が紐づいているので、生成部13ではN枚のマスク画像の時刻と同一時刻の環境情報を参照したうえで、３Ｄモデルを生成することができる。 <Generator 13>
The generation unit 13 excludes the occlusion area from the N mask images of the N viewpoints obtained by the extraction unit 12 by considering the environment information of the user on the terminal device 20 side transmitted from the acquisition unit 21. The visual volume crossing method is applied in the form, and the 3D model (virtual object VOB) of the generated object OB is output to the drawing unit 14. The N mask images obtained from the extraction unit 12 are associated with the common shooting time (synchronized by N cameras) in the shooting unit 11, and the time is also included in the environmental information in the acquisition unit 21. Since the acquisition time is linked as a stamp, the generation unit 13 can generate a 3D model after referring to the environment information at the same time as the time of the N mask images.

図５は、既存技術としての視体積交差法を模式的に示す図である。既知のように、視体積交差法の原理は、N台のカメラC1,C2,…,CNの位置（図５ではカメラをそのカメラ中心として示す）からそれぞれ、マスク画像M1,M2,…,MNの前景上へと３次元逆投影を行って得られる視錐体V1,V2,…,VNが全て通過する共通体積部分（ビジュアル・ハルVH）として、３Ｄモデルを得るものである。なお、図５では模式例としてN台のカメラのうち最初の2台C1,C2とそのマスク画像M1,M2及び視錐体V1,V2のみが示されている。 FIG. 5 is a diagram schematically showing a visual volume crossing method as an existing technique. As is known, the principle of the visual volume crossing method is that the mask images M1, M2, ..., MN are obtained from the positions of N cameras C1, C2, ..., CN (the camera is shown as the center of the camera in FIG. 5), respectively. A 3D model is obtained as a common volume portion (visual hull VH) through which all of the visual cones V1, V2, ..., VN obtained by performing three-dimensional back projection onto the foreground of the camera. In FIG. 5, only the first two cameras C1 and C2, their mask images M1 and M2, and the pyramidal cones V1 and V2 are shown as schematic examples.

この図５に模式的に示される原理に基づく視体積交差法（ボクセルを利用するもの）で実際に３Ｄモデルを生成する際は、３次元逆投影とは逆に、マスク画像上への２次元投影を利用することができる。すなわち、モデル空間に所定のボクセル集合（３次元モデル空間内での離散的な格子点集合）を予め定義して配置しておき、ボクセルの各点（X,Y,Z）に関して、N台のカメラC1,C2,…,CNのマスク画像M1,M2,…,MN上へと２次元投影を行うことで各マスク画像上での投影位置(x,y)_[1],( x,y)_[2]…, (x,y)_[N]を求め、N枚のマスク画像の全てにおいて、シルエットとしての前景上に投影されたようなボクセル点（X,Y,Z）に関しては３Ｄモデルの内部（又は表面）に属すると判定し、これ以外のボクセル点（X,Y,Z）（少なくとも１枚のマスク画像において背景上に投影されたボクセル点（X,Y,Z））に関しては、３Ｄモデルに属さない外部の点であると判定することができる。こうして肯定判定を得られたボクセルの集合が、得られる３Ｄモデルとなる。 When actually generating a 3D model by the visual volume crossing method (using voxels) based on the principle schematically shown in FIG. 5, the 2D on the mask image is opposite to the 3D back projection. Projections can be used. That is, a predetermined set of voxels (set of discrete lattice points in the three-dimensional model space) is defined and arranged in advance in the model space, and N units of each voxel point (X, Y, Z) are arranged. Mask images of cameras C1, C2, ..., CN M1, M2, ..., Projection positions on each mask image by performing two-dimensional projection on MN (x, y) _[1] , (x, y) _[2] …, (x, y) _[N] is obtained, and in all of the N mask images, the voxel points (X, Y, Z) as projected on the foreground as a silhouette are the 3D model. Regarding other voxel points (X, Y, Z) (boxel points (X, Y, Z) projected on the background in at least one mask image) that are determined to belong to the inside (or surface), It can be determined that the point is an external point that does not belong to the 3D model. The set of voxels for which affirmative judgment is obtained in this way becomes the obtained 3D model.

上記のように、ボクセルを用いた既存技術としての視体積交差法においては、３次元モデル空間内の所定範囲（例えば直方体状の範囲）に、所定密度のボクセル集合(X,Y,Z)を定義しておき、当該格子点としての全てのボクセル点(X,Y,Z)に関して、視体積交差法によるN枚のマスク画像上への前景／背景への投影結果の判定を行うこととなる。 As described above, in the visual volume crossing method as an existing technique using voxels, a set of voxels (X, Y, Z) having a predetermined density is placed in a predetermined range (for example, a rectangular parallelepiped range) in a three-dimensional model space. By definition, for all voxel points (X, Y, Z) as the lattice points, the projection result on the foreground / background on N mask images by the visual volume crossing method will be determined. ..

生成部13では、上記既存技術としての視体積交差法をそのまま適用するのではなく、取得部21で得られる環境情報を参照することにより、モデル空間内に予め所定範囲に所定密度で定義されているボクセル集合の全ての点(X,Y,Z)のうち、オクルージョンの影響がないと判定されるもののみについて、視体積交差法によるN枚のマスク画像上への前景／背景への投影結果の判定を行う。これにより、生成部13ではボクセルの全ての点(X,Y,Z)のうち、オクルージョンの影響があると判定されるものに関しては投影処理を最初から省略し、生成される３Ｄモデルを構成する点から除外する判定を得ることが可能となるため、従来技術と比べて計算負荷を低減することが可能となる。 The generation unit 13 does not apply the visual volume crossing method as the existing technology as it is, but by referring to the environmental information obtained by the acquisition unit 21, it is defined in advance in the model space with a predetermined density in a predetermined range. Of all the points (X, Y, Z) of the existing voxel set, only those that are judged not to be affected by occlusion are projected onto the foreground / background on N mask images by the visual volume crossing method. Is judged. As a result, the generation unit 13 omits the projection process from the beginning for all the voxel points (X, Y, Z) that are determined to be affected by occlusion, and constructs the generated 3D model. Since it is possible to obtain a determination to exclude from the points, it is possible to reduce the calculation load as compared with the conventional technique.

図６は、生成部13においてオクルージョン領域を除外して視体積交差法を適用することで計算負荷が低減されることを、従来技術との対比で模式的に示す図である。第１パネルPL1内に示される従来技術では、予め定義されるボクセル集合VS内の全ての点について視体積交差法を適用し、前景該当ボクセルの集合としてオブジェクトOBの仮想オブジェクトVOB'が３Ｄモデルとして得られる。一方、第２パネルPL2内に示される生成部13の手法では、ボクセル集合VS内の全ての点のうち、オクルージョンの影響があると判定される点を除外して視体積交差法を適用し、前景該当ボクセルの集合としてオブジェクトOBの仮想オブジェクトVOBが３Ｄモデルとして得られる。 FIG. 6 is a diagram schematically showing that the calculation load is reduced by excluding the occlusion region and applying the visual volume crossing method in the generation unit 13 in comparison with the prior art. In the prior art shown in the first panel PL1, the visual volume crossing method is applied to all points in the predefined voxel set VS, and the virtual object VOB'of the object OB as the set of voxels corresponding to the foreground is used as a 3D model. can get. On the other hand, in the method of the generation unit 13 shown in the second panel PL2, the visual volume crossing method is applied by excluding all the points in the voxel set VS that are judged to be affected by occlusion. A virtual object VOB of an object OB is obtained as a 3D model as a set of corresponding voxels in the foreground.

図６に示される従来技術の仮想オブジェクトVOB'は人物等のモデル化対象のオブジェクトOBの全身がモデル化されているのに対し、生成部13の仮想オブジェクトVOBは、図２に模式例を示したテーブル等の物理オブジェクトPOBによるオクルージョン領域ROC（仮想カメラVCから見た、ある視錐体領域におけるオクルージョン領域ROCとしてその一部分が示される）を予め除外し、全身のうちの一部のみがモデル化されている。なお、図６ではオクルージョン領域ROCとの区別として、仮想カメラVCから見た、ある視錐体領域における視体積交差法の判定対象の領域RDもその一部分が示されている。 While the conventional virtual object VOB'shown in FIG. 6 models the whole body of the object OB to be modeled such as a person, the virtual object VOB of the generation unit 13 shows a schematic example in FIG. The occlusion area ROC by the physical object POB such as a table (a part of which is shown as the occlusion area ROC in a certain visual cone area as seen from the virtual camera VC) is excluded in advance, and only a part of the whole body is modeled. Has been done. In addition, in FIG. 6, as a distinction from the occlusion region ROC, a part of the region RD to be determined by the visual volume crossing method in a certain visual cone region as seen from the virtual camera VC is also shown.

図７は、一実施形態に係る生成部13による視体積交差法のフローチャートであり、以上説明した通りのオクルージョン領域を除外する手法の詳細を示すものである。 FIG. 7 is a flowchart of the visual volume crossing method by the generation unit 13 according to the embodiment, and shows the details of the method for excluding the occlusion region as described above.

ステップS10では、視体積交法を適用する際の全体的な設定として、得られる３Ｄモデルを構成するか否かの判定対象となるボクセル集合等を設定してから、ステップS11へと進む。ステップS10において、３Ｄモデル空間内の所定範囲に所定密度で設定したボクセル集合を{v_i|i=1,2,…,M}とする。当該設定される各ボクセルv_i(i=1,2,…,M)に対して以降のステップS11〜S17の繰り返し処理で３Ｄモデルに属する点か否かの判定が行われるが、i番目に判定されるのがボクセルv_iであるものとする。当該ボクセルv_iの判定順序は任意でよく、例えば３次元空間内のラスタスキャン順等で定めておけばよい。 In step S10, as an overall setting when applying the visual volume crossing method, a voxel set or the like to be determined whether or not to configure the obtained 3D model is set, and then the process proceeds to step S11. _{In step S10, let {v i} | i = 1,2, ..., M} be a set of voxels set in a predetermined range in the 3D model space with a predetermined density. _{For each voxel v i} (i = 1,2, ..., M) to be set, it is determined whether or not the point belongs to the 3D model by the iterative processing of the subsequent steps S11 to S17. It is assumed that the voxel v _{i is judged.} The determination order of the voxels v _i may be arbitrary, and may be determined by, for example, the raster scan order in the three-dimensional space.

ステップS10ではボクセル集合{v_i|i=1,2,…,M}を設定したうえでさらに、各ボクセルv_iが３Ｄモデルに属するか否かの２値の判定結果の初期値として判定結果E(v_i)=0（「３Ｄモデルには属さない」を表す）を設定する。以下に説明するように、ボクセルv_iのうち、以降のステップS11〜S17の繰り返し処理においてステップS16に到達したものは判定結果がE(v_i)=1（「３Ｄモデルに属する」を表す）に書き換えられることとなり、ステップS16に到達しなかったものは当該書き換えられることなく、初期値としての判定結果E(v_i)=0（「３Ｄモデルには属さない」）が実際の結果として確定することとなる。 In step S10, _{after setting the voxel set {v i} | i = 1,2, ..., M}, the judgment result is set as the initial value of the binary judgment result of whether or not _{each voxel v i belongs to the 3D model.} Set E (v _i ) = 0 (representing "does not belong to 3D model"). As described below, the voxel v of _i, which has reached the step S16 in the repetitive processing subsequent steps S11~S17 determination result is E (v _i) = 1 (representing "belonging to the 3D model") will be rewritten to, without being the rewriting not to have reached the step S16, ( "does not belong to the 3D model") determination result E (v _i) = 0 as an initial value is determined as the actual results Will be done.

また、ステップS10ではさらに、ボクセル集合{v_i|i=1,2,…,M}を設定した３次元モデル空間内に、オクルージョン領域の判定を行うための基準位置としてのAR視聴を行うユーザの仮想視点VC（図６で模式的に示した仮想カメラVC）を設定する。この仮想視点の位置は、ユーザ側の端末装置20の取得部21の位置姿勢取得部211で取得され送信された位置姿勢に即したものとして、３次元モデル空間内に設定すればよい。（なお、この３次元モデル空間に関しては、撮影部11を構成するカメラが撮影している撮影環境PEの世界座標系に一致するものとして設定してよい。） Further, in step S10, a user who performs AR viewing as a reference position for determining the occlusion area in the three-dimensional model space in which _{the voxel set {v i | i = 1,2, ..., M} is set.} The virtual viewpoint VC (virtual camera VC schematically shown in FIG. 6) is set. The position of this virtual viewpoint may be set in the three-dimensional model space as being in line with the position / orientation acquired and transmitted by the position / orientation acquisition unit 211 of the acquisition unit 21 of the terminal device 20 on the user side. (Note that this three-dimensional model space may be set to match the world coordinate system of the shooting environment PE shot by the cameras constituting the shooting unit 11.)

すなわち、位置姿勢取得部211ではユーザが存在する視聴環境WEの世界座標系における位置姿勢を取得するが、これに予め設定しておく所定変換（並進及び回転）を施すことで、３次元モデル空間内での仮想視点VCの位置姿勢を得ることができる。この所定変換は、画像処理システム100で提供するAR視聴コンテンツを用意する管理者等が、撮影側環境PEにおけるモデル化対象となるオブジェクトOBの配置（移動範囲等を含み、撮影部11を構成するカメラを基準とする配置）と、視聴環境WEにおける仮想オブジェクトVOBの配置（移動範囲等を含み、視聴デバイスとしての端末装置20を基準とする配置）と、を考慮して、これらの位置合わせを行うための情報として予め設定しておけばよい。 That is, the position / orientation acquisition unit 211 acquires the position / orientation in the world coordinate system of the viewing environment WE in which the user exists, and by performing predetermined conversion (translation and rotation) set in advance, the three-dimensional model space. The position and orientation of the virtual viewpoint VC within can be obtained. In this predetermined conversion, an administrator or the like who prepares the AR viewing content provided by the image processing system 100 configures the shooting unit 11 by arranging the object OB to be modeled in the shooting side environment PE (including the movement range and the like). Arrangement based on the camera) and arrangement of the virtual object VOB in the viewing environment WE (arrangement based on the terminal device 20 as a viewing device including the movement range), and these alignments are performed. It may be set in advance as information for performing.

また、当該所定変換に対してさらに、端末装置20を利用するユーザの指定入力による、仮想オブジェクトの表示位置姿勢を調整する変換を適用する（所定変換と位置調整変換の合成変換を適用する）ことで、３次元モデル空間内での仮想視点VCの位置姿勢を得るようにしてもよい。この調整するための情報に関しては、時刻を紐づけたうえで環境情報に含めて生成部13に送信すればよい。 Further, to the predetermined conversion, a conversion that adjusts the display position / orientation of the virtual object by the input specified by the user who uses the terminal device 20 is applied (combined conversion of the predetermined conversion and the position adjustment conversion is applied). Then, the position and orientation of the virtual viewpoint VC in the three-dimensional model space may be obtained. The information for this adjustment may be linked to the time, included in the environmental information, and transmitted to the generation unit 13.

既に説明したように、この図７のフロー全体はリアルタイムに各時刻について実施することができるが、この各時刻のステップS10で設定される仮想視点VCの位置姿勢は、所定変換が施されることにより、位置姿勢取得部211で取得した位置姿勢と同様の挙動で３Ｄモデル空間内を移動するものとなる。 As described above, the entire flow of FIG. 7 can be performed for each time in real time, but the position / orientation of the virtual viewpoint VC set in step S10 of each time is subjected to a predetermined conversion. As a result, it moves in the 3D model space with the same behavior as the position / orientation acquired by the position / attitude acquisition unit 211.

ステップS11では、３Ｄモデルに属する点であるか否かの判定がまだ行われていない未処理のボクセルv_iを選択してから、ステップS12へと進む。ステップS12では、当該ボクセルv_iがオクルージョン領域にあるか否かを判定してから、ステップS13へと進む。 _{In step S11, an unprocessed voxel v i} that has not yet been determined whether or not the point belongs to the 3D model is selected, and then the process proceeds to step S12. In step S12, _{it is determined whether or not the voxel v i} is in the occlusion area, and then the process proceeds to step S13.

図８は、一実施形態に係るステップS12でのオクルージョン領域にあるか否かの判定処理のフローチャートである。図８に示される手順における主要な処理は、当該ボクセルv_iをデプス取得部212より送信されて得られたデプス画像の深度値と照合することで、当該ボクセルv_iが深度値の示す位置よりも手前側（仮想カメラVCに近い側）にあればオクルージョン領域にないと判定し、逆に奥側（仮想カメラVCよりも遠い側）にあればオクルージョン領域にあると判定するものである。 FIG. 8 is a flowchart of a process for determining whether or not the device is in the occlusion area in step S12 according to the embodiment. Main processing in the procedure illustrated in FIG. 8, by matching the depth values of the obtained depth image the voxel v _i is transmitted from the depth acquiring unit 212, from the position shown the voxel v _i is the depth value If it is on the front side (closer to the virtual camera VC), it is judged that it is not in the occlusion area, and conversely, if it is on the back side (farther side than the virtual camera VC), it is judged to be in the occlusion area.

ステップS20では、当該ボクセルv_iと仮想カメラVCとの間の距離dist(v_i,VC)を計算してから、ステップS21へと進む。（この距離dist(v_i,VC)をボクセルv_iに紐づくものとして距離e_iと表記する。なお、ボクセルviに関しては予め３Ｄモデル空間内に設定され、仮想カメラVCも図７のステップS10で３Ｄモデル空間内の位置（及び姿勢）が求められているので、３Ｄモデル空間内で距離e_iを計算できる。） In step S20, _{the distance dist (v i} _{, VC) between the voxel v i} and the virtual camera VC is calculated, and then the process proceeds to step S21. (This distance dist (v _i , VC) is referred to as the distance e _i as being linked to the voxel v _i . The voxel vi is set in advance in the 3D model space, and the virtual camera VC is also set in step S10 in FIG. Since the position (and orientation) in the 3D model space is obtained in, the distance e _i can be calculated in the 3D model space.)

ステップS21では当該ボクセルv_iをボクセル空間座標系（すなわち、３Ｄモデル空間座標系）から仮想カメラVCの座標系に変換してから、ステップS22へと進む。この座標変換されたボクセルをv'_iと表記する。なお、仮想カメラVCの位置姿勢は図７のステップS10で３Ｄモデル空間内において求められているので、この「v_i→v'_i」の座標変換が可能である。 In step S21, the voxel v _i is converted from the voxel space coordinate system (that is, the 3D model space coordinate system) to the coordinate system of the virtual camera VC, and then the process proceeds to step S22. The coordinate transformation voxel v is denoted as' _i. The position and orientation of the virtual camera VC is so is a need in the 3D model space in step S10 in FIG. 7, it is possible to coordinate transformation of the "v _{_i} → v _'i'.

ステップS22では、ステップS21で得た座標変換ボクセルv'_i=(X,Y,Z)を仮想カメラVCの画像座標系に定義される「前景デプス画像」の画素位置(x,y)にマッピングし、この画素位置(x,y)の画素値としてステップS20で求めた距離e_i=e_i(x,y)を割り当ててからステップS23へと進む。（ここで、この距離e_iを画素値として割り当てることで、仮想カメラVCの画像座標系の画像としてボクセルv_i毎に定義される「前景デプス画像」が得られる。マッピングした画素位置(x,y)以外では「前景デプス画像」の画素値は定義されない。） In step S22, the mapping coordinate transformation voxel v _'i = obtained in step S21 (X, Y, Z) the pixel position of the "foreground depth image" as defined to the image coordinate system of the virtual camera VC (x, y) _{Then, the distance e i} = e _i (x, y) obtained in step S20 is assigned as the pixel value of this pixel position (x, y), and then the process proceeds to step S23. (Here, by _{assigning this distance e i} as a pixel value, a “foreground depth image” defined for each _{voxel v i} is obtained as an image of the image coordinate system of the virtual camera VC. The mapped pixel position (x, Other than y), the pixel value of the "foreground depth image" is not defined.)

ステップS22における、ボクセルv'_iの空間座標位置(X,Y,Z)から仮想カメラVCの２次元画像座標系の位置(x,y)への変換は、仮想カメラVCについて予め設定されている内部パラメータを用いた２次元投影として行うことができる。（なお、端末装置20の表示部23においてAR表示を行う際も、この仮想カメラVCの内部パラメータを用いて画像処理装置10の描画部14（後述）でなされた描画結果が表示されることとなる。） In step S22, transformation into the spatial coordinates of the voxel _{v 'i (X, Y,} Z) position of the two-dimensional image coordinate system of the virtual camera VC from (x, y) are set in advance for the virtual camera VC It can be performed as a two-dimensional projection using internal parameters. (It should be noted that even when AR display is performed on the display unit 23 of the terminal device 20, the drawing result made by the drawing unit 14 (described later) of the image processing device 10 is displayed using the internal parameters of this virtual camera VC. Become.)

ステップS23では、ステップS21で得た座標変換ボクセルv'_i=(X,Y,Z)が仮想カメラVCの画角内（仮想カメラVCの位置から画像平面の画像範囲（通常は矩形）へと３次元逆投影した視錐体の範囲内）に含まれているか否かを調べてから、ステップS24へと進む。ステップS23での仮想カメラVCの画角の範囲は、仮想カメラVCのカメラパラメータとして予め定められた範囲を用いればよい。（なお、当該範囲内において描画部14（後述）の描画もなされることとなる。） In step S23, coordinate transformation voxels obtained in step _{S21 v 'i = (X,} Y, Z) is into the angle of view of the virtual camera VC (image area of the image plane from the position of the virtual camera VC (usually rectangular) After checking whether or not it is included in the range of the three-dimensional back-projected visual cone), the process proceeds to step S24. As the range of the angle of view of the virtual camera VC in step S23, a predetermined range may be used as the camera parameter of the virtual camera VC. (Note that the drawing unit 14 (described later) will also be drawn within the range.)

ステップS24では、ステップS23で調べた結果が、ボクセルv'_iが画角範囲内であった場合（肯定の場合）にはステップS25へと進み、範囲外であった場合（否定の場合）にはステップS27へと進む。 In step S24, the result of examining in step S23 is, if the voxel v _'i proceeds to step S25 in the case was within the angle range (if so), was outside the range (if negative) Proceeds to step S27.

ステップS25では、ステップS22にてマッピングした画素位置(x,y)において、同ステップS22で求めた前景デプス画像の画素値（距離e_i）から「背景デプス画像」の画素値d(x,y)を減算した差D=e_i(x,y)-d(x,y)を求め、ステップS26へと進む。ここで、減算する「背景デプス画像」には、端末装置20のデプス取得部212で取得されたデプス画像を用いる。（すなわち、「背景デプス画像」とは、ユーザの視聴環境WEに存在する背景（物理オブジェクトPOB等）の深度の画像として、ボクセルv'_iの深度を与える「前景デプス画像」と区別したものである。） In step S25, at the pixel position (x, y) mapped in step S22, the pixel value d (x, y) of the “background depth image” is obtained from _{the pixel value (distance e i) of the foreground depth image obtained in step S22.} ) Is subtracted to obtain the difference D = e _i (x, y) -d (x, y), and the process proceeds to step S26. Here, as the "background depth image" to be subtracted, the depth image acquired by the depth acquisition unit 212 of the terminal device 20 is used. (I.e., the "background depth image", as the image depth of background present in the viewing environment WE user (physical object POB, etc.), give the depth of the voxel v _'i those distinguished as "foreground depth image" is there.)

なお、デプス取得部212では仮想カメラVCの画像平面の画素位置(x,y)において深度d(x,y)を与えたものとしてデプス画像を取得可能なように、予めキャリブレーション等を行ったうえで、深度を取得するものとする。 The depth acquisition unit 212 has been calibrated in advance so that the depth image can be acquired assuming that the depth d (x, y) is given at the pixel position (x, y) of the image plane of the virtual camera VC. Then, the depth shall be obtained.

ステップS26では、ステップS25で求めた差Dが正である（D>0）か否かを判定し、肯定（D>0）であればステップS27へと進み、否定（D≦0）であればステップS28へと進む。 In step S26, it is determined whether or not the difference D obtained in step S25 is positive (D> 0), and if it is affirmative (D> 0), the process proceeds to step S27, and if it is negative (D ≤ 0). Then proceed to step S28.

ステップS27では、当該ボクセルv_iはオクルージョン領域に該当するとの判定結果を得て、図８のフロー（図７のステップS12の判定処理）を終了する。ステップS28では、当該ボクセルv_iはオクルージョン領域に該当しないとの判定結果を得て、図８のフローを終了する。 In step S27, _{the determination result that the voxel v i} corresponds to the occlusion area is obtained, and the flow of FIG. 8 (determination process of step S12 of FIG. 7) is terminated. In step S28, _{the determination result that the voxel v i} does not correspond to the occlusion area is obtained, and the flow of FIG. 8 is terminated.

ここで、ステップS26からステップS27に至る場合は差D>0であり、当該ボクセルv_iの空間位置が対応する深度値d(x,y)の与える空間位置よりも奥側（仮想カメラVCからみて遠方側）であるため、オクルージョン領域にあるものと判定している。一方、ステップS26からステップS28に至る場合はこの逆であるため、オクルージョン領域ではないものと判定している。 Here, when going from step S26 to step S27, the difference D> 0, and _{the spatial position of the voxel v i} is behind the spatial position given by the corresponding depth value d (x, y) (from the virtual camera VC). Since it is on the far side), it is judged to be in the occlusion area. On the other hand, in the case from step S26 to step S28, the opposite is true, so it is determined that the region is not an occlusion region.

なお、オクル―ジョン領域にあると判定するステップS27に至るのはステップS26からではなく、ステップS24での否定判定を得てから至る場合もある。ステップS24で否定判定を得た場合は、当該ボクセルv_iが仮想カメラVCの画角の範囲外の場合であり、この場合は当該ボクセルv_iがAR表示を行いうる範囲内を逸脱している（描画部14や表示部23によるAR描画及び表示の範囲からフレームアウトしている）ため、３Ｄモデル生成及びAR描画等を省略するための便宜上、オクルージョン領域であるとの判定結果を付与することとなる。 It should be noted that step S27, which is determined to be in the occlusion region, may not be reached from step S26, but may be reached after obtaining a negative determination in step S24. If a negative judgment is obtained in step S24, it means that the voxel v _i is out of the range of the angle of view of the virtual camera VC, and in this case, the voxel v _i is out of the range where AR display can be performed. (The frame is out of the range of AR drawing and display by the drawing unit 14 and the display unit 23). Therefore, for convenience of omitting 3D model generation, AR drawing, etc., a determination result of the occlusion area is given. It becomes.

以上、図７のステップS12の一実施形態として図８の各ステップを説明したので、再び図７の各ステップの説明に戻る。 Since each step of FIG. 8 has been described above as an embodiment of step S12 of FIG. 7, the description of each step of FIG. 7 will be returned again.

ステップS13では、ステップS12の判定結果が否定（オクルージョン領域でない）であった場合にはステップS14へと進み、肯定（オクルージョン領域である）であった場合には当該ボクセルv_iについての処理は完了したものとしてステップS17へと進む。 In step S13, if the determination result in step S12 is negative (not in the occlusion area), the process proceeds to step S14, and if it is affirmative (in the occlusion area), the processing for the _{voxel v i is completed.} Proceed to step S17 as if it had been done.

ステップS14では、当該ボクセルv_iに視体積交差法を適用し、図５で説明したように、抽出部12で得たN枚のマスク画像の全てにおいて前景シルエット上に当該ボクセルv_iが投影されることで３Ｄモデルに含まれる点に該当するか否かを調べてから、ステップS15へと進む。 In step S14, the _{visual volume crossing method is applied to the voxel v i} _{, and as described in FIG. 5, the voxel v i} is projected on the foreground silhouette in all of the N mask images obtained by the extraction unit 12. After checking whether or not the points included in the 3D model are applicable, the process proceeds to step S15.

ステップS15では、ステップS14での結果が肯定で３Ｄモデルに含まれる点に該当するものであった場合にはステップS16へと進み、否定であった場合には当該ボクセルv_iについての処理は完了したものとしてステップS17へと進む。ステップS16では当該ボクセルv_iの判定結果を初期値から書き換えてE(v_i)=1（「３Ｄモデルに属する」）とし、当該ボクセルv_iについての処理は完了したものとしてステップS17へと進む。 In step S15, if the result in step S14 is affirmative and corresponds to the point included in the 3D model, the process proceeds to step S16, and if it is negative, the processing for the _{voxel v i is completed.} Proceed to step S17 as if it had been done. In step S16, _{the judgment result of the voxel v i} is rewritten from the initial value so that E (v _i ) = 1 (“belonging to the 3D model”), and _{the process for the voxel v i} is assumed to be completed, and the process proceeds to step S17. ..

ステップS17ではステップS10で設定したボクセル集合{v_i|i=1,2,…,M}の全てのボクセルについて処理が完了したか否かを判断し、完了していればステップS18へと進み、未処理のボクセルが残っていればステップS11へと戻る。なお、ステップS11〜S17の繰り返し処理により、ボクセル集合{v_i|i=1,2,…,M}の各ボクセルに関して、以下の３通りのいずれかの判定結果が得られることとなる。 In step S17, it is determined whether or not the processing is completed for all the voxels of the voxel set {v _{i | i = 1,2,…, M} set in step S10, and if so, the process proceeds to step S18.} , If unprocessed voxels remain, the process returns to step S11. By the iterative processing of steps S11 to S17, one of the following three determination results can be obtained for each voxel _{of the voxel set {v i | i = 1,2, ..., M}.}

（第１ケース）「S16→S17」と遷移することで、視体積交差法（S14）を適用したうえで、E(v_i)=1（オクルージョン領域に該当せず、「３Ｄモデルに属する」）と判定される。
（第２ケース）「S15→S17」と遷移することで、視体積交差法（S14）を適用したうえで、E(v_i)=0（オクルージョン領域に該当せず、「３Ｄモデルに属さない」）と判定される。
（第３ケース）「S13→S17」と遷移することで、視体積交差法（S14）を適用することなく、E(v_i)=0（オクルージョン領域に該当するため「３Ｄモデルに属さない」）と判定される。 (First case) makes a transition to as "S16 → S17", in terms of applying the volume intersection method (S14), E (v _i) = 1 (does not correspond to the occlusion region, "belong to the 3D model" ) Is determined.
By (second case) transits "S15 → S17", in terms of applying the volume intersection method (S14), does not correspond to E (v _i) = 0 (occlusion area, do not belong to the "3D model ") Is determined.
By (third case) transits "S13 → S17", without applying the volume intersection method _{(S14), E (v i} ) = 0 ( for corresponding to the occlusion region "does not belong to the 3D model" ) Is determined.

ステップS18では、E(v_i)=1（「３Ｄモデルに属する」）と判定されたボクセル集合に対してポリゴン化することで表面形状を得るようにする等の後処理を行い、描画が可能な状態としての３Ｄモデルを生成部13より描画部14へと出力して、図７のフローは終了する。ポリゴン化などの後処理には任意の既存技術を用いてよい。 In step S18, performs post-processing such as to obtain E a (v _i) = 1 ( "belonging to the 3D model") and the determined surface shape by polygons against set of voxels, enables drawing The 3D model in this state is output from the generation unit 13 to the drawing unit 14, and the flow of FIG. 7 ends. Any existing technology may be used for post-processing such as polygonization.

なお、図７のフローに即して上記の第１〜第３ケースの区別を付与して各ボクセルv_iの判定結果を得るようにしたうえで、ステップS14において視体積交差法を適用するに際しては、任意の既存手法を組み合わせるようにしてもよい。 In addition, when applying the visual volume crossing method in step S14 after giving the distinction between the first to third cases described above according to the flow of FIG. 7 _{so as to obtain the judgment result of each voxel v i.} May be combined with any existing method.

＜描画部14＞
描画部14では、生成部13で生成された３Ｄモデルを、撮影部11で得た多視点画像のテクスチャを用いて仮想カメラVCの視点においてレンダリングすることで描画し、得られた仮想視点画像（描画がなされた箇所以外は画素値が定義されないマスク画像となる）を端末装置20の表示部23へと送信する。 <Drawing unit 14>
The drawing unit 14 draws the 3D model generated by the generation unit 13 by rendering it from the viewpoint of the virtual camera VC using the texture of the multi-viewpoint image obtained by the shooting unit 11, and the obtained virtual viewpoint image ( (It becomes a mask image in which the pixel value is not defined except for the portion where the drawing is made) is transmitted to the display unit 23 of the terminal device 20.

描画部14におけるレンダリングには、自由視点映像の合成等において利用されている任意の既存手法（例えば前掲の特許文献３の手法）を利用してよく、３Ｄモデルの要素であるポリゴンを仮想カメラVC視点の画像平面へと投影し、当該投影したポリゴンに、撮影部11で得た多視点画像より対応するテクスチャを選択して、投影による変形を反映したうえで貼り付けるようにすればよい。ここで、N個の視点の多視点画像のうち、仮想カメラVCに近い位置姿勢にある１つ以上の画像よりテクスチャを選択すればよい。２つ以上の画像を用いる場合は重みづけ和などを用いてよい。 For rendering in the drawing unit 14, any existing method (for example, the method of Patent Document 3 described above) used in synthesizing a free-viewpoint image may be used, and a polygon which is an element of a 3D model is converted into a virtual camera VC. It may be projected onto the image plane of the viewpoint, and the corresponding texture may be selected from the multi-view image obtained by the photographing unit 11 on the projected polygon, and the texture may be pasted after reflecting the deformation due to the projection. Here, the texture may be selected from one or more images in a position and orientation close to the virtual camera VC among the multi-viewpoint images of N viewpoints. When two or more images are used, a weighted sum or the like may be used.

＜表示部23及び表示側撮影部22＞
表示部23は、描画部14より送信された仮想視点画像をユーザに対して表示することで、ユーザに対してAR視聴を可能とする。ハードウェアとしての表示部23は例えば、光学シースルー型HMD又はビデオシースルー型HMDとして実現することができる。前者（光学シースルー型HMD）の場合であれば、ユーザの肉眼にそのまま見えている撮影環境WEの実物の背景に対して、描画部14より送信された仮想視点画像（マスク画像）のみを重畳表示すればよい。後者（ビデオシースルー型HMD）の場合、ハードウェアとしてはカメラで構成される表示側撮影部22が撮影した撮影環境WEの背景画像に対して、描画部14より送信された仮想視点画像（マスク画像）を重畳表示すればよい。なお、端末装置20側での画像撮影が不要な場合、表示側撮影部22は省略してよい。 <Display unit 23 and display side shooting unit 22>
The display unit 23 displays the virtual viewpoint image transmitted from the drawing unit 14 to the user, thereby enabling the user to view the AR. The display unit 23 as hardware can be realized as, for example, an optical see-through type HMD or a video see-through type HMD. In the case of the former (optical see-through type HMD), only the virtual viewpoint image (mask image) transmitted from the drawing unit 14 is superimposed and displayed on the actual background of the shooting environment WE that is visible to the naked eye of the user. do it. In the case of the latter (video see-through type HMD), the hardware is a virtual viewpoint image (mask image) transmitted from the drawing unit 14 with respect to the background image of the shooting environment WE taken by the display side shooting unit 22 composed of the camera. ) May be superimposed and displayed. If it is not necessary to take an image on the terminal device 20 side, the display side shooting unit 22 may be omitted.

以上、本発明の一実施形態によれば、モデル生成や描画を行うサーバ（画像処理装置10）において、モデル生成を簡略化することができるため、サーバでのモデル生成や簡略化されたモデルによるレンダリングコストの削減などが期待でき、且つ、視聴デバイス（端末装置20）の側において、物理オブジェクトとのオクルージョンを考慮した仮想オブジェクトの表示が可能になり、より自然なAR表示が実現できる。 As described above, according to one embodiment of the present invention, the model generation can be simplified in the server (image processing device 10) that generates and draws the model. Therefore, the model generation and the simplified model in the server are used. It is expected that the rendering cost will be reduced, and the viewing device (terminal device 20) will be able to display the virtual object in consideration of occlusion with the physical object, and a more natural AR display can be realized.

なお、画像処理装置10と端末装置20とで利用する各種の３次元座標系自体は、AR表示を実現するものとして、既存のサーバサイドレンダリングと同様の関係を用いればよいが、まとめると以下の通りである。 The various three-dimensional coordinate systems themselves used by the image processing device 10 and the terminal device 20 may use the same relationship as the existing server-side rendering to realize AR display. It's a street.

端末装置20では、視聴側の世界座標系(X,Y,Z)_{[視聴側世界]}において位置姿勢情報を取得し、デプス画像にはこの位置姿勢情報が、デプス画像を取得したデプスカメラの位置姿勢を表すものとして、同じ時刻で取得されたものとして紐づいている。画像処理装置10においては、撮影部11では、撮影側の世界座標系(X,Y,Z)_{[撮影側世界]}に配置されたモデル化対象のオブジェククトOBをN視点の各カメラCk(k=1,2,…,N)のカメラ座標(X,Y,Z)_[Ck]において撮影する。生成部13では、３Ｄモデル空間としてのボクセル空間(X,Y,Z)_{[ボクセル]}に予めボクセルを定義しておく。既知のカメラパラメータにより、N個のカメラ座標系と撮影側世界座標系との相互変換「(X,Y,Z)_[Ck]⇔(X,Y,Z)_{[撮影側世界]}」が可能である。また、ボクセル空間(X,Y,Z)_{[ボクセル]}は撮影の舞台である世界座標系(X,Y,Z)_{[撮影側世界]}に管理者等が設定するものであるため、これらの相互変換「(X,Y,Z)_{[ボクセル]}⇔(X,Y,Z)_{[撮影側世界]}」も可能である。（同一のものとして「(X,Y,Z)_{[ボクセル]}=(X,Y,Z)_{[撮影側世界]}」として設定してもよい。） In the terminal device 20, the position / orientation information is acquired in the world coordinate system (X, Y, Z) _{[viewing side world] on the viewing side} , and this position / orientation information is used in the depth image, and the position of the depth camera from which the depth image is acquired is obtained. As a representation of the posture, it is linked as if it was acquired at the same time. In the image processing device 10, in the photographing unit 11, each camera Ck (k) of the N viewpoints the object OB to be modeled arranged in _{the world coordinate system (X, Y, Z) [shooting side world] on the shooting side.} Take a picture at the camera coordinates (X, Y, Z) _[Ck] of = 1,2,…, N). In the generation unit 13, a voxel is defined in advance in the voxel space (X, Y, Z) _{[voxel] as the 3D model space.} With known camera parameters, mutual conversion between N camera coordinate systems and the shooting side world coordinate system "(X, Y, Z) _[Ck] ⇔ (X, Y, Z) _{[shooting side world]} " is possible. is there. In addition, since the voxel space (X, Y, Z) _[voxel] is set by the administrator etc. in the world coordinate system (X, Y, Z) _{[shooting side world], which is the stage of shooting, these mutual} The conversion "(X, Y, Z) _[voxel] ⇔ (X, Y, Z) _{[shooting world]} " is also possible. (It may be set as "(X, Y, Z) _[voxel] = (X, Y, Z) _{[shooting world]" as the same thing.)}

また、ステップS10で説明したように所定変換によって、視聴を行う端末装置20側の世界座標系と画像処理装置10側のボクセル座標系とが「(X,Y,Z)_{[視聴側世界]}⇔(X,Y,Z)_{[ボクセル]}」のように変換可能である。特に、この変換により視聴側世界座標で取得された端末装置20の位置姿勢をボクセル座標系の位置姿勢に変換したものが、ボクセル座標系内において仮想カメラVCの位置姿勢を与える。このように仮想カメラVCの位置姿勢がボクセル座標系(X,Y,Z)_{[ボクセル]}において与えられていることから、ボクセル座標系と仮想カメラVCの座標系との変換「(X,Y,Z)_{[ボクセル]}⇔(X,Y,Z)_[VC]」（ステップS21の「v_i→v'_i」等）も、原点位置及び座標軸の向きの変換として可能である。 In addition, as explained in step S10, the world coordinate system on the terminal device 20 side and the voxel coordinate system on the image processing device 10 side for _{viewing are changed to "(X, Y, Z) [viewing side world]} ⇔ (X, Y, Z) _[Voxel] ”can be converted. In particular, the position / orientation of the terminal device 20 acquired in the viewing side world coordinates by this conversion is converted into the position / orientation of the voxel coordinate system, which gives the position / orientation of the virtual camera VC in the voxel coordinate system. Since the position and orientation of the virtual camera VC are given in the voxel coordinate system (X, Y, Z) _[boxel] in this way, the conversion between the voxel coordinate system and the coordinate system of the virtual camera VC "(X, Y, Z) Z) _{[voxel] ⇔ (X, Y, Z} ) [VC] "(in step S21" v _{_i} → v _'i', etc.) are also possible in the conversion of the direction of the home position and the coordinate axes.

そして、画像処理装置10において仮想カメラVCの座標系(X,Y,Z)_[VC]で求めた３Ｄモデルを描画して端末装置20の側で表示する際には、仮想カメラVCの視点がそのまま、端末装置20におけるAR視聴を行うユーザ視点に一致するものとして扱えばよい。（例えば、表示部23での表示を、ビデオシースルー方式で実現し、表示側撮影部22で撮影した背景画像に対して重畳表示する場合、仮想カメラVCの内部パラメータを、表示側撮影部22を構成するハードウェアとしてのカメラの内部パラメータと同じものとすることで、表示が整合するように描画部14において描画を行うことが可能となる。光学シースルー方式の場合も同様である。） Then, when the image processing device 10 draws the 3D model obtained by the coordinate system (X, Y, Z) _[VC] of the virtual camera VC and displays it on the terminal device 20 side, the viewpoint of the virtual camera VC is displayed. As it is, it may be treated as matching the viewpoint of the user who performs AR viewing on the terminal device 20. (For example, when the display on the display unit 23 is realized by the video see-through method and the background image captured by the display side shooting unit 22 is superimposed and displayed, the internal parameters of the virtual camera VC are set to the display side shooting unit 22. By setting the same as the internal parameters of the camera as the hardware to be configured, it is possible to draw in the drawing unit 14 so that the display is consistent. The same applies to the optical see-through method.)

以上説明した実施形態を第一実施形態とし、以下ではこの変形例である第二実施形態ないし第六実施形態を説明する。 The embodiment described above will be referred to as the first embodiment, and the second to sixth embodiments, which are modified examples of the above, will be described below.

＜第二実施形態＞
第二実施形態は、第一実施形態において得られた仮想オブジェクトを用いて３Ｄモデル空間に光源を配置したうえで仮想オブジェクトの影を含めて描画を行う場合に発生しうる、次のような課題への対処を可能とするものである。 <Second embodiment>
The second embodiment has the following problems that may occur when a light source is arranged in the 3D model space using the virtual object obtained in the first embodiment and then drawing is performed including the shadow of the virtual object. It is possible to deal with.

すなわち、３Ｄモデル空間において光源を設置して描画した場合、光源と仮想オブジェクトとの位置関係より、影が生成される。しかしながら、第一実施形態ではオクルージョンを考慮して一部のボクセルについてはモデル生成を省略していることで、省略された領域に対する影が生成されなくなる。オクルージョン領域自体は、ユーザのビューから見えない部分であるが、その影についてはユーザから見える部分である場合がある。この場合に、ユーザから見ると影が途切れたり消失したりしている状態となり、不自然に見えてしまうというのが第一実施形態で発生しうる課題である。 That is, when a light source is installed and drawn in the 3D model space, a shadow is generated from the positional relationship between the light source and the virtual object. However, in the first embodiment, the model generation is omitted for some voxels in consideration of occlusion, so that the shadow for the omitted region is not generated. The occlusion area itself is a part that is not visible to the user's view, but its shadow may be a part that is visible to the user. In this case, it is a problem that can occur in the first embodiment that the shadow is interrupted or disappears from the user's point of view and looks unnatural.

そこで、第二実施形態では、オクルージョン領域である全てのボクセルに関してモデル化をスキップするのではなく、オクルージョン領域であるボクセルのうち、影に影響する部分のボクセルはモデル化し、影に影響しない部分のみモデル化をスキップする。 Therefore, in the second embodiment, instead of skipping modeling for all voxels in the occlusion area, the voxels in the part of the voxels in the occlusion area that affect the shadow are modeled, and only the part that does not affect the shadow is modeled. Skip modeling.

図９は第一実施形態で発生しうる上記の課題を模式的に示す図である。（なお、図９では３Ｄモデル空間をその２次元断面によって模式的に示している。）第一実施形態では、仮想カメラVCから見て物理オブジェクトPOBに遮蔽されている領域はオクルージョン領域ROCを形成し、生成される仮想オブジェクトVOBはこのオクルージョン領域ROC外部のみにおいて生成され、図９の例においては本来のオブジェクトOBの上側部分up及び下側部分dpの２つに分断された形で生成されている。ここで、図３で説明した従来技術を適用して仮想オブジェクトVOBを生成したと仮定すると、上側部分up及び下側部分dpの２つに分断されるのではなく、その中間部分mp（オクルージョン領域ROC内にある）をも含めて、本来のオブジェクトOBの全体が生成されることとなる。 FIG. 9 is a diagram schematically showing the above-mentioned problems that may occur in the first embodiment. (Note that, in FIG. 9, the 3D model space is schematically shown by its two-dimensional cross section.) In the first embodiment, the region shielded by the physical object POB when viewed from the virtual camera VC forms an occlusion region ROC. However, the generated virtual object VOB is generated only outside this occlusion area ROC, and in the example of FIG. 9, it is generated in the form of being divided into two parts, the upper part up and the lower part dp of the original object OB. There is. Here, assuming that the virtual object VOB is generated by applying the conventional technique described in FIG. 3, it is not divided into two parts, the upper part up and the lower part dp, but the intermediate part mp (occlusion area). The entire original object OB will be generated, including (in the ROC).

このように第一実施形態で生成された仮想オブジェクトVOBに対して仮想光源VLを配置して、その影を仮想空間内に定義される地面GR上において描画すると、上側部分upに由来する上側影領域usと下側部分dpに由来する下側影領域dsと、の２領域に分断された影が描画されるが、仮想カメラVCはこの分断された状態の影が見える位置にあり、AR視聴ユーザに不自然な印象を与えてしまう。 When the virtual light source VL is placed on the virtual object VOB generated in the first embodiment and the shadow is drawn on the ground GR defined in the virtual space, the upper shadow derived from the upper portion up is obtained. A divided shadow is drawn in two areas, the area us and the lower shadow area ds derived from the lower part dp, but the virtual camera VC is in a position where the shadow in this divided state can be seen, and the AR viewing user Gives an unnatural impression to.

図１０は、図９に模式的に示された第一実施形態の課題に対して第二実施形態によって提供される解決策を模式的に示す図である。図１０にて図９と同一の符号は同一内容を表すため、重複した説明は省略する。図１０に示されるように、不自然な印象を与える原因であった、仮想オブジェクト生成が省略された中間部分mpについて、第二実施形態では、第一領域m1及び第二領域m2の区別を得ることができる。 FIG. 10 is a diagram schematically showing a solution provided by the second embodiment to the problem of the first embodiment schematically shown in FIG. Since the same reference numerals as those in FIG. 9 in FIG. 10 represent the same contents, duplicate description will be omitted. As shown in FIG. 10, with respect to the intermediate part mp in which the virtual object generation is omitted, which was the cause of giving an unnatural impression, in the second embodiment, the first region m1 and the second region m2 are distinguished. be able to.

第一領域m1は、オクルージョン領域ROC内にあるが仮想オブジェクトVOBを構成する領域として視体積交法を適用して算出されたものであり、仮想オブジェクトVOBのテクスチャ描画においては無視するが、影の描画においては考慮するように扱うものである。第二領域m2は、結果的に第一実施形態と同様に視体積交差法の適用対象除外として扱われるものである。（なお、図１０の例では上側部分up寄りと下側部分dp寄りの２領域で第二領域m2が構成されている。） The first region m1 is in the occlusion region ROC but is calculated by applying the visual volume crossing method as a region constituting the virtual object VOB, and is ignored in the texture drawing of the virtual object VOB, but the shadow It is treated as a consideration in drawing. As a result, the second region m2 is treated as an exemption from the application of the visual volume crossing method as in the first embodiment. (In the example of FIG. 10, the second region m2 is composed of two regions, one closer to the upper portion up and the other closer to the lower portion dp.)

第二実施形態では、オクルージョン領域ROC内であっても上記の第一領域m1のように、影に影響する領域の区別を得ることができ、図１０に示されるように影の描画の際には第一領域m1によって生じる中間影領域ms1も描画されることで、影領域がus,ms,dsと途切れることなく連続したものとして描画され、仮想カメラVCから見た際の第一実施形態での不自然な印象を解消することが可能となる。 In the second embodiment, it is possible to distinguish the regions that affect the shadow even within the occlusion region ROC, as in the first region m1 described above, and when drawing the shadow as shown in FIG. Is drawn as an intermediate shadow area ms1 generated by the first area m1 so that the shadow area is continuous with us, ms, ds without interruption, and is the first embodiment when viewed from the virtual camera VC. It is possible to eliminate the unnatural impression of.

第二実施形態は具体的に以下のように、生成部13、描画部14及び表示部23の処理が第一実施形態から変更や追加を伴うものとなる。（その他の機能部の処理内容は第二実施形態と第一実施形態とで同様であるため、重複した説明は行わない。） Specifically, in the second embodiment, the processes of the generation unit 13, the drawing unit 14, and the display unit 23 are changed or added from the first embodiment as follows. (Since the processing contents of the other functional parts are the same in the second embodiment and the first embodiment, no duplicate explanation will be given.)

＜生成部13…第二実施形態＞
図１１は、第二実施形態による生成部13のモデル生成の手順の一例を示すフローチャートである。図１１のフローチャートは、図７のフローチャートに示されるのと同様のステップS10〜S18に、第二実施形態での追加手順としてのステップS131及びS132を加えたものである。ステップS10〜S18に関しては図７の同符号のステップと特段の追加説明がない限り同様であるため、重複する説明は省略し、相違点としてのステップS13,S131,S132,S16,S18に関して以下で説明する。 <Generation unit 13 ... Second embodiment>
FIG. 11 is a flowchart showing an example of a procedure for generating a model of the generation unit 13 according to the second embodiment. The flowchart of FIG. 11 is obtained by adding steps S131 and S132 as additional procedures in the second embodiment to steps S10 to S18 similar to those shown in the flowchart of FIG. 7. Steps S10 to S18 are the same as the steps of the same reference numerals in FIG. 7 unless otherwise specified. Therefore, duplicate explanations are omitted, and steps S13, S131, S132, S16, and S18 as differences are described below. explain.

図１１のステップS13では図７のステップS13と同様に、当該ボクセルv_iがオクルージョン領域に該当するか否かの判定結果による場合分けでフローの進み先が分岐する。図１１のステップS13にて否定判定の場合（オクルージョン領域に該当しない場合）は、図７と同様にステップS14へと進むが、肯定判定の場合（オクルージョン領域に該当する場合）、ステップS131へと進む。 In step S13 of FIG. 11, as in step S13 of FIG. 7, the destination of the flow branches depending on the case according to the determination result of whether or not the _{voxel v i corresponds to the occlusion region.} In the case of a negative determination in step S13 of FIG. 11 (when it does not correspond to the occlusion area), the process proceeds to step S14 as in FIG. 7, but in the case of an affirmative determination (when it corresponds to the occlusion area), the process proceeds to step S131. move on.

ステップS131では、当該ボクセルv_iが影に影響するものであるか否かの判定を行い、ステップS132へと進む。ステップS131での判定の詳細はステップS16での追加処理の詳細と共に後述する。ステップS132では、ステップS131での判定結果が肯定（影響する）の場合、ステップS14へと進み、否定（影響しない）の場合、当該ボクセルv_iについての処理は完了したものとしてステップS17へと進む。 In step S131, _{it is determined whether or not the voxel v i} affects the shadow, and the process proceeds to step S132. The details of the determination in step S131 will be described later together with the details of the additional processing in step S16. In step S132, if the judgment result in step S131 is affirmative (affects), the process proceeds to step S14, and if it is negative (does not affect), _{the process for the voxel v i} is assumed to be completed and the process proceeds to step S17. ..

上記のようにステップS13での肯定判定での分岐先のステップS131,S132が追加されることにより、図１１の第二実施形態のステップS18では、図１０の模式例の第一領域m1として示したように、オクルージョン領域に該当するが、影に影響するボクセルをも含めて３Ｄモデルを得ることができる。すなわち、第一実施形態では各ボクセルv_iの判定結果は第１〜第３ケースのいずれかであったが、第二実施形態ではこの第３ケースに関しても影に影響しうると判定されたボクセルであれば、視体積交差法を適用する対象となり、３Ｄモデルを構成する点であるか否かの判定結果が得られることとなる。 By adding steps S131 and S132 of the branch destination in the affirmative determination in step S13 as described above, in step S18 of the second embodiment of FIG. 11, it is shown as the first region m1 of the schematic example of FIG. As described above, a 3D model can be obtained including voxels that correspond to the occlusion area but affect shadows. That is, in the first embodiment, _{the determination result of each voxel v i} was any of the first to third cases, but in the second embodiment, the voxels determined to have an influence on the shadow also in this third case. If this is the case, the visual volume crossing method is applied, and a determination result of whether or not it is a point constituting the 3D model can be obtained.

（ステップS131での判定処理の詳細）
ステップS131では、当該ボクセルv_iを仮想空間内の所定位置が定義されている仮想光源VLにおけるカメラ座標系の画像平面にマッピングし、その画素位置(x,y)を求め、仮想光源VLとボクセルv_iとの距離e2_i=dist(v_i,VL)を計算する。ここで、後述するステップS16の追加処理により、仮想光源VLにおけるカメラ座標系の画像平面には、仮想光源VLから見て３Ｄモデルに属するボクセルとの距離のうちの最小値が「光源デプス画像」として記録され継続的に更新されている。 (Details of judgment processing in step S131)
In step S131, the voxel v _i is mapped to the image plane of the camera coordinate system in the virtual light source VL in which a predetermined position in the virtual space is defined, the pixel position (x, y) is obtained, and the virtual light source VL and the voxel are obtained. Calculate the distance to v _i _{e2 i} = dist (v _i , VL). Here, due to the additional processing of step S16 described later, the minimum value of the distance to the voxels belonging to the 3D model as viewed from the virtual light source VL is "light source depth image" on the image plane of the camera coordinate system in the virtual light source VL. It is recorded as and is continuously updated.

従って、ステップS131ではさらに、この光源デプス画像における当該画素位置(x,y)の値L(x,y)を参照し、距離e2_iとこの光源デプス値L(x,y)との大小関係を判定し、「e2_i>L(x,y)」であれば、当該ボクセルv_iよりも光源VLに近い側（且つ、当該ボクセルv_iと光源VLとを結ぶ線分上）に、３Ｄモデルを構成するボクセルが既に存在するので、当該ボクセルv_iは影に影響しないと判定する。（当該より近い既に存在するボクセルによる影で、当該ボクセルv_iによる影も生成できるためのである。）逆に、否定判定すなわち「e2_i≦L(x,y)」の場合は、当該ボクセルv_iは影に影響すると判定する。 Therefore, in step S131, the value L (x, y) of the pixel position (x, y) in the light source depth image is further referred to, _{and the magnitude relationship between the distance e2 i} and the light source depth value L (x, y). If "e2 _i > L (x, y)", 3D is on the side closer to the light source VL than the _{voxel v i} _{(and on the line connecting the voxel v i} and the light source VL). Since the voxels that make up the model already exist, it _{is determined that the voxels v i} do not affect the shadow. (This is because the shadow of the voxel v _i that is closer to the existing shadow can also be generated.) On the contrary, in the case of a negative judgment, that is, "e2 _i ≤ L (x, y)", the voxel v _It is determined that i affects the shadow.

（ステップS16での追加処理の詳細）
ステップS16での追加処理として、上記ステップS131で参照するための「光源デプス画像」の画素値を更新する。具体的には、ステップS131と同様に、当該３Ｄモデルを構成する点であると判定されたボクセルv_iに関して、仮想光源VLにおけるカメラ座標系の画像平面にマッピングし、その画素位置(x,y)を求め、仮想光源VLとボクセルv_iとの距離e2_i=dist(v_i,VL)を計算する。そして、当該計算した距離e2_iに関して、現時点での同位置(x,y)における光源デプス画像の画素値L(x,y)との大小関係を調べ、距離e2_iの方が小さい（「e2_i<L(x,y)」）ならば、光源デプス画像の画素値L(x,y)を、当該より小さい距離e2_iの値で上書きして更新する。否定「e2_i≧L(x,y)」の場合、値の更新は行わない。なお、光源デプス画像の画素の初期値として、ステップS10において各画素位置(x,y)における光源デプス値L(x,y)＝∞（無限大）を設定しておけばよい。計算された距離e2_iの値によらず常に、大小関係は「e2_i<∞」である。 (Details of additional processing in step S16)
As an additional process in step S16, the pixel value of the "light source depth image" to be referred to in step S131 is updated. Specifically, as in step S131, the voxels v _i determined to be the points constituting the 3D model are mapped to the image plane of the camera coordinate system in the virtual light source VL, and the pixel positions (x, y) are mapped. ), And the distance e2 _i = dist (v _i , VL) between the virtual light source VL and the voxel v _{i is calculated.} Then, with respect to the calculated distance e2 _i , the magnitude relationship with the pixel value L (x, y) of the light source depth image at the same position (x, y) at the present time is investigated, and the distance e2 _i is smaller ("e2"). _{If i} <L (x, y) ”), the pixel value L (x, y) of the light source depth image is overwritten with the value of the _{smaller distance e2 i and updated.} Negation If "e2 _i ≥ L (x, y)", the value is not updated. As the initial value of the pixels of the light source depth image, the light source depth value L (x, y) = ∞ (infinity) at each pixel position (x, y) may be set in step S10. Regardless of the calculated value of the distance e2 _i , the magnitude relation is always "e2 _i <∞".

以上、ステップS16での追加処理として更新する光源デプス画像を用いた、ステップS131の判定処理の意義は次の通りである。 As described above, the significance of the determination process in step S131 using the light source depth image to be updated as the additional process in step S16 is as follows.

すなわち、図１１に示されるフローの繰り返し処理で３Ｄモデルに属するか等が所定順番（ラスタスキャン順など）で判定される各ボクセルv_iに関して、オクルージョン領域にあると判定された場合（ステップS131に到達した場合）、当該判定等の処理が既に完了している別のボクセルv_j(j<i)であって、当該ボクセルv_iから見て、光源VLを遮蔽する位置に存在し、且つ、３Ｄモデルに属するもの（オクルージョン領域内のものも含む）が存在する場合、当該ボクセルv_iは影に影響しないと判定する。一方、このような別のボクセルv_jが存在しない場合、当該ボクセルv_iは、（処理順番においてv_i自身より前に）既に判定された３Ｄモデルを構成するいずれのボクセルによっても遮蔽されることなく、光源VLを直接に見ることができる位置にあるため、当該ボクセルv_iは影に影響すると判定している。 _{That is, when it is determined that each voxel v i} , which is determined in a predetermined order (raster scan order, etc.) whether or not it belongs to the 3D model by the iterative process of the flow shown in FIG. 11, is in the occlusion area (in step S131). _{(When it arrives), another voxel v j} (j <i) for which the processing such as the determination has already been completed, exists at a position that shields the light source VL when viewed from the _{voxel v i, and} If there is something that belongs to the 3D model (including the one in the occlusion area), it is _{determined that the voxel v i} does not affect the shadow. On the other hand, in the _{absence of such another voxel v j} , the voxel v _i is shielded by any voxel that constitutes the already determined 3D model _{(before vi} itself in the processing order). Since it is in a position where the light source VL can be seen directly, it is _{judged that the voxel v i} affects the shadow.

光源デプス画像は、上記のように仮想光源VLから見て当該ボクセルv_iが別のボクセルv_j（３Ｄモデルを構成するもの）によって遮蔽されているか否かの判定を行うための手段の一例であり、光源カメラの内部パラメータを設定しておくことで、簡素に当該判定を行うことができるものである。（光源デプス画像をそのまま用いて何らかの描画を行うためのものではない。） The light source depth image is an example of a means for determining whether or not the _{voxel v i} is shielded by another voxel v _j (which constitutes a 3D model) when viewed from the virtual light source VL as described above. Yes, by setting the internal parameters of the light source camera, the determination can be made simply. (It is not for drawing something using the light source depth image as it is.)

＜描画部14…第二実施形態＞
描画部14では、生成部13で得られた３Ｄモデルを用いて、光源を反映した描画を行う。ここで、３Ｄモデルの全体のうち、オクルージョン領域とは判定されなかった部分に関しては、そのテクスチャを第一実施形態と同様にして光源を反映したうえで描画し、且つ、光源が存在することによって当該領域によって生じる影も描画するようにすればよい。一方、３Ｄモデルの全体のうち、オクルージョン領域と判定された部分に関しては、ユーザ視点（仮想カメラ）から見えないためテクスチャ描画は省略するが、当該領域によって当該光源との関係で生じる影に関する描画は行うようにすればよい。（すなわち、テクスチャ描画は、３Ｄモデル全体のうち、オクルージョン領域でない部分のみを用いて行い、影の描画は、オクルージョン領域であるか否かを問わず、３Ｄモデル全体を用いて行えればよい。）光源効果を付与した影の描画に関しては、３ＤＣＧ（３Ｄコンピュータグラフィックス）の分野で利用されている任意の既存手法を利用してよい。 <Drawing unit 14 ... Second embodiment>
The drawing unit 14 uses the 3D model obtained by the generation unit 13 to perform drawing reflecting the light source. Here, in the entire 3D model, the portion that is not determined to be the occlusion region is drawn after reflecting the light source in the same manner as in the first embodiment, and the presence of the light source causes the texture to be drawn. The shadow generated by the area may also be drawn. On the other hand, in the entire 3D model, the part determined to be the occlusion area is not visible from the user's viewpoint (virtual camera), so the texture drawing is omitted, but the drawing related to the shadow generated in relation to the light source by the area is You can do it. (That is, the texture drawing may be performed using only the portion of the entire 3D model that is not the occlusion area, and the shadow drawing may be performed using the entire 3D model regardless of whether or not it is the occlusion area. ) For drawing shadows with a light source effect, any existing method used in the field of 3DCG (3D computer graphics) may be used.

＜表示部23…第二実施形態＞
表示部23では、描画部14で得られた描画結果としての、仮想オブジェクトのテクスチャとその影とを表示すれよい。 <Display unit 23 ... Second embodiment>
The display unit 23 may display the texture of the virtual object and its shadow as the drawing result obtained by the drawing unit 14.

＜第三実施形態＞
第三実施形態は、第一実施形態での生成部13に対する追加処理として、次を行うものである。すなわち、生成部13では予め、第一実施形態によってオクルージョン領域を除外して３Ｄモデルを構成するボクセルを求めておき、追加処理として、オクルージョン領域に該当すると判定されたボクセル集合に関しても、ボクセル密度を所定割合だけ下げたうえで、視体積交差法を適用し、３Ｄモデルを構成するボクセルを求める。 <Third Embodiment>
The third embodiment performs the following as additional processing for the generation unit 13 in the first embodiment. That is, in the generation unit 13, the voxels constituting the 3D model are obtained in advance by excluding the occlusion area according to the first embodiment, and as an additional process, the voxel density is also determined for the voxel set determined to correspond to the occlusion area. After lowering by a predetermined ratio, the visual volume crossing method is applied to obtain the voxels constituting the 3D model.

第一実施形態において高密度でオクルージョン領域外から求めた３Ｄモデルを構成するボクセル集合（第１ボクセル集合と呼ぶ）に加えて、第三実施形態においてさらに、低密度でオクルージョン領域内から求めた３Ｄモデルを構成するボクセル集合（第２ボクセル集合と呼ぶ）の用途の一例として、第三実施形態では次が可能である。 In addition to the voxel set (referred to as the first voxel set) constituting the 3D model obtained from outside the occlusion region at high density in the first embodiment, the 3D obtained from within the occlusion region at low density in the third embodiment. As an example of the use of the voxel set (referred to as the second voxel set) constituting the model, the following is possible in the third embodiment.

この用途を説明するための前提事項をまず説明する。画像処理システム100においては既に説明したように、画像処理装置10と端末装置20との間で時刻を同期したうえでリアルタイムに、端末装置20側で環境情報を取得して画像処理装置10に送信し、画像処理装置10側で多視点映像を撮影して３Ｄモデルを生成して仮想視点における描画を行って描画結果を端末装置20に送信し、端末装置20の側でこの描画結果をユーザに対して表示してAR視聴を可能とする。 The prerequisites for explaining this application will be described first. In the image processing system 100, as described above, the time is synchronized between the image processing device 10 and the terminal device 20, and then the terminal device 20 acquires the environmental information and transmits it to the image processing device 10 in real time. Then, the image processing device 10 takes a multi-viewpoint image, generates a 3D model, draws in a virtual viewpoint, sends the drawing result to the terminal device 20, and sends the drawing result to the user on the terminal device 20 side. On the other hand, AR viewing is possible.

ここで、端末装置20の側での環境情報（位置姿勢の情報のみ、または、位置姿勢情報とデプス画像の両方）の取得は、位置姿勢センサ等の専用デバイスを利用して低負荷且つ高速な処理レートで実現できるのと比べて、画像処理装置10の側での多視点映像からの３Ｄモデル生成及び描画は、データサイズが環境情報に比べて多いことに加え計算量も多いため、画像処理装置10を計算リソースが豊富な専用サーバ等で実装したとしても、実現できる処理レートに限界がある場合がありうる。このことへの対処として、端末装置20の側での環境情報の取得及び送信は高速レート（例えば0.1秒ごと）で行い、画像処理装置10の側での生成部13による３Ｄモデル生成までを低速レート（例えば1秒ごと）で行い、描画部14による描画及びこの描画情報の送信を高速レートに合わせるようにすることで、端末装置20の側において高速レートでの表示を実現することが可能である。 Here, the acquisition of environmental information (only position / orientation information or both position / orientation information and depth image) on the terminal device 20 side is low-load and high-speed using a dedicated device such as a position / orientation sensor. Compared to what can be achieved at the processing rate, 3D model generation and drawing from multi-viewpoint video on the image processing device 10 side is image processing because the data size is larger than the environmental information and the amount of calculation is also large. Even if the device 10 is mounted on a dedicated server or the like having abundant computing resources, there may be a limit to the processing rate that can be realized. As a countermeasure against this, the acquisition and transmission of environmental information on the terminal device 20 side is performed at a high speed rate (for example, every 0.1 seconds), and the 3D model generation by the generation unit 13 on the image processing device 10 side is slow. By performing at a rate (for example, every second) and matching the drawing by the drawing unit 14 and the transmission of this drawing information to the high-speed rate, it is possible to realize the display at the high-speed rate on the terminal device 20 side. is there.

この際、画像処理装置10の側において低速レート（例として1秒ごと）での生成部13で生成された３Ｄモデルを当該1秒ごとの時刻t=0,1,2,…においてそれぞれMD(0),MD(1),MD(2),…とすると、描画部14では高速レート（例として0.1秒ごと）に、直近に生成されている３Ｄモデルを補間して描画を行うようにすればよい。例えば、時刻t=0の３ＤモデルにはMD(0)をそのまま利用できるが、時刻t=0.1,0.2,0.3,…に関して、この直近のモデルMD(0)を補間したモデルMD'(0.1),MD'(0.2),MD'(0.3),…を用いればよい。 At this time, the 3D model generated by the generation unit 13 at a low speed rate (for example, every 1 second) on the image processing apparatus 10 side is MD (1) at the time t = 0, 1, 2, ... Every 1 second. Assuming 0), MD (1), MD (2), ..., the drawing unit 14 should perform drawing by interpolating the most recently generated 3D model at a high-speed rate (for example, every 0.1 seconds). Just do it. For example, MD (0) can be used as it is for the 3D model at time t = 0, but for time t = 0.1, 0.2, 0.3, ..., the model MD'(0.1) that interpolates this latest model MD (0). , MD'(0.2), MD'(0.3), ... may be used.

第三実施形態はこのような補間が可能な３Ｄモデルを、計算量を抑制して生成することが可能なものである。例えば、時刻t=0.1での補間モデルMD'(0.1)は、モデルMD(0)に対して、時刻t=0から時刻t=0.1の間までの環境情報の変化（仮想カメラの位置姿勢の変化分）を反映して座標移動（回転及び並進による移動）させ、オクルージョン領域の内外の区別を付与するようにすればよい。（すなわち、補間モデルMD'(0.1)は、モデルMD(0)と同一形状のままで、仮想カメラの位置姿勢の移動に伴って、見えている位置姿勢のみが変化したものとなる。この時刻t=0〜0.1間での位置姿勢の変化を表す変換（剛体変換）をT_[0→0.1]とすると、「MD'(0.1)=T_[0→0.1]・MD(0)」である。）以下の説明を含め、t=0.2,0.3,…等に関しても同様にして補間できる。 In the third embodiment, it is possible to generate a 3D model capable of such interpolation while suppressing the amount of calculation. For example, the interpolation model MD'(0.1) at time t = 0.1 changes the environmental information from time t = 0 to time t = 0.1 (the position and orientation of the virtual camera) with respect to the model MD (0). The coordinates may be moved (moved by rotation and translation) to reflect the change) to give a distinction between the inside and outside of the occlusion area. (That is, the interpolation model MD'(0.1) remains the same shape as the model MD (0), and only the visible position and orientation change as the position and orientation of the virtual camera move. If the transformation (rigid transformation) that represents the change in position and orientation between t = _{0 and 0.1 is T [0 → 0.1]} , then "MD'(0.1) = T _{[0 → 0.1]} · MD (0)". .) Including the following explanation, t = 0.2, 0.3, ..., etc. can be interpolated in the same way.

補間モデルMD'(0.1)におけるオクルージョン領域の内外の区別の付与及びこれに基づく描画部14での描画は次のように付与すればよい。説明のため、モデルMD(0)のうち高密度で求められているオクルージョン領域外の可視部分をMD(0)_[可視]、低密度で求められておりオクルージョン領域内にある遮蔽された部分をMD(0)_[遮蔽]とする。同様に、区別された結果としての補間モデルMD'(0.1)における可視部分及び遮蔽部分をそれぞれMD'(0.1)_[可視]及びMD'(0.1)_[遮蔽]とする。また、第一実施形態の処理（この第三実施形態での前処理に相当）によって時刻t=0秒のモデルMD(0)を求める際にデプス取得部122から送信されて求まっている、背景デプス画像によって定まる空間内の面領域をオクルージョン表面OC(0)とする。オクルージョン表面OC(0)とはすなわち、仮想カメラの位置から深度画像の各位置(x,y)へと３次元逆投影を行う直線上において、仮想カメラの位置からその深度d(x,y)だけ離れた点(X,Y,Z)を通るような、モデル空間内の平面である。 The distinction between the inside and outside of the occlusion area in the interpolation model MD'(0.1) and the drawing in the drawing unit 14 based on this may be given as follows. _{For the sake of explanation, MD (0) [visible]} is the visible part of the model MD (0) outside the occlusion area, which is required at high density, and the shielded part inside the occlusion area, which is required at low density. MD (0) _[Shield] . Similarly, the visible and occluded parts of the interpolated model MD'(0.1) as a result of the distinction are MD'(0.1) _[visible] and MD'(0.1) _[occluded] , respectively. Further, when the model MD (0) at time t = 0 seconds is obtained by the processing of the first embodiment (corresponding to the preprocessing in this third embodiment), it is transmitted from the depth acquisition unit 122 and obtained. The surface area in the space determined by the depth image is defined as the occlusion surface OC (0). The occlusion surface OC (0) is the depth d (x, y) from the position of the virtual camera on a straight line that performs three-dimensional back projection from the position of the virtual camera to each position (x, y) of the depth image. It is a plane in the model space that passes through points (X, Y, Z) that are only apart.

このオクルージョン表面は、上記深度画像の離散的な各位置(x,y)を３次元逆投影して得られる離散的な各位置(X,Y,Z)を通る面として求めればよい。例えば曲面フィッティングによって求めてもよいし、離散的な各位置(X,Y,Z)を頂点とするポリゴンとして求めてもよい。当該ポリゴン等として面を求めたうえで、面を構成する各々の面要素（個別のポリゴン等）に関して、仮想カメラの位置（t=0等のモデルを求めた時刻の位置でも、t=0.1,0.2,…等の補間対象の時刻の位置でもよい）から面要素の位置に延ばした直線の向きと、当該面要素の法線の向きのなす角が閾値判定で直角に近いと判定される場合、当該面要素は除外してオクルージョン表面を求めてもよい。（当該判定される面要素は、対応する実際の物理オブジェクトの表面が存在しない可能性があるため、除外して求めるようにしてもよい。） This occlusion surface may be obtained as a surface passing through each discrete position (X, Y, Z) obtained by three-dimensionally back-projecting each discrete position (x, y) of the depth image. For example, it may be obtained by curve fitting, or it may be obtained as a polygon having each discrete position (X, Y, Z) as a vertex. After finding the surface as the polygon, etc., for each surface element (individual polygon, etc.) that composes the surface, even at the position of the virtual camera (t = 0.1, etc.) at the time when the model was obtained, t = 0.1, When the angle between the direction of the straight line extending from the position of the time to be interpolated, such as 0.2, ...) to the position of the surface element and the direction of the normal of the surface element is determined to be close to a right angle by the threshold judgment. , The surface element may be excluded to obtain the occlusion surface. (The surface element to be determined may be excluded because the surface of the corresponding actual physical object may not exist.)

ここで、オクルージョン表面OC(0)は、図２等の物理オブジェクトPOBの模式例でテーブルとして例示したように、撮影側の世界座標において静止している（従って、描画のための仮想空間においても同様に静止している）と仮定する。すなわち、時刻t=0秒でのオクルージョン領域OC(0)と時刻t=0.1秒での、オクルージョン表面OC(0.1)との関係は、モデルMD(0)とモデルMD'(0.1)との関係と同様に、「OC(0.1)=T_[0→0.1]・OC(0)」であるものとする。（換言すれば、時刻t=0秒で生成した３Ｄモデルや求まったオクル―ジョン表面は、時刻t=0.1でも同じモデル空間内の位置に静止しており、モデル空間内の仮想視点のみが動いているものと仮定する。） Here, the occlusion surface OC (0) is stationary in the world coordinates on the shooting side (thus, even in the virtual space for drawing) as illustrated as a table in the schematic example of the physical object POB shown in FIG. (Similarly stationary). That is, the relationship between the occlusion region OC (0) at time t = 0 seconds and the occlusion surface OC (0.1) at time t = 0.1 seconds is the relationship between model MD (0) and model MD'(0.1). Similarly, it is assumed that "OC (0.1) = T _{[0 → 0.1]} · OC (0)". (In other words, the 3D model generated at time t = 0 seconds and the obtained occlusion surface are stationary at the same position in the model space even at time t = 0.1, and only the virtual viewpoint in the model space moves. It is assumed that

当該仮定により、補間モデルMD'(0.1)における可視部分MD'(0.1)_[可視]及び遮蔽部分MD'(0.1)_[遮蔽]の区別は、時刻t=0.1秒での仮想カメラの位置から見て、当該静止しているオクルージョン表面OC(0.1)による遮蔽の有無により判断することができる。すなわち、補間モデルMD'(0.1)を構成する各ポリゴンについて、時刻t=0.1秒の仮想カメラ位置に投影を行う際に、オクルージョン表面OC(0.1)を通過すれば遮蔽部分MD'(0.1)_[遮蔽]に属し、通過しなければ可視部分MD'(0.1)_[可視]に属するものとして判断することができる。この判断には、図７のステップS13と同様の手法を用いてよく、時刻t=0.1秒での仮想カメラの位置から見た深度情報（時刻t=0秒の位置に静止しているオクルージョン表面）に関して、モデルMD'(0.1)がこの深度情報によって遮蔽されるか否かの区別により、遮蔽されていれば遮蔽部分MD'(0.1)_[遮蔽]とし、遮蔽されていなければ可視部分MD'(0.1)_[可視]として判断できる。 _{Based on this assumption, the distinction between the visible part MD'(0.1) [visible]} and the shielding part MD'(0.1) _{[shielding] in} the interpolation model MD'(0.1) is seen from the position of the virtual camera at time t = 0.1 seconds. Therefore, it can be judged by the presence or absence of shielding by the stationary occlusion surface OC (0.1). That is, when projecting each polygon constituting the interpolation model MD'(0.1) to the virtual camera position at time t = 0.1 seconds, if it passes through the occlusion surface OC (0.1), the shielding portion MD'(0.1) _{[ It belongs to [shield]} , and if it does not pass, it can be judged to belong to the visible part MD'(0.1) _[visible]. The same method as in step S13 of FIG. 7 may be used for this determination, and the depth information seen from the position of the virtual camera at time t = 0.1 seconds (occlusion surface stationary at the position at time t = 0 seconds). ), Depending on whether the model MD'(0.1) is shielded by this depth information, the shielded part MD'(0.1) _[shielded] if it is shielded, and the visible part MD'if it is not shielded. (0.1) Can be judged as _[visible].

別の手法として、デプス画像が時刻t=0.1,0.2,…等でもリアルタイムに得られている場合であれば、上記のような３Ｄモデルとしてのオクルージョン表面OC(0.1)等を用いることなく、補間モデルMD'(0.1)を構成する各ポリゴンについて時刻t=0.1のデプス画像を参照して図７のステップS13と同様の処理を適用し、可視部分MD'(0.1)_[可視]及び遮蔽部分MD'(0.1)_[遮蔽]の区別を得るようにしてもよい。 As another method, if the depth image is obtained in real time even at time t = 0.1, 0.2, ..., etc., interpolation is performed without using the occlusion surface OC (0.1) as a 3D model as described above. For each polygon constituting the model MD'(0.1), the same processing as in step S13 of FIG. 7 is applied with reference to the depth image at time t = 0.1, and the visible part MD'(0.1) _[visible] and the shielding part MD are applied. '(0.1) You may want to get the distinction of _[shielding].

当該求めた可視部分MD'(0.1)_[可視]に対する描画部14による描画は、第一実施形態と同様でよい。図１２は、以上の説明例に対応する時刻t=0秒の描画情報G(0)と時刻t=0.1秒での描画情報G(0.1)との模式例を示す図である。描画情報G(0)及びG(0.1)はそれぞれ、グレー地で示す可視部分MD(0)_[可視]及びMD'(0.1)_[可視]を描画したものである。その他の白地で示す遮蔽部分やオクル―ジョン表面は描画されないが、以上の説明例の模式例として示されている。 The drawing by the drawing unit 14 with respect to the obtained visible portion MD'(0.1) _[visible] may be the same as in the first embodiment. FIG. 12 is a diagram showing a schematic example of drawing information G (0) at time t = 0 seconds and drawing information G (0.1) at time t = 0.1 seconds corresponding to the above description example. The drawing information G (0) and G (0.1) are the drawing of the visible part MD (0) _[visible] and MD'(0.1) _[visible] shown in gray, respectively. Other shielding parts and occlusion surfaces shown on a white background are not drawn, but are shown as schematic examples of the above explanatory examples.

図１２の例にて、撮影環境PE側で描画されるオブジェクトOB及び視聴環境WE側でオクル―ジョン領域を発生させる物理オブジェクトPOBは図２の模式例と同様にそれぞれ人物及びテーブルである。時刻t=0秒の描画情報G(0)では、視聴環境WE側のユーザがこれらを正面から見た状態として描画されており、時刻t=0.1秒の描画情報G(0.1)では、これらをやや上方側から見込んだ状態として描画されている。（すなわち、仮想カメラは正面からやや上方側へと移動している。） In the example of FIG. 12, the object OB drawn on the shooting environment PE side and the physical object POB that generates the occlusion region on the viewing environment WE side are a person and a table, respectively, as in the schematic example of FIG. In the drawing information G (0) at time t = 0 seconds, the user on the viewing environment WE side is drawn as if they were viewed from the front, and in the drawing information G (0.1) at time t = 0.1 seconds, these are drawn. It is drawn as if it was seen from slightly above. (That is, the virtual camera is moving slightly upward from the front.)

＜第四実施形態＞
第四実施形態は第三実施形態の変形例である。第三実施形態では、オクルージョン領域内の全体を対象として、低密度で視体積交差法を適用して３Ｄモデルを構成するボクセル集合（第２ボクセル集合）を求めたが、第四実施形態は、オクルージョン領域内の一部分のみを対象として、低密度で視体積交差法を適用して３Ｄモデルを構成するボクセル集合（第３ボクセル集合とする）を求めるものである。 <Fourth Embodiment>
The fourth embodiment is a modification of the third embodiment. In the third embodiment, a voxel set (second voxel set) constituting a 3D model was obtained by applying the visual volume crossing method at a low density for the entire occlusion region, but in the fourth embodiment, the voxel set (second voxel set) is obtained. A voxel set (referred to as a third voxel set) that constitutes a 3D model is obtained by applying the visual volume crossing method at a low density for only a part of the occlusion region.

すなわち、第三実施形態の第２ボクセル集合と第四実施形態の第３ボクセル集合との関係は「第２ボクセル集合⊃第３ボクセル集合」であり、低密度で視体積交差法する対象となるオクルージョン領域の範囲が狭まる分だけ、第四実施形態では計算の高速化が期待できる。 That is, the relationship between the second voxel set of the third embodiment and the third voxel set of the fourth embodiment is "the second voxel set ⊃ the third voxel set", and is subject to the visual volume crossing method at low density. As the range of the occlusion area is narrowed, the calculation speed can be expected to be increased in the fourth embodiment.

第四実施形態においてオクルージョン領域の全体の中から、低密度で視体積交差法を適用する一部分のオクルージョン領域を決定する手法としては、オクルージョン領域ではなかったと判定された領域に閾値判定で近いと判定される領域（「閾値近接領域」と呼ぶ）として決定してもよい。この閾値近接領域を決定するための閾値判定は、位置姿勢取得部211で得られるユーザ視点に対応する仮想視点の位置姿勢の時間変動が大きいほど、閾値近接領域が広くなるように、閾値判定を緩和するようにしてもよい。 In the fourth embodiment, as a method of determining a part of the occlusion region to which the visual volume crossing method is applied at a low density from the entire occlusion region, it is determined by the threshold value that it is close to the region determined not to be the occlusion region. It may be determined as a region to be (referred to as a "threshold proximity region"). The threshold value determination for determining the threshold value proximity region is performed so that the larger the time variation of the position / orientation of the virtual viewpoint corresponding to the user viewpoint obtained by the position / attitude acquisition unit 211, the wider the threshold value proximity area. It may be relaxed.

この閾値近接領域はいわば、現実にオクルージョンがない領域にバッファ領域を加えることで、オクルージョンがないものとして扱う領域（現実にはオクルージョン領域である領域も含む）として拡張しているものとしての役割を果たすものである。 This threshold proximity area serves as an extension as an area treated as having no occlusion (including an area that is actually an occlusion area) by adding a buffer area to an area that does not actually have occlusion. It will be fulfilled.

＜第五実施形態＞
第五実施形態は、第三実施形態の変形例として、上記の閾値判定の緩和の考え方と同様にして、位置姿勢取得部211で得られるユーザ視点に対応する仮想視点の位置姿勢の時間変動（仮想視点の動き）が閾値判定で大きいと判定される場合に、第三実施形態を適用するようにし、当該判定されない場合には第一実施形態を適用する場合分けを行うものである。 <Fifth Embodiment>
In the fifth embodiment, as a modification of the third embodiment, the time variation of the position / posture of the virtual viewpoint corresponding to the user viewpoint obtained by the position / posture acquisition unit 211 (similar to the above-mentioned concept of easing the threshold value determination). When it is determined that the movement of the virtual viewpoint) is large in the threshold value determination, the third embodiment is applied, and when the determination is not made, the first embodiment is applied.

すなわち、ユーザ視点に対応する仮想視点の動きが大きいと判定される場合には、第三実施形態の適用により、オクルージョン領域外において高密度で３Ｄモデルを求めることに加えて、オクルージョン領域内においても低密度で３Ｄモデルを求めるようにし、ユーザ視点に対応する仮想視点の動きが大きいとは判定されない場合には、第一実施形態の適用により、オクルージョン領域外において高密度で３Ｄモデルを求めることのみを実施する。 That is, when it is determined that the movement of the virtual viewpoint corresponding to the user viewpoint is large, in addition to obtaining a high-density 3D model outside the occlusion region by applying the third embodiment, it is also possible to obtain the 3D model inside the occlusion region. When the 3D model is obtained at a low density and it is not determined that the movement of the virtual viewpoint corresponding to the user viewpoint is large, only the 3D model is obtained at a high density outside the occlusion region by applying the first embodiment. To carry out.

＜第六実施形態＞
第六実施形態は、第四実施形態の変形例として、上記の閾値判定の緩和の考え方と同様にして、位置姿勢取得部211で得られるユーザ視点に対応する仮想視点の位置姿勢の時間変動（仮想視点の動き）が閾値判定で大きいと判定される場合に、第四実施形態を適用するようにし、当該判定されない場合には第一実施形態を適用する場合分けを行うものである。 <Sixth Embodiment>
As a modification of the fourth embodiment, the sixth embodiment is the same as the above-mentioned concept of easing the threshold value determination, and the time variation of the position / posture of the virtual viewpoint corresponding to the user viewpoint obtained by the position / posture acquisition unit 211 ( When it is determined that the movement of the virtual viewpoint) is large in the threshold value determination, the fourth embodiment is applied, and when the determination is not made, the first embodiment is applied.

すなわち、ユーザ視点に対応する仮想視点の動きが大きいと判定される場合には、第四実施形態の適用により、オクルージョン領域外において高密度で３Ｄモデルを求めることに加えて、オクルージョン領域内の一部領域においても低密度で３Ｄモデルを求めるようにし、ユーザ視点に対応する仮想視点の動きが大きいとは判定されない場合には、第一実施形態の適用により、オクルージョン領域外において高密度で３Ｄモデルを求めることのみを実施する。 That is, when it is determined that the movement of the virtual viewpoint corresponding to the user viewpoint is large, by applying the fourth embodiment, in addition to obtaining a 3D model with high density outside the occlusion area, one in the occlusion area. The 3D model is obtained at a low density even in the partial area, and when it is not determined that the movement of the virtual viewpoint corresponding to the user viewpoint is large, the application of the first embodiment is performed to obtain a high density 3D model outside the occlusion area. Only ask for.

＜ハードウェア構成＞
図１３は、一般的なコンピュータ装置70におけるハードウェア構成の例を示す図である。画像処理装置10及び端末装置20の各々は、このような構成を有する１台以上のコンピュータ装置70として実現可能である。なお、２台以上のコンピュータ装置70で画像処理装置10又は端末装置20の各々を実現する場合、ネットワーク経由で処理に必要な情報の送受を行うようにしてよい。コンピュータ装置70は、所定命令を実行するCPU（中央演算装置）71、CPU71の実行命令の一部又は全部をCPU71に代わって又はCPU71と連携して実行する専用プロセッサとしてのGPU（グラフィックス演算装置）72、CPU71にワークエリアを提供する主記憶装置としてのRAM73、補助記憶装置としてのROM74、GPU72用のメモリ空間を提供するGPUメモリ78、通信インタフェース75、ディスプレイ76、マウス、キーボード、タッチパネル等によりユーザ入力を受け付ける入力インタフェース77センサ78、及びカメラ79と、これらの間でデータを授受するためのバスBSと、を備える。 <Hardware configuration>
FIG. 13 is a diagram showing an example of a hardware configuration in a general computer device 70. Each of the image processing device 10 and the terminal device 20 can be realized as one or more computer devices 70 having such a configuration. When each of the image processing device 10 or the terminal device 20 is realized by two or more computer devices 70, information necessary for processing may be transmitted and received via a network. The computer device 70 is a CPU (central processing unit) 71 that executes a predetermined instruction, and a GPU (graphics calculation device) as a dedicated processor that executes a part or all of the execution instructions of the CPU 71 on behalf of the CPU 71 or in cooperation with the CPU 71. ) 72, RAM73 as the main storage device that provides the work area to the CPU71, ROM74 as the auxiliary storage device, GPU memory 78 that provides the memory space for the GPU72, communication interface 75, display 76, mouse, keyboard, touch panel, etc. It includes an input interface 77 sensor 78 that accepts user input, a camera 79, and a bus BS for exchanging data between them.

画像処理装置10及び端末装置20の各部は、各部の機能に対応する所定のプログラムをROM74から読み込んで実行するCPU71及び／又はGPU72によって実現することができる。なお、CPU71及びGPU72は共に、演算装置（プロセッサ）の一種である。ここで、表示関連の処理が行われる場合にはさらに、ディスプレイ76が連動して動作し、データ送受信に関する通信関連の処理が行われる場合にはさらに通信インタフェース75が連動して動作する。端末装置20で環境情報を専用センサで取得する際の１種類以上の専用センサとして、センサ78を用いることができる。端末装置20の表示部23は、ディスプレイ76（光学シースルー方式又はビデオシースルー方式）で実現できる。撮影部11及び表示側撮影部22はカメラ79で実現できる。 Each part of the image processing device 10 and the terminal device 20 can be realized by a CPU 71 and / or a GPU 72 that reads and executes a predetermined program corresponding to the function of each part from the ROM 74. Both CPU71 and GPU72 are a type of arithmetic unit (processor). Here, when the display-related processing is performed, the display 76 further operates in conjunction with the display 76, and when the communication-related processing related to data transmission / reception is performed, the communication interface 75 further operates in conjunction with the display. The sensor 78 can be used as one or more types of dedicated sensors when the terminal device 20 acquires environmental information with the dedicated sensor. The display unit 23 of the terminal device 20 can be realized by the display 76 (optical see-through method or video see-through method). The photographing unit 11 and the display side photographing unit 22 can be realized by the camera 79.

100…画像処理システム
10…画像処理装置、11…撮影部、12…抽出部、13…生成部、14…描画部
20…端末装置、21…取得部、22…表示側撮影部、23…表示部 100 ... Image processing system
10 ... Image processing device, 11 ... Shooting unit, 12 ... Extraction unit, 13 ... Generation unit, 14 ... Drawing unit
20 ... Terminal device, 21 ... Acquisition unit, 22 ... Display side shooting unit, 23 ... Display unit

Claims

An extraction unit that extracts the area of the object being photographed as a mask image from the image of each viewpoint of the multi-viewpoint image,
An image including a generation unit that applies the visual volume crossing method to the mask image and generates a 3D model of the object by determining whether or not each voxel of a predetermined voxel set belongs to the 3D model. It ’s a processing device,
Before the application, the generation unit arranges the depth information acquired from the user's viewpoint in the voxel space with reference to the virtual camera viewpoint, determines the spatial position given by the depth information, and then determines the spatial position. It is determined whether each voxel is closer to or farther from the virtual camera viewpoint than the spatial position given by the depth information, and only the voxels determined to be closer are subject to the visual volume crossing method. An image processing device for determining whether or not it belongs to the 3D model.

The image according to claim 1, further comprising a drawing unit that projects the generated 3D model onto an image plane with reference to the virtual camera viewpoint and draws the image using the texture of the multi-view image. Processing equipment.

A light source at a predetermined position is set in the voxel space, and the light source is set.
The generation unit determines in a predetermined order whether each voxel in a predetermined voxel set is on the near side or the far side, and then determines whether the voxel is on the near side and the 3D model for the voxel determined to be on the near side. Judge whether it belongs to or not,
With respect to the voxels determined to be on the distant side, it is further determined whether or not the voxels already determined to belong to the 3D model by the determination in the predetermined order are shielded from the light source.
The image processing apparatus according to claim 1, wherein when it is determined that the voxel is not shielded, it is determined whether or not the voxel belongs to the 3D model as an application target of the visual volume crossing method.

Of the generated 3D model, only the part corresponding to the voxel determined to be on the near side is projected onto the image plane based on the virtual camera viewpoint and drawn using the texture of the multi-view image. And then
The image processing apparatus according to claim 3, further comprising a drawing unit that draws a shadow of the 3D model by the light source on the image plane using the entire generated 3D model.

The generation unit applies the visual volume crossing method at the first voxel density to the voxels determined to be on the near side, and further, from the first voxel density to the voxels determined to be on the far side. The image processing apparatus according to claim 1, wherein the visual volume crossing method is applied at a low second voxel density.

The generation unit has a second voxel density lower than that of the first voxel density only for voxels determined to be on the distant side and which are determined to be close to the voxel determined to be on the near side. The image processing apparatus according to claim 5, wherein the visual volume crossing method is applied.

The generator generates the 3D model by applying the visual volume crossing method at the first voxel density and the second voxel density at the first time.
The generated 3D model is further provided with a drawing unit that projects the generated 3D model onto an image plane with reference to the virtual camera viewpoint and draws it using the texture of the multi-view image.
At the first time, the drawing unit draws only the portion of the 3D model having the first voxel density.
At the second time after the first time, the depth information is arranged in the voxel space based on the virtual camera viewpoint at the second time, and the spatial position given by the depth information is determined. It is determined whether each voxel is closer to or farther from the virtual camera viewpoint than the spatial position given by the depth information, and only the part corresponding to the voxel determined to be near is the part of the 3D model. The image processing apparatus according to claim 5 or 6, wherein drawing is performed.

The time change of the position and posture of the user's viewpoint has been acquired.
When it is determined that the time change is larger than the threshold value determination,
The generation unit applies the visual volume crossing method at the first voxel density to the voxels determined to be on the near side, and further applies the visual volume crossing method to all or a part of the voxels determined to be on the distant side. Applying the visual volume crossing method at a second voxel density lower than one voxel density,
If it is not determined that the time change is greater than the threshold determination,
The image processing according to claim 1, wherein the generation unit determines whether or not only the voxels determined to be on the near side belong to the 3D model as an application target of the visual volume crossing method. apparatus.

An extraction stage that extracts the area of the object being photographed as a mask image from the image of each viewpoint of the multi-viewpoint image,
An image including a generation stage in which a visual volume crossing method is applied to the mask image to generate a 3D model of the object by determining whether or not each voxel of a predetermined voxel set belongs to the 3D model. It ’s a processing method,
In the generation stage, the depth information acquired from the user's viewpoint is arranged in the voxel space with reference to the virtual camera viewpoint in advance before the application, and the spatial position given by the depth information is determined. It is determined whether each voxel is closer to or farther from the virtual camera viewpoint than the spatial position given by the depth information, and only the voxels determined to be closer are subject to the visual volume crossing method. An image processing method characterized by determining whether or not it belongs to the 3D model.

An extraction unit that extracts the area of the object being photographed as a mask image from the image of each viewpoint of the multi-viewpoint image,
An image including a generation unit that applies the visual volume crossing method to the mask image and generates a 3D model of the object by determining whether or not each voxel of a predetermined voxel set belongs to the 3D model. It ’s a processing device,
Before the application, the generation unit arranges the depth information acquired from the user's viewpoint in the voxel space with reference to the virtual camera viewpoint, determines the spatial position to be given by the depth information, and then determines the spatial position. It is determined whether each voxel is closer to or farther from the virtual camera viewpoint than the spatial position given by the depth information, and only the voxels determined to be closer are subject to the visual volume crossing method. An image processing program characterized in that a computer functions as an image processing device for determining whether or not it belongs to the 3D model.