JP2021068272A

JP2021068272A - Image processing system, image processing method and program

Info

Publication number: JP2021068272A
Application number: JP2019194285A
Authority: JP
Inventors: 次郎中島; Jiro Nakajima; 泰輔稲村; Taisuke Inamura; 将亮伊藤; Shosuke Ito
Original assignee: Toppan Printing Co Ltd
Current assignee: Toppan Inc
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2021-04-30
Anticipated expiration: 2039-10-25

Abstract

To provide an image processing system which estimates information about a light source in a space of interest from a photographed image of a real space of interest and a three-dimensional shape model of the space of interest without preparing an object having a known shape and preparing a reference image by capturing an image of the object under a known light source and which correctly reproduces the incidence of light at any point of the three-dimensional shape model.SOLUTION: An image processing system comprises: a shape information estimation unit which estimates space shape information including at least a three-dimensional shape of a space of interest from a single or a plurality of photographed images obtained by photographing the space of interest; and a light source information estimation unit which estimates light source information including at least the position and intensity of a light source in the space of interest from the photographed images and the space shape information.SELECTED DRAWING: Figure 1

Description

本発明は、画像処理システム、画像処理方法及びプログラムに関する。 The present invention relates to an image processing system, an image processing method and a program.

建物のリフォームや備品の追加などを行う際、実際にリフォームする前、あるいは備品を配置する前に、予め完了後の見えを確認したい場合がある。
例えば、備品として家具、壁、設備を追加したり、追加した家具の配置する位置を検討したりする場合、備品を配置した際にどのような印象となるか、すなわち配置している備品が部屋の環境に調和するか否かを、シミュレーション結果の画像により視覚的に検討したいという要望がある。
このとき、部屋の光源が正確に仮想空間で反映されない場合、シミュレーション時における見えから得た印象と、実際に行ったレイアウト変更における見えから得られる印象とが、大きく異なってしまう場合がある。 When remodeling a building or adding equipment, it may be desirable to check the appearance after completion in advance before actually remodeling or arranging equipment.
For example, when adding furniture, walls, and equipment as equipment, or considering the position of the added furniture, what kind of impression will be given when the equipment is placed, that is, the equipment that is placed is the room. There is a request to visually examine whether or not it is in harmony with the environment of the furniture by using the image of the simulation result.
At this time, if the light source of the room is not accurately reflected in the virtual space, the impression obtained from the appearance in the simulation and the impression obtained from the appearance in the actual layout change may be significantly different.

このため、リフォーム対象の空間である対象空間を撮像した撮像画像から、この対象空間の三次元形状を仮想空間としてモデル化し、備品をこの仮想空間に配置して、仮想空間における室内などにおける備品の見えを、仮想空間を用いてシミュレーションすることが行われる。
このとき、仮想空間における物品の見えをより対象空間に近づけるため、対象空間としての室内における光源の光源情報を正確に推定して、この光源情報に基づいて仮想空間における仮想光源を生成することが行われている（例えば、特許文献１参照）。 For this reason, the three-dimensional shape of this target space is modeled as a virtual space from the captured image of the target space, which is the space to be reformed, and the equipment is arranged in this virtual space to display the equipment in the room or the like in the virtual space. The appearance is simulated using virtual space.
At this time, in order to make the appearance of the article in the virtual space closer to the target space, it is possible to accurately estimate the light source information of the light source in the room as the target space and generate a virtual light source in the virtual space based on this light source information. It has been done (see, for example, Patent Document 1).

特開２０１８−０３６８８４号公報Japanese Unexamined Patent Publication No. 2018-036884

しかしながら、引用文献１においては、対象空間の撮像画像を撮像する際、予め既知光源下で撮像した基準画像が用意されている、形状が既知である物体を対象空間に対して配置し、撮像画像に撮像されている物体の画像と、基準画像とを対比することに光源情報を推定している。
このため、形状が既知の物体を準備し、かつこの物体の画像を既知光源下で撮像して基準画像を準備する必要があり、対象空間の撮像を行うまでに手間が掛かってしまう。 However, in Cited Document 1, when an image captured in the target space is captured, an object having a known shape and for which a reference image captured under a known light source is prepared in advance is arranged with respect to the target space, and the captured image is captured. The light source information is estimated by comparing the image of the object captured in the image with the reference image.
Therefore, it is necessary to prepare an object having a known shape, and to prepare an image of this object under a known light source to prepare a reference image, which takes time and effort to image the target space.

また、この方法においては、光源の放射する光の情報のみを推定することしかできず、放射光が放射される空間の三次元形状が不明である。
このため、光源から放射される光が遮蔽されたり、反射されたりするかが不明であり、対象空間に対応した見えを、高い精度で仮想空間においてシミュレーションすることができない。 Further, in this method, only the information of the light emitted from the light source can be estimated, and the three-dimensional shape of the space in which the radiated light is emitted is unknown.
Therefore, it is unclear whether the light emitted from the light source is shielded or reflected, and it is not possible to simulate the appearance corresponding to the target space in the virtual space with high accuracy.

本発明は、このような状況に鑑みてなされたもので、形状が既知の物体を準備し、かつこの物体の画像を既知光源下で撮像して基準画像を準備する必要がなく、現実の対象空間の撮像画像と、この対象空間の三次元形状モデルとから、対象空間における少なくとも光源の位置及び強度の情報を推定する画像処理システム、画像処理方法及びプログラムを提供する。 The present invention has been made in view of such a situation, and it is not necessary to prepare an object having a known shape and to take an image of this object under a known light source to prepare a reference image, which is an actual object. Provided are an image processing system, an image processing method, and a program for estimating at least information on the position and intensity of a light source in the target space from an captured image of the space and a three-dimensional shape model of the target space.

本発明の画像処理システムは、対象空間が撮像された単一あるいは複数の撮像画像から、前記対象空間の少なくとも三次元形状を含む空間形状情報を推定する形状情報推定部と、前記撮像画像及び前記空間形状情報から、前記対象空間における光源の少なくとも位置及び強度を含む光源情報を推定する光源情報推定部とを備えることを特徴とする。 The image processing system of the present invention includes a shape information estimation unit that estimates spatial shape information including at least a three-dimensional shape of the target space from a single or a plurality of captured images of the target space, the captured image, and the above-mentioned image processing system. It is characterized by including a light source information estimation unit that estimates light source information including at least the position and intensity of the light source in the target space from the space shape information.

本発明の画像処理システムは、前記光源情報推定部が、前記撮像画像及び前記空間形状情報から、前記対象空間における三次元形状の反射率情報を推定し、当該反射率情報を用いて前記光源情報を推定することを特徴とする。 In the image processing system of the present invention, the light source information estimation unit estimates the reflectance information of the three-dimensional shape in the target space from the captured image and the space shape information, and uses the reflectance information to estimate the light source information. Is characterized by estimating.

本発明の画像処理システムは、前記光源情報推定部が、前記撮像画像の画素値と、前記反射率情報、前記光源情報及び前記空間形状情報の各々との関係を示す所定の式から、前記推定した前記反射率情報を用いて前記光源情報を推定することを特徴とする。 In the image processing system of the present invention, the light source information estimation unit estimates the pixel value of the captured image from a predetermined formula showing the relationship between the reflectance information, the light source information, and the spatial shape information. It is characterized in that the light source information is estimated using the reflectance information obtained.

本発明の画像処理システムは、前記撮像画像、前記空間形状情報及び前記光源情報から、前記撮像画像の所定の位置に対して、所定のオブジェクトの形状モデルを配置した仮想空間の観察画像を生成する画像合成部をさらに備えることを特徴とする。 The image processing system of the present invention generates an observation image of a virtual space in which a shape model of a predetermined object is arranged at a predetermined position of the captured image from the captured image, the spatial shape information, and the light source information. It is characterized by further including an image compositing unit.

本発明の画像処理システムは、前記撮像画像、前記空間形状情報及び前記光源情報から、前記撮像画像の所定の位置に対して、前記所定のオブジェクトの形状モデルを配置し、当該形状モデルを配置した影響が反映された仮想空間の観察画像を生成する画像合成部をさらに備えることを特徴とする。 In the image processing system of the present invention, the shape model of the predetermined object is arranged at a predetermined position of the captured image from the captured image, the spatial shape information, and the light source information, and the shape model is arranged. It is characterized by further including an image compositing unit that generates an observation image of a virtual space in which the influence is reflected.

本発明の画像処理システムは、前記光源情報推定部が、前記三次元形状の反射率情報と前記空間形状情報と前記光源情報から、ＩＢＬ（image based lighting）情報を生成することを特徴とする。 The image processing system of the present invention is characterized in that the light source information estimation unit generates IBL (image based lighting) information from the reflectance information of the three-dimensional shape, the spatial shape information, and the light source information.

本発明の画像処理システムは、前記形状情報推定部が、前記対象空間における構造部とオブジェクト部との三次元形状の各々を別々に推定し、前記構造部、前記オブジェクト部それぞれの三次元形状を合成して、前記対象空間全体の三次元形状モデルを生成することを特徴とする。 In the image processing system of the present invention, the shape information estimation unit separately estimates each of the three-dimensional shapes of the structural part and the object part in the target space, and obtains the three-dimensional shape of each of the structural part and the object part. It is characterized in that a three-dimensional shape model of the entire target space is generated by synthesizing.

本発明の画像処理システムは、前記形状情報推定部が、前記対象空間の三次元形状と、当該対象空間における物体認識情報を含む前記光源の形状情報を推定することを特徴とする。 The image processing system of the present invention is characterized in that the shape information estimation unit estimates the three-dimensional shape of the target space and the shape information of the light source including the object recognition information in the target space.

本発明の画像処理システムは、前記撮像画像を撮像した際の撮像条件を取得する撮像条件取得部をさらに備えることを特徴とする。 The image processing system of the present invention is further provided with an imaging condition acquisition unit that acquires imaging conditions when the captured image is captured.

本発明の画像処理方法は、形状情報推定部が、対象空間が撮像された単一あるいは複数の撮像画像から、前記対象空間の少なくとも三次元形状を含む空間形状情報を推定する形状情報推定過程と、光源情報推定部が、前記撮像画像及び前記空間形状情報から、前記対象空間における光源の少なくとも位置及び強度を含む光源情報を推定する光源情報推定過程とを含むことを特徴とする。 The image processing method of the present invention includes a shape information estimation process in which the shape information estimation unit estimates spatial shape information including at least a three-dimensional shape of the target space from a single or a plurality of captured images in which the target space is captured. The light source information estimation unit includes a light source information estimation process for estimating light source information including at least the position and intensity of the light source in the target space from the captured image and the space shape information.

本発明のプログラムは、コンピュータを、対象空間が撮像された単一あるいは複数の撮像画像から、前記対象空間の少なくとも三次元形状を含む空間形状情報を推定する形状情報推定手段、前記撮像画像及び前記空間形状情報から、前記対象空間における光源の少なくとも位置及び強度を含む光源情報を推定する光源情報推定手段として機能させるプログラムである。 The program of the present invention is a shape information estimation means for estimating spatial shape information including at least a three-dimensional shape of the target space from a single or a plurality of captured images in which the target space is captured, the captured image, and the above. This is a program that functions as a light source information estimation means for estimating light source information including at least the position and intensity of a light source in the target space from space shape information.

以上説明したように、本発明によれば、形状が既知の物体を準備し、かつこの物体の画像を既知光源下で撮像して基準画像を準備する必要がなく、現実の対象空間の撮像画像と、この対象空間の三次元形状モデルとから、対象空間における光源の情報を推定し、三次元形状モデルの任意の点における光の入射を正確に再現する画像処理システム、画像処理方法及びプログラムを提供することが可能となる。 As described above, according to the present invention, it is not necessary to prepare an object having a known shape and to capture an image of this object under a known light source to prepare a reference image, and an image captured in an actual target space. An image processing system, an image processing method, and a program that estimate the information of the light source in the target space from the three-dimensional shape model of the target space and accurately reproduce the incident of light at an arbitrary point of the three-dimensional shape model. It will be possible to provide.

本発明の一実施形態による画像処理システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the image processing system by one Embodiment of this invention. 撮像画像から部屋における天井、壁及び床の消失点部及び境界部を抽出し、デプスマップを求める処理を説明する図である。It is a figure explaining the process of extracting the vanishing point portion and the boundary portion of the ceiling, the wall and the floor in a room from the captured image, and obtaining a depth map. セマンティックセグメンテーションを行う機械学習モデルにより対象空間のオブジェクトの領域分割を示す図である。It is a figure which shows the area division of the object of the object space by the machine learning model which performs semantic segmentation. 機械学習モデルで生成したデプスマップとマンハッタンワールド仮説により生成したデプスマップとの合成を説明する図である。It is a figure explaining the composition of the depth map generated by the machine learning model and the depth map generated by the Manhattan world hypothesis. マンハッタンワールド仮説により生成したデプスマップから求めた部屋構造部（天井、壁及び床）の三次元形状を示す図である。It is a figure which shows the three-dimensional shape of the room structure part (ceiling, wall and floor) obtained from the depth map generated by the Manhattan world hypothesis. 三次元形状推定方法（１）における機械学習モデルで求めたデプスマップから、セマンティックセグメンテーションにより分割した領域により求めたオブジェクト部のデプスマップから求めた三次元形状を示す図である。It is a figure which shows the 3D shape obtained from the depth map of the object part obtained by the area divided by the semantic segmentation from the depth map obtained by the machine learning model in the 3D shape estimation method (1). 光源情報推定方法（１）により抽出された光源の領域（光源領域）を、光源情報推定部１４がグレースケール画像において示す図である。It is a figure which the light source information estimation unit 14 shows in the grayscale image the area (light source area) of the light source extracted by the light source information estimation method (1). 光源情報推定部１４により求められた空間形状情報における反射率情報を示す反射率成分画像である。It is a reflectance component image which shows the reflectance information in the spatial shape information obtained by a light source information estimation unit 14. 光源情報推定部１４が推定した光源情報を空間形状情報における三次元形状に配置した例を示している。An example is shown in which the light source information estimated by the light source information estimation unit 14 is arranged in a three-dimensional shape in the spatial shape information. 画像合成部により生成された三次元の仮想空間を撮像画像の撮像方向から見た画像を示す図である。It is a figure which shows the image which looked at the 3D virtual space generated by the image synthesis part from the image-taking direction of the captured image. 仮想空間に対してＣＧオブジェクトの追加の配置を行う制御について説明する図である。It is a figure explaining the control which performs the additional arrangement of a CG object with respect to a virtual space. 仮想空間に対してＣＧオブジェクトの追加の配置を行う制御について説明する図である。It is a figure explaining the control which performs the additional arrangement of a CG object with respect to a virtual space. 本実施形態による画像処理システムにおいて、対象空間の選択された撮像画像から生成した仮想空間と、ＣＧオブジェクトとを合成する処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the process which synthesizes the virtual space generated from the selected captured image of the target space, and a CG object in the image processing system by this embodiment.

以下、図１における画像処理システムの構成例について、図面を参照して説明する。
図１は、本発明の一実施形態による画像処理システムの構成例を示すブロック図である。図１において、画像処理システム１０は、データ入出力部１１、撮像条件取得部１２、形状情報推定部１３、光源情報推定部１４、画像合成部１５、表示制御部１６、表示部１７、撮像画像記憶部１８、空間情報記憶部１９及び合成画像記憶部２０の各々を備えている。
ここで、画像処理システム１０は、パーソナルコンピュータ、タブレット端末、スマートフォンなどに、以下に説明する各機能部より画像処理を行なうアプリケーションをインストールすることにより構成される。 Hereinafter, a configuration example of the image processing system in FIG. 1 will be described with reference to the drawings.
FIG. 1 is a block diagram showing a configuration example of an image processing system according to an embodiment of the present invention. In FIG. 1, the image processing system 10 includes a data input / output unit 11, an imaging condition acquisition unit 12, a shape information estimation unit 13, a light source information estimation unit 14, an image synthesis unit 15, a display control unit 16, a display unit 17, and an captured image. Each of a storage unit 18, a spatial information storage unit 19, and a composite image storage unit 20 is provided.
Here, the image processing system 10 is configured by installing an application that performs image processing from each functional unit described below on a personal computer, a tablet terminal, a smartphone, or the like.

データ入出力部１１は、シミュレーションを行う対象空間が撮像された撮像画像を外部装置から入力し、撮像画像識別情報を付与して、撮像画像記憶部１８に対して書き込んで記憶させる。
撮像条件取得部１２は、対象空間の撮像画像を撮像した際の撮像条件を取得し、それぞれの撮像画像の撮像画像識別情報に対応させ、撮像画像記憶部１８に対して書き込んで記憶させる。この撮像条件は、撮像装置のカメラパラメータ、センササイズ、撮像装置から対象空間の三次元形状までの実寸サイズの距離、撮像画像の画素数、撮像した撮像位置の位置関係、また動画であればフレームレートなどである。また、撮像条件において撮像画像のＥｘｉｆ（exchangeable image file format ）情報から取得可能な情報は、ユーザが入力する必要はない。実寸サイズの距離は、単一画像の場合には撮像する対象空間に対して、大きさの判っているマーカを置いて、そのマーカの画像を元に距離の算出を行う。 The data input / output unit 11 inputs the captured image in which the target space to be simulated is captured from an external device, adds the captured image identification information, and writes and stores the captured image in the captured image storage unit 18.
The imaging condition acquisition unit 12 acquires the imaging conditions when the captured image of the target space is captured, corresponds to the captured image identification information of each captured image, and writes and stores the captured image storage unit 18. The imaging conditions include the camera parameters of the imaging device, the sensor size, the actual size distance from the imaging device to the three-dimensional shape of the target space, the number of pixels of the captured image, the positional relationship of the captured imaging position, and the frame if it is a moving image. Rate etc. Further, it is not necessary for the user to input the information that can be acquired from the Exif (exchangeable image file format) information of the captured image under the imaging conditions. In the case of a single image, the actual size distance is calculated by placing a marker whose size is known in the target space to be imaged and based on the image of the marker.

形状情報推定部１３は、撮像画像記憶部１８に記憶されている撮像画像から、撮像した対象空間の三次元形状を推定して、三次元形状モデル（仮想空間）を含む空間形状情報を生成する。
そして、形状情報推定部１３は、生成した空間形状情報を空間情報記憶部１９に対して書き込んで記憶させる。この空間形状情報は、少なくとも、対象空間の三次元形状モデルのデータ、法線ベクトルなどを含んでいる。
本実施形態において、形状情報推定部１３は、以下に示す３つの三次元形状推定方法により、三次元形状モデルの生成を行う。 The shape information estimation unit 13 estimates the three-dimensional shape of the captured target space from the captured image stored in the captured image storage unit 18, and generates spatial shape information including the three-dimensional shape model (virtual space). ..
Then, the shape information estimation unit 13 writes and stores the generated spatial shape information in the spatial information storage unit 19. This spatial shape information includes at least the data of the three-dimensional shape model of the target space, the normal vector, and the like.
In the present embodiment, the shape information estimation unit 13 generates a three-dimensional shape model by the following three three-dimensional shape estimation methods.

三次元形状推定方法（１）：
形状情報推定部１３は、機械学習モデルを利用し、対象空間が撮像された一枚の撮像画像（ＲＧＢ画像）から直接にデプスマップを推定する（例えば、Ibraheem Alhashim, Peter Wonka"High Quality Monocular Depth Estimation via Transfer Learning" CoRR, Vol. abs/1901.03861, 2019.の文献に記載されている手法を用いる）。
形状情報推定部１３は、求めたデプスマップに基づいて、撮像画像の各画素における三次元点群及び法線を算出し、三次元点群をメッシュ化することができる。
そして、三次元点群は、メッシュ化された三次元形状モデルである仮想空間の情報を空間形状情報として、撮像画像記憶部１８に対して、撮像画像に対応させて書き込んで記憶させる。 Three-dimensional shape estimation method (1):
The shape information estimation unit 13 uses a machine learning model to estimate the depth map directly from a single captured image (RGB image) in which the target space is captured (for example, Ibraheem Alhashim, Peter Wonka "High Quality Monocular Depth". Estimation via Transfer Learning "CoRR, Vol. Abs / 1901.03861, 2019. Use the method described in the literature).
The shape information estimation unit 13 can calculate a three-dimensional point cloud and a normal in each pixel of the captured image based on the obtained depth map, and mesh the three-dimensional point cloud.
Then, the three-dimensional point cloud writes and stores the information of the virtual space, which is a meshed three-dimensional shape model, as the spatial shape information in the captured image storage unit 18 in correspondence with the captured image.

また、同一の対象空間が異なる撮像位置から撮像された複数の撮像画像が存在する場合、形状情報推定部１３は、同一の対象空間が撮像されていることを利用して、ＳｆＭ（structure from motion）のアルゴリズムによって、撮像画像の各々の撮像位置間の位置関係の推定を行う。
そして、形状情報推定部１３は、撮像画像の各々から求められたデプスマップそれぞれを統合して、より精度の高い対象空間のデプスマップを生成する。 Further, when there are a plurality of captured images captured from different imaging positions in the same target space, the shape information estimation unit 13 utilizes the fact that the same target space is imaged to obtain SfM (structure from motion). ), The positional relationship between the respective imaging positions of the captured image is estimated.
Then, the shape information estimation unit 13 integrates each of the depth maps obtained from each of the captured images to generate a more accurate depth map of the target space.

三次元形状推定方法（２）：
また、同一の対象空間が異なる撮像位置から撮像された複数の撮像画像が存在する場合、形状情報推定部１３は、ＭＶＳ(Multi-View Stereo) のアルゴリズムを用いて、複数の撮像画像を用いて対象空間の三次元形状の推定を行い、三次元形状モデルである仮想空間を生成して空間形状情報としても良い。 Three-dimensional shape estimation method (2):
Further, when there are a plurality of captured images captured from different imaging positions in the same target space, the shape information estimation unit 13 uses the plurality of captured images by using the algorithm of MVS (Multi-View Stereo). The three-dimensional shape of the target space may be estimated, and a virtual space, which is a three-dimensional shape model, may be generated and used as space shape information.

三次元形状推定方法（３）：
また、形状情報推定部１３は、対象空間における部屋構造部と、それ以外の部屋に配置されたオブジェクト部（部屋の備品など）との各々を、それぞれ別々に形状モデルを求めるアルゴリズムを用いても良い。
形状情報推定部１３は、以下の図２に示されているように、撮像画像から天井と壁、壁と床の境界部を抽出する。 Three-dimensional shape estimation method (3):
Further, the shape information estimation unit 13 may use an algorithm for separately obtaining a shape model for each of the room structure unit in the target space and the object unit (room equipment, etc.) arranged in other rooms. good.
As shown in FIG. 2 below, the shape information estimation unit 13 extracts the boundary between the ceiling and the wall and the boundary between the wall and the floor from the captured image.

図２は、撮像画像から部屋における天井、壁及び床の境界部を抽出し、デプスマップを求める処理を説明する図である。図２（ａ）は、撮像画像１００を示している。撮像画像１００には、対象空間の部屋２００における天井２００Ｃ、壁２００Ｗ、床２００Ｆ、柱２００Ｐ、テーブル２００Ｔ、光源２００Ｌなどが撮像されている。
図２（ｂ）は、天井２００Ｃ及び壁２００Ｗの境界部２００ＣＷと、壁２００Ｗ及び床２００Ｆの境界部２００ＷＦとが示されてる。また、図２（ｂ）において、柱２００Ｐ、テーブル２００Ｔ及び光源２００Ｌに関する情報は抽出されていない。
図２（ｃ）は、図２（ａ）の撮像画像と、図２（ｂ）の境界部とから求めたデプスマップを示している。 FIG. 2 is a diagram illustrating a process of extracting a boundary portion between a ceiling, a wall, and a floor in a room from a captured image and obtaining a depth map. FIG. 2A shows the captured image 100. In the captured image 100, the ceiling 200C, the wall 200W, the floor 200F, the pillar 200P, the table 200T, the light source 200L, and the like in the room 200 of the target space are imaged.
FIG. 2B shows a boundary portion 200CW between the ceiling 200C and the wall 200W and a boundary portion 200WF between the wall 200W and the floor 200F. Further, in FIG. 2B, information on the pillar 200P, the table 200T, and the light source 200L is not extracted.
FIG. 2C shows a depth map obtained from the captured image of FIG. 2A and the boundary portion of FIG. 2B.

そして、形状情報推定部１３は、マンハッタンワールド仮説を用いて、図２（ａ）の撮像画像と、図２（ｂ）の境界部とから、三次元空間の３軸（ｘ軸、ｙ軸及びｚ軸）に対応して、天井、壁及び床を検出して、それぞれのデプスマップを生成する。この天井、壁及び床の検出を、機械学習モデル（例えば、ＣＮＮ（convolutional neural network））によりＲＧＢ画像から直接行なうものとして、例えば、"Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem"LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2051-2059."がある。
また、形状情報推定部１３は、セマンティックセグメンテーションを行う機械学習モデル（例えば、ＣＮＮ）により、撮像画像におけるオブジェクトの各々の領域を抽出する。
図３は、セマンティックセグメンテーションを行う機械学習モデルにより対象空間のオブジェクトの領域分割を示す図である。図３において、撮像画像に撮像された対象空間の各オブジェクトの領域がセグメントとして分離されている。
形状情報推定部１３は、セマンティックセグメンテーションを行う機械学習モデルを用いて、撮像画像１００の画像において、天井２００Ｃ、壁２００Ｗ、床２００Ｆの構造部と、柱２００Ｐ、光源２００Ｌ＿１、２００Ｌ＿２、テーブル２００Ｔなどのオブジェクト部の各々の画像領域として分離する。 Then, the shape information estimation unit 13 uses the Manhattan world hypothesis to obtain the three axes (x-axis, y-axis and the y-axis) of the three-dimensional space from the captured image of FIG. 2 (a) and the boundary portion of FIG. 2 (b). Corresponds to the z-axis), it detects ceilings, walls and floors and generates a depth map for each. The detection of ceilings, walls and floors is performed directly from RGB images by a machine learning model (for example, CNN (convolutional neural network)), for example, "Chuhang Zou, Alex Colburn, Qi Shan, Derek Hoiem" LayoutNet: Reconstructing. There is the 3D Room Layout from a Single RGB Image "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2051-2059."
Further, the shape information estimation unit 13 extracts each region of the object in the captured image by a machine learning model (for example, CNN) that performs semantic segmentation.
FIG. 3 is a diagram showing the region division of an object in the target space by a machine learning model that performs semantic segmentation. In FIG. 3, the regions of each object in the target space captured in the captured image are separated as segments.
The shape information estimation unit 13 uses a machine learning model that performs semantic segmentation, and in the image of the captured image 100, the structural parts of the ceiling 200C, the wall 200W, the floor 200F, the pillar 200P, the light sources 200L_1, 200L_2, the table 200T, and the like. It is separated as each image area of the object part.

上述したように、形状情報推定部１３は、対象空間の部屋の撮像画像を、部屋構造部（天井、壁、床）やオブジェクト部（備品としての柱、机（テーブル）、椅子、棚などの家具）などの領域に分割することで、それぞれの領域を認識する。
また、形状情報推定部１３は、三次元形状推定方法（１）で説明した機械学習モデルを利用し、対象空間が撮像された一枚の撮像画像（ＲＧＢ画像）から直接にデプスマップを推定する。
そして、形状情報推定部１３は、図３のセマンティックセグメンテーション結果における対象空間が分離された画像領域から、部屋構造部の領域を除いたオブジェクト部の領域として柱２００Ｐ及びテーブル２００Ｔの領域を抽出する。 As described above, the shape information estimation unit 13 captures the captured image of the room in the target space, such as the room structure unit (ceiling, wall, floor) and the object unit (pillars, desks (tables), chairs, shelves, etc. as equipment). By dividing into areas such as furniture), each area is recognized.
Further, the shape information estimation unit 13 estimates the depth map directly from one captured image (RGB image) in which the target space is captured by using the machine learning model described in the three-dimensional shape estimation method (1). ..
Then, the shape information estimation unit 13 extracts the area of the pillar 200P and the table 200T as the area of the object part excluding the area of the room structure part from the image area where the target space is separated in the semantic segmentation result of FIG.

図４は、機械学習モデルで生成したデプスマップとマンハッタンワールド仮説により生成したデプスマップとの合成を説明する図である。
図４（ａ）は、三次元形状推定方法（１）で説明した機械学習モデルにより生成したデプスマップから、セマンティックセグメンテーション結果におけるオブジェクト部の柱２００Ｐ及びテーブル２００Ｔの領域を用いて抽出した、柱２００Ｐ、テーブル２００Ｔそれぞれのデプスマップを示している。
図４（ｂ）は、図２（ｃ）に示すマンハッタンワールド仮説により生成した部屋構造部のデプスマップと、図４（ａ）に示すオブジェクト部のデプスマップとをスケーリングして統合した対象空間のデプスマップの一例を示している。 FIG. 4 is a diagram for explaining the composition of the depth map generated by the machine learning model and the depth map generated by the Manhattan World hypothesis.
FIG. 4A is a column 200P extracted from the depth map generated by the machine learning model described in the three-dimensional shape estimation method (1) using the column 200P of the object part and the area of the table 200T in the semantic segmentation result. , Table 200T shows the depth map of each.
FIG. 4B shows a target space in which the depth map of the room structure generated by the Manhattan world hypothesis shown in FIG. 2C and the depth map of the object portion shown in FIG. 4A are scaled and integrated. An example of a depth map is shown.

形状情報推定部１３は、図４（ｂ）に示すように、機械学習モデルで生成したデプスマップとマンハッタンワールド仮説により生成したデプスマップとの合成を行う。
これにより、マンハッタンワールド仮説は、人工物の多くがそれぞれの面が直交座標系に平行に拘束されているという仮定になりたつため、天井、壁及び床などの部屋構造部におけるデプスマップを高い精度で得ることができる。
しかしながら、対象空間におけるテーブルや椅子などのオブジェクト部については、上記拘束に対応しないためにデプスマップが生成されない。 As shown in FIG. 4B, the shape information estimation unit 13 synthesizes the depth map generated by the machine learning model and the depth map generated by the Manhattan world hypothesis.
As a result, the Manhattan World hypothesis is based on the assumption that many of the artifacts are constrained in parallel to the Cartesian coordinate system, so that depth maps in room structures such as ceilings, walls, and floors can be accurately mapped. Obtainable.
However, the depth map is not generated for the object part such as the table or the chair in the target space because it does not correspond to the above constraint.

一方、三次元形状推定方法（１）で説明した機械学習モデルにより生成したデプスマップにおいては、対象空間におけるオブジェクト部についてもデプスマップが生成される。
しかしながら、機械学習で求めたデプスマップにおいては、いずれがオブジェクト部のデプスマップの存在する領域かを認識することができない。
このため、形状情報推定部１３は、機械学習で求めたデプスマップにおけるオブジェクト部（２００Ｐ、２００Ｔ）の領域を用いて、図３に示すセマンティックセグメンテーションにおけるオブジェクト部の領域により抽出して、図４（ａ）に示すオブジェクト部の領域のデプスマップを取得する。これにより、部屋構造部とオブジェクト部とのデプスマップとを別々に得ることができる。
このとき、形状情報推定部１３は、図３の部屋構造部の値に基づいて、機械学習で求めたデプスマップから抽出したオブジェクト部のデプスマップのスケーリングを行う。 On the other hand, in the depth map generated by the machine learning model described in the three-dimensional shape estimation method (1), the depth map is also generated for the object part in the target space.
However, in the depth map obtained by machine learning, it is not possible to recognize which is the area where the depth map of the object part exists.
Therefore, the shape information estimation unit 13 uses the region of the object portion (200P, 200T) in the depth map obtained by machine learning to extract from the region of the object portion in the semantic segmentation shown in FIG. Acquire the depth map of the area of the object part shown in a). As a result, the depth map of the room structure part and the object part can be obtained separately.
At this time, the shape information estimation unit 13 scales the depth map of the object unit extracted from the depth map obtained by machine learning based on the values of the room structure unit of FIG.

この三次元形状推定方法（３）により、オブジェクト部により遮蔽されていた部屋構造部の三次元形状を示すデプスマップを、三次元形状推定方法（１）で用いて得たデプスマップより高い精度で得ることができる。
そして、形状情報推定部１３は、部屋構造部及びオブジェクト部の各々のデプスマップを用いて、それぞれについて三次元点群及び各三次元点における法線を求め、これら三次元点群点のメッシュ化を行ない、空間形状情報として空間情報記憶部１９に対して書き込んで記憶させる。
この手法によって、オブジェクト部により遮蔽されていた部屋構造部の三次元形状を記憶することが可能となる。 By this three-dimensional shape estimation method (3), the depth map showing the three-dimensional shape of the room structure part shielded by the object part is more accurate than the depth map obtained by using the three-dimensional shape estimation method (1). Obtainable.
Then, the shape information estimation unit 13 obtains a three-dimensional point cloud and a normal at each three-dimensional point for each depth map of the room structure unit and the object unit, and meshes these three-dimensional point cloud points. Is performed, and it is written and stored in the spatial information storage unit 19 as spatial shape information.
By this method, it becomes possible to memorize the three-dimensional shape of the room structure portion shielded by the object portion.

また、同一の対象空間が異なる撮像位置から撮像された複数の撮像画像が存在する場合、形状情報推定部１３は、同一の対象空間が撮像されていることを利用して、ＳｆＭ（structure from motion）のアルゴリズムによって、撮像画像の各々の撮像位置間の位置関係の推定を行う。
そして、形状情報推定部１３は、撮像画像の各々から求められた部屋構造部及びオブジェクト部の各々のデプスマップを、部屋構造部、オブジェクト部それぞれにおいて統合して、より精度の高い対象空間における部屋構造部、オブジェクト部のデプスマップを生成する。 Further, when there are a plurality of captured images captured from different imaging positions in the same target space, the shape information estimation unit 13 utilizes the fact that the same target space is imaged to obtain SfM (structure from motion). ), The positional relationship between the respective imaging positions of the captured image is estimated.
Then, the shape information estimation unit 13 integrates the depth maps of the room structure unit and the object unit obtained from each of the captured images in each of the room structure unit and the object unit, and the room in the target space with higher accuracy. Generates the depth map of the structure part and the object part.

図５は、マンハッタンワールド仮説により生成したデプスマップから求めた部屋構造部（天井、壁及び床）の三次元形状を表示した図である。
この図５には、対象空間の部屋の天井２００Ｃ、壁２００Ｗ及び床２００Ｆからなる部屋構造部の三次元形状が示されている。 FIG. 5 is a diagram showing the three-dimensional shape of the room structure (ceiling, wall and floor) obtained from the depth map generated by the Manhattan World hypothesis.
FIG. 5 shows the three-dimensional shape of the room structure portion including the ceiling 200C, the wall 200W, and the floor 200F of the room in the target space.

図６は、三次元形状推定方法（１）における機械学習モデルで求めたデプスマップから、セマンティックセグメンテーションにより分割した領域により求めたオブジェクト部のデプスマップから求めた三次元形状を表示した図である。
図６には、対象空間の仮想空間２５０において、オブジェクト部のデプスマップから求めた柱２００Ｐ及びテーブル２００Ｔの三次元形状が示されている。 FIG. 6 is a diagram showing a three-dimensional shape obtained from the depth map of the object portion obtained by the region divided by the semantic segmentation from the depth map obtained by the machine learning model in the three-dimensional shape estimation method (1).
FIG. 6 shows the three-dimensional shapes of the pillar 200P and the table 200T obtained from the depth map of the object portion in the virtual space 250 of the target space.

上述したように、形状情報推定部１３は、例えば、三次元形状推定方法（１）から三次元形状推定方法（３）などにより、対象空間の三次元形状を推定する。
しかしながら、上述した三次元形状推定方法（１）から三次元形状推定方法（３）の方法は、撮像画像からの三次元復元技術の例として示すものである。形状情報推定部１３で用いられる三次元復元技術は、必ずしも特定の手法として限定される訳ではなく、一般的に用いられているいずれの手法を用いてもよい。 As described above, the shape information estimation unit 13 estimates the three-dimensional shape of the target space by, for example, the three-dimensional shape estimation method (1) to the three-dimensional shape estimation method (3).
However, the above-mentioned methods from the three-dimensional shape estimation method (1) to the three-dimensional shape estimation method (3) are shown as an example of the three-dimensional restoration technique from the captured image. The three-dimensional restoration technique used in the shape information estimation unit 13 is not necessarily limited to a specific method, and any generally used method may be used.

また、上述した三次元形状推定方法（１）から三次元形状推定方法（３）により求めたデプスマップが相対的な距離として求められているため、実際の撮像位置からの距離は不明である。以下の手法などにより、デプスマップにおける実際の距離を推定し、復元される対象空間の各オブジェクトのサイズを実寸のサイズとして求める。
例えば、撮像画像記憶部１８に記憶されている撮像画像を撮像した際の撮像条件として、撮像装置における特定の画素と対象空間の三次元形状との実寸サイズでの距離が記憶されていれば、この実寸サイズの距離を用いて、デプスマップにおける各画素の距離を求めることができる。 Further, since the depth map obtained by the three-dimensional shape estimation method (1) to the three-dimensional shape estimation method (3) described above is obtained as a relative distance, the distance from the actual imaging position is unknown. The actual distance in the depth map is estimated by the following method, and the size of each object in the restored target space is calculated as the actual size.
For example, if the actual size distance between a specific pixel in the imaging device and the three-dimensional shape of the target space is stored as an imaging condition when the captured image stored in the captured image storage unit 18 is captured, the distance is stored. Using this actual size distance, the distance of each pixel in the depth map can be obtained.

また、大きさが既知であるオブジェクトを配置して撮像し、撮像されたこの大きさが既知のオブジェクトの画像に基づいて、対象空間におけるオブジェクトの各々をスケーリングしても良い。このとき、既知のオブジェクトをパターン認識やセマンティックセグメンテーション、物体認識などの画像解析により、既知のオブジェクトの画素領域を抽出し、抽出した領域における画素のデプス値を用いて、他の領域のデプスマップを実寸サイズにスケーリングする。
また、画像のぼけ量と撮像装置の焦点距離との対応関係を、予め計測実験により取得しておき、撮像画像のぼけ量から距離を推定する構成を用いてもよい。 Alternatively, an object of known size may be arranged and imaged, and each of the objects in the target space may be scaled based on the image of the captured object of known size. At this time, the pixel area of the known object is extracted by image analysis such as pattern recognition, semantic segmentation, and object recognition of the known object, and the depth map of the other area is obtained by using the pixel depth value in the extracted area. Scale to actual size.
Further, a configuration may be used in which the correspondence between the amount of blurring of the image and the focal length of the imaging device is acquired in advance by a measurement experiment, and the distance is estimated from the amount of blurring of the captured image.

光源情報推定部１４は、対象空間の撮像画像と、対象空間の空間形状情報（仮想空間の三次元形状）とから光源情報の推定を行い、推定された光源情報を空間情報記憶部１９に対して書き込んで記憶させる。この光源情報は、少なくとも光源の対象空間における位置、光源の放射する光の強度、及び光の色味（分光情報）である。
本実施形態において、光源情報推定部１４は、以下に示す３つの光源情報推定方法により、光源情報の生成を行う。 The light source information estimation unit 14 estimates the light source information from the captured image of the target space and the spatial shape information (three-dimensional shape of the virtual space) of the target space, and supplies the estimated light source information to the spatial information storage unit 19. Write and memorize. This light source information is at least the position of the light source in the target space, the intensity of the light emitted by the light source, and the tint of the light (spectral information).
In the present embodiment, the light source information estimation unit 14 generates light source information by the following three light source information estimation methods.

光源情報推定方法（１）：
光源情報推定部１４は、撮像画像(ＲＧＢ画像）をグレースケール画像に変換し、階調度の最も高い画素の領域を抽出する。
そして、光源情報推定部１４は、例えば、この階調度の最も高い画素の領域と、当該領域の画素の階調度に対して、所定の比（例えば、９０％以上）の階調度を有する画素の領域とを光源の領域として抽出する。 Light source information estimation method (1):
The light source information estimation unit 14 converts the captured image (RGB image) into a grayscale image, and extracts the region of the pixel having the highest degree of gradation.
Then, the light source information estimation unit 14 is, for example, a region of the pixel having the highest gradation degree and a pixel having a gradation degree of a predetermined ratio (for example, 90% or more) with respect to the gradation degree of the pixel in the region. The area is extracted as the area of the light source.

図７は、光源情報推定方法（１）により抽出された光源の領域（光源領域）を、光源情報推定部１４がグレースケール画像において示す図である。図７において、グレースケール画像３００は、ＲＧＢ画像である撮像画像１００（図２（ａ））をグレースケール化した画像である。
グレースケール画像３００の画素において、領域３０１の画素が最も高い階調度を有している。そして、領域３０２の画素が、領域３０１の画素の９０％以上の階調度を有している。これにより、光源情報推定部１４は、領域３０１及び領域３０２の各々を、光源領域として抽出する。 FIG. 7 is a diagram showing a region of the light source (light source region) extracted by the light source information estimation method (1) in a grayscale image by the light source information estimation unit 14. In FIG. 7, the grayscale image 300 is a grayscale image of the captured image 100 (FIG. 2A) which is an RGB image.
Among the pixels of the grayscale image 300, the pixels of the region 301 have the highest gradation. The pixels in the area 302 have a gradation degree of 90% or more of the pixels in the area 301. As a result, the light source information estimation unit 14 extracts each of the area 301 and the area 302 as the light source area.

光源情報推定方法（２）：
光源情報推定部１４は、上述した光源情報推定方法（１）により、撮像画像において光源領域を抽出する。
そして、光源情報推定部１４は、形状情報推定部１３により抽出した光源領域の形状情報に基づき、仮想空間である空間形状情報において、光源領域の三次元形状としての点光源を配置する。
また、光源情報推定部１４は、撮像画像及び空間形状情報における三次元形状により、対象空間の反射率情報（波長毎の反射率情報）を推定する。 Light source information estimation method (2):
The light source information estimation unit 14 extracts a light source region from the captured image by the light source information estimation method (1) described above.
Then, the light source information estimation unit 14 arranges a point light source as a three-dimensional shape of the light source region in the space shape information which is a virtual space based on the shape information of the light source region extracted by the shape information estimation unit 13.
Further, the light source information estimation unit 14 estimates the reflectance information (reflectance information for each wavelength) of the target space from the three-dimensional shape in the captured image and the spatial shape information.

ここで、光源情報推定部１４は、対象空間の反射率情報の推定において、例えば、固有画像分解(Intrinsic Image Decomposition) に基づくＩｎｔｒｉｎｓｉｃＩｍａｇｅＰｒｏｂｌｅｍのアルゴリズムの手法（例えば、Qifeng Chen, Vladlen Koltun" A Simple Model for Intrinsic Image Decomposition with Depth Cues" The IEEE International Conference on Computer Vision (ICCV), 2013, pp. 241-248の文献に記載されている手法）を用いている。
光源情報推定部１４は、ＩｎｔｒｉｎｓｉｃＩｍａｇｅＰｒｏｂｌｅｍのアルゴリズムによって反射率・陰影分離を行う。
すなわち、光源情報推定部１４は、撮像画像を反射率成分画像（反射率情報）及び陰影成分画像の各々に分離する際、ＩｎｔｒｉｎｓｉｃＩｍａｇｅＰｒｏｂｌｅｍのアルゴリズムに基づき、「画像（撮像画像）Ｉは反射率成分Ｒと陰影成分Ｓの積で表すことができる」という仮定を基とし、撮像画像Ｉを反射率成分画像Ｒと陰影成分画像Ｓとの各々に分離する。 Here, the light source information estimation unit 14 in estimating the reflectance information of the target space, for example, is a method of the algorithm of the Intrinsic Image Problem based on the intrinsic image decomposition (for example, Qifeng Chen, Vladlen Koltun "A Simple". Model for Intrinsic Image Decomposition with Depth Cues "The IEEE International Conference on Computer Vision (ICCV), 2013, pp. 241-248) is used.
The light source information estimation unit 14 separates the reflectance and the shadow by the algorithm of the Intricic Image Problem.
That is, when the light source information estimation unit 14 separates the captured image into each of the reflectance component image (reflectance information) and the shadow component image, the light source information estimation unit 14 "images (captured image) I is the reflectance" based on the algorithm of the Intrinsic Image Problem. Based on the assumption that it can be represented by the product of the component R and the shadow component S, the captured image I is separated into the reflectance component image R and the shadow component image S, respectively.

図８は、光源情報推定部１４により求められた空間形状情報における三次元形状の反射率情報を示す反射率成分画像である。
この図８において、反射率成分画像は、撮像画像における２次元座標値で示される２次元座標点で示される各画素の反射率成分（反射率情報）がＲＧＢ（Red Green Blue）の画素値として示されている。
一方、陰影成分画像は、撮像画像における２次元座標値で示される各２次元座標点の陰影成分、すなわち撮像画像を撮像した際の光源に依存するデータが画素値として示されている。 FIG. 8 is a reflectance component image showing the reflectance information of the three-dimensional shape in the spatial shape information obtained by the light source information estimation unit 14.
In FIG. 8, in the reflectance component image, the reflectance component (reflectance information) of each pixel indicated by the two-dimensional coordinate points indicated by the two-dimensional coordinate values in the captured image is a pixel value of RGB (Red Green Blue). It is shown.
On the other hand, in the shadow component image, the shadow component of each two-dimensional coordinate point indicated by the two-dimensional coordinate value in the captured image, that is, the data depending on the light source when the captured image is captured is shown as the pixel value.

次に、光源情報推定部１４は、撮像画像の各画素の階調度と、空間形状情報における三次元形状の反射率情報とにより、光源の強度の推定を行う。
そして、光源情報推定部１４は、撮像画像において任意に光源の強度を求めるための領域Ａを設定し、この領域Ａの反射率情報を抽出する。
ここで、設定する領域Ａは、強度及び色味を求めたい対象の光源の光のみ（あるいは入射する光のほとんどが対象の光源の光のみ）が照射される領域であることが望ましい。
すなわち、複数の光源がある場合、それぞれの光源からの影響を求めるための演算が複雑とならないように、対象の光源の強度及び色味を求めるための演算量を低減する。 Next, the light source information estimation unit 14 estimates the intensity of the light source based on the gradation of each pixel of the captured image and the reflectance information of the three-dimensional shape in the spatial shape information.
Then, the light source information estimation unit 14 arbitrarily sets a region A for obtaining the intensity of the light source in the captured image, and extracts the reflectance information of this region A.
Here, it is desirable that the region A to be set is a region in which only the light of the target light source for which the intensity and the tint are to be obtained (or most of the incident light is only the light of the target light source) is irradiated.
That is, when there are a plurality of light sources, the amount of calculation for obtaining the intensity and color of the target light source is reduced so that the calculation for obtaining the influence from each light source is not complicated.

次に、光源情報推定部１４は、撮像画像において設定した領域Ａと、反射率成分画像において領域Ａに対応する位置の領域Ｂを抽出する。
そして、光源情報推定部１４は、領域ＡのＲＧＢ階調度と領域Ｂの反射率情報とにより、光源の強度及び色味を求める。
また、光源情報推定部１４は、光源の強度を求める際、光の強度が距離の二乗に反比例することを考慮し、領域Ｂに入射すると推定された光の強度に対して、空間形状情報における三次元形状から求めた光源と領域Ｂとの距離の二乗を乗して、光源の強度を求める。 Next, the light source information estimation unit 14 extracts the region A set in the captured image and the region B at the position corresponding to the region A in the reflectance component image.
Then, the light source information estimation unit 14 obtains the intensity and color of the light source from the RGB gradation of the region A and the reflectance information of the region B.
Further, when determining the intensity of the light source, the light source information estimation unit 14 considers that the intensity of light is inversely proportional to the square of the distance, and in the spatial shape information with respect to the intensity of light estimated to be incident on the region B. The intensity of the light source is obtained by multiplying the square of the distance between the light source and the region B obtained from the three-dimensional shape.

この光源情報推定方法（２）により、光源の強度が撮像画像のＲＧＢ値を飽和していても、光源からの光が入射する領域から間接的に光源の強度を求めるため、正確な光源情報を取得することができる。
図９は、光源情報推定部１４が推定した光源情報を空間形状情報における三次元形状に配置した例を示している。図９においては、対象空間の三次元形状における推定位置に対応して、光源２００Ｌ＿１、２００Ｌ＿２を対象空間の仮想空間２５０に配置している。 By this light source information estimation method (2), even if the intensity of the light source saturates the RGB values of the captured image, the intensity of the light source is indirectly obtained from the region where the light from the light source is incident, so that accurate light source information can be obtained. Can be obtained.
FIG. 9 shows an example in which the light source information estimated by the light source information estimation unit 14 is arranged in a three-dimensional shape in the spatial shape information. In FIG. 9, the light sources 200L_1 and 200L_2 are arranged in the virtual space 250 of the target space corresponding to the estimated positions in the three-dimensional shape of the target space.

光源情報推定方法（３）：
光源情報推定部１４は、光源情報推定方法（１）により、撮像画像において光源領域を抽出する。また、光源情報推定部１４は、上述した光源情報推定方法（２）と同様に、形状情報推定部１３により抽出した光源領域の形状情報に基づき、空間形状情報における仮想空間において、光源領域の三次元形状としての点光源を配置する。
このとき、この点光源の強度と色味とは、仮の任意の数値（仮光源情報）とし、例えば撮像画像の点光源と推定された領域のＲＧＢ値を仮の数値として設定する。 Light source information estimation method (3):
The light source information estimation unit 14 extracts a light source region from the captured image by the light source information estimation method (1). Further, the light source information estimation unit 14 is the third dimension of the light source region in the virtual space in the spatial shape information based on the shape information of the light source region extracted by the shape information estimation unit 13 as in the light source information estimation method (2) described above. Place a point light source as the original shape.
At this time, the intensity and color of the point light source are set to temporary arbitrary numerical values (temporary light source information), and for example, the RGB value of the region estimated to be the point light source of the captured image is set as a temporary numerical value.

次に、光源情報推定部１４は、撮像画像の各画素に対応する、空間形状情報における三次元形状の座標点を抽出し、この座標点において光が完全拡散反射するという仮定により、点光源それぞれから光が入射された際の陰影情報を計算する。
そして、光源情報推定部１４は、撮像画像の画素値と、陰影成分画像の対応する画素の陰影情報とから、各画素における反射率情報を求める。すなわち、光源情報推定部１４は、撮像画像の画素値を、陰影成分画像の対応する画素の陰影情報により除算して、除算結果を反射率情報とする。 Next, the light source information estimation unit 14 extracts the coordinate points of the three-dimensional shape in the spatial shape information corresponding to each pixel of the captured image, and under the assumption that the light is completely diffused and reflected at these coordinate points, each of the point light sources The shadow information when light is incident from is calculated.
Then, the light source information estimation unit 14 obtains the reflectance information in each pixel from the pixel value of the captured image and the shadow information of the corresponding pixel of the shadow component image. That is, the light source information estimation unit 14 divides the pixel value of the captured image by the shadow information of the corresponding pixel of the shadow component image, and uses the division result as the reflectance information.

そして、光源情報推定部１４は、自然物の反射率の平均値が２０％であるという仮定を用い、反射率情報をスケーリングする。
すなわち、光源情報推定部１４は、以下に示す（１）式を用いて、撮像画像の各画素の反射率Ａ（ｘ，ｙ）を求める。ここで、光源情報推定部１４は、撮像画像における全ての画素の反射率Ａ（ｘ，ｙ）の平均値が２０％となるスケーリング係数αを求める。すなわち、光源情報推定部１４は、撮像画像の画素値Ｉ（ｘ，ｙ）を陰影成分画像の対応する画素の陰影情報（陰影成分画像の画素値Ｓ（ｘ，ｙ））により除算した結果の数値において、反射率Ａ（ｘ，ｙ）の平均値が２０％となるように、スケーリング係数αを求める。光源情報推定部１４は、スケーリング係数αが求まった際の反射率Ａ（ｘ，ｙ）を、撮像画像のそれぞれの画素の反射率Ａ（ｘ，ｙ）とする。 Then, the light source information estimation unit 14 scales the reflectance information using the assumption that the average value of the reflectance of the natural object is 20%.
That is, the light source information estimation unit 14 obtains the reflectance A (x, y) of each pixel of the captured image by using the following equation (1). Here, the light source information estimation unit 14 obtains a scaling coefficient α at which the average value of the reflectances A (x, y) of all the pixels in the captured image is 20%. That is, the light source information estimation unit 14 is the result of dividing the pixel value I (x, y) of the captured image by the shadow information (pixel value S (x, y) of the shadow component image) of the corresponding pixel of the shadow component image. In the numerical value, the scaling coefficient α is obtained so that the average value of the reflectance A (x, y) is 20%. The light source information estimation unit 14 sets the reflectance A (x, y) when the scaling coefficient α is obtained as the reflectance A (x, y) of each pixel of the captured image.

次に、光源情報推定部１４は、（１）式で求めた撮像画像における画素の各々の反射率Ａ（ｘ，ｙ）を用いて、以下に示す（２）式により、光源の色味及び強度の光源情報を求める。このとき、光源情報推定部１４は、（２）式により求まる画素値Ｉ（ｘ，ｙ）が撮像画像の対応する位置の画素値と同様となるように、点光源の強度と色味とを仮の任意の数値から変化させて、最終的に点光源の強度及び色味を求める。
以下の（２）式において、Ａ（ｘ，ｙ）が画素（ｘ，ｙ）の反射率を示し、ｋ（ｌ）が光源ｌの色みを示し、ｋ_α（ｌ）が光源ｌの光の強度を示し、ｒ（ｘ，ｙ，ｌ）が画素（ｘ，ｙ）から光源ｌまでの距離を示し、ｎ（ｘ，ｙ，ｌ）が画素（ｘ、ｙ）から光源ｌへのベクトルを示し、ｎ_Ｉ（ｘ，ｙ）が画素（ｘ，ｙ）の法線ベクトルを示している。
光源情報推定部１４は、撮像画像の各画素の画素値Ｉ（ｘ，ｙ）となるように、各光源の光源情報である色みｋ（ｌ）及び強度ｋ_α（ｌ）を、（２）式による数値計算により求める。 Next, the light source information estimation unit 14 uses the reflectances A (x, y) of each pixel in the captured image obtained by the equation (1), and the color of the light source and the color of the light source according to the equation (2) shown below. Obtain the intensity light source information. At this time, the light source information estimation unit 14 adjusts the intensity and color of the point light source so that the pixel value I (x, y) obtained by Eq. (2) is the same as the pixel value at the corresponding position of the captured image. The intensity and color of the point light source are finally obtained by changing from a tentative arbitrary numerical value.
In the following equation (2), A (x, y) indicates the reflectance of the pixel (x, y), k (l) indicates the color of the light source l, and k _α (l) indicates the light of the light source l. R (x, y, l) indicates the distance from the pixel (x, y) to the light source l, and n (x, y, l) is the vector from the pixel (x, y) to the light source l. , And n _I (x, y) indicates the normal vector of the pixel (x, y).
_{The light source information estimation unit 14 sets the color k (l) and the intensity k α} (l), which are the light source information of each light source, to (2) so as to be the pixel value I (x, y) of each pixel of the captured image. ) Calculated by numerical calculation.

また、光源情報を求めるアルゴリズムは、上述した光源情報推定方法（１）から光源情報推定方法（３）以外を用いてもよい。
上述した光源情報推定方法（１）から光源情報推定方法（３）の各々においては、光源を点光源として説明したが、点光源以外の例えば線光源あるいは面光源でもよい。
例えば、光源情報推定部１４は、すでに説明したセマンティックセグメンテーションを用いて、各光源の物体認識結果（撮像画像における光源の形状）を求めておく。そして、光源情報推定部１４は、光源の種類として、物体認識結果が電球形状であれば点光源とし、ダウンライト形状であればディレクショナルライト（平行光源）とし、蛍光灯形状であれば線光源とし、窓形状であれば面光源とする。 Further, as the algorithm for obtaining the light source information, a method other than the above-mentioned light source information estimation method (1) to the light source information estimation method (3) may be used.
In each of the above-mentioned light source information estimation method (1) to light source information estimation method (3), the light source has been described as a point light source, but for example, a line light source or a surface light source other than the point light source may be used.
For example, the light source information estimation unit 14 obtains the object recognition result (shape of the light source in the captured image) of each light source by using the semantic segmentation already described. Then, the light source information estimation unit 14 uses a point light source if the object recognition result is a light bulb shape, a directional light (parallel light source) if the object recognition result is a downlight shape, and a line light source if the object recognition result is a fluorescent lamp shape. If it has a window shape, it will be a surface light source.

また、光源情報としては、撮像画像、対象空間の空間形状情報、及び光源情報とにより、ＣＧで生成したＣＧオブジェクトを合成する仮想空間上の任意の位置におけるＩＢＬ（image based lighting）情報を生成することができる。
これにより、画像合成部１５は、撮像画像と対象空間の空間形状情報とにより、ＣＧで生成したＣＧオブジェクトを配置する位置における光源情報をＩＢＬ情報から容易に求めることができる。ここで、光源情報推定方法（２）及び光源情報推定方法（３）で求めた光源情報を、画素値が飽和しないＩＢＬ情報のデータ形式を用いることができる。 Further, as the light source information, IBL (image based lighting) information at an arbitrary position in the virtual space where the CG object generated by CG is synthesized is generated by the captured image, the spatial shape information of the target space, and the light source information. be able to.
As a result, the image synthesizing unit 15 can easily obtain the light source information at the position where the CG object generated by CG is arranged from the IBL information based on the captured image and the spatial shape information of the target space. Here, the data format of the IBL information in which the pixel values are not saturated can be used for the light source information obtained by the light source information estimation method (2) and the light source information estimation method (3).

画像合成部１５は、光源情報に基づいて、撮像画像、対象空間の空間形状情報及び光源情報とに基づいて、三次元の仮想空間を生成する。
ここで、画像合成部１５は、空間形状情報における三次元形状を形状モデルとし、撮像画像をテクスチャデータとし、光源情報をレンダリングパラメータとして、対象空間の仮想空間を生成する。 The image synthesizing unit 15 generates a three-dimensional virtual space based on the captured image, the spatial shape information of the target space, and the light source information based on the light source information.
Here, the image synthesizing unit 15 uses the three-dimensional shape in the spatial shape information as a shape model, the captured image as texture data, and the light source information as a rendering parameter to generate a virtual space in the target space.

図１０は、画像合成部により生成された三次元の仮想空間を撮像画像の撮像方向から見た画像を示す図である。この図１０は、表示制御部１６が仮想空間を所定の視点方向（例えば、撮像方向）から観察された画面として、表示部１７の表示画面に表示した図である。
図１０においては、撮像画像により生成した三次元の仮想空間を所定の視点から観察した画像として表示させる際、平面や天球に対して撮像画像を撮像装置の画角に応じてテクスチャデータとして設定して（貼り付けて）仮想空間を生成している。あるいは、仮想空間において、観察する視点位置の変更をしても違和感なく（撮像位置に対して所定の範囲内において違和感なく）観察画像を表示させるため、形状モデルにテクスチャデータとして撮像画像を設定すれ（貼り付けれ）ばよい。 FIG. 10 is a diagram showing an image of a three-dimensional virtual space generated by the image synthesizing unit as viewed from the imaging direction of the captured image. FIG. 10 is a diagram in which the display control unit 16 displays the virtual space on the display screen of the display unit 17 as a screen observed from a predetermined viewpoint direction (for example, an imaging direction).
In FIG. 10, when the three-dimensional virtual space generated by the captured image is displayed as an image observed from a predetermined viewpoint, the captured image is set as texture data for a plane or a celestial sphere according to the angle of view of the imaging device. (Paste) to create a virtual space. Alternatively, in the virtual space, in order to display the observed image without discomfort even if the viewpoint position to be observed is changed (without discomfort within a predetermined range with respect to the imaging position), the captured image should be set as texture data in the shape model. (Paste) just do.

図１０において、柱２００Ｐの形状モデルには、光源２００Ｌ＿１及び２００Ｌ＿２の各々からの光が照射される明るい明領域２００ＰＬと、光が照射されない影領域２００ＰＳ（シャドウ）とが、形状モデルにテクスチャデータとして撮像画像を貼り付ける（設定する）処理など、撮像画像を用いて形成されている。
同様に、テーブル２００Ｔの形状モデルが配置された位置における、テーブル２００Ｔの下部の床２００Ｆには、光源２００Ｌ＿１及び２００Ｌ＿２の各々からの光がテーブル２００Ｔにより遮蔽されることで生成される影領域２００ＴＳが同様に形成されている。
また、床２００Ｆにおいて、光源２００Ｌ＿１及び２００Ｌ＿２からの距離に対応して、明るさの強さの程度を示しているグラデーション領域５０１も、上述したように撮像画像を用いて形成されている。 In FIG. 10, in the shape model of the pillar 200P, a bright bright region 200PL irradiated with light from each of the light sources 200L_1 and 200L_2 and a shadow region 200PS (shadow) not irradiated with light are used as texture data in the shape model. It is formed by using the captured image, such as a process of pasting (setting) the captured image.
Similarly, at the position where the shape model of the table 200T is arranged, the floor 200F below the table 200T has a shadow area 200TS generated by shielding the light from each of the light sources 200L_1 and 200L_2 by the table 200T. It is formed in the same way.
Further, on the floor 200F, a gradation region 501 indicating the degree of brightness intensity corresponding to the distances from the light sources 200L_1 and 200L_2 is also formed by using the captured image as described above.

また、画像合成部１５は、生成した三次元の仮想空間に対して、ＣＧで生成されたＣＧオブジェクトを追加して配置する制御が行われた場合、当該仮想空間に対してＣＧオブジェクトの追加配置を行う。
本実施形態において、ＣＧで生成されたＣＧオブジェクトの各々は、例えば、予め生成されており、合成画像記憶部２０に書き込まれて記憶されている。 Further, when the image synthesizing unit 15 is controlled to add and arrange the CG object generated by CG to the generated three-dimensional virtual space, the image synthesizing unit 15 additionally arranges the CG object to the virtual space. I do.
In the present embodiment, each of the CG objects generated by CG is, for example, generated in advance and is written and stored in the composite image storage unit 20.

図１１は、仮想空間に対してＣＧオブジェクトの追加の配置を行う制御について説明する図である。この図１１は、図１０と同様に、表示制御部１６が仮想空間を所定の視点方向（例えば、撮像方向）から観察された画面として、表示部１７の表示画面に表示した図である。
図１１において、ＣＧオブジェクト提示領域５０３に表示されているＣＧオブジェクト５０３＿１、５０３＿２などが、画像合成部１５により、表示制御部１６を介して表示部１７の表示画面５０２に表示される。 FIG. 11 is a diagram illustrating control for additionally arranging CG objects in the virtual space. FIG. 11 is a diagram in which the display control unit 16 displays the virtual space on the display screen of the display unit 17 as a screen observed from a predetermined viewpoint direction (for example, the imaging direction) as in FIG. 10.
In FIG. 11, the CG objects 503_1, 503_2, etc. displayed in the CG object presentation area 503 are displayed by the image synthesizing unit 15 on the display screen 502 of the display unit 17 via the display control unit 16.

例えば、ユーザがＣＧオブジェクト提示領域５０３におけるＣＧオブジェクト５０３＿１をドラッグし、配置位置５０４においてドロップすることにより、画像合成部１５は、ＣＧオブジェクト５０３＿１を仮想空間における配置位置５０４に配置する。 For example, when the user drags the CG object 503_1 in the CG object presentation area 503 and drops it at the arrangement position 504, the image synthesizing unit 15 arranges the CG object 503_1 at the arrangement position 504 in the virtual space.

図１２は、仮想空間に対してＣＧオブジェクトの追加の配置を行う制御について説明する図である。この図１２は、図１０と同様に、表示制御部１６が仮想空間を所定の視点方向（例えば、撮像方向）から観察された画面として、表示部１７の表示画面に表示した図である。
図１２において、仮想空間に配置されたＣＧオブジェクト５０３＿１に対応し、このＣＧオブジェクト５０３＿１の形状情報に対して、光源２００Ｌ＿１及び２００Ｌ＿２の各々の光源情報を反映させて、仮想空間が再構成される。 FIG. 12 is a diagram illustrating control for additionally arranging CG objects in the virtual space. FIG. 12 is a diagram in which the display control unit 16 displays the virtual space on the display screen of the display unit 17 as a screen observed from a predetermined viewpoint direction (for example, the imaging direction) as in FIG. 10.
In FIG. 12, the virtual space is reconstructed by reflecting the light source information of each of the light sources 200L_1 and 200L_1 with respect to the shape information of the CG object 503_1 corresponding to the CG object 503_1 arranged in the virtual space.

画像合成部１５は、配置したＣＧオブジェクト５０３＿１の形状情報と、光源２００Ｌ＿１及び２００Ｌ＿２の光源情報（配置位置、光の色味及び強度）とに対応し、光源２００Ｌ＿１及び２００Ｌ＿２からの照射光による明度を反映させたレンダリングを、ＣＧオブジェクト５０３＿１に対して行う。
これにより、画像合成部１５は、レイトレーシングなどの技法により、図１２に示すように、配置したＣＧオブジェクト５０３＿１の形状情報に対応し、光源２００Ｌ＿１及び２００Ｌ＿２からの光が照射される上部の明るさの強度を強くし、ＣＧオブジェクト５０３＿１の三次元形状により遮蔽された領域を求め影とし、影となる影領域５０３＿１Ｓの明るさの強度を低くする。 The image synthesizing unit 15 corresponds to the shape information of the arranged CG object 503_1 and the light source information (arrangement position, color and intensity of light) of the light sources 200L_1 and 200L_2, and determines the brightness due to the irradiation light from the light sources 200L_1 and 200L_2. The reflected rendering is performed on the CG object 503_1.
As a result, the image synthesizing unit 15 corresponds to the shape information of the arranged CG object 503_1 by a technique such as ray tracing, and the brightness of the upper part irradiated with the light from the light sources 200L_1 and 200L_2. The intensity of the CG object 503_1 is increased, the area shielded by the three-dimensional shape of the CG object 503_1 is obtained as a shadow, and the brightness intensity of the shadow area 503_1S to be a shadow is decreased.

また、画像合成部１５は、光源情報推定部１４が求めた仮想空間における天井２００Ｃ、壁２００Ｗ、床２００Ｆ、柱２００Ｐ及びテーブル２００Ｔなどの反射率情報及び三次元形状を用いて、照射光の反射成分による明度をレンダリングに反映させてもよい。
すなわち、画像合成部１５は、仮想空間における三次元形状と、光源２００Ｌ＿１及び２００Ｌ＿２の光源情報（配置位置、光の色味及び強度）とに対応し、光源２００Ｌ＿１及び２００Ｌ＿２からの照射光の反射光による、仮想空間に配置されたＣＧオブジェクトに対する明度の影響を、仮想空間のレンダリングに対して反映させる構成としてもよい。 Further, the image synthesizing unit 15 reflects the irradiation light by using the reflectance information and the three-dimensional shape of the ceiling 200C, the wall 200W, the floor 200F, the pillar 200P, the table 200T, etc. in the virtual space obtained by the light source information estimation unit 14. The brightness of the components may be reflected in the rendering.
That is, the image synthesizing unit 15 corresponds to the three-dimensional shape in the virtual space and the light source information (arrangement position, light tint and intensity) of the light sources 200L_1 and 200L_2, and the reflected light of the irradiation light from the light sources 200L_1 and 200L_2. The effect of the brightness on the CG object arranged in the virtual space may be reflected on the rendering of the virtual space.

図１３は、本実施形態による画像処理システムにおいて、対象空間の選択された撮像画像から生成した仮想空間と、ＣＧオブジェクトとを合成する処理の動作例を示すフローチャートである。以下の説明において、データ入出力部１１は、予め外部装置から供給される、対象空間の各々が撮像された単一あるいは複数の撮像画像を撮像画像記憶部１８に対して、撮像画像識別情報を付与して、撮像画像のそれぞれを撮像した際の撮像条件とともに書き込んで記憶させている。 FIG. 13 is a flowchart showing an operation example of the process of synthesizing the virtual space generated from the selected captured image of the target space and the CG object in the image processing system according to the present embodiment. In the following description, the data input / output unit 11 transmits the captured image identification information to the captured image storage unit 18 for a single or a plurality of captured images in which each of the target spaces is captured in advance, which is supplied from an external device. It is added, and each of the captured images is written and stored together with the imaging conditions at the time of imaging.

ステップＳ１：
データ入出力部１１は、表示制御部１６により表示部１７に対して、撮像画像記憶部１８に記憶されている撮像画像の表示を行う。
ユーザは、表示部１７に表示されている撮像画像の中から、観察対象とする対象空間の撮像画像をマウスによりクリックなどすることにより選択する。
これにより、データ入出力部１１は、撮像条件取得部１２、形状情報推定部１３、光源情報推定部１４及び画像合成部１５の各々に対して、ユーザが選択した撮像画像の撮像画像識別情報を出力する。 Step S1:
The data input / output unit 11 displays the captured image stored in the captured image storage unit 18 on the display unit 17 by the display control unit 16.
The user selects from the captured images displayed on the display unit 17 by clicking the captured image of the target space to be observed with the mouse or the like.
As a result, the data input / output unit 11 outputs the captured image identification information of the captured image selected by the user to each of the imaging condition acquisition unit 12, the shape information estimation unit 13, the light source information estimation unit 14, and the image synthesis unit 15. Output.

ステップＳ２：
撮像条件取得部１２は、ユーザが選択した撮像画像の撮像条件を、当該撮像画像の撮像画像識別情報により撮像画像記憶部１８から読み出す。
そして、撮像条件取得部１２は、読み出した撮像条件を、形状情報推定部１３及び光源情報推定部１４の各々に対して出力する。 Step S2:
The imaging condition acquisition unit 12 reads out the imaging conditions of the captured image selected by the user from the captured image storage unit 18 based on the captured image identification information of the captured image.
Then, the imaging condition acquisition unit 12 outputs the read imaging condition to each of the shape information estimation unit 13 and the light source information estimation unit 14.

ステップＳ３：
形状情報推定部１３は、撮像画像及び撮像条件の各々を用いて、すでに説明した三次元形状推定方法（１）から三次元形状推定方法（３）などの手法により、選択された撮像画像の対象空間の三次元形状を推定する。
そして、形状情報推定部１３は、生成した対象空間の三次元形状のデータを空間形状情報として、対応する撮像画像の撮像画像識別情報を付加して、空間情報記憶部１９に対して書き込んで記憶させる。 Step S3:
The shape information estimation unit 13 is an object of the captured image selected by a method such as the three-dimensional shape estimation method (1) to the three-dimensional shape estimation method (3) already described using each of the captured image and the imaging condition. Estimate the three-dimensional shape of space.
Then, the shape information estimation unit 13 adds the captured image identification information of the corresponding captured image to the spatial information storage unit 19 by using the generated three-dimensional shape data of the target space as the spatial shape information and stores it. Let me.

ステップＳ４：
光源情報推定部１４は、空間情報記憶部１９から撮像画像識別情報に対応した空間形状情報を読み出す。
そして、光源情報推定部１４は、撮像画像及び空間形状情報の各々を用いて、すでに説明した光源情報推定方法（１）から光源情報推定方法（３）などのいずれかの手法により、選択された撮像画像に対応した仮想空間における光源情報の推定を行う。光源情報推定部１４は、推定した光源情報を、空間形状情報に対応させて空間情報記憶部１９に対して書き込んで記憶させる。 Step S4:
The light source information estimation unit 14 reads out the spatial shape information corresponding to the captured image identification information from the spatial information storage unit 19.
Then, the light source information estimation unit 14 is selected by any method from the light source information estimation method (1) to the light source information estimation method (3), which has already been described, using each of the captured image and the spatial shape information. Estimate the light source information in the virtual space corresponding to the captured image. The light source information estimation unit 14 writes and stores the estimated light source information in the spatial information storage unit 19 in correspondence with the spatial shape information.

ステップＳ５：
画像合成部１５は、空間情報記憶部１９から撮像画像識別情報に対応した空間形状情報及び光源情報の各々を、また撮像画像記憶部１８から撮像画像識別情報に対応した撮像画像を読み出す。
画像合成部１５は、空間形状情報における三次元形状に対して、光源情報及び撮像画像とを用いて、撮像画像に撮像された対象空間に対応する仮想空間を生成する。
そして、表示制御部１６は、上記仮想空間における視点の位置から観察される観察画像を生成し、図１０に示すように表示部１７の表示画面に表示する。 Step S5:
The image synthesizing unit 15 reads out each of the spatial shape information and the light source information corresponding to the captured image identification information from the spatial information storage unit 19, and reads out the captured image corresponding to the captured image identification information from the captured image storage unit 18.
The image synthesizing unit 15 uses the light source information and the captured image for the three-dimensional shape in the spatial shape information to generate a virtual space corresponding to the target space captured in the captured image.
Then, the display control unit 16 generates an observation image observed from the position of the viewpoint in the virtual space and displays it on the display screen of the display unit 17 as shown in FIG.

ステップＳ６：
例えば、ユーザが、表示部１７の表示画面に表示されている観察画像に対応する仮想空間に対して、ＣＧオブジェクトの追加（または変更）の配置を行う制御を画像処理システム１０に対して行う。
これにより、画像合成部１５は、合成画像記憶部２０に予め記憶されているＣＧオブジェクトの画像のデータを読み込む。 Step S6:
For example, the user controls the image processing system 10 to add (or change) a CG object to the virtual space corresponding to the observation image displayed on the display screen of the display unit 17.
As a result, the image synthesizing unit 15 reads the image data of the CG object stored in advance in the synthesizing image storage unit 20.

そして、画像合成部１５は、図１１に示すように、読み出したＣＧオブジェクトの画像を、表示部１７の表示画面におけるＣＧオブジェクト提示領域５０３に対して表示する。
ユーザがＣＧオブジェクト提示領域５０３におけるＣＧオブジェクト５０３＿１を選択してドラッグし、仮想空間における床２００Ｆにおける配置位置５０４にドロップし、仮想空間に対するＣＧオブジェクト５０３＿１を追加する配置を行う。 Then, as shown in FIG. 11, the image synthesizing unit 15 displays the read image of the CG object on the CG object presentation area 503 on the display screen of the display unit 17.
The user selects and drags the CG object 503_1 in the CG object presentation area 503, drops it at the arrangement position 504 on the floor 200F in the virtual space, and adds the CG object 503_1 to the virtual space.

ステップＳ７：
画像合成部１５は、仮想空間における配置位置５０４に対して、ＣＧオブジェクト５０３＿１を新たに配置する合成処理を行う。
そして、画像合成部１５は、仮想空間の三次元形状と、ＣＧオブジェクト５０３＿１の三次元形状と、光源２００Ｌ＿１、２００Ｌ＿２の光源情報とに基づき、新たにレンダリングを行うことにより仮想空間の再構成を行う。 Step S7:
The image composition unit 15 performs a composition process for newly arranging the CG object 503_1 with respect to the arrangement position 504 in the virtual space.
Then, the image synthesizing unit 15 reconstructs the virtual space by newly rendering based on the three-dimensional shape of the virtual space, the three-dimensional shape of the CG object 503_1, and the light source information of the light sources 200L_1 and 200L_1. ..

ここで、画像合成部１５は、対応する撮像画像の撮像画像識別情報に対応させて、再構成した仮想空間の三次元形状のデータを、合成画像記憶部２０に対して書き込んで記憶させる。
表示制御部１６は、再構成された仮想空間における視点の位置から観察される観察画像を生成し、図１２に示すように表示部１７の表示画面に表示する。 Here, the image synthesizing unit 15 writes and stores the three-dimensional shape data of the reconstructed virtual space in the synthesizing image storage unit 20 in correspondence with the captured image identification information of the corresponding captured image.
The display control unit 16 generates an observation image observed from the position of the viewpoint in the reconstructed virtual space, and displays it on the display screen of the display unit 17 as shown in FIG.

ステップＳ８：
データ入出力部１１は、ＣＧオブジェクトの配置（追加及び変更を含む）を続けて行うか否かを入力する入力表示領域（不図示）を表示部１７の表示画面に表示する。
そして、ユーザが、ＣＧオブジェクトの配置を続ける情報を上記入力表示領域に対して入力した場合、データ入出力部１１は処理をステップＳ６へ進める。
一方、ユーザが、ＣＧオブジェクトの配置を終了する情報を入力表示領域に入力した場合、データ入出力部１１は処理を終了する。 Step S8:
The data input / output unit 11 displays an input display area (not shown) for inputting whether or not to continuously arrange (including additions and changes) CG objects on the display screen of the display unit 17.
Then, when the user inputs the information for continuing the arrangement of the CG object to the input display area, the data input / output unit 11 advances the process to step S6.
On the other hand, when the user inputs the information for terminating the arrangement of the CG object into the input display area, the data input / output unit 11 ends the process.

上述したように、本実施形態によれば、上述した光源情報推定方法（１）から光源情報推定方法（３）により、撮像画像及び仮想空間の三次元形状に基づいて光源情報を取得することが可能であるため、従来のように形状が既知の物体を準備し、かつこの物体の画像を既知光源下で撮像して基準画像を準備する必要がなく、簡易に対象空間における光源の光源情報を得ることができる。 As described above, according to the present embodiment, it is possible to acquire the light source information based on the captured image and the three-dimensional shape of the virtual space by the light source information estimation method (3) from the light source information estimation method (1) described above. Since it is possible, it is not necessary to prepare an object having a known shape as in the past and to prepare a reference image by capturing an image of this object under a known light source, and simply obtain light source information of the light source in the target space. Obtainable.

また、本実施形態によれば、画像合成部１５が対象空間の三次元形状と、対象空間における光源情報と、ＣＧオブジェクトの三次元形状とにより、新たにＣＧオブジェクトが配置された仮想空間のレンダリングを行うため、ＣＧオブジェクトに対する光源の放射光の影響、ＣＧオブジェクトの三次元形状により遮蔽されて隠れる領域、及びＣＧオブジェクトの三次元形状により光源からの放射光が遮蔽される影の領域を反映させて、仮想空間と当該仮想空間に追加配置されたＣＧオブジェクトとを合成した仮想空間（再構成した仮想空間）をレンダリングすることにより、現実の対象空間における光源環境と同様の光源環境において、当該対象空間に対して物体（ＣＧオブジェクトに対応した物体）を配置した状態を仮想的に生成して、物体を配置した対象空間の見えを観察画像として容易に観察することができる。 Further, according to the present embodiment, the image synthesizing unit 15 renders a virtual space in which a CG object is newly arranged by the three-dimensional shape of the target space, the light source information in the target space, and the three-dimensional shape of the CG object. The effect of the light emitted from the light source on the CG object, the area shielded and hidden by the three-dimensional shape of the CG object, and the shadow area where the radiation from the light source is shielded by the three-dimensional shape of the CG object are reflected. Then, by rendering a virtual space (reconstructed virtual space) in which the virtual space and the CG object additionally arranged in the virtual space are combined, the target is in a light source environment similar to the light source environment in the actual target space. It is possible to virtually generate a state in which an object (an object corresponding to a CG object) is arranged in space, and easily observe the appearance of the target space in which the object is arranged as an observation image.

また、本実施形態によれば、仮想空間の構造部及びオブジェクト部の各々の三次元形状の反射率情報及びＣＧオブジェクトの各々の反射率情報を用いて、光源からの放射光の三次元形状の反射（二次反射）をレンダリングに反映させることで、仮想空間における見えを現実の対象空間により近似させることができる。 Further, according to the present embodiment, the reflectance information of each of the three-dimensional shapes of the structural part and the object part of the virtual space and the reflectance information of each of the CG objects are used to form the three-dimensional shape of the light emitted from the light source. By reflecting the reflection (secondary reflection) in the rendering, the appearance in the virtual space can be approximated to the actual target space.

また、本実施形態によれば、仮想空間及びＣＧオブジェクトの各々の三次元情報を実寸サイズにスケーリングすることにより、ＣＧオブジェクトを配置した仮想空間の見えを、対象空間に物体を配置した場合の正確なサイズ感覚の質感を有する観察画像として視認することができる。 Further, according to the present embodiment, by scaling the three-dimensional information of each of the virtual space and the CG object to the actual size, the appearance of the virtual space in which the CG object is arranged is accurate when the object is arranged in the target space. It can be visually recognized as an observation image having a texture with a sense of size.

また、本実施形態によれば、単一の撮像画像あるいは複数の撮像画像を用いて対象空間の三次元形状を推定して仮想空間を生成しているため、所定の範囲における対象空間を観察する視点を任意に変化させることが可能であり、仮想空間におけるＣＧオブジェクトを配置した領域の見えを、視点を変化させた観察画像として観察することができる。 Further, according to the present embodiment, since the virtual space is generated by estimating the three-dimensional shape of the target space using a single captured image or a plurality of captured images, the target space in a predetermined range is observed. The viewpoint can be changed arbitrarily, and the appearance of the area where the CG object is arranged in the virtual space can be observed as an observation image in which the viewpoint is changed.

また、上述した実施形態においては、予め撮像されて撮像画像記憶部１８に書き込まれている撮像画像を用いた、仮想空間の生成及びこの仮想空間に対するＣＧオブジェクトの合成を行う画像処理について説明した。
しかしながら、データ入出力部１１から、撮像装置から供給される対象空間を逐次的に撮像した撮像画像（単一または複数の撮像画像）をリアルタイムに撮像画像記憶部１８に書き込む構成としてもよい。この場合、逐次的に供給される単一または複数の撮像画像により仮想空間を生成し、この仮想空間に対してＣＧオブジェクトを合成する処理を行い、合成処理を行った仮想空間の観察画像を、リアルタイムに表示部１７に表示してユーザに視認させることができる。 Further, in the above-described embodiment, image processing for generating a virtual space and synthesizing a CG object with respect to the virtual space using the captured image captured in advance and written in the captured image storage unit 18 has been described.
However, the data input / output unit 11 may be configured to write the captured images (single or plurality of captured images) that sequentially capture the target space supplied from the imaging device to the captured image storage unit 18 in real time. In this case, a virtual space is generated from a single or a plurality of captured images that are sequentially supplied, a process of synthesizing a CG object is performed on the virtual space, and an observation image of the combined virtual space is displayed. It can be displayed on the display unit 17 in real time so that the user can visually recognize it.

以上、本発明の実施形態を図面を参照し説明してきたが、具体的な構成はこの形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計なども含まれる。 Although the embodiments of the present invention have been described above with reference to the drawings, the specific configuration is not limited to this mode, and includes designs within a range that does not deviate from the gist of the present invention.

なお、本発明における図１の画像処理システム１０の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより撮像画像から生成した仮想空間に対してＣＧオブジェクトを合成する処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。 A program for realizing the function of the image processing system 10 of FIG. 1 in the present invention is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed. A process of synthesizing a CG object with the virtual space generated from the captured image may be performed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices.

また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 Further, the "computer system" shall also include a WWW system provided with a homepage providing environment (or display environment). Further, the "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Furthermore, a "computer-readable recording medium" is a volatile memory (RAM) inside a computer system that serves as a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, it shall include those that hold the program for a certain period of time.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the program may be transmitted from a computer system in which this program is stored in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the "transmission medium" for transmitting a program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. Further, the above program may be for realizing a part of the above-mentioned functions. Further, a so-called difference file (difference program) may be used, which can realize the above-mentioned functions in combination with a program already recorded in the computer system.

１０…画像処理システム
１１…データ入出力部
１２…撮像条件取得部
１３…形状情報推定部
１４…光源情報推定部
１５…画像合成部
１６…表示制御部
１７…表示部
１８…撮像画像記憶部
１９…空間情報記憶部
２０…合成画像記憶部 10 ... Image processing system 11 ... Data input / output unit 12 ... Imaging condition acquisition unit 13 ... Shape information estimation unit 14 ... Light source information estimation unit 15 ... Image synthesis unit 16 ... Display control unit 17 ... Display unit 18 ... Captured image storage unit 19 … Spatial information storage unit 20… Composite image storage unit

Claims

A shape information estimation unit that estimates spatial shape information including at least a three-dimensional shape of the target space from a single or a plurality of captured images in which the target space is captured.
An image processing system including a light source information estimation unit that estimates light source information including at least the position and intensity of a light source in the target space from the captured image and the space shape information.

The light source information estimation unit
The image according to claim 1, wherein the reflectance information of the three-dimensional shape in the target space is estimated from the captured image and the spatial shape information, and the light source information is estimated using the reflectance information. Processing system.

The light source information estimation unit uses the reflectance information estimated from a predetermined formula showing the relationship between the pixel value of the captured image and each of the reflectance information, the light source information, and the spatial shape information. The image processing system according to claim 2, wherein the light source information is estimated.

An image compositing unit that generates a composite image of a virtual space in which a shape model of a predetermined object is arranged at a predetermined position of the captured image from the captured image, the spatial shape information, and the light source information is further provided. The image processing system according to any one of claims 1 to 3, wherein the image processing system is characterized.

From the captured image, the spatial shape information, and the light source information, a shape model of the predetermined object is arranged at a predetermined position of the captured image, and the influence of arranging the shape model is reflected in the virtual space. The image processing system according to any one of claims 1 to 4, further comprising an image compositing unit for generating an observation image.

The light source information estimation unit
The image according to any one of claims 1 to 5, wherein IBL (image based lighting) information is generated from the reflectance information of the three-dimensional shape, the spatial shape information, and the light source information. Processing system.

The shape information estimation unit
Each of the three-dimensional shapes of the structural part and the object part in the target space is estimated separately, and the three-dimensional shapes of the structural part and the object part are combined to generate a three-dimensional shape model of the entire target space. The image processing system according to any one of claims 1 to 6, wherein the image processing system is characterized by the above.

The shape information estimation unit
The image processing system according to any one of claims 1 to 7, wherein the three-dimensional shape of the target space and the shape information of the light source including the object recognition information in the target space are estimated.

The image processing system according to any one of claims 1 to 8, further comprising an imaging condition acquisition unit that acquires an imaging condition when the captured image is captured.

A shape information estimation process in which the shape information estimation unit estimates spatial shape information including at least a three-dimensional shape of the target space from a single or a plurality of captured images in which the target space is captured.
An image processing method comprising a light source information estimation process in which a light source information estimation unit estimates light source information including at least the position and intensity of a light source in the target space from the captured image and the space shape information.

Computer,
A shape information estimation means that estimates spatial shape information including at least a three-dimensional shape of the target space from a single or a plurality of captured images in which the target space is captured.
A program that functions as a light source information estimation means that estimates light source information including at least the position and intensity of a light source in the target space from the captured image and the space shape information.