CN114972617B

CN114972617B - Scene illumination and reflection modeling method based on conductive rendering

Info

Publication number: CN114972617B
Application number: CN202210712261.0A
Authority: CN
Inventors: 施柏鑫; 于博涵; 杨思祺; 崔轩宁; 董思言; 陈宝权
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2022-06-22
Filing date: 2022-06-22
Publication date: 2023-04-07
Anticipated expiration: 2042-06-22
Also published as: CN114972617A

Abstract

The invention discloses a scene illumination and reflection modeling method based on conductive rendering, which is characterized in that an indoor scene reverse rendering model with multiple light emitters and multiple reflections is designed, geometric constraint is increased, an image with a new visual angle can be generated, the model can well describe common light sources in the indoor scene, the illumination attributes can be really separated, an environment re-illumination task is well realized, a gradient descent method is easy to recover, and the application range is wide; meanwhile, the invention tracks the light path for multiple times, analyzes three ambiguity problems existing in the process of tracking for multiple times by using a light-conducting ray tracking method, designs three ambiguity elimination methods respectively to solve the ambiguity elimination problems, and has more real recovery effect.

Description

Scene illumination and reflection modeling method based on conductive rendering

Technical Field

The invention relates to the technical field of computational vision, in particular to a method for modeling illumination and reflection of a scene based on a conductive rendering.

Background

With the development of computer technology, computer computing power is gradually strengthened, machine learning and deep learning technologies are rapidly advanced, and computer vision related technologies are gradually applied to various scenes, such as human face detection, image beautifying, night photographing and other functions of a mobile phone camera, pedestrian detection and road recognition in unmanned driving, human face recognition of mobile payment and station identity detection, or synchronous positioning and image construction tasks of robots and the like. With the coming of big data and intelligent times, more and more application scenes need the support of the computer vision technology, massive video and image data are in urgent need of processing, and the application of the computer vision technology is gradually expanded from two dimensions to three dimensions. Therefore, the three-dimensional parameters of the real world are restored through the two-dimensional information, so that the method has great significance for realizing high-level applications such as new visual angle image generation, virtual object insertion and material editing, and is widely concerned by society.

The Inverse Rendering (IR) can recover the illumination and material information of the real world, and render the information in combination with the corresponding physical model, so that many functions such as new view image generation, virtual object insertion and material editing can be realized. Reverse rendering can be conducted, and as a leading-edge research in computational photography, its development is extremely important for other computer vision technologies. Traditional methods use intrinsic image decomposition to achieve the inverse rendering problem, which decomposes the photograph into reflectance and shadow maps. Recent work on reverse rendering of indoor scenes has proposed some new ways of representing illumination. Incident illumination at each point is represented, for example, using a point source model, a Spatially Varying Spherical harmonic function (SVSH), a Spatially Varying Spherical Gaussian function (SVSG). Such methods can more accurately describe the direction of incident light than traditional shadow mapping. These methods also use various bi-directional Reflectance Distribution Function (BRDF) models, which better describe the specular reflection characteristics of a surface than a Reflectance map. The method greatly improves the inverse rendering effect of the complex scene. There are also some new works that use a point source model to describe the light source, but only consider the direct reflection of the light source, ignoring the multiple reflection effect. Meanwhile, the method lacks sufficient geometric constraint, cannot realize a new visual angle image generation task, cannot really separate a light source, cannot realize an environment relighting task, and has certain limitation.

Among the commonly used rendering methods in computer graphics, the monte carlo path tracing algorithm can well consider multiple reflections and accurately simulate light transmission, and is a rendering method based on physics. However, it is not easy to realize inverse rendering by using the monte carlo path tracking algorithm, that is, recovering the light source and surface reflectivity parameters from the image, and making the parameters to be recovered consistent with the real scene as much as possible, and obtaining the same effect as the real photo in rendering. In order to realize the inverse rendering based on the monte carlo path tracking method, an appropriate light source model is needed. Since the monte carlo path tracking method is a physics-based rendering method, the light source model used needs to be able to describe physically luminous objects, rather than intermediate incident illumination representations like SVSH, SVSG or shadow maps. Meanwhile, the light source model needs to be directly visible and easy to solve by using a gradient descent optimization method. Existing light source models, such as point light sources, directional light, spotlights, and global environment maps, do not meet these requirements. In addition, the existing method usually only considers single reflection of the optical path, which makes the complex optical phenomenon unable to recover, if multiple reflections are considered, then it is necessary to introduce variables describing spontaneous light and reflection into each point geometrically, no matter whether the points are visible under the current view angle, so the number of unknown variables needing to be optimized is greatly increased. Excessive unknowns lead to ambiguity in solving the inverse rendering problem, and the recovery effect is of low quality and is not stable enough.

Disclosure of Invention

Aiming at the technical problems, the invention provides a method for modeling illumination and reflection of a scene based on a conductive rendering.

In order to achieve the above purpose, the invention provides the following technical scheme:

a method for modeling scene illumination and reflection based on conductive rendering comprises the following steps:

s1, training data acquisition: testing by using a synthetic data set and a real data set, selecting a PBRS data set and an AI2-THOR data set by the synthetic data, converting a specific scene in the PBRS data set and the AI2-THOR data set into an input format of a renderable device based on a Monte Carlo ray tracing algorithm, and performing physical rendering again to obtain a new scene as synthetic data; the real data comprises scanning geometry collected by a depth camera and photos collected by a panoramic camera;

s2, data tuning alignment: combining the photos collected in the real data into HDR photos by a Debevec method, placing a plurality of labels Tag36h11 of an AprilTag visual reference system in a scene, respectively identifying the labels in scanning geometry and shooting photos, wherein each photo at least comprises 4 labels, aligning the photos collected by a panoramic camera with the geometry collected by a depth camera, and assisting the registration of the pose of the camera during three-dimensional scanning;

s3, material illumination parameter generation: inputting the geometric information into an indoor scene inverse rendering model with multiple luminophors for reflection, and generating six parameter maps, wherein the six parameter maps are respectively as follows: diffuse reflectance map, specular reflectance map, self-luminous map, ambient light map, window map, and roughness map; firstly, carrying out UV expansion on a geometric figure obtained by a depth camera by using an indoor scene reverse rendering model with multi-reflection of luminous bodies, calculating XYZ coordinates in a three-dimensional world corresponding to each pixel on a UV mapping, and generating six parameter mappings by using the XYZ coordinates as input of an MLP network; when the network is used to generate the map, a numerical loss item is added

S4, picture rendering: for each pixel in the picture, transmitting the position and pose of a camera into a guidable renderer based on a Monte Carlo ray tracing algorithm to generate a rendered picture; the guidable renderer based on the Monte Carlo ray tracing algorithm restores the rendered photos to the radiation values observed along the ray direction through the internal and external parameter matrixes and the response curve of the camera, and the cook-torance reflection model is approximately solved by using the Monte Carlo ray tracing algorithm, wherein the rendering equation is as follows:

wherein, ω is _r For reflecting the direction of the light, omega _i Is the direction of the incident ray, x is the intersection point of the incident ray and the scene surface, y is the point of the light source of the incident ray, and L is the direction omega of the light source at the point of y _i E is a model of the light source, F is a model of the two-way reflection distribution function, and theta is the incident light and the surfaceThe included angle of the normal line is omega, which is the upper hemispherical surface of the integral;

s5, supervised learning: the radiance of a given ray is rendered with a conductive renderer based on the monte carlo ray tracing algorithm and compared to the true radiance observed in the photograph using the following loss function:

wherein, ω is _j Is the direction of the j ray, x _j As a starting point for light, L (omega) _j ,x _j ) Is from x _j Starting from the final calculated radiation value of the jth ray,

for each radiation ray's true radiation value, k is the order of the paradigm and M is the total number of observed radiation rays.

Further, in step S1, when using the composite dataset, a plurality of virtual cameras are placed into each scene and an image used as a true value is rendered using a conductable renderer r based on the monte carlo ray tracing algorithm.

Further, in the step S1, the depth camera and the three-dimensional scanning software are used to acquire the geometric model of the indoor scene, the 360-degree panoramic camera is used to photograph the photo part, the multiple exposure method is used to acquire the HDR image during photographing, and 7 groups of exposure time with 2 as the base number and increasing in an equal ratio sequence are selected according to different brightness of different scenes.

Further, step S2 selects 20 to 40 camera pose points for training an inverse rendering model line of the indoor scene with illuminant multi-reflections.

Further, in step S3, when the indoor scene inverse rendering model with illuminant multi-reflection performs UV unfolding on the geometric figure, a position code γ (x) is added, the coordinates of points in the scene are converted into a code vector, the coordinates of the points are used as an input of the neural network instead of the coordinates of the points, and x represents the coordinates of the intersection point of the incident light and the scene surface.

Further, the MLP network in step S3 adopts a four-layer fully-connected MLP network structure, wherein the number of neurons in the input layer is 15; the MLP network comprises two hidden layers, each hidden layer is provided with 1024 neurons, the output end of each neuron is added with a ReLU activation function, and batch normalization is used between the layers; the output of the network comprises Q neurons, and the output numerical value is changed into probability distribution through a SoftMax function.

Further, in step S3, six parameter maps are calculated according to the following formula:

wherein the content of the first and second substances,

11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, P _i (x) Is the probability that the current point belongs to the i-th class of material, m _i Q is the number of different material types, and x represents the coordinate of the intersection point of the incident ray and the scene surface.

Further, a detail texture map is added to the parameter map, and the formula is as follows:

wherein k is _m The 11-dimensional parameter vector optimized for adding the texture details comprises diffuse reflection parameters, specular reflection parameters, reflectivity parameters, self-luminous parameters and window parameters,

11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k _t For texture details, i.e. 11-dimensional parameter vectors, including diffuse reflection parameters, specular reflection parameters, reflectivity parameters, self-luminescence parameters, window parameters, and k for the same time _t Add l1 regularization constraints.

Further, in step S3, the numerical loss term

The formula is as follows:

/>

wherein, the first and the second end of the pipe are connected with each other,

to be a constraint on the values of the various parameter maps learned, in order to limit them to a reasonable range before rendering,. Epsilon.0.001, in order to prevent the occurrence of extreme values, k _d For diffuse reflectance mapping, k _s Is a specular reflection map, k _t For texture details, also 11-dimensional parameter vectors, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k _e Is a self-luminous sticking picture, k _g For ambient light mapping, k _w For window mapping, x represents the coordinates of the point where the incident ray intersects the scene surface.

Further, in step S4, based on the hook-torrance model, the MILO-Renderer performs rendering by using six parameter maps generated by an indoor scene inverse rendering model with illuminant multi-reflection, where the formula is as follows:

E(ω,x)＝k _e (x)+k _w (x)·k _g (ω)

F(ω _i ,ω _r ,x)＝f _r (ω _i ,ω _r ,x,k _d ,k _s ,k _a )

where E (ω, x) represents the self-luminous radiation value at the point where the ray with the direction ω intersects with the object in the scene from the point x, ω is the direction of the radiated ray, x is the starting point, k _e Is a self-luminous sticking picture, k _g For ambient light mapping, k _w Pasting a picture on the window; f (omega) _i ,ω _r X) represents the BRDF reflection model, at x, the angle is omega _i To an angle of omega _r In the conversion of the outgoing light, omega _r Is the direction of the emergent ray, omega _i Direction of incident light, f _r Is the BRDF reflection function, r is the emergent ray, k _d For diffuse reflectance mapping, k _s Is a mirror reflection map, k _a For roughness mapping, N is the surface normal direction,

for the normal distribution function, a normal distribution, for the micro-surface model, is described>

Describes the reflectivity and the refractive index of a light ray passing through the surface of an object, based on the Fresnel reflection function>

Describing the self-shielding degree of the object surface for a geometric shielding function, wherein H is the angular bisector direction of incident light and emergent light, L (omega, x) represents the radiation intensity value of the light with the direction omega from the point x, S is the sampling number, B is the maximum reflection time of the light path, S is the sampling time of the S, B is the reflection time of the light path B, omega is the reflection time of the light path B, and _b,r is the exit direction of the b-th reflection of the optical path, omega _b,i Incident direction of the b-th reflection of the optical path, x _b And theta is the intersection point of the b-th reflection of the light path and the object, and is the included angle between the incident ray and the surface normal.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a Scene illumination and reflection modeling method based on conductive Rendering, which designs an Indoor Scene Inverse Rendering (MILO-Net) model with multiple luminophors and multiple reflections, increases geometric constraint and can generate images with new visual angles, the model can well describe common Light sources in Indoor scenes, such as ceiling lamps, table lamps, windows and the like, can truly separate illumination attributes, well realize an environment re-illumination task, is easy to recover by using a gradient descent method, and has wide application range;

2. according to the scene illumination and reflection modeling method based on the conductive rendering, provided by the invention, the light path is traced for many times, three ambiguity problems existing in the process of tracing for many times by using the conductive ray tracing method are analyzed, three ambiguity elimination methods are respectively designed to solve the ambiguity elimination problems, the recovery effect is more real, and the ambiguity elimination methods are finally integrated into a material parameter generation network MILO-Net and are mainly embodied in the design of an MILO-Net network structure;

3. the invention provides a scene illumination and reflection modeling method based on a guided rendering, which designs a guided Renderer (MILO-Renderer) based on a Monte Carlo ray tracing algorithm, wherein the Renderer is integrated with a luminous source model designed by the inventor and is used for supervising MILO-Net training and learning.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to these drawings.

Fig. 1 is a flowchart of a method for modeling illumination and reflection of a scene based on a guided rendering according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of mil-Net according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a system according to an embodiment of the present invention.

Detailed Description

For a better understanding of the present solution, the method of the present invention is described in detail below with reference to the accompanying drawings.

The method for modeling scene illumination and reflection based on guidable rendering provided by the embodiment of the invention has the flow shown in fig. 1, and mainly comprises five steps of training data acquisition, data tuning alignment, material illumination parameter generation, picture rendering and supervised learning, and specifically comprises the following steps:

s1, training data acquisition: the test is performed using the synthetic dataset and the real dataset. Selecting a PBRS data set and an AI2-THOR data set for the synthetic data, converting a specific scene in the PBRS data set and the AI2-THOR data set into an input format of an MILO-renderer, and performing physical rendering again to obtain a new scene serving as the synthetic data; when using a composite dataset, multiple virtual cameras are placed into each scene and images used as true values are rendered using the MILO-render. The real data comprises scanning geometry collected by a depth camera and photos collected by a panoramic camera; for real data, a depth camera and three-dimensional scanning software are used for acquiring a geometric model of an indoor scene, a 360-degree panoramic camera is used for photographing a photo part, a multi-exposure method is used for acquiring HDR images during photographing, and 7 groups of exposure time with the base number of 2 and increasing geometric number series are selected according to different brightness of different scenes.

S2, data tuning alignment: the method comprises the steps of combining the pictures collected in real data into HDR pictures through a Debevec method, and meanwhile, placing a plurality of labels Tag36h11 of April Tag visual reference system in a scene in order to align the pictures collected by a panoramic camera with the geometry collected by a depth camera, wherein April Tag is a visual reference system and can be used for various tasks including augmented reality, robot and camera calibration. The target can be created from a regular printer and the AprilTag detection software calculates the precise 3D position, orientation and identity of the tag relative to the camera. In the experiments of the present invention, the tags were used to assist in the reconstruction of panoramas, which were identified separately in the scan geometry and in the captured photographs, each containing at least 4 tags to ensure unique camera position determination, to align the photographs captured by the panoramic camera with the geometry captured by the depth camera, and to assist in registration of the camera pose during three-dimensional scanning.

S3, material illumination parameter generation: and recovering six parameter maps related to material illumination through the input two-dimensional picture and the geometric information. To resolve multiple reflection ambiguities, the present invention requires that the total number of unknowns to be solved not exceed the total number of constraints; in order to solve the ambiguity of self-luminescence-diffuse reflection, the invention introduces limited material constraint, under the constraint, a small number of groups of material parameters can be easily determined by a plurality of groups of radiation values observed from the image; to address the specular reflection uncertainty problem, the present invention implements a reflectivity similarity constraint and a limited material constraint. These constraints are uniformly integrated into the MILO-Net network. Specifically, inputting the geometric information into MILO-Net to generate six parameter maps, wherein the six parameter maps are respectively: diffuse reflectance map, specular reflectance map, self-luminous map, ambient light map, window map, and roughness map; the MILO-Net firstly carries out UV expansion on a geometric figure obtained by a depth camera, and calculates the three-dimensional corresponding to each pixel on a UV mapUsing XYZ coordinates in the world as input of MLP network to generate six parameter maps, and adding a numerical loss term when using the network to generate the maps

Wherein, the parameter domain variables of the parameter surface are generally expressed by UV letters, such as the parameter surface F (u, v). For a triangular mesh, it is parameterized if it can be mapped to a parametric plane, which is the UV unfolding. Each vertex has a uv parameter value, which is also called texture coordinate.

S4, picture rendering: for each pixel in the picture, transmitting the pixel into an MILO-Renderer in combination with a camera pose to generate a rendered picture; the MILO-render recovers the camera internal and external parameter matrix and the response curve into the radiometric value observed along the light direction, and the look-torrance reflection model is approximately solved by using a Monte Carlo ray tracing algorithm, and the rendering equation is as follows:

wherein, ω is _r For reflecting the direction of the light, omega _i Is the direction of the incident ray, x is the intersection point of the incident ray and the scene surface, y is the point where the light source of the incident ray is located, and L is the direction omega emitted by the light source at the point y _i The light intensity of (1) is the light source model, the light intensity of (E) is the light source model, the light intensity of (F) is the bidirectional reflection distribution function model, theta is the included angle between the incident light and the surface normal, and omega is the upper hemispherical surface of the integral.

S5, supervised learning: the radiance of a given ray was obtained by means of MILO-render rendering and compared to the true radiance observed in the photograph using the following loss function:

wherein, ω is _j Is the direction of the j ray, x _j For the starting point of light, L (omega) _j ,x _j ) Is from x _j Starting from the final calculated radiation value of the jth ray,

for the true radiation value of each radiation ray, k is the order of the paradigm, and M is the total number of observed radiation rays. />

The invention adopts the HDR data obtained in the steps S1 and S2 to train the neural network MILO-Net, and supervises and trains by using the pictures generated by the Renderer MILO-render, and the specific training process is as follows:

1) Obtaining a training data set

(1) For synthetic data, the public PBRS data set and AI2-THOR data set are downloaded over the network.

(2) Selecting two scenes meeting the requirements from the PBRS data set as synthetic data, selecting two scenes from the AI2-THOR data set, converting the two scenes into an input format of an MILO-Render, and performing physical rendering again to obtain a new scene as the synthetic data. Using the dataset, we manually placed multiple virtual cameras into each scene and rendered the images used as true values using the MILO-render.

(3) For real data, an Intel RealSense D455 depth camera and Dot3D Pro three-dimensional scanning software are used for acquiring a model of an indoor scene, and a RICOH THETA Z1 360-degree panoramic camera is used for photographing. During photographing, a HDR image is acquired by adopting a multi-exposure method, 7 groups of exposure time with the base of 2 and increasing in an equal ratio sequence are selected according to different brightness of different scenes, and the HDR image is combined by using a Debevec method.

(4) Several Tag36h11 apriltags are placed in the scene and these tags are identified in the scan geometry and the shot respectively to align the picture taken by the panoramic camera with the geometry taken by the depth camera. In order to ensure that the position of the camera is uniquely determined, at least 4 labels are required to be contained in each photo, and the labels are also used for assisting the registration of the pose of the camera in three-dimensional scanning. For each synthetic and real scene, we select 20 to 40 camera pose points for training the network, depending on the complexity of the scene layout.

2) Implementation of MILO-Net

(1) Firstly, the geometric figure obtained by the depth camera is subjected to UV expansion, XYZ coordinates in a three-dimensional world corresponding to each pixel on a UV map are calculated, and the XYZ coordinates are used as the input of the MLP. In addition, a position code γ (x) is further added to enhance the ability of the MLP to fit to different frequency information. The encoding converts point coordinates in a scene into encoding vectors, and replaces the point coordinates to serve as input of a neural network.

(2) A four-layer fully-connected MLP network structure is designed, wherein the number of neurons of an input layer is 15. The network comprises two hidden layers, each hidden layer comprises 1024 neurons, a ReLU activation function is added to the output end of each neuron, and Batch Normalization (Batch Normalization) is used between the layers. The output of the network comprises Q neurons, and the output numerical value is changed into probability distribution through a SoftMax function. Through the neural network, a mapping from three-dimensional point coordinates to a probability distribution that the point belongs to a certain material is established.

(3) Six parameter maps were calculated according to the following formula:

11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, P _i (x) Is the probability that the current point belongs to the i-th class of material, m _i Q is the number of the ith material among the predetermined material types and the number of the different materials.

The parameter map generated by the method can be guaranteed to be only composed of Q different materials, and therefore the mirror reflection similarity constraint is achieved. Consider that real-world textures tend to contain rich high-frequency information that is difficult to fit by MLPs. In order to make the generated map contain the detail texture of these high frequencies, further, the invention adds the detail texture map to the parameter map, and the formula is as follows:

11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k _t For texture details, i.e. 11-dimensional parameter vectors, including diffuse reflection parameters, specular reflection parameters, reflectivity parameters, self-luminescence parameters, window parameters, and k for the same time _t Adding l1 regularization constraint to prevent low frequency information in scene from being added to k during learning _t In (1).

(4) When using network generated maps, it is desirable to avoid that the network generation contains illegal values. When a map contains illegal or extreme values, problems of rendering errors, gradient disappearance, or explosion may occur. For this purpose, the invention adds a numerical loss term

Make the parameter map generated by the network in a reasonable value range, and the value loss item->

The formula is as follows:

wherein the content of the first and second substances,

in order to constrain the values of the various learned parameter maps in order to limit them to a reasonable range before rendering,. Epsilon.is a very small value, set to 0.001 in the present invention, in order to prevent the occurrence of extreme values, k _d For diffuse reflectance mapping, k _s Is a mirror reflection map, k _t For texture details, also 11-dimensional parameter vectors, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k _e Is a self-luminous sticking picture, k _g For ambient light mapping, k _w And pasting pictures on the window.

3) Implementation of Renderer MILO-Renderer

(1) Based on a book-torance model, six parameter maps generated by MILO-Net are used for rendering, and the formula is as follows:

E(ω,x)＝k _e (x)+k _w (x)·k _g (ω)

F(ω _i ,ω _r ,x)＝f _r (ω _i ,ω _r ,x,k _d ,k _s ,k _a )

/>

where E (ω, x) represents the self-luminous radiation value at the point where the ray with the direction ω intersects with the object in the scene from the point x, ω is the direction of the radiated ray, x is the starting point, k _e Is a self-luminous sticking picture, k _g For ambient light mapping, k _w Pasting a picture on the window; f (omega) _i ,ω _r X) represents the BRDF reflection model, at x, the angle is omega _i To an angle of omega _r In the conversion of the outgoing light, omega _r Is the direction (angle) of the emergent ray, omega _i Is the direction (angle) of the incident ray, f _r Is the BRDF reflection function, r is the emergent ray, k _d For diffuse reflectance mapping, k _s Is a mirror reflection map, k _a For roughness mapping, N is the surface normal direction,

for one of the terms mentioned in the cook-torrance reflection model, we use the normal distribution function to represent, describing the normal distribution, or the->

For one of the items mentioned in the cook-torrance reflection model, we approximate with a Fresnel reflection function describing the reflectivity and index of refraction of a light ray passing through an object surface, based on>

For one of the terms mentioned in the cook-torrance reflection model, we use the geometric shading function to describe the self-shading degree of the object surface, H is the direction of the angle bisector of the incident ray and the emergent ray, L (ω, x) represents the radiation intensity value of the ray starting from point x and having the direction of ω,s is the number of samples, B is the maximum reflection time of the optical path, S is the S-th sample, B is the B-th reflection of the optical path, ω _b,r Is the exit direction of the b-th reflection of the optical path, omega _b,i Incident direction of the b-th reflection of the optical path, x _b And theta is the intersection point of the b-th reflection of the light path and the object, and is the included angle between the incident ray and the surface normal.

The independent variable is six parameter mapping, and the dependent variable is the radiance of the rendering. The gradients of these parameter maps are calculated by back propagation.

(2) b) with MILO-Renderer, the invention renders the radiance of a given ray and compares it to the true radiance value observed in the photograph using the following loss function:

for the true radiation value of each radiation ray, k is the order of the paradigm, and M is the total number of observed radiation rays. The choice of k has an impact on the correctness of the result and the magnitude of the noise, if a larger k is chosen, the light source recovery can converge to the correct value, but the result often looks rough and fuzzy due to the influence of the higher-order norm on the noise and the Monte Carlo sampling noise.

The invention provides an application embodiment of the method, and the system is as shown in fig. 3, and the implementation process is as follows:

a) An Intel RealSense D455 depth camera and Dot3D Pro three-dimensional scanning software are used for acquiring a model of an indoor scene, and a RICOH THETA Z1 degree panoramic camera is used for photographing.

b) When in photographing, a HDR image is acquired by using a multi-exposure method, 7 groups of exposure time which takes 2 as a base number and is increased in an equal ratio series are selected according to different brightness of different scenes, and a Dedevec method is used for combining HDR pictures.

c) Several Tag36h11 aprilats are placed in the scene and identified in the scanned geometry and the taken picture, respectively, to align the picture captured by the panoramic camera with the geometry captured by the depth camera and to assist in registration of the camera pose during three-dimensional scanning. And selecting 40 camera pose points for training the network.

d) And transmitting geometric information as input into MILO-Net to generate six parameter maps, transmitting the geometry information into MILO-render in combination with a camera pose to generate a rendered photo, and optimizing parameters of the MILO-Net by using the shot photo as supervision. Finally, a high-quality parameter map capable of representing real-world material illumination information is generated.

e) And inserting a luminous ball, a mirror surface luminous ball and a common white small ball into the virtual object insertion device, and checking the realization effect of the virtual object insertion task.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: it is to be understood that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof, but such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for modeling illumination and reflection of a scene based on guided rendering is characterized by comprising the following steps:

s1, training data acquisition: testing by using a synthetic data set and a real data set, selecting a PBRS data set and an AI2-THOR data set by the synthetic data, converting a specific scene in the PBRS data set and the AI2-THOR data set into an input format of a guidable renderer based on a Monte Carlo ray tracing algorithm, and performing physical rendering again to obtain a new scene as synthetic data; the real data comprises scanning geometry collected by a depth camera and photos collected by a panoramic camera;

s3, generating material illumination parameters: inputting the geometric information into an indoor scene inverse rendering model with multiple luminophors for reflection, and generating six parameter maps, wherein the six parameter maps are respectively as follows: diffuse reflectance map, specular reflectance map, self-luminous map, ambient light map, window map, and roughness map; firstly, carrying out UV expansion on a geometric figure obtained by a depth camera by using an indoor scene inverse rendering model with illuminant multi-reflection, calculating XYZ coordinates of each pixel on a UV map in a three-dimensional world, and using the XYZ coordinates as input of an MLP network to generate six parameter maps; when the network is used to generate the map, a numerical loss item is added

S4, picture rendering: for each pixel in the photo, transmitting the position and posture of the camera into a guidable renderer based on a Monte Carlo ray tracing algorithm, and generating a rendered photo; the guidable renderer based on the Monte Carlo ray tracing algorithm restores the rendered photos to radiometric values observed along the ray direction through the camera internal and external parameter matrixes and the response curve, and the cook-torrance reflection model is approximately solved by the Monte Carlo ray tracing algorithm, wherein a rendering equation is as follows:

L(ω _r ,x)＝E(ω _r ,x)+∫ _Ω F(ω _i ,ω _r ,x)L(ω _i ,y)cosθdω _i

wherein, ω is _r For reflecting the direction of the light, omega _i Is the direction of the incident ray, x is the intersection point of the incident ray and the scene surface, y is the point where the light source of the incident ray is located, and L is the direction omega emitted by the light source at the point y _i The radiation intensity of the light, E is a light source model, F is a bidirectional reflection distribution function model, theta is an included angle between the incident light and a surface normal, and omega is an upper hemispherical surface of the integral;

s5, supervised learning: the radiance of a given ray is rendered with a conductible renderer based on the monte carlo ray-tracing algorithm and compared to the true radiance observed in the photograph using the following loss function:

for the true radiation value of each radiation ray, k is the order of the paradigm, and M is the total number of observed radiation rays.

2. The method of claim 1, wherein in step S1, when using the composite dataset, multiple virtual cameras are placed into each scene and images used as real values are rendered using a renderable, monte carlo ray tracing algorithm-based, renderable, scene.

3. The method as claimed in claim 1, wherein the step S1 of modeling the illumination and reflection of the renderable scene includes using a depth camera and three-dimensional scanning software to capture a geometric model of the indoor scene, using a 360-degree panoramic camera to capture a photo part, using a multi-exposure method to capture HDR images, and selecting 7 groups of exposure times with a base of 2 and increasing geometric progression according to different scenes.

4. The method of claim 1, wherein step S2 selects between 20 and 40 camera pose points for training an inverse rendering model line of an indoor scene with illuminant multi-reflections.

5. The method of claim 1, wherein in step S3, when the inverse rendering model of the indoor scene with illuminant multi-reflection performs UV unfolding on the geometric figure, a position code γ (x) is added to convert the coordinates of points in the scene into a code vector, instead of the coordinates of points as input to the neural network, x represents the coordinates of the intersection point of the incident ray and the scene surface.

6. The method of claim 1, wherein the MLP network in step S3 adopts a four-layer fully-connected MLP network structure, wherein the number of neurons in the input layer is 15; the MLP network comprises two hidden layers, each hidden layer is provided with 1024 neurons, the output end of each neuron is added with a ReLU activation function, and batch normalization is used between the layers; the output of the network comprises Q neurons, the output numerical value is changed into probability distribution through a SoftMax function, and Q represents the number of different material types.

7. The method of claim 1, wherein in step S3, six parameter maps are calculated according to the following formula:

8. The method of claim 7, wherein a detail texture map is added to the parameter map, and the formula is as follows:

11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k _t For texture details, i.e. 11-dimensional parameter vectors including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, and k for each _t Add l1 regularization constraints.

9. The method of claim 1, wherein in step S3, a numerical loss term is used

The formula is as follows:

/>

to be a constraint on the values of the various parameter maps learned, the goal is to limit them to a reasonable range before rendering, e is 0.001,k _d Is a diffuse reflectance map, k _s Is a mirror reflection map, k _t For texture details, also 11-dimensional parameter vectors, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k _e Is a self-luminous sticking picture, k _g For ambient light mapping, k _w For window mapping, x represents the coordinates of the point where the incident ray intersects the scene surface.

10. The method of claim 1, wherein in step S4, the Monte Carlo ray tracing algorithm-based renderers are rendered based on a cook-torrance model by using six parameter maps generated by an inverse rendering model of the indoor scene with illuminant multi-reflection, according to the following formula:

E(ω,x)＝k _e (x)+k _w (x)·k _g (ω)

F(ω _i ,ω _r ,x)＝f _r (ω _i ,ω _r ,x,k _d ,k _s ,k _a )

where E (ω, x) represents the self-luminous radiation value at the point where the ray with the direction ω intersects with the object in the scene starting from point x, ω is the direction of the radiated ray, x is the starting point, k _e Is a self-luminous sticking picture, k _g For ambient light mapping, k _w Pasting a picture on the window; f (omega) _i ,ω _r X) represents the BRDF reflection model, at x, the angle is omega _i To an angle of omega _r In the conversion of the outgoing light, omega _r Is the direction of the emergent ray, omega _i Direction of incident light, f _r Is the BRDF reflection function, r is the emergent ray, k _d For diffuse reflectance mapping, k _s Is a mirror reflection map, k _a For roughness mapping, N is the surface normal direction,

Describes the reflectivity and refractive index of light passing through the surface of an object in relation to the Fresnel reflection function>

Describing the self-shielding degree of the object surface for a geometric shielding function, wherein H is the angular bisector direction of incident light and emergent light, L (omega, x) represents the radiation intensity value of the light with the direction omega from the point x, S is the sampling number, B is the maximum reflection time of the light path, S is the sampling time of the S, B is the reflection time of the light path B, omega is the reflection time of the light path B, and _b,r is the exit direction of the b-th reflection of the optical path, omega _b,i Incident direction of the b-th reflection of the optical path, x _b And theta is the intersection point of the b-th reflection of the optical path and the object, and the included angle between the incident ray and the surface normal is theta. />