CN114972617B - Scene illumination and reflection modeling method based on conductive rendering - Google Patents

Scene illumination and reflection modeling method based on conductive rendering Download PDF

Info

Publication number
CN114972617B
CN114972617B CN202210712261.0A CN202210712261A CN114972617B CN 114972617 B CN114972617 B CN 114972617B CN 202210712261 A CN202210712261 A CN 202210712261A CN 114972617 B CN114972617 B CN 114972617B
Authority
CN
China
Prior art keywords
reflection
parameters
scene
ray
light
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210712261.0A
Other languages
Chinese (zh)
Other versions
CN114972617A (en
Inventor
施柏鑫
于博涵
杨思祺
崔轩宁
董思言
陈宝权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202210712261.0A priority Critical patent/CN114972617B/en
Publication of CN114972617A publication Critical patent/CN114972617A/en
Application granted granted Critical
Publication of CN114972617B publication Critical patent/CN114972617B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • G06T15/506Illumination models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/06Ray-tracing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B20/00Energy efficient lighting technologies, e.g. halogen lamps or gas discharge lamps
    • Y02B20/40Control techniques providing energy savings, e.g. smart controller or presence detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

The invention discloses a scene illumination and reflection modeling method based on conductive rendering, which is characterized in that an indoor scene reverse rendering model with multiple light emitters and multiple reflections is designed, geometric constraint is increased, an image with a new visual angle can be generated, the model can well describe common light sources in the indoor scene, the illumination attributes can be really separated, an environment re-illumination task is well realized, a gradient descent method is easy to recover, and the application range is wide; meanwhile, the invention tracks the light path for multiple times, analyzes three ambiguity problems existing in the process of tracking for multiple times by using a light-conducting ray tracking method, designs three ambiguity elimination methods respectively to solve the ambiguity elimination problems, and has more real recovery effect.

Description

Scene illumination and reflection modeling method based on conductive rendering
Technical Field
The invention relates to the technical field of computational vision, in particular to a method for modeling illumination and reflection of a scene based on a conductive rendering.
Background
With the development of computer technology, computer computing power is gradually strengthened, machine learning and deep learning technologies are rapidly advanced, and computer vision related technologies are gradually applied to various scenes, such as human face detection, image beautifying, night photographing and other functions of a mobile phone camera, pedestrian detection and road recognition in unmanned driving, human face recognition of mobile payment and station identity detection, or synchronous positioning and image construction tasks of robots and the like. With the coming of big data and intelligent times, more and more application scenes need the support of the computer vision technology, massive video and image data are in urgent need of processing, and the application of the computer vision technology is gradually expanded from two dimensions to three dimensions. Therefore, the three-dimensional parameters of the real world are restored through the two-dimensional information, so that the method has great significance for realizing high-level applications such as new visual angle image generation, virtual object insertion and material editing, and is widely concerned by society.
The Inverse Rendering (IR) can recover the illumination and material information of the real world, and render the information in combination with the corresponding physical model, so that many functions such as new view image generation, virtual object insertion and material editing can be realized. Reverse rendering can be conducted, and as a leading-edge research in computational photography, its development is extremely important for other computer vision technologies. Traditional methods use intrinsic image decomposition to achieve the inverse rendering problem, which decomposes the photograph into reflectance and shadow maps. Recent work on reverse rendering of indoor scenes has proposed some new ways of representing illumination. Incident illumination at each point is represented, for example, using a point source model, a Spatially Varying Spherical harmonic function (SVSH), a Spatially Varying Spherical Gaussian function (SVSG). Such methods can more accurately describe the direction of incident light than traditional shadow mapping. These methods also use various bi-directional Reflectance Distribution Function (BRDF) models, which better describe the specular reflection characteristics of a surface than a Reflectance map. The method greatly improves the inverse rendering effect of the complex scene. There are also some new works that use a point source model to describe the light source, but only consider the direct reflection of the light source, ignoring the multiple reflection effect. Meanwhile, the method lacks sufficient geometric constraint, cannot realize a new visual angle image generation task, cannot really separate a light source, cannot realize an environment relighting task, and has certain limitation.
Among the commonly used rendering methods in computer graphics, the monte carlo path tracing algorithm can well consider multiple reflections and accurately simulate light transmission, and is a rendering method based on physics. However, it is not easy to realize inverse rendering by using the monte carlo path tracking algorithm, that is, recovering the light source and surface reflectivity parameters from the image, and making the parameters to be recovered consistent with the real scene as much as possible, and obtaining the same effect as the real photo in rendering. In order to realize the inverse rendering based on the monte carlo path tracking method, an appropriate light source model is needed. Since the monte carlo path tracking method is a physics-based rendering method, the light source model used needs to be able to describe physically luminous objects, rather than intermediate incident illumination representations like SVSH, SVSG or shadow maps. Meanwhile, the light source model needs to be directly visible and easy to solve by using a gradient descent optimization method. Existing light source models, such as point light sources, directional light, spotlights, and global environment maps, do not meet these requirements. In addition, the existing method usually only considers single reflection of the optical path, which makes the complex optical phenomenon unable to recover, if multiple reflections are considered, then it is necessary to introduce variables describing spontaneous light and reflection into each point geometrically, no matter whether the points are visible under the current view angle, so the number of unknown variables needing to be optimized is greatly increased. Excessive unknowns lead to ambiguity in solving the inverse rendering problem, and the recovery effect is of low quality and is not stable enough.
Disclosure of Invention
Aiming at the technical problems, the invention provides a method for modeling illumination and reflection of a scene based on a conductive rendering.
In order to achieve the above purpose, the invention provides the following technical scheme:
a method for modeling scene illumination and reflection based on conductive rendering comprises the following steps:
s1, training data acquisition: testing by using a synthetic data set and a real data set, selecting a PBRS data set and an AI2-THOR data set by the synthetic data, converting a specific scene in the PBRS data set and the AI2-THOR data set into an input format of a renderable device based on a Monte Carlo ray tracing algorithm, and performing physical rendering again to obtain a new scene as synthetic data; the real data comprises scanning geometry collected by a depth camera and photos collected by a panoramic camera;
s2, data tuning alignment: combining the photos collected in the real data into HDR photos by a Debevec method, placing a plurality of labels Tag36h11 of an AprilTag visual reference system in a scene, respectively identifying the labels in scanning geometry and shooting photos, wherein each photo at least comprises 4 labels, aligning the photos collected by a panoramic camera with the geometry collected by a depth camera, and assisting the registration of the pose of the camera during three-dimensional scanning;
s3, material illumination parameter generation: inputting the geometric information into an indoor scene inverse rendering model with multiple luminophors for reflection, and generating six parameter maps, wherein the six parameter maps are respectively as follows: diffuse reflectance map, specular reflectance map, self-luminous map, ambient light map, window map, and roughness map; firstly, carrying out UV expansion on a geometric figure obtained by a depth camera by using an indoor scene reverse rendering model with multi-reflection of luminous bodies, calculating XYZ coordinates in a three-dimensional world corresponding to each pixel on a UV mapping, and generating six parameter mappings by using the XYZ coordinates as input of an MLP network; when the network is used to generate the map, a numerical loss item is added
Figure GDA0004083661230000032
S4, picture rendering: for each pixel in the picture, transmitting the position and pose of a camera into a guidable renderer based on a Monte Carlo ray tracing algorithm to generate a rendered picture; the guidable renderer based on the Monte Carlo ray tracing algorithm restores the rendered photos to the radiation values observed along the ray direction through the internal and external parameter matrixes and the response curve of the camera, and the cook-torance reflection model is approximately solved by using the Monte Carlo ray tracing algorithm, wherein the rendering equation is as follows:
Figure GDA0004083661230000031
wherein, ω is r For reflecting the direction of the light, omega i Is the direction of the incident ray, x is the intersection point of the incident ray and the scene surface, y is the point of the light source of the incident ray, and L is the direction omega of the light source at the point of y i E is a model of the light source, F is a model of the two-way reflection distribution function, and theta is the incident light and the surfaceThe included angle of the normal line is omega, which is the upper hemispherical surface of the integral;
s5, supervised learning: the radiance of a given ray is rendered with a conductive renderer based on the monte carlo ray tracing algorithm and compared to the true radiance observed in the photograph using the following loss function:
Figure GDA0004083661230000041
wherein, ω is j Is the direction of the j ray, x j As a starting point for light, L (omega) j ,x j ) Is from x j Starting from the final calculated radiation value of the jth ray,
Figure GDA0004083661230000044
for each radiation ray's true radiation value, k is the order of the paradigm and M is the total number of observed radiation rays.
Further, in step S1, when using the composite dataset, a plurality of virtual cameras are placed into each scene and an image used as a true value is rendered using a conductable renderer r based on the monte carlo ray tracing algorithm.
Further, in the step S1, the depth camera and the three-dimensional scanning software are used to acquire the geometric model of the indoor scene, the 360-degree panoramic camera is used to photograph the photo part, the multiple exposure method is used to acquire the HDR image during photographing, and 7 groups of exposure time with 2 as the base number and increasing in an equal ratio sequence are selected according to different brightness of different scenes.
Further, step S2 selects 20 to 40 camera pose points for training an inverse rendering model line of the indoor scene with illuminant multi-reflections.
Further, in step S3, when the indoor scene inverse rendering model with illuminant multi-reflection performs UV unfolding on the geometric figure, a position code γ (x) is added, the coordinates of points in the scene are converted into a code vector, the coordinates of the points are used as an input of the neural network instead of the coordinates of the points, and x represents the coordinates of the intersection point of the incident light and the scene surface.
Further, the MLP network in step S3 adopts a four-layer fully-connected MLP network structure, wherein the number of neurons in the input layer is 15; the MLP network comprises two hidden layers, each hidden layer is provided with 1024 neurons, the output end of each neuron is added with a ReLU activation function, and batch normalization is used between the layers; the output of the network comprises Q neurons, and the output numerical value is changed into probability distribution through a SoftMax function.
Further, in step S3, six parameter maps are calculated according to the following formula:
Figure GDA0004083661230000042
wherein the content of the first and second substances,
Figure GDA0004083661230000043
11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, P i (x) Is the probability that the current point belongs to the i-th class of material, m i Q is the number of different material types, and x represents the coordinate of the intersection point of the incident ray and the scene surface.
Further, a detail texture map is added to the parameter map, and the formula is as follows:
Figure GDA0004083661230000051
wherein k is m The 11-dimensional parameter vector optimized for adding the texture details comprises diffuse reflection parameters, specular reflection parameters, reflectivity parameters, self-luminous parameters and window parameters,
Figure GDA0004083661230000052
11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k t For texture details, i.e. 11-dimensional parameter vectors, including diffuse reflection parameters, specular reflection parameters, reflectivity parameters, self-luminescence parameters, window parameters, and k for the same time t Add l1 regularization constraints.
Further, in step S3, the numerical loss term
Figure GDA0004083661230000053
The formula is as follows:
Figure GDA0004083661230000054
/>
Figure GDA0004083661230000055
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0004083661230000056
to be a constraint on the values of the various parameter maps learned, in order to limit them to a reasonable range before rendering,. Epsilon.0.001, in order to prevent the occurrence of extreme values, k d For diffuse reflectance mapping, k s Is a specular reflection map, k t For texture details, also 11-dimensional parameter vectors, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k e Is a self-luminous sticking picture, k g For ambient light mapping, k w For window mapping, x represents the coordinates of the point where the incident ray intersects the scene surface.
Further, in step S4, based on the hook-torrance model, the MILO-Renderer performs rendering by using six parameter maps generated by an indoor scene inverse rendering model with illuminant multi-reflection, where the formula is as follows:
E(ω,x)=k e (x)+k w (x)·k g (ω)
F(ω ir ,x)=f rir ,x,k d ,k s ,k a )
Figure GDA0004083661230000061
Figure GDA0004083661230000062
Figure GDA0004083661230000063
Figure GDA0004083661230000064
Figure GDA0004083661230000065
Figure GDA0004083661230000066
where E (ω, x) represents the self-luminous radiation value at the point where the ray with the direction ω intersects with the object in the scene from the point x, ω is the direction of the radiated ray, x is the starting point, k e Is a self-luminous sticking picture, k g For ambient light mapping, k w Pasting a picture on the window; f (omega) ir X) represents the BRDF reflection model, at x, the angle is omega i To an angle of omega r In the conversion of the outgoing light, omega r Is the direction of the emergent ray, omega i Direction of incident light, f r Is the BRDF reflection function, r is the emergent ray, k d For diffuse reflectance mapping, k s Is a mirror reflection map, k a For roughness mapping, N is the surface normal direction,
Figure GDA0004083661230000067
for the normal distribution function, a normal distribution, for the micro-surface model, is described>
Figure GDA0004083661230000068
Describes the reflectivity and the refractive index of a light ray passing through the surface of an object, based on the Fresnel reflection function>
Figure GDA0004083661230000069
Describing the self-shielding degree of the object surface for a geometric shielding function, wherein H is the angular bisector direction of incident light and emergent light, L (omega, x) represents the radiation intensity value of the light with the direction omega from the point x, S is the sampling number, B is the maximum reflection time of the light path, S is the sampling time of the S, B is the reflection time of the light path B, omega is the reflection time of the light path B, and b,r is the exit direction of the b-th reflection of the optical path, omega b,i Incident direction of the b-th reflection of the optical path, x b And theta is the intersection point of the b-th reflection of the light path and the object, and is the included angle between the incident ray and the surface normal.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a Scene illumination and reflection modeling method based on conductive Rendering, which designs an Indoor Scene Inverse Rendering (MILO-Net) model with multiple luminophors and multiple reflections, increases geometric constraint and can generate images with new visual angles, the model can well describe common Light sources in Indoor scenes, such as ceiling lamps, table lamps, windows and the like, can truly separate illumination attributes, well realize an environment re-illumination task, is easy to recover by using a gradient descent method, and has wide application range;
2. according to the scene illumination and reflection modeling method based on the conductive rendering, provided by the invention, the light path is traced for many times, three ambiguity problems existing in the process of tracing for many times by using the conductive ray tracing method are analyzed, three ambiguity elimination methods are respectively designed to solve the ambiguity elimination problems, the recovery effect is more real, and the ambiguity elimination methods are finally integrated into a material parameter generation network MILO-Net and are mainly embodied in the design of an MILO-Net network structure;
3. the invention provides a scene illumination and reflection modeling method based on a guided rendering, which designs a guided Renderer (MILO-Renderer) based on a Monte Carlo ray tracing algorithm, wherein the Renderer is integrated with a luminous source model designed by the inventor and is used for supervising MILO-Net training and learning.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to these drawings.
Fig. 1 is a flowchart of a method for modeling illumination and reflection of a scene based on a guided rendering according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of mil-Net according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a system according to an embodiment of the present invention.
Detailed Description
For a better understanding of the present solution, the method of the present invention is described in detail below with reference to the accompanying drawings.
The method for modeling scene illumination and reflection based on guidable rendering provided by the embodiment of the invention has the flow shown in fig. 1, and mainly comprises five steps of training data acquisition, data tuning alignment, material illumination parameter generation, picture rendering and supervised learning, and specifically comprises the following steps:
s1, training data acquisition: the test is performed using the synthetic dataset and the real dataset. Selecting a PBRS data set and an AI2-THOR data set for the synthetic data, converting a specific scene in the PBRS data set and the AI2-THOR data set into an input format of an MILO-renderer, and performing physical rendering again to obtain a new scene serving as the synthetic data; when using a composite dataset, multiple virtual cameras are placed into each scene and images used as true values are rendered using the MILO-render. The real data comprises scanning geometry collected by a depth camera and photos collected by a panoramic camera; for real data, a depth camera and three-dimensional scanning software are used for acquiring a geometric model of an indoor scene, a 360-degree panoramic camera is used for photographing a photo part, a multi-exposure method is used for acquiring HDR images during photographing, and 7 groups of exposure time with the base number of 2 and increasing geometric number series are selected according to different brightness of different scenes.
S2, data tuning alignment: the method comprises the steps of combining the pictures collected in real data into HDR pictures through a Debevec method, and meanwhile, placing a plurality of labels Tag36h11 of April Tag visual reference system in a scene in order to align the pictures collected by a panoramic camera with the geometry collected by a depth camera, wherein April Tag is a visual reference system and can be used for various tasks including augmented reality, robot and camera calibration. The target can be created from a regular printer and the AprilTag detection software calculates the precise 3D position, orientation and identity of the tag relative to the camera. In the experiments of the present invention, the tags were used to assist in the reconstruction of panoramas, which were identified separately in the scan geometry and in the captured photographs, each containing at least 4 tags to ensure unique camera position determination, to align the photographs captured by the panoramic camera with the geometry captured by the depth camera, and to assist in registration of the camera pose during three-dimensional scanning.
S3, material illumination parameter generation: and recovering six parameter maps related to material illumination through the input two-dimensional picture and the geometric information. To resolve multiple reflection ambiguities, the present invention requires that the total number of unknowns to be solved not exceed the total number of constraints; in order to solve the ambiguity of self-luminescence-diffuse reflection, the invention introduces limited material constraint, under the constraint, a small number of groups of material parameters can be easily determined by a plurality of groups of radiation values observed from the image; to address the specular reflection uncertainty problem, the present invention implements a reflectivity similarity constraint and a limited material constraint. These constraints are uniformly integrated into the MILO-Net network. Specifically, inputting the geometric information into MILO-Net to generate six parameter maps, wherein the six parameter maps are respectively: diffuse reflectance map, specular reflectance map, self-luminous map, ambient light map, window map, and roughness map; the MILO-Net firstly carries out UV expansion on a geometric figure obtained by a depth camera, and calculates the three-dimensional corresponding to each pixel on a UV mapUsing XYZ coordinates in the world as input of MLP network to generate six parameter maps, and adding a numerical loss term when using the network to generate the maps
Figure GDA0004083661230000093
Wherein, the parameter domain variables of the parameter surface are generally expressed by UV letters, such as the parameter surface F (u, v). For a triangular mesh, it is parameterized if it can be mapped to a parametric plane, which is the UV unfolding. Each vertex has a uv parameter value, which is also called texture coordinate.
S4, picture rendering: for each pixel in the picture, transmitting the pixel into an MILO-Renderer in combination with a camera pose to generate a rendered picture; the MILO-render recovers the camera internal and external parameter matrix and the response curve into the radiometric value observed along the light direction, and the look-torrance reflection model is approximately solved by using a Monte Carlo ray tracing algorithm, and the rendering equation is as follows:
Figure GDA0004083661230000091
wherein, ω is r For reflecting the direction of the light, omega i Is the direction of the incident ray, x is the intersection point of the incident ray and the scene surface, y is the point where the light source of the incident ray is located, and L is the direction omega emitted by the light source at the point y i The light intensity of (1) is the light source model, the light intensity of (E) is the light source model, the light intensity of (F) is the bidirectional reflection distribution function model, theta is the included angle between the incident light and the surface normal, and omega is the upper hemispherical surface of the integral.
S5, supervised learning: the radiance of a given ray was obtained by means of MILO-render rendering and compared to the true radiance observed in the photograph using the following loss function:
Figure GDA0004083661230000092
wherein, ω is j Is the direction of the j ray, x j For the starting point of light, L (omega) j ,x j ) Is from x j Starting from the final calculated radiation value of the jth ray,
Figure GDA0004083661230000094
for the true radiation value of each radiation ray, k is the order of the paradigm, and M is the total number of observed radiation rays. />
The invention adopts the HDR data obtained in the steps S1 and S2 to train the neural network MILO-Net, and supervises and trains by using the pictures generated by the Renderer MILO-render, and the specific training process is as follows:
1) Obtaining a training data set
(1) For synthetic data, the public PBRS data set and AI2-THOR data set are downloaded over the network.
(2) Selecting two scenes meeting the requirements from the PBRS data set as synthetic data, selecting two scenes from the AI2-THOR data set, converting the two scenes into an input format of an MILO-Render, and performing physical rendering again to obtain a new scene as the synthetic data. Using the dataset, we manually placed multiple virtual cameras into each scene and rendered the images used as true values using the MILO-render.
(3) For real data, an Intel RealSense D455 depth camera and Dot3D Pro three-dimensional scanning software are used for acquiring a model of an indoor scene, and a RICOH THETA Z1 360-degree panoramic camera is used for photographing. During photographing, a HDR image is acquired by adopting a multi-exposure method, 7 groups of exposure time with the base of 2 and increasing in an equal ratio sequence are selected according to different brightness of different scenes, and the HDR image is combined by using a Debevec method.
(4) Several Tag36h11 apriltags are placed in the scene and these tags are identified in the scan geometry and the shot respectively to align the picture taken by the panoramic camera with the geometry taken by the depth camera. In order to ensure that the position of the camera is uniquely determined, at least 4 labels are required to be contained in each photo, and the labels are also used for assisting the registration of the pose of the camera in three-dimensional scanning. For each synthetic and real scene, we select 20 to 40 camera pose points for training the network, depending on the complexity of the scene layout.
2) Implementation of MILO-Net
(1) Firstly, the geometric figure obtained by the depth camera is subjected to UV expansion, XYZ coordinates in a three-dimensional world corresponding to each pixel on a UV map are calculated, and the XYZ coordinates are used as the input of the MLP. In addition, a position code γ (x) is further added to enhance the ability of the MLP to fit to different frequency information. The encoding converts point coordinates in a scene into encoding vectors, and replaces the point coordinates to serve as input of a neural network.
(2) A four-layer fully-connected MLP network structure is designed, wherein the number of neurons of an input layer is 15. The network comprises two hidden layers, each hidden layer comprises 1024 neurons, a ReLU activation function is added to the output end of each neuron, and Batch Normalization (Batch Normalization) is used between the layers. The output of the network comprises Q neurons, and the output numerical value is changed into probability distribution through a SoftMax function. Through the neural network, a mapping from three-dimensional point coordinates to a probability distribution that the point belongs to a certain material is established.
(3) Six parameter maps were calculated according to the following formula:
Figure GDA0004083661230000101
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0004083661230000102
11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, P i (x) Is the probability that the current point belongs to the i-th class of material, m i Q is the number of the ith material among the predetermined material types and the number of the different materials.
The parameter map generated by the method can be guaranteed to be only composed of Q different materials, and therefore the mirror reflection similarity constraint is achieved. Consider that real-world textures tend to contain rich high-frequency information that is difficult to fit by MLPs. In order to make the generated map contain the detail texture of these high frequencies, further, the invention adds the detail texture map to the parameter map, and the formula is as follows:
Figure GDA0004083661230000111
wherein k is m The 11-dimensional parameter vector optimized for adding the texture details comprises diffuse reflection parameters, specular reflection parameters, reflectivity parameters, self-luminous parameters and window parameters,
Figure GDA0004083661230000112
11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k t For texture details, i.e. 11-dimensional parameter vectors, including diffuse reflection parameters, specular reflection parameters, reflectivity parameters, self-luminescence parameters, window parameters, and k for the same time t Adding l1 regularization constraint to prevent low frequency information in scene from being added to k during learning t In (1).
(4) When using network generated maps, it is desirable to avoid that the network generation contains illegal values. When a map contains illegal or extreme values, problems of rendering errors, gradient disappearance, or explosion may occur. For this purpose, the invention adds a numerical loss term
Figure GDA0004083661230000113
Make the parameter map generated by the network in a reasonable value range, and the value loss item->
Figure GDA0004083661230000114
The formula is as follows:
Figure GDA0004083661230000115
Figure GDA0004083661230000116
wherein the content of the first and second substances,
Figure GDA0004083661230000117
in order to constrain the values of the various learned parameter maps in order to limit them to a reasonable range before rendering,. Epsilon.is a very small value, set to 0.001 in the present invention, in order to prevent the occurrence of extreme values, k d For diffuse reflectance mapping, k s Is a mirror reflection map, k t For texture details, also 11-dimensional parameter vectors, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k e Is a self-luminous sticking picture, k g For ambient light mapping, k w And pasting pictures on the window.
3) Implementation of Renderer MILO-Renderer
(1) Based on a book-torance model, six parameter maps generated by MILO-Net are used for rendering, and the formula is as follows:
E(ω,x)=k e (x)+k w (x)·k g (ω)
F(ω ir ,x)=f rir ,x,k d ,k s ,k a )
Figure GDA0004083661230000121
Figure GDA0004083661230000122
Figure GDA0004083661230000123
Figure GDA0004083661230000124
/>
Figure GDA0004083661230000125
Figure GDA0004083661230000126
where E (ω, x) represents the self-luminous radiation value at the point where the ray with the direction ω intersects with the object in the scene from the point x, ω is the direction of the radiated ray, x is the starting point, k e Is a self-luminous sticking picture, k g For ambient light mapping, k w Pasting a picture on the window; f (omega) ir X) represents the BRDF reflection model, at x, the angle is omega i To an angle of omega r In the conversion of the outgoing light, omega r Is the direction (angle) of the emergent ray, omega i Is the direction (angle) of the incident ray, f r Is the BRDF reflection function, r is the emergent ray, k d For diffuse reflectance mapping, k s Is a mirror reflection map, k a For roughness mapping, N is the surface normal direction,
Figure GDA0004083661230000127
for one of the terms mentioned in the cook-torrance reflection model, we use the normal distribution function to represent, describing the normal distribution, or the->
Figure GDA0004083661230000128
For one of the items mentioned in the cook-torrance reflection model, we approximate with a Fresnel reflection function describing the reflectivity and index of refraction of a light ray passing through an object surface, based on>
Figure GDA0004083661230000131
For one of the terms mentioned in the cook-torrance reflection model, we use the geometric shading function to describe the self-shading degree of the object surface, H is the direction of the angle bisector of the incident ray and the emergent ray, L (ω, x) represents the radiation intensity value of the ray starting from point x and having the direction of ω,s is the number of samples, B is the maximum reflection time of the optical path, S is the S-th sample, B is the B-th reflection of the optical path, ω b,r Is the exit direction of the b-th reflection of the optical path, omega b,i Incident direction of the b-th reflection of the optical path, x b And theta is the intersection point of the b-th reflection of the light path and the object, and is the included angle between the incident ray and the surface normal.
The independent variable is six parameter mapping, and the dependent variable is the radiance of the rendering. The gradients of these parameter maps are calculated by back propagation.
(2) b) with MILO-Renderer, the invention renders the radiance of a given ray and compares it to the true radiance value observed in the photograph using the following loss function:
Figure GDA0004083661230000132
wherein, ω is j Is the direction of the j ray, x j For the starting point of light, L (omega) j ,x j ) Is from x j Starting from the final calculated radiation value of the jth ray,
Figure GDA0004083661230000133
for the true radiation value of each radiation ray, k is the order of the paradigm, and M is the total number of observed radiation rays. The choice of k has an impact on the correctness of the result and the magnitude of the noise, if a larger k is chosen, the light source recovery can converge to the correct value, but the result often looks rough and fuzzy due to the influence of the higher-order norm on the noise and the Monte Carlo sampling noise.
The invention provides an application embodiment of the method, and the system is as shown in fig. 3, and the implementation process is as follows:
a) An Intel RealSense D455 depth camera and Dot3D Pro three-dimensional scanning software are used for acquiring a model of an indoor scene, and a RICOH THETA Z1 degree panoramic camera is used for photographing.
b) When in photographing, a HDR image is acquired by using a multi-exposure method, 7 groups of exposure time which takes 2 as a base number and is increased in an equal ratio series are selected according to different brightness of different scenes, and a Dedevec method is used for combining HDR pictures.
c) Several Tag36h11 aprilats are placed in the scene and identified in the scanned geometry and the taken picture, respectively, to align the picture captured by the panoramic camera with the geometry captured by the depth camera and to assist in registration of the camera pose during three-dimensional scanning. And selecting 40 camera pose points for training the network.
d) And transmitting geometric information as input into MILO-Net to generate six parameter maps, transmitting the geometry information into MILO-render in combination with a camera pose to generate a rendered photo, and optimizing parameters of the MILO-Net by using the shot photo as supervision. Finally, a high-quality parameter map capable of representing real-world material illumination information is generated.
e) And inserting a luminous ball, a mirror surface luminous ball and a common white small ball into the virtual object insertion device, and checking the realization effect of the virtual object insertion task.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: it is to be understood that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof, but such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for modeling illumination and reflection of a scene based on guided rendering is characterized by comprising the following steps:
s1, training data acquisition: testing by using a synthetic data set and a real data set, selecting a PBRS data set and an AI2-THOR data set by the synthetic data, converting a specific scene in the PBRS data set and the AI2-THOR data set into an input format of a guidable renderer based on a Monte Carlo ray tracing algorithm, and performing physical rendering again to obtain a new scene as synthetic data; the real data comprises scanning geometry collected by a depth camera and photos collected by a panoramic camera;
s2, data tuning alignment: combining the photos collected in the real data into HDR photos by a Debevec method, placing a plurality of labels Tag36h11 of an AprilTag visual reference system in a scene, respectively identifying the labels in scanning geometry and shooting photos, wherein each photo at least comprises 4 labels, aligning the photos collected by a panoramic camera with the geometry collected by a depth camera, and assisting the registration of the pose of the camera during three-dimensional scanning;
s3, generating material illumination parameters: inputting the geometric information into an indoor scene inverse rendering model with multiple luminophors for reflection, and generating six parameter maps, wherein the six parameter maps are respectively as follows: diffuse reflectance map, specular reflectance map, self-luminous map, ambient light map, window map, and roughness map; firstly, carrying out UV expansion on a geometric figure obtained by a depth camera by using an indoor scene inverse rendering model with illuminant multi-reflection, calculating XYZ coordinates of each pixel on a UV map in a three-dimensional world, and using the XYZ coordinates as input of an MLP network to generate six parameter maps; when the network is used to generate the map, a numerical loss item is added
Figure FDA0004083661200000011
S4, picture rendering: for each pixel in the photo, transmitting the position and posture of the camera into a guidable renderer based on a Monte Carlo ray tracing algorithm, and generating a rendered photo; the guidable renderer based on the Monte Carlo ray tracing algorithm restores the rendered photos to radiometric values observed along the ray direction through the camera internal and external parameter matrixes and the response curve, and the cook-torrance reflection model is approximately solved by the Monte Carlo ray tracing algorithm, wherein a rendering equation is as follows:
L(ω r ,x)=E(ω r ,x)+∫ Ω F(ω ir ,x)L(ω i ,y)cosθdω i
wherein, ω is r For reflecting the direction of the light, omega i Is the direction of the incident ray, x is the intersection point of the incident ray and the scene surface, y is the point where the light source of the incident ray is located, and L is the direction omega emitted by the light source at the point y i The radiation intensity of the light, E is a light source model, F is a bidirectional reflection distribution function model, theta is an included angle between the incident light and a surface normal, and omega is an upper hemispherical surface of the integral;
s5, supervised learning: the radiance of a given ray is rendered with a conductible renderer based on the monte carlo ray-tracing algorithm and compared to the true radiance observed in the photograph using the following loss function:
Figure FDA0004083661200000021
wherein, ω is j Is the direction of the j ray, x j For the starting point of light, L (omega) j ,x j ) Is from x j Starting from the final calculated radiation value of the jth ray,
Figure FDA0004083661200000022
for the true radiation value of each radiation ray, k is the order of the paradigm, and M is the total number of observed radiation rays.
2. The method of claim 1, wherein in step S1, when using the composite dataset, multiple virtual cameras are placed into each scene and images used as real values are rendered using a renderable, monte carlo ray tracing algorithm-based, renderable, scene.
3. The method as claimed in claim 1, wherein the step S1 of modeling the illumination and reflection of the renderable scene includes using a depth camera and three-dimensional scanning software to capture a geometric model of the indoor scene, using a 360-degree panoramic camera to capture a photo part, using a multi-exposure method to capture HDR images, and selecting 7 groups of exposure times with a base of 2 and increasing geometric progression according to different scenes.
4. The method of claim 1, wherein step S2 selects between 20 and 40 camera pose points for training an inverse rendering model line of an indoor scene with illuminant multi-reflections.
5. The method of claim 1, wherein in step S3, when the inverse rendering model of the indoor scene with illuminant multi-reflection performs UV unfolding on the geometric figure, a position code γ (x) is added to convert the coordinates of points in the scene into a code vector, instead of the coordinates of points as input to the neural network, x represents the coordinates of the intersection point of the incident ray and the scene surface.
6. The method of claim 1, wherein the MLP network in step S3 adopts a four-layer fully-connected MLP network structure, wherein the number of neurons in the input layer is 15; the MLP network comprises two hidden layers, each hidden layer is provided with 1024 neurons, the output end of each neuron is added with a ReLU activation function, and batch normalization is used between the layers; the output of the network comprises Q neurons, the output numerical value is changed into probability distribution through a SoftMax function, and Q represents the number of different material types.
7. The method of claim 1, wherein in step S3, six parameter maps are calculated according to the following formula:
Figure FDA0004083661200000031
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004083661200000032
11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, P i (x) Is the probability that the current point belongs to the i-th class of material, m i Q is the number of different material types, and x represents the coordinate of the intersection point of the incident ray and the scene surface.
8. The method of claim 7, wherein a detail texture map is added to the parameter map, and the formula is as follows:
Figure FDA0004083661200000033
wherein k is m The 11-dimensional parameter vector optimized for adding the texture details comprises diffuse reflection parameters, specular reflection parameters, reflectivity parameters, self-luminous parameters and window parameters,
Figure FDA0004083661200000034
11-dimensional parameter vectors before optimization for adding texture details, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k t For texture details, i.e. 11-dimensional parameter vectors including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, and k for each t Add l1 regularization constraints.
9. The method of claim 1, wherein in step S3, a numerical loss term is used
Figure FDA0004083661200000035
The formula is as follows:
Figure FDA0004083661200000036
/>
Figure FDA0004083661200000041
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0004083661200000042
to be a constraint on the values of the various parameter maps learned, the goal is to limit them to a reasonable range before rendering, e is 0.001,k d Is a diffuse reflectance map, k s Is a mirror reflection map, k t For texture details, also 11-dimensional parameter vectors, including diffuse reflectance parameters, specular reflectance parameters, self-luminescence parameters, window parameters, k e Is a self-luminous sticking picture, k g For ambient light mapping, k w For window mapping, x represents the coordinates of the point where the incident ray intersects the scene surface.
10. The method of claim 1, wherein in step S4, the Monte Carlo ray tracing algorithm-based renderers are rendered based on a cook-torrance model by using six parameter maps generated by an inverse rendering model of the indoor scene with illuminant multi-reflection, according to the following formula:
E(ω,x)=k e (x)+k w (x)·k g (ω)
F(ω ir ,x)=f rir ,x,k d ,k s ,k a )
Figure FDA0004083661200000043
Figure FDA0004083661200000044
Figure FDA0004083661200000045
Figure FDA0004083661200000046
Figure FDA0004083661200000047
Figure FDA0004083661200000048
where E (ω, x) represents the self-luminous radiation value at the point where the ray with the direction ω intersects with the object in the scene starting from point x, ω is the direction of the radiated ray, x is the starting point, k e Is a self-luminous sticking picture, k g For ambient light mapping, k w Pasting a picture on the window; f (omega) ir X) represents the BRDF reflection model, at x, the angle is omega i To an angle of omega r In the conversion of the outgoing light, omega r Is the direction of the emergent ray, omega i Direction of incident light, f r Is the BRDF reflection function, r is the emergent ray, k d For diffuse reflectance mapping, k s Is a mirror reflection map, k a For roughness mapping, N is the surface normal direction,
Figure FDA0004083661200000051
for the normal distribution function, a normal distribution, for the micro-surface model, is described>
Figure FDA0004083661200000052
Describes the reflectivity and refractive index of light passing through the surface of an object in relation to the Fresnel reflection function>
Figure FDA0004083661200000053
Describing the self-shielding degree of the object surface for a geometric shielding function, wherein H is the angular bisector direction of incident light and emergent light, L (omega, x) represents the radiation intensity value of the light with the direction omega from the point x, S is the sampling number, B is the maximum reflection time of the light path, S is the sampling time of the S, B is the reflection time of the light path B, omega is the reflection time of the light path B, and b,r is the exit direction of the b-th reflection of the optical path, omega b,i Incident direction of the b-th reflection of the optical path, x b And theta is the intersection point of the b-th reflection of the optical path and the object, and the included angle between the incident ray and the surface normal is theta. />
CN202210712261.0A 2022-06-22 2022-06-22 Scene illumination and reflection modeling method based on conductive rendering Active CN114972617B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210712261.0A CN114972617B (en) 2022-06-22 2022-06-22 Scene illumination and reflection modeling method based on conductive rendering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210712261.0A CN114972617B (en) 2022-06-22 2022-06-22 Scene illumination and reflection modeling method based on conductive rendering

Publications (2)

Publication Number Publication Date
CN114972617A CN114972617A (en) 2022-08-30
CN114972617B true CN114972617B (en) 2023-04-07

Family

ID=82964683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210712261.0A Active CN114972617B (en) 2022-06-22 2022-06-22 Scene illumination and reflection modeling method based on conductive rendering

Country Status (1)

Country Link
CN (1) CN114972617B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416364B (en) * 2022-10-25 2023-11-03 北京大学 Data acquisition and estimation method and device for urban scene space variable environment illumination
CN115731336B (en) * 2023-01-06 2023-05-16 粤港澳大湾区数字经济研究院(福田) Image rendering method, image rendering model generation method and related devices
CN116385612B (en) * 2023-03-16 2024-02-20 如你所视(北京)科技有限公司 Global illumination representation method and device under indoor scene and storage medium
CN116109520B (en) * 2023-04-06 2023-07-04 南京信息工程大学 Depth image optimization method based on ray tracing algorithm
CN117422815A (en) * 2023-12-19 2024-01-19 北京渲光科技有限公司 Reverse rendering method and system based on nerve radiation field
CN117437345B (en) * 2023-12-22 2024-03-19 山东捷瑞数字科技股份有限公司 Method and system for realizing rendering texture specular reflection effect based on three-dimensional engine
CN117953137A (en) * 2024-03-27 2024-04-30 哈尔滨工业大学(威海) Human body re-illumination method based on dynamic surface reflection field

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223132A (en) * 2021-04-21 2021-08-06 浙江大学 Indoor scene virtual roaming method based on reflection decomposition

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986195B (en) * 2018-06-26 2023-02-28 东南大学 Single-lens mixed reality implementation method combining environment mapping and global illumination rendering
CN113572962B (en) * 2021-07-28 2022-03-18 北京大学 Outdoor natural scene illumination estimation method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223132A (en) * 2021-04-21 2021-08-06 浙江大学 Indoor scene virtual roaming method based on reflection decomposition

Also Published As

Publication number Publication date
CN114972617A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN114972617B (en) Scene illumination and reflection modeling method based on conductive rendering
US20180012411A1 (en) Augmented Reality Methods and Devices
Mandl et al. Learning lightprobes for mixed reality illumination
WO2022165809A1 (en) Method and apparatus for training deep learning model
CN101422035B (en) Light source estimation device, light source estimation system, light source estimation method, device having increased image resolution, and method for increasing image resolution
CN111783525A (en) Aerial photographic image target sample generation method based on style migration
CN113572962B (en) Outdoor natural scene illumination estimation method and device
US11797863B2 (en) Systems and methods for synthesizing data for training statistical models on different imaging modalities including polarized images
CN109815847B (en) Visual SLAM method based on semantic constraint
Geng et al. Using deep learning in infrared images to enable human gesture recognition for autonomous vehicles
US20220335682A1 (en) Generating physically-based material maps
CN112365604A (en) AR equipment depth of field information application method based on semantic segmentation and SLAM
KR20220117324A (en) Learning from various portraits
Zhu et al. Spatially-varying outdoor lighting estimation from intrinsics
Karakottas et al. 360 surface regression with a hyper-sphere loss
Condorelli et al. A comparison between 3D reconstruction using nerf neural networks and mvs algorithms on cultural heritage images
Yu et al. Hierarchical disentangled representation learning for outdoor illumination estimation and editing
CN115115776A (en) Single image three-dimensional human body reconstruction method and device based on shadow
KR102291162B1 (en) Apparatus and method for generating virtual data for artificial intelligence learning
Mittal Neural Radiance Fields: Past, Present, and Future
CN116958396A (en) Image relighting method and device and readable storage medium
CN108447085B (en) Human face visual appearance recovery method based on consumption-level RGB-D camera
CN114491694A (en) Spatial target data set construction method based on illusion engine
Sneha et al. A Neural Radiance Field-Based Architecture for Intelligent Multilayered View Synthesis
Bi et al. SIR-Net: Self-Supervised Transfer for Inverse Rendering via Deep Feature Fusion and Transformation From a Single Image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant