CN113223132A

CN113223132A - Indoor scene virtual roaming method based on reflection decomposition

Info

Publication number: CN113223132A
Application number: CN202110429676.2A
Authority: CN
Inventors: 许威威; 许佳敏; 吴秀超; 朱紫涵; 鲍虎军
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2021-08-06
Anticipated expiration: 2041-04-21
Also published as: CN113223132B

Abstract

The invention discloses an indoor scene virtual roaming method based on reflection decomposition, which comprises the steps of firstly, obtaining a rough global triangular mesh model projection as an initial depth map by utilizing three-dimensional reconstruction, aligning the depth edge to a color edge, and converting the aligned depth map into a simplified triangular mesh; detecting planes in the global triangular mesh model, if a certain plane is a reflection plane, constructing a double-layer expression on a reflection area for each picture which can see the reflection plane for correctly rendering the reflection effect of the surface of an object; and giving a virtual visual angle, drawing the virtual visual angle picture by using the neighborhood picture and the triangular mesh, and drawing the reflection area by using the foreground background picture and the foreground background triangular mesh. The method can perform virtual roaming with large degree of freedom in a large indoor scene with a reflection effect under the condition of smaller storage requirement. The method has the advantages of good rendering effect, larger roaming freedom degree, capability of drawing partial reflection, highlight and other effects, and robust result.

Description

Indoor scene virtual roaming method based on reflection decomposition

Technical Field

The invention relates to the technical field of picture-based rendering and virtual viewpoint synthesis, in particular to a method for performing indoor scene virtual roaming by combining a picture-based rendering technology with reflection decomposition.

Background

The purpose of the virtual roaming of the indoor scene is to construct a system, give internal and external parameters of a virtual camera and output a drawing picture of a virtual viewpoint. The existing mature virtual roaming application is mainly based on a series of panoramic pictures, each panoramic picture is taken as a center to perform pure-rotation virtual roaming, most systems for moving panoramic pictures are performed by using simple interpolation, and the visual error is large at this time. For large degrees of freedom virtual roaming, there are many methods available to make object level observations or view point movement observations of a part of a scene, including explicit acquisition of The light field around The target object using a light field camera, see Gortler, Steven j., et al, "The lumigraph," Proceedings of The 23rd annual Conference on Computer graphics and interactive technology.1996, or using The photographs of ordinary cameras to make scene expression and interpolation using a neural network, see millenhall, Ben, et al, "rf: Representing scenes as a new radial field for view synthesis," Proceedings of The European Conference Computer vision.2020. For larger indoor scenes, the current latest methods can achieve rendering with relatively Free viewpoints, but the rendering effect is not good enough, see Riegler and koltun, "Free View synthesis," Proceedings of the European Conference on Computer vision.2020. In particular, for the various reflection types (ground, table, mirror, etc.) present in large indoor scenes, there is still no system that can better handle indoor roaming with such complex materials.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an indoor scene virtual roaming method based on reflection decomposition, which can perform virtual roaming with large degree of freedom in a large indoor scene with reflection effect under the condition of smaller storage requirement.

In order to achieve the purpose, the invention adopts the following technical scheme: an indoor scene virtual roaming method based on reflection decomposition comprises the following steps:

s1: shooting a picture of a scene which is sufficiently covered in a target indoor scene, and performing three-dimensional reconstruction on the indoor scene based on the shot picture to obtain a rough global triangular mesh model of the indoor scene and the inside and outside parameters of the camera;

s2: for each picture, projecting the global triangular mesh model into a corresponding depth map, aligning the depth edge to a color edge, converting the aligned depth map into a triangular mesh, and carrying out mesh simplification on the triangular mesh;

s3: detecting a plane in the global triangular mesh model, detecting whether the plane is a reflection plane or not by utilizing the color consistency between adjacent images, and if so, constructing a double-layer expression on a reflection area for each picture which can see the reflection plane for correctly rendering the reflection effect of the surface of an object;

the double-layer expression comprises a foreground-background double-layer triangular mesh and two decomposed images of a foreground and a background, the foreground triangular mesh is used for expressing the surface geometry of an object, the background triangular mesh is used for expressing the mirror image of the scene geometry on a reflecting plane, the foreground image is used for expressing the surface texture of the object after the reflection component is removed, and the background image is used for expressing the reflection component of the scene on the surface of the object;

s4: and giving a virtual visual angle, drawing the virtual visual angle picture by using the neighborhood picture and the triangular mesh, and drawing the reflection area by using the foreground background picture and the foreground background triangular mesh.

Further, in S2, aligning the depth edge of the depth map to the color edge of the original picture, and acquiring the aligned depth map, specifically:

firstly, a normal map corresponding to the depth map is calculated, and then for each pixel i in the depth map, the depth value d of each pixel i is calculated_iThree-dimensional point v converted into local coordinate system according to camera internal reference_iCalculating the planar distance dt between adjacent points i, j_ij＝max(|(v_i-v_j)·n_i|,|(v_i-v_j)·n_j|)，n_i,n_jNormal vectors to points i, j, respectively, if dt_ijGreater than λ max (1, min (d)_i,d_j) Record the pixel as a depth edge pixel, where λ is an edge detection threshold;

for each picture, after all depth edge pixels are obtained, calculating the local two-dimensional gradient of the depth edge by utilizing Sobel convolution, and then traversing one pixel by one pixel along the direction of the edge two-dimensional gradient and the opposite direction thereof by taking each depth edge pixel as a starting point until one of two sides traverses to a color edge pixel; after traversing to the color edge pixel, deleting the depth values of all pixels of the intermediate path from the starting point pixel to the color edge pixel; and defining the pixel of the deleted depth value as a non-aligned pixel, defining the pixel of the non-deleted depth value as an aligned pixel, and carrying out interpolation filling by using the peripheral non-deleted depth value for each deleted depth value.

Further, for each deleted depth value, performing interpolation filling by using surrounding undeleted depth values, specifically: for each unaligned pixel p to be interpolated_iCalculate its geodesic distance d to all other aligned pixels_g(p_i,p_j) Finding m nearest aligned pixels by using geodesic distance, and calculating interpolated depth value

Wherein

Representing a pixel p_iIs aligned to the nearest neighbor set of pixels, w_g(i,j)＝exp(-d_g(p_i,p_j))，

Represents a pixel p_iProjection to p_jThe local plane equation of (a) is composed of v_jAnd n_jAnd (6) calculating.

Further, in S3, detecting a plane and a reflection plane in the global triangular mesh model specifically includes:

detecting a plane in the global triangular mesh model, reserving the plane with the area larger than the area threshold value, projecting the plane onto a visible picture, and recording the picture set with the visible plane as

For the

Each picture I in_kCalculate its K neighbor picture set

K neighbor calculation is obtained according to the overlapping rate of the vertexes in the global triangular mesh model after plane reflection;

by using

Constructing a matching cost body, and judging that the plane is in the picture I_kWhether the reflection component is enough or not is judged by the following method: for each pixel, after mirroring the global triangular mesh model according to a plane equation, searching a cost corresponding to a mirrored depth value in a matched cost body, judging whether a cost position is a local minimum point, if the number of pixels of the local minimum point in the picture is greater than a pixel number threshold, considering that the plane has a reflection component on the picture, and if the number of visible pictures with reflection components on a certain plane is greater than a picture number threshold, considering that the plane is a reflection plane.

Further, in S3, for each reflection plane, the two-dimensional reflection area β on each visible picture is calculated_kThe method specifically comprises the following steps: projecting the reflection plane onto a visible image to obtain a projection depth map, performing expansion operation on the projection depth map, comparing the expanded projection depth map with the aligned depth map to obtain an accurate two-dimensional reflection area, and projecting the projection depth map to a depth map of the two-dimensional reflection areaEach pixel with a depth value in the depth map is screened by utilizing the three-dimensional point distance and the normal included angle, and the screened pixel area is taken as the reflection area beta of the reflection plane on the picture_k。

Further, in S3, for each picture that can see the reflection plane, a double-layer expression is constructed on the reflection area, specifically:

taking the projection depth map as an initial foreground depth map, mirroring the camera internal and external parameters of the image into a virtual camera according to the plane equation, rendering the initial background depth map in the virtual camera by using a global triangular mesh model, and converting the initial foreground background depth map into a simplified two-layer triangular mesh

And

calculating two layers of foreground and background pictures by using iterative optimization algorithm

And

and further optimize

And

before optimization, all related original pictures are subjected to inverse gamma correction in advance for subsequent decomposition;

the goal of the optimization is to minimize the energy function:

wherein in the optimization objective

Representing the rigid body transformation of the triangular mesh of the reflecting layer, the initial values of which are respectively an identity matrix and 0,

and

optimizing only the three-dimensional positions of the vertices of the mesh without changing the topology, E_d、E_s、E_pRespectively a data term, a smoothing term, a prior term, lambda_s、λ_pU represents the weight of each term

A pixel of (1); specifically, the method comprises the following steps:

wherein H is a Laplace matrix; function omega^-1Returning the two-dimensional coordinates, and imaging I_k′Point u in (1) is projected to image I according to depth value and camera internal and external parameters_k；

To represent

Projection is obtainedA depth map of (a); v represents

A vertex in (1);

to minimize the energy function, an alternating optimization scheme is used, which is first fixed for each round of optimization

And

optimization

Wherein

Is calculated using the following formula:

giving an initial value, and optimizing by using a nonlinear conjugate gradient method; then, fix

Optimization

And

the conjugate gradient method was also used; alternately optimizing for one round at a time, carrying out two rounds of optimization in the whole optimization process, and after the first round of optimization, utilizing the consistency constraint of the foreground pictures among a plurality of visual angles to carry out the first round of optimization

De-noising, in particular, known after a first round of optimization

And

acquiring a denoised image by using the following formula

And

use of

And

as

Continues the second round of optimization, further, adds a prior term to the total energy equation in the second round of optimization:

wherein λ_gThe prior term weight is used for restricting the second round of optimization;

after two rounds of optimization, utilize

Transformation of

Obtaining a final two-layer simplified triangular mesh

And after decomposition

For correctly rendering the reflection effect of the object surface.

Further, in S4, a neighborhood picture set is calculated according to the inside and outside parameters of the virtual camera, the local coordinate system of the current virtual camera is divided into 8 quadrants according to the coordinate axis plane, a series of neighborhood pictures are further selected in each quadrant, and the optical center direction of the pictures is utilized

And virtual camera optical center direction

Angle of (2)

And picture optical center t_kAnd virtual camera optical center t_nDistance | t_k-t_nII, dividing each quadrant into a plurality of areas again; then, in each region, a similarity d is selected_kThe smallest 1 picture is added to the neighborhood picture set,

wherein λ is a distance proportion weight;

after the neighborhood picture set is obtained, drawing each picture in the neighborhood picture set to a virtual viewpoint according to the corresponding simplified triangular mesh, specifically:

a) computing a robust depth map for patch renderingFor each pixel of the device, its rendering cost c (t) is calculated_k,t_n,x)：

c(t_k,t_n,x)＝∠(t_k-x,t_n-x)*π/180+max(0,1-‖t_n-x‖/‖t_k-x‖)

Wherein t is_kAnd t_nRepresenting the three-dimensional coordinates of optical centers of a picture and a virtual camera, x represents the three-dimensional coordinates of a corresponding three-dimensional point of the pixel, each pixel has a series of triangular patches rendered, wherein, the points represent the intersection points of light rays determined by the patches and the pixels, if the rendering cost of a certain point is more than the minimum rendering cost of all the points in the pixel plus a range threshold value lambda, the point does not participate in the computation of the depth map, and thus the depths of all the points participating in the computation of the depth map are compared to obtain the minimum value which is taken as the depth value of the pixel;

b) calculating the depth map of the virtual camera, adding the picture into a triangular mesh for drawing as a texture map, and regarding the pixel of each virtual camera picture, the color of a point near the depth map is set according to a set weight w_kAnd mixing to obtain the final rendering color.

Further, in S4, the reflection region β in the neighborhood picture is divided into two_kAlso drawing to the current virtual viewpoint to obtain the reflection area beta of the current virtual viewpoint_nFor the pixels in the reflection area, the two layers of images of the foreground and the background and the simplified triangular mesh are required to be used for drawing, and the calculation of the depth map and the color mixing are respectively carried out on the two layers of images because of the two layers of images

And

the method is obtained by decomposition after inverse gamma correction is carried out, and in a rendering stage, two layers of mixed pictures are added, and a correct picture with a reflection effect is obtained by carrying out gamma correction once.

Further, in S4, in order to reduce the storage size, all pictures are downsampled to 1/n for storage, n is greater than or equal to 1, and the virtual window is set to the original size during rendering.

Further, training a super-resolution neural network to compensate definition loss caused by downsampling of a stored picture, and reducing possible drawing errors, specifically:

after the depth picture and the color picture are rendered and obtained at each new virtual visual angle, rendering errors are reduced and definition is improved by using a depth neural network; the network uses the color picture and the depth picture of the current frame plus the color picture and the depth picture of the previous frame as input; firstly, a three-layer convolution network is utilized to respectively extract features of a depth color picture of a current frame and a depth color picture of a previous frame, next, the features of the previous frame are mapped to the current frame in a distortion mode, the initial correspondence is obtained by depth map calculation, an alignment module is utilized to further fit a local two-dimensional offset to further align the features of the previous frame and the next frame, the aligned features of the previous frame and the next frame are combined and input into a super-resolution module realized through a U-Net convolution neural network, and a high-definition picture of the current frame is output.

The invention has the beneficial effects that:

1. a set of complete flow is constructed, a large amount of shooting data can be processed, and virtual viewpoint roaming with larger degree of freedom is realized for large-scale indoor scenes;

2. detecting a reflecting surface in an indoor scene and a reflecting area in a picture, and constructing double-layer expression on the reflecting area, so that a reflecting effect can be better rendered in the roaming process of the indoor scene, and the rendering sense of reality is greatly improved;

3. by connecting a special super-resolution neural network, the method reduces the rendering error and reduces the picture resolution required by supporting the roaming of a single scene, thereby reducing the storage and memory consumption.

Drawings

Fig. 1 is a flowchart of an indoor scene virtual roaming method based on reflection decomposition according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a global triangular mesh model according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a two-layer expression construction result of a reflection region according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a rendering result of a virtual viewpoint with reflection according to an embodiment of the present invention;

FIG. 5 is a comparison graph of the results of whether to use a super-resolution neural network according to an embodiment of the present invention;

fig. 6 is a diagram of a super-resolution neural network structure according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following drawings and specific embodiments, it being understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, an embodiment of the present invention provides a method for indoor scene virtual roaming based on reflection decomposition, where the method includes the following steps:

(1) the method comprises the steps of shooting pictures of enough covered scenes in a target indoor scene, carrying out three-dimensional reconstruction on the indoor scene based on the shot pictures, and acquiring a rough global triangular mesh model of the indoor scene, which is referred by the inside and the outside of a camera, as shown in figure 2.

Specifically, a three-dimensional reconstruction software COLMAP or RealityCapture can be adopted to acquire the camera internal and external parameters and the global triangular mesh model.

(2) And for each picture, projecting the global triangular mesh model into a corresponding depth map, aligning the depth edge to the color edge, converting the aligned depth map into a triangular mesh, and carrying out mesh simplification on the triangular mesh.

Specifically, since the global triangular mesh model includes some errors, the depth edge of the projected depth map is aligned to the color edge of the original picture, and the aligned depth map is obtained, which specifically includes the following steps:

firstly, a normal map corresponding to the depth map is calculated, and then for each pixel i in the depth map, the depth value d of each pixel i is calculated_iThree-dimensional point v converted into local coordinate system according to camera internal reference_iCalculating the planar distance dt between adjacent points i, j_ij＝max(|(v_i-v_j)·n_i|,|(v_i-v_j)·n_j|)，n_i,n_jNormal vectors to points i, j, respectively, if dt_ijGreater than λ max (1, min (d)_i,d_j) It is regarded as a depth edge pixel, where λ is an edge detection threshold, and λ is 0.01 in this embodiment.

For each picture, after all depth edge pixels are obtained, calculating a local two-dimensional gradient of a depth edge by utilizing Sobel convolution, and then traversing one pixel by one pixel along the direction of the edge two-dimensional gradient and the opposite direction thereof by taking each depth edge pixel as a starting point until one of two sides traverses to a color edge pixel, wherein the color edge pixel is obtained by a Canny edge extraction algorithm; after traversing to the color edge pixel, deleting the depth values of all pixels of the intermediate path from the starting point pixel to the color edge pixel; defining the pixel with the deleted depth value as a non-aligned pixel, defining the pixel with the non-deleted depth value as an aligned pixel, and for each deleted depth value, carrying out interpolation filling by using the peripheral non-deleted depth value, specifically, for each non-aligned pixel p to be interpolated_iCalculate its geodesic distance d to all other aligned pixels_g(p_i,p_j) See Revaud, Jerome, et al, "Edge-predicting interpolation of corrections for optical flow," Proceedings of the IEEE conference on computer vision and pattern registration. 2015, finding m (m 4 in this example) nearest aligned pixels using geodesic distance, calculating interpolated depth value

Wherein

Specifically, after the depth maps are aligned, the aligned depth maps are converted into triangular meshes, specifically: and converting the depth value into a three-dimensional coordinate, connecting all the horizontal and vertical edges and connecting one oblique edge, and disconnecting the corresponding edge when the depth edge in the previous step is met to obtain the triangular mesh.

Specifically, a mesh simplification algorithm is called to perform mesh simplification on the generated triangular mesh, see Garland, Michael, and Paul S.Heckbert. "Surface location using square error metrics." Proceedings of the 24th annual reference on Computer graphics and interactive techniques.1997.

(3) Detecting a plane in the global triangular mesh model, detecting whether the plane is a reflection plane or not by utilizing the color consistency between adjacent images, and if so, constructing a double-layer expression on a reflection area for each picture which can see the reflection plane for correctly rendering the reflection effect of the surface of the object. Fig. 3 is a schematic diagram of a reflection region double-layer expression construction result provided in the embodiment of the present invention.

The double-layer expression comprises a foreground-background double-layer triangular mesh and two decomposed images of a foreground and a background, the foreground triangular mesh is used for expressing the surface geometry of an object, the background triangular mesh is used for expressing the mirror image of the scene geometry on a reflecting plane, the foreground image is used for expressing the surface texture of the object after reflection components are removed, and the background image is used for expressing the reflection components of the scene on the surface of the object.

Specifically, first, a plane in the global triangular mesh model is detected, and a plane with an area larger than an area threshold (in this embodiment, the area threshold is 0.09 m)²) Projecting a plane on the visible picture, and recording the picture set in which the plane is visible as

For the

Each picture I in_kThen, K-nearest neighbor (K equals 6 in this embodiment) picture set is calculated

The K neighbors are obtained by sequencing the overlapping rate of the vertexes in the global triangular mesh model after plane reflection, and the K neighbors comprise a picture I_kBy itself, the overlap ratio must be highest. Then utilize

Constructing a matching cost (cost volume), see Sinha, Sudipt N., et al, "Image-based rendering for scenes with transformations," ACM Transformations On Graphics (TOG)31.4(2012):1-10, judging that the plane is in picture I_kSpecifically, for each pixel, after mirroring the global triangular mesh model according to the plane equation, finding a cost corresponding to the mirrored depth value in the matching cost body, determining whether the cost position is a local minimum point, if the number of pixels of the local minimum point in the picture is greater than the threshold of the number of pixels (50 in this embodiment), the plane is considered to have a reflection component on the picture, and if the number of visible pictures with reflection components on a plane is greater than the threshold of the number of pictures (5 in this embodiment), the plane is considered to be a reflection plane.

Specifically, for each reflection plane, the two-dimensional reflection area β of the reflection plane on each visible picture is calculated_kSpecifically, a reflection plane (with a three-dimensional boundary) is projected onto a visible picture to obtain a projection depth map, the projection depth map is subjected to a dilation operation (a 9x9 window may be adopted), then the dilated projection depth map is compared with the depth map aligned in the previous step to obtain an accurate two-dimensional reflection area, for each pixel with a depth value in the projection depth map, a three-dimensional point distance and a normal included angle are used for screening (a three-dimensional point distance is less than 0.03 meter and the normal included angle is less than 60 degrees is reserved), and the screened pixel area is used as a reflection area β of the reflection plane on the picture_k(ii) a Meanwhile, the plane equation of the plane is utilized to obtain an initial two-layer depth map,specifically, the projection depth map is used as an initial foreground depth map, camera internal and external parameters of the image are mirrored into a virtual camera according to the plane equation, then the initial background depth map is rendered in the virtual camera by using a global triangular mesh model, attention is paid to the fact that a rendered near clipping plane needs to be modified into a reflection plane, and then the initial foreground background depth map is converted into a simplified two-layer triangular mesh according to the method of the step 2)

And

next, calculating two layers of foreground and background pictures by using an iterative optimization algorithm

And

and further optimize

And

before optimization, all relevant original pictures are pre-gamma corrected for subsequent decomposition.

The goal of the optimization is to minimize the energy function:

wherein in the optimization objective

and

optimizing only the three-dimensional positions of the vertices of the mesh without changing the topology, E_d、E_s、E_pRespectively a data term, a smoothing term, a prior term, lambda_s、λ_pAre the weights of the terms 0.04 and 0.01, respectively, u represents

A pixel of (1); specifically, the method comprises the following steps:

To represent

Projecting the obtained depth map; v represents

The vertex in (1).

In order to minimize the aboveEnergy function, using alternate optimization scheme, for each round of optimization, is first fixed

And

optimization

Wherein

Is calculated using the following formula:

giving an initial value, and optimizing by using a nonlinear conjugate gradient method, wherein the iteration number is 30; then, fix

Optimization

And

the conjugate gradient method is also used, and 30 times of iteration is carried out; alternately one round of optimization at a time, performing two rounds of optimization in total in the whole optimization process, and after the first round of optimization, utilizing the consistency constraint of the foreground pictures (surface colors) among a plurality of view angles to perform the first round of optimization

De-noising, in particular, known after a first round of optimization

And

acquiring a denoised image by using the following formula

And

use of

And

as

wherein the prior term weight λ_gEqual to 0.05 for constraining the second round of optimization.

After two rounds of optimization, utilize

Transformation of

Obtaining a final two-layer simplified triangular mesh

And after decomposition

For correctly rendering the reflection effect of the object surface.

(4) And giving a virtual visual angle, drawing the virtual visual angle picture by using the neighborhood picture and the triangular mesh, and drawing the reflection area by using the foreground background picture and the foreground background triangular mesh. Fig. 4 is a schematic diagram of a rendering result of a virtual viewpoint with reflection according to an embodiment of the present invention.

Specifically, the online rendering process targets internal and external parameters of a given virtual camera, and the output is a virtual picture corresponding to the virtual camera. Specifically, the method comprises the following steps: calculating a neighborhood picture set according to internal and external parameters of the virtual camera, dividing a local coordinate system of the current virtual camera into 8 quadrants according to a coordinate axis plane, further selecting a series of neighborhood pictures in each quadrant, and utilizing the optical center direction of the pictures

And virtual camera optical center direction

Angle of (2)

And picture optical center t_kAnd virtual camera optical center t_nDistance | t_k-t_nII, dividing each quadrant into a plurality of areas again; preferably, the division is 9 regions, and the 9 regions are

In the range of [0 °,10 °, [10 °,20 °, [20 °, ∞), ] t_k-t_nII in the permutation and combination of [0,0.6 ], [0.6,1.2 ], [1.2,1.8) of the respective three intervals; in-line with the aboveThen, in each region, a similarity d is selected_kAdding the smallest 1 picture into the neighborhood picture set:

wherein the distance-to-weight λ is equal to 0.1.

a) a robust depth map is calculated, and for each pixel of a patch shader, the rendering cost c (t) is calculated_k,t_n,x)：

c(t_k,t_n,x)＝∠(t_k-x,t_n-x)*π/180+max(0,1-‖t_n-x‖/‖t_k-x‖)

Wherein, t_kAnd t_nRepresenting the three-dimensional coordinates of optical centers of the picture and the virtual camera, x represents the three-dimensional coordinates of a corresponding three-dimensional point of the pixel, each pixel has a series of triangular patches rendered to, here, "points" represent the intersection points of rays determined by the patches and the pixels, if the rendering cost of a certain point is too large and is larger than the minimum rendering cost of all points in the pixel + a range threshold value lambda, where lambda is 0.17 in the embodiment, the point does not participate in the calculation of the depth map, and thus the depths of all points participating in the calculation of the depth map are compared and taken as the minimum value, which is the depth value of the pixel.

b) Calculating the depth map of the virtual camera, adding the picture into a triangular mesh for drawing as a texture map, and regarding the pixel of each virtual camera picture, the color of a point near the depth map (with the distance less than 3cm) is set according to a set weight w_k(w_k＝exp(-d_k/0.033)) to obtain a final rendering color.

Specifically, a reflection region beta in a neighborhood picture is determined_kAlso drawing to the current virtual viewpoint to obtain the reflection area beta of the current virtual viewpoint_nFor pixels in the reflective region, it is advantageousDrawing by using the pictures of the two layers of the foreground and the background and the simplified triangular mesh, and respectively carrying out depth map calculation and color mixing on the images of the two layers according to the steps, wherein the pictures of the two layers are the pictures

And

the method is obtained by decomposition after inverse gamma correction is carried out, two layers of mixed pictures are added in a rendering stage, and then the correct picture with the reflection effect is obtained by carrying out gamma correction once.

Specifically, in the rendering step, in order to reduce storage, all pictures are down-sampled to 1/n for storage (n is greater than or equal to 1, and n in this embodiment is 4), the size of the virtual window is set to the original size during rendering, so that the resolution of the rendered virtual viewpoint picture is unchanged but is fuzzy, and the definition is improved by using a super-resolution neural network in the next step.

(5) Training a super-resolution neural network to compensate definition loss caused by downsampling of a stored picture, and reducing possible drawing errors; fig. 5 is a comparison diagram showing results of whether to use the super-resolution neural network according to the embodiment of the present invention, and fig. 6 is a structural diagram of the super-resolution neural network according to the embodiment of the present invention.

Specifically, after the depth picture and the color picture are obtained in each new virtual visual angle rendering mode, the depth neural network is used for reducing rendering errors and improving definition. Specifically, the network uses the current frame color picture and the depth picture plus the previous frame color picture and the depth picture as input, and the purpose of using the previous frame picture and the next frame picture is to add more effective information and improve the timing sequence stability; firstly, a three-layer convolution network is utilized to respectively extract features of a depth color picture of a current frame and a depth color picture of a previous frame, next, the features of the previous frame are subjected to distortion mapping (warp) to the current frame, the initial correspondence is obtained by the depth map calculation, an alignment module (realized by the convolution neural network of three layers of convolution layers) is utilized to further fit a local two-dimensional offset to further align the features of the previous frame and the next frame due to the fact that the depth map is not completely accurate, the aligned features of the previous frame and the next frame are combined (concat) and input into a super-resolution module (realized by the U-Net convolution neural network), and a high-definition picture of the current frame is output.

In one embodiment, a computer device is provided, which includes a memory and a processor, the memory stores computer readable instructions, and when executed by the processor, the processor executes the steps in the indoor scene virtual roaming method based on reflection decomposition in the embodiments.

In one embodiment, a storage medium storing computer readable instructions is provided, and the computer readable instructions, when executed by one or more processors, cause the one or more processors to perform the steps of the reflection decomposition-based indoor scene virtual roaming method in the embodiments. The storage medium may be a nonvolatile storage medium.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims

1. An indoor scene virtual roaming method based on reflection decomposition is characterized by comprising the following steps:

2. The method according to claim 1, wherein in S2, aligning the depth edge of the depth map to the color edge of the original picture to obtain an aligned depth map, specifically:

3. The method as claimed in claim 2, wherein for each deleted depth value, interpolation filling is performed by using surrounding undeleted depth values, specifically: for each unaligned pixel p to be interpolated_iCalculate its geodesic distance d to all other aligned pixels_g(p_i,p_j) Finding m nearest aligned pixels by using geodesic distance, and calculating interpolated depth value

Wherein

Representing a pixel p_iIs aligned to the nearest neighbor set of pixels, w_g(i,h)＝exp(-d_g(p_i,p_j))，

4. The method of claim 1, wherein in S3, the detecting the plane and the reflection plane in the global triangular mesh model specifically includes:

For the

Each picture I in_kCalculate its K neighbor picture set

by using

5. The method of claim 1, wherein in the step S3, for each reflection plane, a two-dimensional reflection area β on each visible picture is calculated_kThe method specifically comprises the following steps: projecting the reflecting plane onto a visible pictureObtaining a projection depth map, performing expansion operation on the projection depth map, comparing the expanded projection depth map with the aligned depth map to obtain an accurate two-dimensional reflection area, screening each pixel with a depth value in the projection depth map by using the distance between three-dimensional points and the included angle of a normal line, and taking the screened pixel area as a reflection area beta of the reflection plane on the image_k。

6. The method according to claim 5, wherein in step S3, a double-layer representation is constructed on the reflection area for each picture that can see the reflection plane, specifically:

And

And

and further optimize

And

before optimization, all related original pictures are subjected to inverse gamma correction in advance and used for post-processingContinuing to decompose;

the goal of the optimization is to minimize the energy function:

wherein in the optimization objective

and

A pixel of (1); specifically, the method comprises the following steps:

To represent

Projecting the obtained depth map; v represents

A vertex in (1);

And

optimization

Wherein

Is calculated using the following formula:

Optimization

And

De-noising, in particular, known after a first round of optimization

And

acquiring a denoised image by using the following formula

And

use of

And

as

after two rounds of optimization, utilize

Transformation of

Obtaining a final two-layer simplified triangular mesh

And after decomposition

For correctly rendering the reflection effect of the object surface.

7. The method of claim 1, wherein in step S4, a neighborhood picture set is calculated according to the inside and outside parameters of the virtual camera, the local coordinate system of the current virtual camera is divided into 8 quadrants according to the coordinate axis plane, and in each quadrant, the local coordinate system is further divided into 8 quadrantsSelecting a series of neighborhood pictures, and utilizing the optical center direction of the pictures

And virtual camera optical center direction

Angle of (2)

wherein λ is a distance proportion weight;

c(t_k，t_n，x)＝∠(t_k-x，t_n-x)*π/180+max(0，1-||t_n-x||/||t_k-x||)

8. The method of claim 1, wherein in step S4, the reflection region β in the neighborhood picture is determined according to the reflection decomposition_kAlso drawing to the current virtual viewpoint to obtain the reflection area beta of the current virtual viewpoint_nFor the pixels in the reflection area, the two layers of images of the foreground and the background and the simplified triangular mesh are required to be used for drawing, and the calculation of the depth map and the color mixing are respectively carried out on the two layers of images because of the two layers of images

And

9. The method of claim 1, wherein in step S4, in order to reduce the storage size, all pictures are down-sampled to 1/n storage, n ≧ 1, and the virtual window is set to the original size during rendering.

10. The method for virtually roaming indoor scenes based on reflection decomposition as claimed in claims 1-9, wherein training a super-resolution neural network compensates for the loss of sharpness caused by downsampling of stored pictures while reducing possible rendering errors, specifically: