CN115457192A

CN115457192A - Method for realizing NERF new visual angle synthetic model based on self-integration module

Info

Publication number: CN115457192A
Application number: CN202210911889.3A
Authority: CN
Inventors: 刘振阳; 陈天笑; 周鑫龙; 郭欢磊; 原俊青; 郭羿宏; 孙淳; 林宁宇; 丁佳骏
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-12-09

Abstract

The invention discloses a realization method of a NERF new visual angle synthesis model based on a self-integration module. The invention comprises the following steps: step 1: selecting image data, and selecting training data open source in googledrive by an NERF author, wherein the training data comprises generated music height data and scene data, and the music height data and the scene data are respectively stored in two folders; and 2, step: constructing a NERF new perspective synthesis model based on a self-integration module, wherein the NERF new perspective synthesis model comprises the steps of acquiring scene representation of NERF; voxel rendering based on a scene representation; and constructing a self-integration module. According to the invention, the self-integration module is added into the NERF nerve radiation field, so that the peak signal-to-noise ratio of an output image can be improved, and the image quality is improved. The self-integration module provided by the invention has an enhanced effect on reconstructing a 3D new visual angle and can better restore the specific details of a scene.

Description

Method for realizing NERF new visual angle synthetic model based on self-integration module

Technical Field

The invention provides a realization method of a NERF new visual angle synthesis model based on a self-integration module.

Background

New view synthesis can be divided into image-based methods, learning-based methods and geometry-based methods. Image-based methods warp and blend the relevant blobs in the observation frame, generating a new view based on the quality measurements. Learning-based methods predict the effects of hybrid weights and view dependence through neural networks and other heuristics. Deep learning also facilitates methods of predicting new views from a single image, but they typically require a large amount of data to train. Unlike image-based and learning-based methods, geometry-based methods first reconstruct a three-dimensional model and render images from the target pose. For example, aliev et al assign multi-resolution features to a point cloud and then perform neural rendering; thies et al store the neural texture on a three-dimensional grid and then use a conventional graphics pipeline to render a new view. Other geometric representations include multi-planar images, voxel grids, depths, and hierarchical depths. These methods, while producing relatively high quality results, require extensive data and memory for the discrete representation, and the resolution of the rendering is limited by the accuracy of the reconstructed geometry.

Implicit neural representation has proven its usefulness in representing shapes and scenes, which typically utilizes multi-layer perceptrons (MLPs) to encode signed distance field distances fields, occupancy, or volume density. Together with micro-renderable, these methods can reconstruct the geometry and appearance of objects and scenes. Among other things, the neural radiation field (NeRF) achieves a significant effect in synthesizing new views of a static scene given a set of pose input images. The key idea of NeRF is to construct a continuous radiation field using MLP and acquire images by micro-volume rendering, so the optimization process can be achieved by minimizing photometric loss, and more NeRF extensions appear, for example: there is no reconstruction of the input camera pose, modeling non-rigid scene non-rigid scenes, unbounded scene and object class objects. NeRF has also been studied as being able to reactivate, generate, edit and 3D reconstruct.

In addition, mip-NeRF also considers the problem of NeRF solution. They show that NeRF rendering at different resolutions will introduce aliasing artifacts and resolve it by proposing an integrated position coding, with a cone instead of a single point. However, mip-NeRF only considers renderings with low sampling resolution, and there is currently no work to investigate how to improve the resolution of NeRF.

The work of the present invention is also related to super-resolution of images. Classical Single Image Super Resolution (SISR) methods utilize priors, such as image statistics or gradients. The CNN-based approach aims to learn the relationship between HR and LR images in CNN by minimizing the mean square error between the SR image and the ground truth. Generation of countermeasure networks (GANs), which produce high resolution details through countermeasure learning, are also popular in super resolution. Most of these methods derive knowledge from large-scale data sets or existing high-resolution and low-resolution training pairs. Furthermore, these two-dimensional image-based methods, especially GAN-based methods, do not take into account view consistency and are suboptimal for new view synthesis.

The existing method matches the correspondence between patch-match, feature extraction or HR reference of attention and LR input. Inspired by their work, we also focused on learning HR details from a given reference image. However, most of these methods are based on referencing super-resolution one SR input, and our method can refine details from all new views through one reference image.

Disclosure of Invention

The invention aims to provide a realization method of a NERF new visual angle synthesis model based on a self-integration module aiming at the defects of the prior art. The invention develops a new NERF nerve radiation field based on a self-integration module, and provides a new self-integration module to reconstruct a new high-quality visual angle image. The self-integration module is mainly used in a testing stage, and images with better reconstruction effect are used in the testing stage. The self-integrated module-based NERF neural rendering field demonstrated greater enhanced reconstruction capabilities than the original NERF, through extensive testing on the data set sourced by the NERF authors.

The technical scheme adopted by the invention for solving the technical problems is as follows:

step 1: selecting image data

Selecting training data which are open in the google drive by a NERF author, wherein the training data comprise generated music height data and scene data, and the music height data and the scene data are respectively stored in two folders;

step 2: constructing a NERF new perspective synthesis model based on the self-integration module.

Further, step 2 is specifically implemented as follows:

2-1, acquiring scene representation of NERF;

2-2, voxel rendering based on scene representation;

and 2-3, constructing a self-integration module.

Further, the step 2-1 is specifically realized as follows:

the input of the MLP is the three-dimensional space coordinate and the view angle corresponding to the source image

The output is (R, G, B, sigma), and after the scene representation (R, G, B, sigma) is obtained, a picture corresponding to the ray is rendered; wherein (x, y, z) represents three-dimensional space coordinates,

represents a view angle; (R, G, B) representsRGB color values, σ, denote the voxel density.

Further, the step 2-2 is specifically realized as follows:

2-2-1, sampling 64 points for a ray, wherein each sampling point corresponds to one point

Obtaining scene representation (R, G, B, sigma) of each sampling point through MLP;

2-2-2, rendering the ray by using a classical rendering formula to obtain a color value C (r) of a pixel corresponding to the ray, wherein the specific rendering process is as follows:

wherein r (t) is a formula of ray, and the value range of t is from a near-end point t _n And a far end point t _f I.e. the upper and lower limits of the integral; σ (r (t)) is the voxel density of the ray at point t, resulting from the results of the MLP prediction; c (r (t), d) is the RGB color value of the ray at point t, resulting from the prediction by MLP, where the notation d is the direction vector in ray o + td; t (T) is the transmittance of the ray at point T, and is obtained by integrating the voxel density sigma (r (T)), and the specific formula is

Solving the RGB color value of a certain pixel point in the picture by using the prediction result of the MLP;

2-2-3. Select the L1 loss, i.e., the mean square error loss function, as follows:

L1＝(RGB _NeRF -RGB _groundtruth ) ²

wherein, RGB _NeRF Is the RGB color values of a picture rendered for a scene of a particular perspective using NeRF, RGB _groundtruth Is the RGB color value of the real picture corresponding to that view.

Further, the steps 2 to 3 are specifically realized as follows.

The self-integration module sets and trains 8 completely independent models, respectively carries out 3D reconstruction on the same data set, and implicitly stores the same object or the same scene in different parameters of different models.

Furthermore, after each model trains 20000 epochs, 3D reconstruction is performed on the same view angle of the same scene respectively, so as to obtain the rendered RGB color value corresponding to the view angle.

Further, the loss of the rendered RGB color values and the RGB color values (grountrun) of the real picture is calculated and converted into psnr.

Further, the rendered RGB color values obtained by the 8 completely independent models are averaged to obtain the final RGB color value reconstructed under the target view angle.

Further, each completely independent model repeats the operations of the

steps

1 and 2, and different visual angles of the same scene in the same data set are rendered to obtain RGB color values.

Furthermore, the self-integration module is mainly embodied in a model testing stage.

The beneficial effects of the invention are realized as follows:

the present invention is a self-integrating process on models based on NERF. The self-integration module can better extract the texture features of the picture, thereby improving the reconstruction precision. According to the invention, the self-integration module is added into the NERF nerve radiation field, so that the peak signal-to-noise ratio of an output image can be improved, and the image quality is improved. Specifically, the invention trains the model on the original NERF, tests the model on the test data sets, and compares the results after self-integration with the NERF and NERF variants, and finds that the self-integration proposed by the invention has good reconstruction effect. The peak signal-to-noise ratio of data set Rain100L increased from 25.73 to 26.73. These phenomena reflect the enhancement of the self-integration module proposed by us on the reconstruction of 3D new viewing angles, which can better restore the details of the scene.

Drawings

FIG. 1 is a schematic diagram of a NERF new view synthesis model MLP prediction based on self-integrated modules.

FIG. 2 is a schematic diagram of NERF new perspective composition model rendering and loss construction based on self-integration modules.

Figure 3 is a diagram of the new perspective composite model effect of NERF from integrated modules.

Detailed Description

The invention will be further explained with reference to the drawings.

As shown in fig. 1-3, the method for implementing the new NERF view synthesis model based on the self-integration module includes the following specific steps:

step 1: selecting image data

Selecting training data which are open source in the google drive by NERF authors, wherein the training data comprise generated music height data and scene data, and the music height data and the scene data are respectively stored in two folders. Take the category "hot dog" in the scene data as an example, which contains a "train/val/test" folder and three json files containing the camera pose at the time of each picture taking. Specifically, in the reading process, information such as image, transform _ matrix and the like is read from the json file by a function of "load _ folder _ data".

Example 1:

in our experiments, a batch size of 4096 rays was used, with each ray sampled at Nc =64 coordinates in the coarse volume and Nf =128 additional coordinates in the fine volume. Using Adam optimizer, learning rate is from 5 × 10 ^-4 Start and decay exponentially to 5 x 10 ^-5 During the optimization (other Adam hyper-parameters remain as default values β 1=0.9, β 2= 0.999). Optimization of a single scene typically requires about 100 ^–300k The number of iterations can converge on a single NVIDIA V100 GPU (approximately 1-2 days).

Step 2: constructing a NERF new visual angle synthetic model based on a self-integration module;

2-1 acquiring scene characterization of NERF

NERF uses a voxel-based characterization, fitting with MLP. As shown in fig. 1.

The input of the MLP is the corresponding three-dimensional space coordinates and view angle of the source image

The output is (R, G, B, σ), representing RGB color values and voxel densities, respectively. And rendering a picture corresponding to the ray after obtaining the scene representation (R, G, B, sigma). Wherein (x, y, z) represents three-dimensional space coordinates,

represents a view angle; (R, G, B) denotes RGB color values, and σ denotes voxel density.

Because of the limitation of MLP extracting high frequency information, adopting the skill of position coding in NLP field to input

And the function of a high frequency domain is easier to fit by carrying out position coding, and the performance is improved.

2-2 voxel rendering based on scene characterization

Voxel rendering is based on ray casting and is widely used in new view synthesis tasks. NERF itself is a method based on three-dimensional voxel rendering to convert the characterizing information of a scene into a 2D image. As shown in fig. 2.

2-2-1, first, sampling 64 points for a ray, each sampling point corresponding to one

wherein r (t) is a formula of ray, and the value range of t is from a near-end point t _n And a far end point t _f I.e. the upper and lower limits of the integral; σ (r (t)) is the voxel density of the ray at point t, resulting from the results of the MLP prediction; c (r (t), d) isThe RGB color values of the line at point t, resulting from the prediction by MLP, where the symbol d is the direction vector in ray o + td; t (T) is the transmittance of the ray at point T, and is obtained by integrating the voxel density sigma (r (T)), and the specific formula is

Therefore, the prediction result of the MLP is utilized to solve the RGB color value of a certain pixel point in the picture.

L1 loss＝(RGB _NeRF -RGB _groundtruth ) ²

wherein, RGB _NeRF Is picture color information, RGB, rendered for a scene of a particular perspective using NeRF _groundtruth Is the true picture color information corresponding to that viewing angle. And continuously optimizing the NeRF model through gradient calculation and a gradient descent method, so that the quality of the NeRF rendered picture is improved.

And 2-3, constructing a self-integration module.

The self-integration module is mainly embodied in a model testing stage. As shown in fig. 3, the self-integration module sets and trains 8 completely independent models, respectively performs 3D reconstruction on the same data set, and implicitly stores the same object or the same scene in different parameters of different models.

After 20000 epochs are trained by each model, 3D reconstruction is carried out on the same visual angle of the same scene respectively, and the rendered RGB color value corresponding to the visual angle is obtained.

The loss of RGB color values after rendering and RGB color values (grountuth) of the real picture is calculated and converted into psnr.

And averaging the rendered RGB color values obtained by the 8 completely independent models to obtain the reconstructed final RGB color value under the target view angle. The method fully utilizes the randomness of the models and the strong information fusion capability of the ensemble learning to extract and fuse the reconstruction results of each model for the same task to obtain a surprising result.

Furthermore, each completely independent model repeats the operations of the

steps

Comparison of Algorithm Performance in step 3

Experiments were performed on the synthetic rendered data sets of the two objects. The DeepVoxels20 dataset contains four Lambertian objects with simple geometric shapes. Each object appears as 512 × 512 pixels from an upper-hemisphere sampled viewpoint. We render 100 views of each scene as input, 200 for testing, all of which are 800 x 800 pixels. The method is based on a self-integration process of the model on NERF. The integration module in the framework can better extract the texture features of the picture, thereby improving the reconstruction precision. The module is added to the NERF nerve radiation field, so that the peak signal-to-noise ratio of an output image can be improved, and the image quality is improved. In particular, we trained the model on the original NERF and tested the model on these test datasets and compared the results of our self-integration with NERF and NERF variants, as shown in table 1, and found that our proposed self-integration had a very good reconstruction effect. The peak signal-to-noise ratio of the data set increased from 25.7317 to 26.7329, i.e., we improved the performance by about one percent compared to the original NeRF. Although only one percentile, it is a huge step-over for the optimization of the original higher quality picture. Currently, the NeRF variants in the world, such as the Mip-NeRF, can only be improved by approximately 0.2 percentage points. Therefore, the self-integration module provided by the inventor has a quite large effect enhancement amplitude for 3D new visual angle reconstruction, can better restore the specific details of a scene, and has certain referential significance for promoting the progress of a 3D reconstruction task technology.

Table 1: NERF and NERF based on self-integrated modules for comparison

Method	PSNR	SSIM
			NERF	25.7317	0.987
NERF based on self-integrated modules	26.7329	0.992

。

Claims

1. The method for realizing the NERF new visual angle synthesis model based on the self-integration module is characterized by comprising the following steps of:

step 1: selecting image data

Selecting training data open source in googledrive by a NERF author, wherein the training data comprises generated music height data and scene data, and the music height data and the scene data are respectively stored in two folders;

2. The method of claim 1, wherein step 2 is implemented as follows:

2-1, acquiring scene representation of NERF;

2-2, voxel rendering based on scene representation;

and 2-3, constructing a self-integration module.

3. The method according to claim 2, wherein step 2-1 is implemented as follows:

represents a view angle; (R, G, B) represents RGB color values, and σ represents voxel density.

4. The method for implementing a new angle-of-view synthesis model of NERF based on self-integrated modules as claimed in claim 2, wherein step 2-2 is implemented as follows:

2-2-1, sampling a ray for 64 points, wherein each sampling point corresponds to one

wherein r (t) is a formula of ray, and the value range of t is from a near-end point t _n And a far end point t _f I.e. the upper and lower limits of the integral; σ (r (t)) is the voxel density of a ray at point t, derived from the results of MLP prediction; c (r (t), d) is the RGB color value of the ray at point t, resulting from the prediction by MLP, where the notation d is the direction vector in ray o + td; t (T) is the transmittance of the ray at point T, and is obtained by integrating the voxel density sigma (r (T)), and the specific formula is

L1＝(RGB _NeRF -RGB _groundtruth ) ²

5. The method of claim 2 wherein step 2-3 is embodied as follows.

8 completely independent models are set and trained by the self-integration module, 3D reconstruction is carried out on the same data set, and the same object or the same scene is stored in different parameters of different models in an implicit mode.

6. The method of claim 5, wherein after 20000 epochs are trained for each new NERF view synthesis model, each new NERF view synthesis model performs 3D reconstruction on the same view of the same scene to obtain the rendered RGB color values corresponding to the view.

7. The method of claim 6 in which the loss of rendered RGB color values from real picture RGB color values (grountruth) is calculated and converted to psnr.

8. The method of claim 7 wherein the rendered RGB color values from 8 fully independent models are averaged to obtain the final RGB color value reconstructed at the target view.

9. The method of claim 7 in which each completely independent model repeats the operations of steps 1 and 2, rendering different views of the same scene in the same dataset to obtain RGB color values.

10. The method of claim 7 wherein the self-integrating module is embodied primarily in the model testing phase.