CN116664782B - Neural radiation field three-dimensional reconstruction method based on fusion voxels - Google Patents
Neural radiation field three-dimensional reconstruction method based on fusion voxels Download PDFInfo
- Publication number
- CN116664782B CN116664782B CN202310947466.1A CN202310947466A CN116664782B CN 116664782 B CN116664782 B CN 116664782B CN 202310947466 A CN202310947466 A CN 202310947466A CN 116664782 B CN116664782 B CN 116664782B
- Authority
- CN
- China
- Prior art keywords
- radiation field
- frame
- volume
- voxels
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000005855 radiation Effects 0.000 title claims abstract description 105
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000001537 neural effect Effects 0.000 title claims abstract description 23
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000009877 rendering Methods 0.000 claims abstract description 32
- 238000013528 artificial neural network Methods 0.000 claims abstract description 21
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 10
- 230000004931 aggregating effect Effects 0.000 claims abstract description 6
- 238000001914 filtration Methods 0.000 claims abstract description 5
- 230000000306 recurrent effect Effects 0.000 claims abstract description 5
- 238000005070 sampling Methods 0.000 claims description 25
- 210000005036 nerve Anatomy 0.000 claims description 9
- 239000003086 colorant Substances 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 claims description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 238000003786 synthesis reaction Methods 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 238000007499 fusion processing Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000002834 transmittance Methods 0.000 claims description 3
- 230000001373 regressive effect Effects 0.000 claims description 2
- 238000012216 screening Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 238000013138 pruning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a neural radiation field three-dimensional reconstruction method based on fusion voxels, which comprises the following steps: acquiring two-dimensional features of an image in a convolutional neural network, and generating a depth map; aggregating the two-dimensional features of adjacent images and the features calculated by the coarse-stage MLP to generate a local radiation field represented by voxels; based on a recurrent neural network, fusing the local radiation field to a world coordinate system according to the weight to generate a global radiation field, and continuously updating the weight; inputting the generated global radiation field into a NeRF renderer to obtain coordinates of each point and a nearby point density value; and filtering the global radiation field according to the depth map and the volume density threshold value, and then inputting the filtered global radiation field into a volume renderer for volume rendering, and continuously optimizing loss until training is completed, so as to obtain a three-dimensional reconstruction model. The invention enhances the acquisition of global information by fusing the local radiation fields generated by each view, reduces redundant parts according to the depth map and the voxel volume density screening, and improves the training efficiency.
Description
Technical Field
The invention belongs to the field of three-dimensional reconstruction, and particularly relates to a neural radiation field three-dimensional reconstruction method based on fusion voxels.
Background
The three-dimensional reconstruction technology is a technology for recovering a three-dimensional model according to the extracted picture features, and is widely applied to the fields of virtual reality, medical treatment, games and the like. In particular, with the recent rise of the metauniverse, the technology is expected to be higher, and people increasingly hope that the technology can have stronger characterization function, so that the reconstructed object is more vivid and lifelike, and finally the purpose of digital twin is achieved.
The initial multi-view three-dimensional reconstruction is to obtain the relation between the characteristic points in the previous frame and the next frame by matching sparse features through SIFT, ORB and other algorithms so as to match the pose of a camera, obtain the three-dimensional coordinates of the characteristic points according to the internal parameters of the camera, and finally generate a dense point cloud, voxels and other display models. However, due to the discreteness of the display representation method, overlapping and artifacts are generated during reconstruction, and besides, the display representation at high resolution can cause a large increase in memory occupation, which also limits the application of the display representation in high resolution scenes. Ben Mildenhall et al in 2020 propose an implicit representation method for synthesizing a realistic view using a combination of neural radiation fields (Neural Radiance Fields, neRF for short) and volume rendering, which exhibits a strong characterization capability, enabling the output of high resolution images while occupying a small amount of memory, without requiring any shape prior information to be obtained compared to other implicit representation methods. In 2021, xiaoshuaiZhang et al proposed fusing radiation fields for neural framework nervus of large-scale scene reconstruction, which focused on local field reconstruction first, built local radiation fields for input key frames, and then fused into world scenes in frame-crossing order. The method solves the defect that the neural network only pays attention to local information, and enhances the global feeling of the system. However, since NeRF needs to intensively collect points in the whole scene, more calculation amount is required compared with the traditional method, and a scene is trained for tens of hours, but for most scenes, the real effective points only occupy 1/5 of the real effective points, and the ineffective points outside the background or the object greatly increase the calculation amount of the system and increase the training time of the NeRF. Furthermore, neRF can produce errors in rendering smooth surface objects. Since NeRF does not add constraint on the surface of the object during rendering, the situation that the surface of the reconstructed object is pothole is easy to generate, and reconstruction errors are also caused. At present, how to solve the problems of overlong training time of NeRF and pothole phenomenon on the surface of an object is still a big problem.
Disclosure of Invention
The invention aims to: the invention aims to provide a neural radiation field three-dimensional reconstruction method based on fusion voxels. And generating local voxels according to the two-dimensional characteristics of the picture and the additional characteristics acquired by the multi-layer perceptron, fusing the local voxels into global voxels by using a recurrent neural network, and reducing the calculated amount by screening voxels with density values within a certain range. In addition, a depth map is generated by a multi-view stereo (MVS) method to limit NeRF-rendered points, and pruning operation is directly performed on points outside the surface of the object to improve surface smoothness.
The technical scheme is as follows: the invention discloses a neural radiation field three-dimensional reconstruction method based on fusion voxels, which is characterized by training a three-dimensional reconstruction model by executing the following steps of;
step 1, inputting an image into a two-dimensional convolutional neural network, acquiring two-dimensional characteristics of the image, and generating a depth map according to the two-dimensional characteristics of the image by using a multi-view stereo MVS method;
step 2, aggregating two-dimensional features of adjacent images in the depth map and additional features calculated based on a coarse-stage MLP (multi-layer perceptron), and generating a local radiation field represented by voxels;
step 3, based on a recurrent neural network, fusing the local radiation field generated by each frame to a world coordinate system according to the weight to generate a global radiation field, and continuously updating the weight;
step 4, inputting the generated global radiation field into a NeRF renderer to obtain coordinates of each point and a nearby point density value, and storing the coordinates and the nearby point density value into each voxel;
step 5, filtering the midpoint of the global radiation field according to the depth map to remove redundant parts in the global radiation field;
step 6, filtering the voxel blocks according to the volume density threshold value, and reserving effective parts in the voxel blocks to obtain an updated global radiation field;
and 7, inputting the updated global radiation field into a volume renderer for volume rendering, calculating a loss function, and continuously optimizing the loss until training is completed, so as to obtain a three-dimensional reconstruction model. And reserving an MLP (multi-level image) network, inputting pictures into the network, generating a three-dimensional model of an object or a scene, and completing the synthesis of a realistic viewing angle of a new view.
The MLP refers to a multi-layer perceptron, multiple MLPs are needed in the whole process of the NeRF three-dimensional reconstruction, wherein the MLP in the coarse stage is responsible for uniform sampling, and the MLP in the fine stage is responsible for sampling near the surface of an object.
Further, the step 1 specifically includes:
parameters of n Zhang Yizhi cameraImage +.>Inputting a two-dimensional convolutional neural network as a sequence, extracting picture features from adjacent pictures, performing parallax matching on the picture features to obtain orderly parallax images with the same size as the original image, and generating depth images corresponding to the original image pixels one by one according to the parallax images; the formula for converting the disparity map into the depth map is as follows:
;
wherein For depth->For baseline length,/->For focal length->For parallax (I)> and />Column coordinates of the main points of the left and right views.
Further, the step 2 specifically includes the following steps:
step 2.1, using deep neural network as the firstImage of frame->Regression of a local nerve volume, extracting two-dimensional image features by using a multi-view stereo MVS technology, and establishing a cost volume represented by voxels according to the features;
step 2.2, using a two-dimensional convolutional neural network to divide the first phase into a plurality of phasesImage of frame->Mapping into a feature map->The scene content of the image is stored, the coarse stage MLP is used for obtaining additional features, the two-dimensional image features and the additional features calculated by the coarse stage MLP are projected onto the corresponding local volumes to obtain single-frame feature volumes, and the formula for generating the single-frame feature volumes is as follows:
;
wherein Is->Frames with voxels->Voxel feature as center +.>Is->Frame center->Corresponding two-dimensional feature projection, < >>Is->Frame view->Additional features calculated by MLP, +.>Representing feature connections;
step 2.3, aggregating the feature volumes of the multiframe to regress using the mean and variance of the voxel featuresLocal volume ∈of frame>To represent the local radiation field, where the mean can fuse the appearance information of multiple views and the variance can help to make geometric reasoning, the formula for generating the local radiation field is as follows:
;
wherein ,representing the local radiation field>Representing deep neural network, ++>Mean value->Indicate->The frame of the frame is a frame of a frame,is indicated at +.>Multiple neighboring views of frame aggregation, +.>Indicate->Voxel characteristics of frame, ">Representing the variance.
Further, the step 3 specifically includes: at each frameLocal radiation field to be generated->Global radiation field generated from the previous frame +.>Performing cyclic fusion to continuously update the global radiation field +.>The local volume of each frame is learned and fused by using a gating circulation unit at the time of updating, and the specific formula of the gating circulation unit is as follows:
;
wherein ,for updating the door->Neural network for controlling update gates,/>For resetting the door +.>To control the neural network of the reset gate, +.>Is based on the global radiation field after fusion of the current frame, < >>The neural network is used for controlling the sequential updating of the whole model and is used for controlling the sequential updating global reconstruction of the whole model; />Multiplying the elements; /> and />Local radiation fields for controlling the current frame of the fusion process, respectively>And the global radiation field of the previous frame +.>During fusion, only the local radiation field of the current frame is +.>And the global radiation field of the previous frame->The application of coincident voxels, other voxels remain unchanged.
Further, the step 4 specifically includes: placing the generated global radiation field into a NeRF renderer to obtain the point density of any pointAnd radiation value->The formula for regressive bulk density and emittance is as follows:
;
wherein Respectively representing the horizontal, vertical and longitudinal coordinates of the point,/-, respectively>Indicating azimuth angle, ++>Representing the polar viewing angle;
density of dotsAfter removing the points of (1) the volume density in each voxel is determined>And storing the volume density in a corresponding voxel, dynamically updating the volume density value, and updating the volume density value according to the following formula:
;
wherein ,indicate->Volume density of frame global radiation field voxels, +.>For controlling the update weight,/->Representing the (th) frame>The volume density of the local radiation field voxels generated from the nth picture.
Further, the step 5 specifically includes: and removing the part of the nerve radiation field outside the object surface according to the depth map, and reserving the part inside the object surface.
Further, the step 6 specifically includes: and removing voxel blocks below the threshold in the nerve radiation field according to the volume density threshold, and retaining the voxel blocks with the volume density threshold within the required range.
Further, the step 7 specifically includes: and (2) repeating the steps (2) to (6), putting the global radiation field filtered by the bulk density threshold and the depth map into a volume renderer for volume rendering, carrying out weighted summation according to the colors of the points sampled on the rays to obtain final rendering colors, and calculating loss according to the volume rendering results and continuously optimizing, wherein the formula of the volume rendering calculated colors is as follows:
;
;
;
;
wherein Representing a ray sampling point +.>Opacity function of>Representing the sampling point +.>Point density of->Representing the upsampling point +.>Interval of->Representing the sampling point +.>Transmittance function of>Representation dot->All previous sampling pointsOpacity function of>Representing the upsampling point +.>Interval of->Is the sampling point +.>Is a function of the probability density of (c) in the (c),is a ray->Color of final rendering, ++>Representing rays +.>Up-sampling number, +.>Representing the sampling point +.>Is a radiation value of (2); for NeRF, the difference between the rendering color and the true value is used as a loss function after volume rendering, and the loss function formula is as follows:
;
wherein Representing a loss function->Is a ray set,/->Is a ray->Is a true color of (c).
The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages:
1. according to the neural radiation field three-dimensional reconstruction method based on the fusion voxels, coarse dense voxels are generated for each view according to the point density, and the calculated amount is reduced by screening the voxels with density values within a certain range.
2. The implicit characterization method for multi-view fusion is provided, the acquisition of global information is enhanced by fusing local radiation fields generated by each view, redundant parts are reduced by screening voxels with the volume density within a required range, the calculated amount is reduced, and the training efficiency is improved.
3. And generating a depth map by using an MVS method to limit NeRF rendering points, and directly removing pruning operation on points outside an isosurface 0 (object surface) to improve the surface smoothing effect.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
The invention aims to provide a neural radiation field three-dimensional reconstruction method based on fusion voxels. The local voxel fields are generated according to the picture characteristics by utilizing the pre-trained network model, the generated local voxel fields are fused according to weights in sequence, so that a complete global radiation field is formed, and the calculated amount is reduced by screening voxels with density values within a certain range. In addition, a depth map is generated by a multi-view stereo (MVS) method to limit NeRF-rendered points, and pruning operation is directly performed on points outside the surface of the object to improve surface smoothness.
The technical scheme is as follows: the invention discloses a neural radiation field three-dimensional reconstruction method based on fusion voxels, which is characterized by training a three-dimensional reconstruction model by executing the following steps of;
step 1: inputting an image into a two-dimensional convolutional neural network, acquiring two-dimensional characteristics of the image, generating a corresponding depth map according to a parallax map of the image by using a multi-view stereo MVS method, wherein the image can use a data set, data in the data set has information such as shooting angle, camera parameters and the like of each picture, and a video shot by the user can also be put into a collmap to generate a corresponding camera pose, parameters and the like. Parameters of n Zhang Yizhi cameraImage +.>Inputting a two-dimensional convolutional neural network as a sequence, extracting picture features from adjacent pictures, performing parallax matching on the picture features to obtain orderly parallax images with the same size as the original image, and generating depth images corresponding to the original image pixels one by one according to the parallax images; disparity map conversionThe formula for the depth map is as follows:
;
wherein For depth->For baseline length,/->For focal length->For parallax (I)> and />Column coordinates of the main points of the left and right views.
Step 2: generating a local radiation field represented by voxels according to the depth map of the adjacent image and the additional features calculated by the coarse stage MLP;
step 2.1, using deep neural network as the firstImage of frame->Regression of a local nerve volume, extracting two-dimensional image features by using a multi-view stereo MVS technology, and establishing a cost volume represented by voxels according to the features;
step 2.2, using a two-dimensional convolutional neural network to divide the first phase into a plurality of phasesImage of frame->Mapping into a feature map->The scene content of the image is stored, the coarse stage MLP is used for obtaining additional features, the two-dimensional image features and the additional features calculated by the coarse stage MLP are projected onto the corresponding local volumes to obtain single-frame feature volumes, and the formula for generating the single-frame feature volumes is as follows:
;
wherein Is->Frames with voxels->Voxel feature as center +.>Is->Frame center->Corresponding two-dimensional feature projection, < >>Is->Frame view->Additional features calculated by MLP, +.>Representing feature connections;
step 2.3, aggregating the feature volumes of the multiframe by using the mean and variance of the voxel featuresIntegral regression ofLocal volume ∈of frame>To represent the local radiation field, where the mean can fuse the appearance information of multiple views and the variance can help to make geometric reasoning, the formula for generating the local radiation field is as follows:
;
wherein ,representing the local radiation field>Representing deep neural network, ++>Mean value->Indicate->The frame of the frame is a frame of a frame,is indicated at +.>Multiple neighboring views of frame aggregation, +.>Indicate->Voxel characteristics of frame, ">Representing the variance.
Step (a)3: based on recurrent neural network, the local radiation field generated by each frame is fused to the world coordinate system according to the weight to generate global radiation field, the weight is continuously updated, and the weight is updated in each frameLocal radiation field to be generated->Global radiation field generated from the previous frame +.>Performing cyclic fusion to continuously update the global radiation field +.>The local volume of each frame is learned and fused by using a gating circulation unit at the time of updating, and the specific formula of the gating circulation unit is as follows:
;
wherein ,for updating the door->To control the neural network of the update gate +.>For resetting the door +.>To control the neural network of the reset gate, +.>Is based on the global radiation field after fusion of the current frame, < >>Is a neural network for controlling the sequential update of the whole model, and is used for controlling the sequential update of the whole modelPartial reconstruction; />Multiplying the elements; /> and />Local radiation fields for controlling the current frame of the fusion process, respectively>And the global radiation field of the previous frame +.>During fusion, only the local radiation field of the current frame is +.>And the global radiation field of the previous frame->The application of coincident voxels, other voxels remain unchanged.
Step 4: inputting the generated global radiation field into a NeRF renderer to obtain the coordinates of each point in the global radiation field and the point density value nearby, removing the point with the point density value of 0, and then obtaining the volume density value of the voxels and storing the volume density value in each voxel; neRF samples the radiation field at each view angle for a point on the incoming ray and uses MLP to obtain the bulk density of any pointAnd (2) radiation degree->Calculate bulk Density->The formula for the irradiance is as follows:
;
wherein Respectively representing the horizontal, vertical and longitudinal coordinates of the point,/-, respectively>Indicating azimuth angle, ++>Representing the polar viewing angle;
density of dotsAfter removing the points of (1) the volume density in each voxel is determined>And storing the volume density in the corresponding voxel, and dynamically updating the volume density value, so that the volume density value is obtained through query instead of MLP calculation in the later use of the volume density, and the formula for updating the volume density value is as follows:
;
wherein ,indicate->Volume density of frame global radiation field voxels, +.>For controlling the update weight,/->Representing the (th) frame>The volume density of the local radiation field voxels generated from the nth picture.
Step 5: and removing the part of the nerve radiation field outside the object surface according to the depth map, and reserving the part inside the object surface. And limiting the position of the sampling point of NeRF during rendering by utilizing the depth map information, neglecting points outside the surface of the object, and only storing information of the sampling point meeting the requirement during rendering.
Step 6: and removing voxel blocks with the volume density threshold value lower than the threshold value in the nerve radiation field according to the volume density threshold value, reserving the voxel blocks with the volume density threshold value within the required range, and performing point sampling on voxels with the volume density within the required range only during NeRF rendering.
Step 7: and (3) repeating the steps (2) to (6), putting the global radiation field filtered by the point density threshold and the depth map into a volume renderer for volume rendering, carrying out weighted summation according to the colors of the points sampled on the rays to obtain a final rendering color, and calculating loss according to the volume rendering result and continuously optimizing, wherein the formula of the volume rendering calculated color is as follows:
;
;
;
;
wherein Representing a ray sampling point +.>Opacity function of>Representing the sampling point +.>Point density of->Representing the upsampling point +.>Interval of->Representing the sampling point +.>Transmittance function of>Representation dot->All previous sampling pointsOpacity function of>Representing the upsampling point +.>Interval of->Is the sampling point +.>Is a function of the probability density of (c) in the (c),is a ray->Color of final rendering, ++>Representing rays +.>Up-sampling number, +.>Representing the sampling point +.>Is a radiation value of (2); for NeRF, the difference between the rendering color and the true value is used as a loss function after volume rendering, and the loss function formula is as follows:
;
wherein Representing a loss function->Is a ray set,/->Is a ray->Is a true color of (c).
The experiment is carried out under the Windows 10 environment, the processor is I7-12700F, the memory is 32G, and the display card is RTX 3080 12G. The experimental performance of the method is as follows:
in contrast to the neural radiation field reconstruction model IBRNet proposed by the paper IBRNet: learning multi-view image-based reconstruction in 2021, the model NeRF proposed by Ben Mildnhall et al in paper NeRF: representing scenes as neural radiance fields forview synthesis in 2020, and the model NSVF proposed by Lingjie Liu et al in paper Neural sparse voxel fields in 2020, 100 scenes were randomly selected as training data in the Scannet dataset. The peak signal-to-noise ratio PSNR, the structural similarity SSIM and the perception loss LPIPS are used as main indexes for evaluation, wherein the higher the PSNR value is, the less noise is represented, the higher the SSIM value is, the higher the structural similarity is represented, and the lower the LPIPS is, the better the perception effect of people is. Since NSVF and NeRF are both optimized for a single scenario, the experiment only shows experimental results of scenario-by-scenario optimization for fairness.
The above cited documents are as follows:
[1]Wang Q, Wang Z, Genova K, et al. Ibrnet: Learning multi-view image-based rendering[C]//Proceeding-s of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 4690-4699.
[2]Mildenhall B, Srinivasan P P, Tancik M, et al. Nerf: Representing scenes as neural radiance fields for vi-ew synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106.
[3]Liu L, Gu J, Zaw Lin K, et al. Neural sparse voxel fields[J]. Advances in Neural Information Processing Systems, 2020, 33: 15651-15663.
table 1 presents the results of the tests performed on the Scannet dataset. From experimental results we can find that the method has a prominent performance in the face of large real scene data sets.
Table 1 quantitative comparison on ScanNet dataset
Table 2 presents the results of the tests performed on the NeRF synthetic dataset. From the experimental results we can find that the method works well in the face of synthetic datasets.
Table 2 quantitative comparisons on NeRF synthetic datasets
Claims (5)
1. A neural radiation field three-dimensional reconstruction method based on fusion voxels, which is characterized in that a three-dimensional reconstruction model is trained by executing the following steps;
step 1, inputting an image into a two-dimensional convolutional neural network, acquiring two-dimensional characteristics of the image, and generating a depth map;
step 2, aggregating two-dimensional features of adjacent images in the depth map and additional features calculated based on the coarse-stage MLP to generate a local radiation field represented by voxels;
step 3, based on a recurrent neural network, fusing the local radiation field generated by each frame to a world coordinate system according to the weight to generate a global radiation field, and continuously updating the weight;
step 4, inputting the generated global radiation field into a NeRF renderer to obtain coordinates of each point and a nearby point density value, and storing the coordinates and the nearby point density value into each voxel;
step 5, filtering the midpoint of the global radiation field according to the depth map to remove redundant parts in the global radiation field;
step 6, filtering the voxel blocks according to the volume density threshold value, and reserving effective parts in the voxel blocks to obtain an updated global radiation field;
step 7, inputting the updated global radiation field into a volume renderer for volume rendering, calculating a loss function, continuously optimizing loss until training is completed, obtaining a three-dimensional reconstruction model, reserving an MLP (multi-level projection) network of the model, inputting pictures into the network, generating a three-dimensional model of an object or a scene, and completing the synthesis of a new view angle;
the step 2 specifically comprises the following steps:
step 2.1, using deep neural network as the firstImage of frame->Regression of a local nerve volume, extracting two-dimensional image features by using a multi-view stereo MVS technology, and establishing a cost volume represented by voxels according to the features;
step 2.2, using a two-dimensional convolutional neural network to divide the first phase into a plurality of phasesImage of frame->Mapping into a feature map->The scene content of the image is stored, the coarse stage MLP is used for obtaining additional features, the two-dimensional image features and the additional features calculated by the coarse stage MLP are projected onto the corresponding local volumes to obtain single-frame feature volumes, and the formula for generating the single-frame feature volumes is as follows:
;
wherein Is->Frames with voxels->Voxel feature as center +.>Is->Frame center->Corresponding two-dimensional feature projection, < >>Is->Frame view->Additional features calculated by MLP, +.>Representing feature connections;
step 2.3, aggregating the feature volumes of the multiframe to regress using the mean and variance of the voxel featuresLocal volume ∈of frame>To represent the local radiation field, where the mean can fuse the appearance information of multiple views and the variance can help to make geometric reasoning, the formula for generating the local radiation field is as follows:
;
wherein ,representing the local radiation field>Representing deep neural network, ++>Mean value->Indicate->Frame (F)>Is indicated at +.>Multiple neighboring views of frame aggregation, +.>Indicate->Voxel characteristics of frame, ">Representing the variance;
the step 3 is specifically as follows: at each frameLocal radiation field to be generated->Global radiation field generated from the previous frame +.>Performing cyclic fusion to continuously update the global radiation field +.>The local volume of each frame is learned and fused by using a gating circulation unit at the time of updating, and the specific formula of the gating circulation unit is as follows:
;
wherein ,for updating the door->To control the neural network of the update gate +.>For resetting the door +.>To control the neural network of the reset gate, +.>Is based on the global radiation field after fusion of the current frame, < >>The neural network is used for controlling the sequential updating of the whole model and is used for controlling the sequential updating global reconstruction of the whole model; />Multiplying the elements; /> and />Local radiation fields for controlling the current frame of the fusion process, respectively>And the global radiation field of the previous frame +.>During fusion, only the local radiation field of the current frame is +.>And the global radiation field of the previous frame->The overlapped voxels are applied, and other voxels are kept unchanged;
the step 4 is specifically as follows: placing the generated global radiation field into a NeRF renderer to obtain the point density of any pointAnd radiation value->The formula for regressive bulk density and emittance is as follows:
;
wherein Respectively representing the horizontal, vertical and longitudinal coordinates of the point,/-, respectively>Indicating azimuth angle, ++>Representing the polar viewing angle;
density of dotsAfter removing the points of (1) the volume density in each voxel is determined>And storing the volume density in a corresponding voxel, dynamically updating the volume density value, and updating the volume density value according to the following formula:
;
wherein ,indicate->Volume density of frame global radiation field voxels, +.>For controlling the update weight of the object,representing the (th) frame>The volume density of the local radiation field voxels generated from the nth picture.
2. The method for three-dimensional reconstruction of a neural radiation field based on fused voxels according to claim 1, wherein step 1 specifically comprises:
parameters of n Zhang Yizhi cameraImage +.>Inputting a two-dimensional convolutional neural network as a sequence, extracting picture features from adjacent pictures, performing parallax matching on the picture features to obtain orderly parallax images with the same size as the original image, and generating depth images corresponding to the original image pixels one by one according to the parallax images; the formula for converting the disparity map into the depth map is as follows:
;
wherein For depth->For baseline length,/->For focal length->For parallax (I)> and />Column coordinates of the main points of the left and right views.
3. The method for three-dimensional reconstruction of a neural radiation field based on fusion voxels according to claim 1, wherein step 5 is specifically: and removing the part of the nerve radiation field outside the object surface according to the depth map, and reserving the part inside the object surface.
4. The method for three-dimensional reconstruction of a neural radiation field based on fusion voxels according to claim 1, wherein step 6 is specifically: and removing voxel blocks below the threshold in the nerve radiation field according to the volume density threshold, and retaining the voxel blocks with the volume density threshold within the required range.
5. The method for three-dimensional reconstruction of a neural radiation field based on fused voxels according to claim 1, wherein step 7 is specifically: and (2) repeating the steps (2) to (6), putting the global radiation field filtered by the bulk density threshold and the depth map into a volume renderer for volume rendering, carrying out weighted summation according to the colors of the points sampled on the rays to obtain final rendering colors, and calculating loss according to the volume rendering results and continuously optimizing, wherein the formula of the volume rendering calculated colors is as follows:
;
;
;
;
wherein Representing a ray sampling point +.>Opacity function of>Representing the sampling point +.>Point density of->Representing the upsampling point +.>Interval of->Representing the sampling point +.>Transmittance function of>Representation dot->All sample points before +.>Opacity function of>Representing the upsampling point +.>Interval of->Is the sampling point +.>Probability density function of>Is a ray->Color of final rendering, ++>Representing rays +.>Up-sampling number, +.>Representing the sampling point +.>Is a radiation value of (2); for NeRF, the difference between the rendering color and the true value is used as a loss function after volume rendering, and the loss function formula is as follows:
;
wherein Representing a loss function->Is a ray set,/->Is a ray->Is a true color of (c).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310947466.1A CN116664782B (en) | 2023-07-31 | 2023-07-31 | Neural radiation field three-dimensional reconstruction method based on fusion voxels |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310947466.1A CN116664782B (en) | 2023-07-31 | 2023-07-31 | Neural radiation field three-dimensional reconstruction method based on fusion voxels |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116664782A CN116664782A (en) | 2023-08-29 |
CN116664782B true CN116664782B (en) | 2023-10-13 |
Family
ID=87710129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310947466.1A Active CN116664782B (en) | 2023-07-31 | 2023-07-31 | Neural radiation field three-dimensional reconstruction method based on fusion voxels |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116664782B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117496072B (en) * | 2023-12-27 | 2024-03-08 | 南京理工大学 | Three-dimensional digital person generation and interaction method and system |
CN117496075B (en) * | 2024-01-02 | 2024-03-22 | 中南大学 | Single-view three-dimensional reconstruction method, system, equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107103637A (en) * | 2017-04-05 | 2017-08-29 | 南京信息工程大学 | One kind enhancing texture power method |
CN112887698A (en) * | 2021-02-04 | 2021-06-01 | 中国科学技术大学 | High-quality face voice driving method based on nerve radiation field |
CN113689540A (en) * | 2021-07-22 | 2021-11-23 | 清华大学 | Object reconstruction method and device based on RGB video |
CN114119838A (en) * | 2022-01-24 | 2022-03-01 | 阿里巴巴(中国)有限公司 | Voxel model and image generation method, equipment and storage medium |
CN114663603A (en) * | 2022-05-24 | 2022-06-24 | 成都索贝数码科技股份有限公司 | Static object three-dimensional grid model generation method based on nerve radiation field |
CN115187682A (en) * | 2022-05-10 | 2022-10-14 | 北京邮电大学 | Object structure reconstruction method and related equipment |
CN115512073A (en) * | 2022-09-19 | 2022-12-23 | 南京信息工程大学 | Three-dimensional texture grid reconstruction method based on multi-stage training under differentiable rendering |
CN115731355A (en) * | 2022-11-29 | 2023-03-03 | 湖北大学 | SuperPoint-NeRF-based three-dimensional building reconstruction method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220301252A1 (en) * | 2021-03-17 | 2022-09-22 | Adobe Inc. | View synthesis of a dynamic scene |
US20230154104A1 (en) * | 2021-11-12 | 2023-05-18 | Nec Laboratories America, Inc. | UNCERTAINTY-AWARE FUSION TOWARDS LARGE-SCALE NeRF |
EP4191539A1 (en) * | 2021-12-02 | 2023-06-07 | Dimension Stream Labs AB | Method for performing volumetric reconstruction |
-
2023
- 2023-07-31 CN CN202310947466.1A patent/CN116664782B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107103637A (en) * | 2017-04-05 | 2017-08-29 | 南京信息工程大学 | One kind enhancing texture power method |
CN112887698A (en) * | 2021-02-04 | 2021-06-01 | 中国科学技术大学 | High-quality face voice driving method based on nerve radiation field |
CN113689540A (en) * | 2021-07-22 | 2021-11-23 | 清华大学 | Object reconstruction method and device based on RGB video |
CN114119838A (en) * | 2022-01-24 | 2022-03-01 | 阿里巴巴(中国)有限公司 | Voxel model and image generation method, equipment and storage medium |
CN115187682A (en) * | 2022-05-10 | 2022-10-14 | 北京邮电大学 | Object structure reconstruction method and related equipment |
CN114663603A (en) * | 2022-05-24 | 2022-06-24 | 成都索贝数码科技股份有限公司 | Static object three-dimensional grid model generation method based on nerve radiation field |
CN115512073A (en) * | 2022-09-19 | 2022-12-23 | 南京信息工程大学 | Three-dimensional texture grid reconstruction method based on multi-stage training under differentiable rendering |
CN115731355A (en) * | 2022-11-29 | 2023-03-03 | 湖北大学 | SuperPoint-NeRF-based three-dimensional building reconstruction method |
Non-Patent Citations (1)
Title |
---|
基于SFS技术的纹理力触觉再现方法研究;李佳璐;宋爱国;吴涓;张小瑞;;仪器仪表学报(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN116664782A (en) | 2023-08-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113706714B (en) | New view angle synthesizing method based on depth image and nerve radiation field | |
Meshry et al. | Neural rerendering in the wild | |
Liu et al. | Neural rays for occlusion-aware image-based rendering | |
Yuan et al. | Star: Self-supervised tracking and reconstruction of rigid objects in motion with neural rendering | |
CN109410307B (en) | Scene point cloud semantic segmentation method | |
CN116664782B (en) | Neural radiation field three-dimensional reconstruction method based on fusion voxels | |
CN109255831A (en) | The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate | |
CN109191369A (en) | 2D pictures turn method, storage medium and the device of 3D model | |
CN106981080A (en) | Night unmanned vehicle scene depth method of estimation based on infrared image and radar data | |
Weng et al. | Vid2actor: Free-viewpoint animatable person synthesis from video in the wild | |
CN110570522A (en) | Multi-view three-dimensional reconstruction method | |
CN111951368B (en) | Deep learning method for point cloud, voxel and multi-view fusion | |
CN110070574A (en) | A kind of binocular vision Stereo Matching Algorithm based on improvement PSMNet | |
CN113077554A (en) | Three-dimensional structured model reconstruction method based on any visual angle picture | |
CN110443883A (en) | A kind of individual color image plane three-dimensional method for reconstructing based on dropblock | |
CN116205962B (en) | Monocular depth estimation method and system based on complete context information | |
CN114926553A (en) | Three-dimensional scene consistency stylization method and system based on nerve radiation field | |
CN115298708A (en) | Multi-view neural human body rendering | |
CN114996814A (en) | Furniture design system based on deep learning and three-dimensional reconstruction | |
CN114677479A (en) | Natural landscape multi-view three-dimensional reconstruction method based on deep learning | |
CN110889868B (en) | Monocular image depth estimation method combining gradient and texture features | |
CN116681838A (en) | Monocular video dynamic human body three-dimensional reconstruction method based on gesture optimization | |
CN117115359B (en) | Multi-view power grid three-dimensional space data reconstruction method based on depth map fusion | |
CN104463962A (en) | Three-dimensional scene reconstruction method based on GPS information video | |
CN117150755A (en) | Automatic driving scene simulation method and system based on nerve point rendering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |