CN116664782B - Neural radiation field three-dimensional reconstruction method based on fusion voxels - Google Patents

Neural radiation field three-dimensional reconstruction method based on fusion voxels Download PDF

Info

Publication number
CN116664782B
CN116664782B CN202310947466.1A CN202310947466A CN116664782B CN 116664782 B CN116664782 B CN 116664782B CN 202310947466 A CN202310947466 A CN 202310947466A CN 116664782 B CN116664782 B CN 116664782B
Authority
CN
China
Prior art keywords
radiation field
frame
volume
voxels
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310947466.1A
Other languages
Chinese (zh)
Other versions
CN116664782A (en
Inventor
张小瑞
陈超
孙伟
张小娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202310947466.1A priority Critical patent/CN116664782B/en
Publication of CN116664782A publication Critical patent/CN116664782A/en
Application granted granted Critical
Publication of CN116664782B publication Critical patent/CN116664782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a neural radiation field three-dimensional reconstruction method based on fusion voxels, which comprises the following steps: acquiring two-dimensional features of an image in a convolutional neural network, and generating a depth map; aggregating the two-dimensional features of adjacent images and the features calculated by the coarse-stage MLP to generate a local radiation field represented by voxels; based on a recurrent neural network, fusing the local radiation field to a world coordinate system according to the weight to generate a global radiation field, and continuously updating the weight; inputting the generated global radiation field into a NeRF renderer to obtain coordinates of each point and a nearby point density value; and filtering the global radiation field according to the depth map and the volume density threshold value, and then inputting the filtered global radiation field into a volume renderer for volume rendering, and continuously optimizing loss until training is completed, so as to obtain a three-dimensional reconstruction model. The invention enhances the acquisition of global information by fusing the local radiation fields generated by each view, reduces redundant parts according to the depth map and the voxel volume density screening, and improves the training efficiency.

Description

Neural radiation field three-dimensional reconstruction method based on fusion voxels
Technical Field
The invention belongs to the field of three-dimensional reconstruction, and particularly relates to a neural radiation field three-dimensional reconstruction method based on fusion voxels.
Background
The three-dimensional reconstruction technology is a technology for recovering a three-dimensional model according to the extracted picture features, and is widely applied to the fields of virtual reality, medical treatment, games and the like. In particular, with the recent rise of the metauniverse, the technology is expected to be higher, and people increasingly hope that the technology can have stronger characterization function, so that the reconstructed object is more vivid and lifelike, and finally the purpose of digital twin is achieved.
The initial multi-view three-dimensional reconstruction is to obtain the relation between the characteristic points in the previous frame and the next frame by matching sparse features through SIFT, ORB and other algorithms so as to match the pose of a camera, obtain the three-dimensional coordinates of the characteristic points according to the internal parameters of the camera, and finally generate a dense point cloud, voxels and other display models. However, due to the discreteness of the display representation method, overlapping and artifacts are generated during reconstruction, and besides, the display representation at high resolution can cause a large increase in memory occupation, which also limits the application of the display representation in high resolution scenes. Ben Mildenhall et al in 2020 propose an implicit representation method for synthesizing a realistic view using a combination of neural radiation fields (Neural Radiance Fields, neRF for short) and volume rendering, which exhibits a strong characterization capability, enabling the output of high resolution images while occupying a small amount of memory, without requiring any shape prior information to be obtained compared to other implicit representation methods. In 2021, xiaoshuaiZhang et al proposed fusing radiation fields for neural framework nervus of large-scale scene reconstruction, which focused on local field reconstruction first, built local radiation fields for input key frames, and then fused into world scenes in frame-crossing order. The method solves the defect that the neural network only pays attention to local information, and enhances the global feeling of the system. However, since NeRF needs to intensively collect points in the whole scene, more calculation amount is required compared with the traditional method, and a scene is trained for tens of hours, but for most scenes, the real effective points only occupy 1/5 of the real effective points, and the ineffective points outside the background or the object greatly increase the calculation amount of the system and increase the training time of the NeRF. Furthermore, neRF can produce errors in rendering smooth surface objects. Since NeRF does not add constraint on the surface of the object during rendering, the situation that the surface of the reconstructed object is pothole is easy to generate, and reconstruction errors are also caused. At present, how to solve the problems of overlong training time of NeRF and pothole phenomenon on the surface of an object is still a big problem.
Disclosure of Invention
The invention aims to: the invention aims to provide a neural radiation field three-dimensional reconstruction method based on fusion voxels. And generating local voxels according to the two-dimensional characteristics of the picture and the additional characteristics acquired by the multi-layer perceptron, fusing the local voxels into global voxels by using a recurrent neural network, and reducing the calculated amount by screening voxels with density values within a certain range. In addition, a depth map is generated by a multi-view stereo (MVS) method to limit NeRF-rendered points, and pruning operation is directly performed on points outside the surface of the object to improve surface smoothness.
The technical scheme is as follows: the invention discloses a neural radiation field three-dimensional reconstruction method based on fusion voxels, which is characterized by training a three-dimensional reconstruction model by executing the following steps of;
step 1, inputting an image into a two-dimensional convolutional neural network, acquiring two-dimensional characteristics of the image, and generating a depth map according to the two-dimensional characteristics of the image by using a multi-view stereo MVS method;
step 2, aggregating two-dimensional features of adjacent images in the depth map and additional features calculated based on a coarse-stage MLP (multi-layer perceptron), and generating a local radiation field represented by voxels;
step 3, based on a recurrent neural network, fusing the local radiation field generated by each frame to a world coordinate system according to the weight to generate a global radiation field, and continuously updating the weight;
step 4, inputting the generated global radiation field into a NeRF renderer to obtain coordinates of each point and a nearby point density value, and storing the coordinates and the nearby point density value into each voxel;
step 5, filtering the midpoint of the global radiation field according to the depth map to remove redundant parts in the global radiation field;
step 6, filtering the voxel blocks according to the volume density threshold value, and reserving effective parts in the voxel blocks to obtain an updated global radiation field;
and 7, inputting the updated global radiation field into a volume renderer for volume rendering, calculating a loss function, and continuously optimizing the loss until training is completed, so as to obtain a three-dimensional reconstruction model. And reserving an MLP (multi-level image) network, inputting pictures into the network, generating a three-dimensional model of an object or a scene, and completing the synthesis of a realistic viewing angle of a new view.
The MLP refers to a multi-layer perceptron, multiple MLPs are needed in the whole process of the NeRF three-dimensional reconstruction, wherein the MLP in the coarse stage is responsible for uniform sampling, and the MLP in the fine stage is responsible for sampling near the surface of an object.
Further, the step 1 specifically includes:
parameters of n Zhang Yizhi cameraImage +.>Inputting a two-dimensional convolutional neural network as a sequence, extracting picture features from adjacent pictures, performing parallax matching on the picture features to obtain orderly parallax images with the same size as the original image, and generating depth images corresponding to the original image pixels one by one according to the parallax images; the formula for converting the disparity map into the depth map is as follows:
wherein For depth->For baseline length,/->For focal length->For parallax (I)> and />Column coordinates of the main points of the left and right views.
Further, the step 2 specifically includes the following steps:
step 2.1, using deep neural network as the firstImage of frame->Regression of a local nerve volume, extracting two-dimensional image features by using a multi-view stereo MVS technology, and establishing a cost volume represented by voxels according to the features;
step 2.2, using a two-dimensional convolutional neural network to divide the first phase into a plurality of phasesImage of frame->Mapping into a feature map->The scene content of the image is stored, the coarse stage MLP is used for obtaining additional features, the two-dimensional image features and the additional features calculated by the coarse stage MLP are projected onto the corresponding local volumes to obtain single-frame feature volumes, and the formula for generating the single-frame feature volumes is as follows:
wherein Is->Frames with voxels->Voxel feature as center +.>Is->Frame center->Corresponding two-dimensional feature projection, < >>Is->Frame view->Additional features calculated by MLP, +.>Representing feature connections;
step 2.3, aggregating the feature volumes of the multiframe to regress using the mean and variance of the voxel featuresLocal volume ∈of frame>To represent the local radiation field, where the mean can fuse the appearance information of multiple views and the variance can help to make geometric reasoning, the formula for generating the local radiation field is as follows:
wherein ,representing the local radiation field>Representing deep neural network, ++>Mean value->Indicate->The frame of the frame is a frame of a frame,is indicated at +.>Multiple neighboring views of frame aggregation, +.>Indicate->Voxel characteristics of frame, ">Representing the variance.
Further, the step 3 specifically includes: at each frameLocal radiation field to be generated->Global radiation field generated from the previous frame +.>Performing cyclic fusion to continuously update the global radiation field +.>The local volume of each frame is learned and fused by using a gating circulation unit at the time of updating, and the specific formula of the gating circulation unit is as follows:
wherein ,for updating the door->Neural network for controlling update gates,/>For resetting the door +.>To control the neural network of the reset gate, +.>Is based on the global radiation field after fusion of the current frame, < >>The neural network is used for controlling the sequential updating of the whole model and is used for controlling the sequential updating global reconstruction of the whole model; />Multiplying the elements; /> and />Local radiation fields for controlling the current frame of the fusion process, respectively>And the global radiation field of the previous frame +.>During fusion, only the local radiation field of the current frame is +.>And the global radiation field of the previous frame->The application of coincident voxels, other voxels remain unchanged.
Further, the step 4 specifically includes: placing the generated global radiation field into a NeRF renderer to obtain the point density of any pointAnd radiation value->The formula for regressive bulk density and emittance is as follows:
wherein Respectively representing the horizontal, vertical and longitudinal coordinates of the point,/-, respectively>Indicating azimuth angle, ++>Representing the polar viewing angle;
density of dotsAfter removing the points of (1) the volume density in each voxel is determined>And storing the volume density in a corresponding voxel, dynamically updating the volume density value, and updating the volume density value according to the following formula:
wherein ,indicate->Volume density of frame global radiation field voxels, +.>For controlling the update weight,/->Representing the (th) frame>The volume density of the local radiation field voxels generated from the nth picture.
Further, the step 5 specifically includes: and removing the part of the nerve radiation field outside the object surface according to the depth map, and reserving the part inside the object surface.
Further, the step 6 specifically includes: and removing voxel blocks below the threshold in the nerve radiation field according to the volume density threshold, and retaining the voxel blocks with the volume density threshold within the required range.
Further, the step 7 specifically includes: and (2) repeating the steps (2) to (6), putting the global radiation field filtered by the bulk density threshold and the depth map into a volume renderer for volume rendering, carrying out weighted summation according to the colors of the points sampled on the rays to obtain final rendering colors, and calculating loss according to the volume rendering results and continuously optimizing, wherein the formula of the volume rendering calculated colors is as follows:
wherein Representing a ray sampling point +.>Opacity function of>Representing the sampling point +.>Point density of->Representing the upsampling point +.>Interval of->Representing the sampling point +.>Transmittance function of>Representation dot->All previous sampling pointsOpacity function of>Representing the upsampling point +.>Interval of->Is the sampling point +.>Is a function of the probability density of (c) in the (c),is a ray->Color of final rendering, ++>Representing rays +.>Up-sampling number, +.>Representing the sampling point +.>Is a radiation value of (2); for NeRF, the difference between the rendering color and the true value is used as a loss function after volume rendering, and the loss function formula is as follows:
wherein Representing a loss function->Is a ray set,/->Is a ray->Is a true color of (c).
The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages:
1. according to the neural radiation field three-dimensional reconstruction method based on the fusion voxels, coarse dense voxels are generated for each view according to the point density, and the calculated amount is reduced by screening the voxels with density values within a certain range.
2. The implicit characterization method for multi-view fusion is provided, the acquisition of global information is enhanced by fusing local radiation fields generated by each view, redundant parts are reduced by screening voxels with the volume density within a required range, the calculated amount is reduced, and the training efficiency is improved.
3. And generating a depth map by using an MVS method to limit NeRF rendering points, and directly removing pruning operation on points outside an isosurface 0 (object surface) to improve the surface smoothing effect.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
The invention aims to provide a neural radiation field three-dimensional reconstruction method based on fusion voxels. The local voxel fields are generated according to the picture characteristics by utilizing the pre-trained network model, the generated local voxel fields are fused according to weights in sequence, so that a complete global radiation field is formed, and the calculated amount is reduced by screening voxels with density values within a certain range. In addition, a depth map is generated by a multi-view stereo (MVS) method to limit NeRF-rendered points, and pruning operation is directly performed on points outside the surface of the object to improve surface smoothness.
The technical scheme is as follows: the invention discloses a neural radiation field three-dimensional reconstruction method based on fusion voxels, which is characterized by training a three-dimensional reconstruction model by executing the following steps of;
step 1: inputting an image into a two-dimensional convolutional neural network, acquiring two-dimensional characteristics of the image, generating a corresponding depth map according to a parallax map of the image by using a multi-view stereo MVS method, wherein the image can use a data set, data in the data set has information such as shooting angle, camera parameters and the like of each picture, and a video shot by the user can also be put into a collmap to generate a corresponding camera pose, parameters and the like. Parameters of n Zhang Yizhi cameraImage +.>Inputting a two-dimensional convolutional neural network as a sequence, extracting picture features from adjacent pictures, performing parallax matching on the picture features to obtain orderly parallax images with the same size as the original image, and generating depth images corresponding to the original image pixels one by one according to the parallax images; disparity map conversionThe formula for the depth map is as follows:
wherein For depth->For baseline length,/->For focal length->For parallax (I)> and />Column coordinates of the main points of the left and right views.
Step 2: generating a local radiation field represented by voxels according to the depth map of the adjacent image and the additional features calculated by the coarse stage MLP;
step 2.1, using deep neural network as the firstImage of frame->Regression of a local nerve volume, extracting two-dimensional image features by using a multi-view stereo MVS technology, and establishing a cost volume represented by voxels according to the features;
step 2.2, using a two-dimensional convolutional neural network to divide the first phase into a plurality of phasesImage of frame->Mapping into a feature map->The scene content of the image is stored, the coarse stage MLP is used for obtaining additional features, the two-dimensional image features and the additional features calculated by the coarse stage MLP are projected onto the corresponding local volumes to obtain single-frame feature volumes, and the formula for generating the single-frame feature volumes is as follows:
wherein Is->Frames with voxels->Voxel feature as center +.>Is->Frame center->Corresponding two-dimensional feature projection, < >>Is->Frame view->Additional features calculated by MLP, +.>Representing feature connections;
step 2.3, aggregating the feature volumes of the multiframe by using the mean and variance of the voxel featuresIntegral regression ofLocal volume ∈of frame>To represent the local radiation field, where the mean can fuse the appearance information of multiple views and the variance can help to make geometric reasoning, the formula for generating the local radiation field is as follows:
wherein ,representing the local radiation field>Representing deep neural network, ++>Mean value->Indicate->The frame of the frame is a frame of a frame,is indicated at +.>Multiple neighboring views of frame aggregation, +.>Indicate->Voxel characteristics of frame, ">Representing the variance.
Step (a)3: based on recurrent neural network, the local radiation field generated by each frame is fused to the world coordinate system according to the weight to generate global radiation field, the weight is continuously updated, and the weight is updated in each frameLocal radiation field to be generated->Global radiation field generated from the previous frame +.>Performing cyclic fusion to continuously update the global radiation field +.>The local volume of each frame is learned and fused by using a gating circulation unit at the time of updating, and the specific formula of the gating circulation unit is as follows:
wherein ,for updating the door->To control the neural network of the update gate +.>For resetting the door +.>To control the neural network of the reset gate, +.>Is based on the global radiation field after fusion of the current frame, < >>Is a neural network for controlling the sequential update of the whole model, and is used for controlling the sequential update of the whole modelPartial reconstruction; />Multiplying the elements; /> and />Local radiation fields for controlling the current frame of the fusion process, respectively>And the global radiation field of the previous frame +.>During fusion, only the local radiation field of the current frame is +.>And the global radiation field of the previous frame->The application of coincident voxels, other voxels remain unchanged.
Step 4: inputting the generated global radiation field into a NeRF renderer to obtain the coordinates of each point in the global radiation field and the point density value nearby, removing the point with the point density value of 0, and then obtaining the volume density value of the voxels and storing the volume density value in each voxel; neRF samples the radiation field at each view angle for a point on the incoming ray and uses MLP to obtain the bulk density of any pointAnd (2) radiation degree->Calculate bulk Density->The formula for the irradiance is as follows:
wherein Respectively representing the horizontal, vertical and longitudinal coordinates of the point,/-, respectively>Indicating azimuth angle, ++>Representing the polar viewing angle;
density of dotsAfter removing the points of (1) the volume density in each voxel is determined>And storing the volume density in the corresponding voxel, and dynamically updating the volume density value, so that the volume density value is obtained through query instead of MLP calculation in the later use of the volume density, and the formula for updating the volume density value is as follows:
wherein ,indicate->Volume density of frame global radiation field voxels, +.>For controlling the update weight,/->Representing the (th) frame>The volume density of the local radiation field voxels generated from the nth picture.
Step 5: and removing the part of the nerve radiation field outside the object surface according to the depth map, and reserving the part inside the object surface. And limiting the position of the sampling point of NeRF during rendering by utilizing the depth map information, neglecting points outside the surface of the object, and only storing information of the sampling point meeting the requirement during rendering.
Step 6: and removing voxel blocks with the volume density threshold value lower than the threshold value in the nerve radiation field according to the volume density threshold value, reserving the voxel blocks with the volume density threshold value within the required range, and performing point sampling on voxels with the volume density within the required range only during NeRF rendering.
Step 7: and (3) repeating the steps (2) to (6), putting the global radiation field filtered by the point density threshold and the depth map into a volume renderer for volume rendering, carrying out weighted summation according to the colors of the points sampled on the rays to obtain a final rendering color, and calculating loss according to the volume rendering result and continuously optimizing, wherein the formula of the volume rendering calculated color is as follows:
wherein Representing a ray sampling point +.>Opacity function of>Representing the sampling point +.>Point density of->Representing the upsampling point +.>Interval of->Representing the sampling point +.>Transmittance function of>Representation dot->All previous sampling pointsOpacity function of>Representing the upsampling point +.>Interval of->Is the sampling point +.>Is a function of the probability density of (c) in the (c),is a ray->Color of final rendering, ++>Representing rays +.>Up-sampling number, +.>Representing the sampling point +.>Is a radiation value of (2); for NeRF, the difference between the rendering color and the true value is used as a loss function after volume rendering, and the loss function formula is as follows:
wherein Representing a loss function->Is a ray set,/->Is a ray->Is a true color of (c).
The experiment is carried out under the Windows 10 environment, the processor is I7-12700F, the memory is 32G, and the display card is RTX 3080 12G. The experimental performance of the method is as follows:
in contrast to the neural radiation field reconstruction model IBRNet proposed by the paper IBRNet: learning multi-view image-based reconstruction in 2021, the model NeRF proposed by Ben Mildnhall et al in paper NeRF: representing scenes as neural radiance fields forview synthesis in 2020, and the model NSVF proposed by Lingjie Liu et al in paper Neural sparse voxel fields in 2020, 100 scenes were randomly selected as training data in the Scannet dataset. The peak signal-to-noise ratio PSNR, the structural similarity SSIM and the perception loss LPIPS are used as main indexes for evaluation, wherein the higher the PSNR value is, the less noise is represented, the higher the SSIM value is, the higher the structural similarity is represented, and the lower the LPIPS is, the better the perception effect of people is. Since NSVF and NeRF are both optimized for a single scenario, the experiment only shows experimental results of scenario-by-scenario optimization for fairness.
The above cited documents are as follows:
[1]Wang Q, Wang Z, Genova K, et al. Ibrnet: Learning multi-view image-based rendering[C]//Proceeding-s of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 4690-4699.
[2]Mildenhall B, Srinivasan P P, Tancik M, et al. Nerf: Representing scenes as neural radiance fields for vi-ew synthesis[J]. Communications of the ACM, 2021, 65(1): 99-106.
[3]Liu L, Gu J, Zaw Lin K, et al. Neural sparse voxel fields[J]. Advances in Neural Information Processing Systems, 2020, 33: 15651-15663.
table 1 presents the results of the tests performed on the Scannet dataset. From experimental results we can find that the method has a prominent performance in the face of large real scene data sets.
Table 1 quantitative comparison on ScanNet dataset
Table 2 presents the results of the tests performed on the NeRF synthetic dataset. From the experimental results we can find that the method works well in the face of synthetic datasets.
Table 2 quantitative comparisons on NeRF synthetic datasets

Claims (5)

1. A neural radiation field three-dimensional reconstruction method based on fusion voxels, which is characterized in that a three-dimensional reconstruction model is trained by executing the following steps;
step 1, inputting an image into a two-dimensional convolutional neural network, acquiring two-dimensional characteristics of the image, and generating a depth map;
step 2, aggregating two-dimensional features of adjacent images in the depth map and additional features calculated based on the coarse-stage MLP to generate a local radiation field represented by voxels;
step 3, based on a recurrent neural network, fusing the local radiation field generated by each frame to a world coordinate system according to the weight to generate a global radiation field, and continuously updating the weight;
step 4, inputting the generated global radiation field into a NeRF renderer to obtain coordinates of each point and a nearby point density value, and storing the coordinates and the nearby point density value into each voxel;
step 5, filtering the midpoint of the global radiation field according to the depth map to remove redundant parts in the global radiation field;
step 6, filtering the voxel blocks according to the volume density threshold value, and reserving effective parts in the voxel blocks to obtain an updated global radiation field;
step 7, inputting the updated global radiation field into a volume renderer for volume rendering, calculating a loss function, continuously optimizing loss until training is completed, obtaining a three-dimensional reconstruction model, reserving an MLP (multi-level projection) network of the model, inputting pictures into the network, generating a three-dimensional model of an object or a scene, and completing the synthesis of a new view angle;
the step 2 specifically comprises the following steps:
step 2.1, using deep neural network as the firstImage of frame->Regression of a local nerve volume, extracting two-dimensional image features by using a multi-view stereo MVS technology, and establishing a cost volume represented by voxels according to the features;
step 2.2, using a two-dimensional convolutional neural network to divide the first phase into a plurality of phasesImage of frame->Mapping into a feature map->The scene content of the image is stored, the coarse stage MLP is used for obtaining additional features, the two-dimensional image features and the additional features calculated by the coarse stage MLP are projected onto the corresponding local volumes to obtain single-frame feature volumes, and the formula for generating the single-frame feature volumes is as follows:
wherein Is->Frames with voxels->Voxel feature as center +.>Is->Frame center->Corresponding two-dimensional feature projection, < >>Is->Frame view->Additional features calculated by MLP, +.>Representing feature connections;
step 2.3, aggregating the feature volumes of the multiframe to regress using the mean and variance of the voxel featuresLocal volume ∈of frame>To represent the local radiation field, where the mean can fuse the appearance information of multiple views and the variance can help to make geometric reasoning, the formula for generating the local radiation field is as follows:
wherein ,representing the local radiation field>Representing deep neural network, ++>Mean value->Indicate->Frame (F)>Is indicated at +.>Multiple neighboring views of frame aggregation, +.>Indicate->Voxel characteristics of frame, ">Representing the variance;
the step 3 is specifically as follows: at each frameLocal radiation field to be generated->Global radiation field generated from the previous frame +.>Performing cyclic fusion to continuously update the global radiation field +.>The local volume of each frame is learned and fused by using a gating circulation unit at the time of updating, and the specific formula of the gating circulation unit is as follows:
wherein ,for updating the door->To control the neural network of the update gate +.>For resetting the door +.>To control the neural network of the reset gate, +.>Is based on the global radiation field after fusion of the current frame, < >>The neural network is used for controlling the sequential updating of the whole model and is used for controlling the sequential updating global reconstruction of the whole model; />Multiplying the elements; /> and />Local radiation fields for controlling the current frame of the fusion process, respectively>And the global radiation field of the previous frame +.>During fusion, only the local radiation field of the current frame is +.>And the global radiation field of the previous frame->The overlapped voxels are applied, and other voxels are kept unchanged;
the step 4 is specifically as follows: placing the generated global radiation field into a NeRF renderer to obtain the point density of any pointAnd radiation value->The formula for regressive bulk density and emittance is as follows:
wherein Respectively representing the horizontal, vertical and longitudinal coordinates of the point,/-, respectively>Indicating azimuth angle, ++>Representing the polar viewing angle;
density of dotsAfter removing the points of (1) the volume density in each voxel is determined>And storing the volume density in a corresponding voxel, dynamically updating the volume density value, and updating the volume density value according to the following formula:
wherein ,indicate->Volume density of frame global radiation field voxels, +.>For controlling the update weight of the object,representing the (th) frame>The volume density of the local radiation field voxels generated from the nth picture.
2. The method for three-dimensional reconstruction of a neural radiation field based on fused voxels according to claim 1, wherein step 1 specifically comprises:
parameters of n Zhang Yizhi cameraImage +.>Inputting a two-dimensional convolutional neural network as a sequence, extracting picture features from adjacent pictures, performing parallax matching on the picture features to obtain orderly parallax images with the same size as the original image, and generating depth images corresponding to the original image pixels one by one according to the parallax images; the formula for converting the disparity map into the depth map is as follows:
wherein For depth->For baseline length,/->For focal length->For parallax (I)> and />Column coordinates of the main points of the left and right views.
3. The method for three-dimensional reconstruction of a neural radiation field based on fusion voxels according to claim 1, wherein step 5 is specifically: and removing the part of the nerve radiation field outside the object surface according to the depth map, and reserving the part inside the object surface.
4. The method for three-dimensional reconstruction of a neural radiation field based on fusion voxels according to claim 1, wherein step 6 is specifically: and removing voxel blocks below the threshold in the nerve radiation field according to the volume density threshold, and retaining the voxel blocks with the volume density threshold within the required range.
5. The method for three-dimensional reconstruction of a neural radiation field based on fused voxels according to claim 1, wherein step 7 is specifically: and (2) repeating the steps (2) to (6), putting the global radiation field filtered by the bulk density threshold and the depth map into a volume renderer for volume rendering, carrying out weighted summation according to the colors of the points sampled on the rays to obtain final rendering colors, and calculating loss according to the volume rendering results and continuously optimizing, wherein the formula of the volume rendering calculated colors is as follows:
wherein Representing a ray sampling point +.>Opacity function of>Representing the sampling point +.>Point density of->Representing the upsampling point +.>Interval of->Representing the sampling point +.>Transmittance function of>Representation dot->All sample points before +.>Opacity function of>Representing the upsampling point +.>Interval of->Is the sampling point +.>Probability density function of>Is a ray->Color of final rendering, ++>Representing rays +.>Up-sampling number, +.>Representing the sampling point +.>Is a radiation value of (2); for NeRF, the difference between the rendering color and the true value is used as a loss function after volume rendering, and the loss function formula is as follows:
wherein Representing a loss function->Is a ray set,/->Is a ray->Is a true color of (c).
CN202310947466.1A 2023-07-31 2023-07-31 Neural radiation field three-dimensional reconstruction method based on fusion voxels Active CN116664782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310947466.1A CN116664782B (en) 2023-07-31 2023-07-31 Neural radiation field three-dimensional reconstruction method based on fusion voxels

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310947466.1A CN116664782B (en) 2023-07-31 2023-07-31 Neural radiation field three-dimensional reconstruction method based on fusion voxels

Publications (2)

Publication Number Publication Date
CN116664782A CN116664782A (en) 2023-08-29
CN116664782B true CN116664782B (en) 2023-10-13

Family

ID=87710129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310947466.1A Active CN116664782B (en) 2023-07-31 2023-07-31 Neural radiation field three-dimensional reconstruction method based on fusion voxels

Country Status (1)

Country Link
CN (1) CN116664782B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496072B (en) * 2023-12-27 2024-03-08 南京理工大学 Three-dimensional digital person generation and interaction method and system
CN117496075B (en) * 2024-01-02 2024-03-22 中南大学 Single-view three-dimensional reconstruction method, system, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103637A (en) * 2017-04-05 2017-08-29 南京信息工程大学 One kind enhancing texture power method
CN112887698A (en) * 2021-02-04 2021-06-01 中国科学技术大学 High-quality face voice driving method based on nerve radiation field
CN113689540A (en) * 2021-07-22 2021-11-23 清华大学 Object reconstruction method and device based on RGB video
CN114119838A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Voxel model and image generation method, equipment and storage medium
CN114663603A (en) * 2022-05-24 2022-06-24 成都索贝数码科技股份有限公司 Static object three-dimensional grid model generation method based on nerve radiation field
CN115187682A (en) * 2022-05-10 2022-10-14 北京邮电大学 Object structure reconstruction method and related equipment
CN115512073A (en) * 2022-09-19 2022-12-23 南京信息工程大学 Three-dimensional texture grid reconstruction method based on multi-stage training under differentiable rendering
CN115731355A (en) * 2022-11-29 2023-03-03 湖北大学 SuperPoint-NeRF-based three-dimensional building reconstruction method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220301252A1 (en) * 2021-03-17 2022-09-22 Adobe Inc. View synthesis of a dynamic scene
US20230154104A1 (en) * 2021-11-12 2023-05-18 Nec Laboratories America, Inc. UNCERTAINTY-AWARE FUSION TOWARDS LARGE-SCALE NeRF
EP4191539A1 (en) * 2021-12-02 2023-06-07 Dimension Stream Labs AB Method for performing volumetric reconstruction

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103637A (en) * 2017-04-05 2017-08-29 南京信息工程大学 One kind enhancing texture power method
CN112887698A (en) * 2021-02-04 2021-06-01 中国科学技术大学 High-quality face voice driving method based on nerve radiation field
CN113689540A (en) * 2021-07-22 2021-11-23 清华大学 Object reconstruction method and device based on RGB video
CN114119838A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Voxel model and image generation method, equipment and storage medium
CN115187682A (en) * 2022-05-10 2022-10-14 北京邮电大学 Object structure reconstruction method and related equipment
CN114663603A (en) * 2022-05-24 2022-06-24 成都索贝数码科技股份有限公司 Static object three-dimensional grid model generation method based on nerve radiation field
CN115512073A (en) * 2022-09-19 2022-12-23 南京信息工程大学 Three-dimensional texture grid reconstruction method based on multi-stage training under differentiable rendering
CN115731355A (en) * 2022-11-29 2023-03-03 湖北大学 SuperPoint-NeRF-based three-dimensional building reconstruction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于SFS技术的纹理力触觉再现方法研究;李佳璐;宋爱国;吴涓;张小瑞;;仪器仪表学报(第04期);全文 *

Also Published As

Publication number Publication date
CN116664782A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
CN113706714B (en) New view angle synthesizing method based on depth image and nerve radiation field
Meshry et al. Neural rerendering in the wild
Liu et al. Neural rays for occlusion-aware image-based rendering
Yuan et al. Star: Self-supervised tracking and reconstruction of rigid objects in motion with neural rendering
CN109410307B (en) Scene point cloud semantic segmentation method
CN116664782B (en) Neural radiation field three-dimensional reconstruction method based on fusion voxels
CN109255831A (en) The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate
CN109191369A (en) 2D pictures turn method, storage medium and the device of 3D model
CN106981080A (en) Night unmanned vehicle scene depth method of estimation based on infrared image and radar data
Weng et al. Vid2actor: Free-viewpoint animatable person synthesis from video in the wild
CN110570522A (en) Multi-view three-dimensional reconstruction method
CN111951368B (en) Deep learning method for point cloud, voxel and multi-view fusion
CN110070574A (en) A kind of binocular vision Stereo Matching Algorithm based on improvement PSMNet
CN113077554A (en) Three-dimensional structured model reconstruction method based on any visual angle picture
CN110443883A (en) A kind of individual color image plane three-dimensional method for reconstructing based on dropblock
CN116205962B (en) Monocular depth estimation method and system based on complete context information
CN114926553A (en) Three-dimensional scene consistency stylization method and system based on nerve radiation field
CN115298708A (en) Multi-view neural human body rendering
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
CN110889868B (en) Monocular image depth estimation method combining gradient and texture features
CN116681838A (en) Monocular video dynamic human body three-dimensional reconstruction method based on gesture optimization
CN117115359B (en) Multi-view power grid three-dimensional space data reconstruction method based on depth map fusion
CN104463962A (en) Three-dimensional scene reconstruction method based on GPS information video
CN117150755A (en) Automatic driving scene simulation method and system based on nerve point rendering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant