CN115880419A - Neural implicit surface generation and interaction method based on voxels - Google Patents

Neural implicit surface generation and interaction method based on voxels Download PDF

Info

Publication number
CN115880419A
CN115880419A CN202211001790.6A CN202211001790A CN115880419A CN 115880419 A CN115880419 A CN 115880419A CN 202211001790 A CN202211001790 A CN 202211001790A CN 115880419 A CN115880419 A CN 115880419A
Authority
CN
China
Prior art keywords
voxel
ray
blocks
geometric
sampling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211001790.6A
Other languages
Chinese (zh)
Inventor
章国锋
鲍虎军
李海
杨兴锐
翟宏佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211001790.6A priority Critical patent/CN115880419A/en
Publication of CN115880419A publication Critical patent/CN115880419A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Generation (AREA)

Abstract

The invention discloses a neural implicit surface generation and interaction method based on voxels, and belongs to the field of computer vision and computer graphics. The invention decomposes a three-dimensional scene into geometric units taking a voxel block as a unit, stores the geometric and texture information inside the three-dimensional scene into the voxel block in a characteristic vector form, obtains the characteristics of corresponding three-dimensional points in an interpolation mode, and obtains a Symbol Distance Field (SDF) and corresponding colors through a geometric analysis network and a texture analysis network. On the basis, the invention provides the method for further improving the surface and texture precision of the model through progressive voxel elimination and decomposition; it is proposed to increase the number of samples of the active points by using surface-aware sampling. According to the method and the device, the surface and texture effects after editing can be rendered through interactive editing of the generated voxel blocks.

Description

Neural implicit surface generation and interaction method based on voxels
Technical Field
The invention relates to the field of computer vision and computer graphics, in particular to a neural implicit surface generation and interaction method based on voxels.
Background
Virtual content generation and interaction are important components of three-dimensional applications. Often, virtual content needs to be built by a professional designer, which is complex and time consuming. Therefore, the automatic reconstruction of accurate surfaces from multi-view images is crucial for virtual content generation, this is also an important research topic for computer vision and computer graphics. Prior to the deep learning era, image surface reconstruction was largely by multi-view stereogeometric techniques (MVS), which were highly dependent on feature detection and matching. Although these methods are mature in both academic and industrial areas, they are based on indirect feature matching and point cloud representation, so that information loss occurs during reconstruction. These lost information pose challenges to the reconstruction of complex scenes. For example, in the case of weak texture, repetitive features, or inconsistent brightness, it is difficult to match the exact features, resulting in the generation of erroneous three-dimensional points, ultimately resulting in defects in the reconstructed surface. Furthermore, discrete triangular meshes and inconsistent texture blocks are often not capable of rendering a realistic scene, since the corresponding textures of the meshes are generated separately.
In recent two years, work has emerged to represent scenes using neural networks and is rapidly becoming a hotspot of research. Work like OccupanceNet and DeepSDF suggests that implicit surfaces can be generated by learning, such as Symbolic Distance Fields (SDF) or Occupancy fields (Occupancy), and stored in a multi-layered perceptron (MLP). These networks can learn continuous scene representations from discrete three-dimensional sample books. Based on this finding, DVR and IDR extend this representation to the task of image-based surface reconstruction. However, these methods only learn textures for points of the surface, and it is difficult to learn an accurate surface without sufficient observation.
With the advent of NeRF-based methods, the novel view synthesis task has improved considerably. The NeRF and the extension method thereof utilize a volume rendering mode to learn the neural radiation field of the scene, and obvious progress is achieved. However, such methods do not accurately reconstruct the surface. Subsequently, methods like NeuS, UNISURF and VolSDF propose to combine SDF and radiation field to achieve surface reconstruction. These methods can be trained end-to-end directly from multi-view images without introducing additional expressions, thereby minimizing information loss and achieving higher accuracy than conventional methods.
However, these methods only reconstruct the entire space with a single network, and cannot perform large-scale reconstruction due to limited network capacity. In addition, scenes are hidden in the network, and interactive operations such as scene segmentation and editing cannot be performed in an aligned manner.
Disclosure of Invention
To solve the above problem, the present invention employs a hybrid architecture consisting of an explicit voxel representation and an implicit surface representation. This architecture combines the advantages of both representations, allowing explicit manipulation of the scene, with implicit surface and texture representation capabilities. The Vox-Surf is a neural implicit surface rendering framework based on voxels, combines a voxel-based method and an implicit surface reconstruction method based on images, and can be used for efficient surface reconstruction and rendering.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention firstly provides a neural implicit surface generation and interaction method based on voxels, which comprises the following steps:
step 1: dividing a scene into a plurality of non-overlapping voxel blocks aligned with coordinate axes in advance, and establishing an octree structure corresponding to the voxel blocks; storing geometric and texture information inside the voxel block in 8 vertexes of the voxel block in a fixed-length optimized feature vector form;
step 2: generating a ray passing through pixels on an image from the center of a camera by inputting a plurality of RGB or RGBD images with known positions and orientations of the camera, calculating the intersection of the ray and a voxel block, performing three-dimensional point sampling in the intersection area of the ray and the voxel block, and acquiring the feature vector of the voxel block where the ray is located through three-dimensional point coordinates; obtaining a Symbol Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining a color through a texture analysis network from the obtained intermediate information;
and step 3: calculating a space density value corresponding to the three-dimensional point through SDF, performing weight accumulation on colors on rays in a volume rendering mode, obtaining the color of a predicted ray corresponding pixel through volume rendering, comparing the color with the real color, optimizing a geometric analysis network, a texture analysis network and a feature vector on a voxel block, and gradually generating a neural implicit surface of a scene by adopting progressive training;
and 4, step 4: and (3) for the neural implicit surface of the scene obtained in the step (3), which contains the voxel blocks of the geometric and texture information feature vectors, carrying out independent rendering and interaction on the voxel blocks.
Further, the three-dimensional point sampling in step 2 specifically includes:
sampling an area with intersection with a voxel block on a ray by adopting a surface perception sampling strategy, wherein the process is divided into three steps:
(1) Firstly, sampling a three-dimensional point p according to uniform probability in an area with intersection with a voxel block on a ray and obtaining a characteristic vector e of p through a characteristic extraction function,
(2) Then using a geometry analysis network F σ Calculating the SDF of each sampling point; determining whether the section of the area contains the surface by judging whether the SDF of the continuous 2 three-dimensional points changes from positive to negative along the ray direction, and marking a voxel block containing the points as an important voxel;
(3) And finally, improving the sampling probability inside important voxels, reducing the sampling probability in other voxel blocks, resampling the region with intersection with the voxel blocks on the ray, and keeping the total sampling point number fixed.
Further, the obtaining of the color of the pixel corresponding to the predicted ray through volume rendering in step 3 specifically includes:
using an S-density function phi s (σ) converting the SDF of the three-dimensional point p into a density, φ s (σ) is a unimodal function of the symbol distance σ with respect to point p, where
Figure BDA0003807676910000031
Is Sigmoid function phi s S is a scale parameter controlling the shape of the distribution, the value of the point close to the surface being greater than the weight of the far point;
based on phi s (σ) defining the opacity density ρ (t) as
Figure BDA0003807676910000032
Thereby defining the bulk density function in volume rendering as:
Figure BDA0003807676910000033
the discrete cumulative transmittance in the definition volume rendering is as follows:
Figure BDA0003807676910000034
thus, for N on a ray p And performing volume rendering on the three-dimensional sampling points to obtain an accumulated color C (r):
Figure BDA0003807676910000035
wherein c is i Is the color of point i on the ray.
According to the preferred embodiment of the present invention, the progressive training in step 3 specifically comprises:
the progressive training is to remove and decompose the voxel blocks from coarse to fine, gradually remove the voxel blocks which do not contain the surface, and decompose the rest voxel blocks to obtain a finer surface generation effect;
firstly, uniformly sampling enough three-dimensional points in each voxel block; then using a geometry resolution network F σ Calculating the SDF of the three-dimensional point; is composed ofDefining a distance threshold τ for deciding whether to retain or reject the voxel block;
Figure BDA0003807676910000041
here K i E {0,1} is a flag, where 1 denotes a voxel to be retained;
the method for decomposing the voxel blocks comprises the following steps: the remaining voxel blocks are decomposed into 8 sub-voxel blocks, and the feature vectors of the corner vertices of the newly generated sub-voxel blocks, which are subsequently individually optimized, are computed using the feature retrieval function Γ.
Compared with the prior art, the invention has the advantages that:
1) The three-dimensional expression mode provided by the invention is named as Vox-Surf, and the three-dimensional scene is divided and stored by storing the scene in a plurality of disjoint voxel blocks. The Vox-Surf expression mixes the advantages of voxel expression and neural implicit surface and can be learned in an end-to-end manner. The Vox-Surf provided by the invention is a neural implicit surface expression based on voxels, and can be learned from end to end in a multi-view image. Compared with the prior art, the method and the device for generating the independent geometric rendering unit based on the voxel block are more suitable for interactive editing of scenes.
2) The invention uses the progressive training and surface perception sampling strategy to improve the reconstruction quality without increasing the memory overhead. Meanwhile, the method is faster in rendering speed than the existing method due to the strategy based on Ray-AABB intersection detection and the use of a smaller network.
Drawings
FIG. 1 is a generalized flow chart of the reconstruction of the Vox-Surf of the present invention;
FIG. 2 is a schematic flow diagram of the surface sensing sampling of the present invention;
FIG. 3 is a schematic diagram illustrating a training process of progressive voxel and surface reconstruction proposed by the present invention;
FIG. 4 is a schematic diagram of the interactive editing of the present invention.
Detailed Description
The invention is described in detail below with reference to the accompanying drawings. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.
The invention discloses a voxel-based neural implicit surface generation and interaction method, which comprises the following steps:
step 1, dividing a scene into a plurality of non-overlapping voxel blocks aligned with coordinate axes in advance, and establishing an octree structure corresponding to the voxel blocks; storing geometric and texture information inside the voxel block in 8 vertexes of the voxel block in a fixed-length optimized feature vector form;
in particular, a scene is represented by a set of bolded blocks of voxels
Figure BDA0003807676910000051
Dividing, wherein each voxel block has 8 angular vertexes, and the angular vertexes contain coded geometric and texture information; this information is based on a fixed-length optimizable feature vector->
Figure BDA0003807676910000052
Is represented by L e Is the length of the feature vector; thus, for any voxel V i Any 3d point inside->
Figure BDA0003807676910000053
Adjacent voxel blocks share the feature vectors of 4 angular vertices.
Step 2: generating a ray passing through pixels on an image from the center of a camera by inputting a plurality of RGB or RGBD images with known positions and orientations of the camera, calculating the intersection of the ray and a voxel block, performing three-dimensional point sampling in the intersection area of the ray and the voxel block, and acquiring the feature vector of the voxel block where the ray is located through three-dimensional point coordinates; and obtaining a Symbol Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining the color from the obtained intermediate information through a texture analysis network.
As shown in fig. 1, a ray passing through a pixel on an image in the d direction from the camera center o is defined as r (t) = o + dt, and t is a depth in the ray direction. And each Ray calculates the depth of the intersection point of the Ray and the voxel block through a Ray-AABB intersection detection algorithm, so that the region with the intersection of the Ray and the voxel block on the emergent line is divided.
In a preferred embodiment of the present invention, the obtaining of the Symbol Distance Field (SDF) and the intermediate information through the geometric resolution network in step 2, and then obtaining the color through the texture resolution network from the obtained intermediate information specifically include: defining feature extraction functions
Figure BDA0003807676910000054
Mapping a three-dimensional point p to a length L e Is greater than or equal to>
Figure BDA0003807676910000055
And the feature extraction function is realized by trilinear interpolation, and feature vectors contained by 8 angular vertexes of the voxel block are interpolated according to the three-dimensional coordinate of p and the relative position of the p in the voxel block to obtain the feature vector of p.
The invention adopts a multilayer perceptron network (MLP) to represent a geometric analysis network F σ And a texture resolution network F c . Geometric resolution network
Figure BDA0003807676910000056
Mapping a feature vector e of p to its signed distance field->
Figure BDA0003807676910000057
And a length L f Geometric feature vector pick>
Figure BDA0003807676910000058
The sign of σ indicates whether p is inside or outside the surface S. The surface S of the scene may be extracted by the following equation
Figure BDA0003807676910000059
Wherein operation [0]Means from F σ A first value, in the example of the invention the symbol distance field sigma at the position p, is obtained. The geometric characteristic vectors f and p of the three-dimensional point p are positionedAre connected as a texture analysis network F c Get the color c at p. In practice, the invention adopts a position coding algorithm PE provided in a NeRF method, codes the characteristic vector e before entering a geometric analysis network, and codes the ray direction d before entering a texture analysis network.
In one embodiment of the invention, the invention proposes a surface-aware sampling strategy to sample three-dimensional points of a region on a ray that intersects a voxel block. The process can be roughly divided into three steps, as shown in fig. 2: (1) Firstly, sampling a three-dimensional point p in an area with intersection with a voxel block on a ray according to uniform probability and obtaining a characteristic vector e of the p through a characteristic extraction function, (2) then utilizing a geometric analysis network F σ The SDF for each sample point is calculated. Whether the segment of the region contains a surface is determined by determining whether the SDFs of 2 consecutive three-dimensional points change from positive (outside) to negative (inside) along the ray direction, and the voxel block containing these points is labeled as a significant voxel. (3) And finally, improving the sampling probability inside important voxels, reducing the sampling probability in other voxel blocks, resampling the region with intersection with the voxel blocks on the ray, and keeping the total sampling point number fixed.
In practice, the resampling is further divided into full surface-aware resampling (fig. 2, 3 rd figure) and first surface-aware resampling (fig. 2, last 1 st figure) depending on whether only the first significant voxel is used. When the shape is unstable, the former is used to optimize all possible surfaces and the latter is used to optimize the fine structure of the stable shape.
And 3, step 3: calculating a space density value corresponding to the three-dimensional point through SDF, performing weight accumulation on colors on rays in a volume rendering mode, obtaining the color of a predicted ray corresponding pixel through volume rendering, comparing the color with the real color, optimizing a geometric analysis network, a texture analysis network and a feature vector on a voxel block, and gradually generating a neural implicit surface of a scene by adopting progressive training;
the step 3 of obtaining the color of the pixel corresponding to the predicted ray through volume rendering specifically includes:
the invention uses an S-density function phi s (σ) converting the SDF of the three-dimensional point p into a density, φ s (σ) is a unimodal function of the symbol distance σ with respect to point p, where
Figure BDA0003807676910000061
Is Sigmoid function phi s S is a scale parameter that controls the shape of the distribution, with points near the surface having a value that is greater than the weight of points farther away.
Based on phi s (σ) defining the opacity density ρ (t) as
Figure BDA0003807676910000062
Thereby defining the bulk density function in volume rendering as:
Figure BDA0003807676910000071
the discrete cumulative transmittance in the definition volume rendering is as follows:
Figure BDA0003807676910000072
thus, for N on a ray p And performing volume rendering on the three-dimensional sampling points to obtain an accumulated color C (r):
Figure BDA0003807676910000073
wherein c is i Is the color of point i on the ray.
The progressive training in the step 3 specifically comprises:
the progressive training is to remove and decompose the voxel blocks from coarse to fine, gradually remove the voxel blocks which do not contain the surface, and decompose the rest voxel blocks to obtain a finer surface generation effect;
the voxel block culling step is to first uniformly sample enough three-dimensional points within each voxel block. Then using a geometry resolution network F σ The SDF of the three-dimensional points is calculated. To decide whether to retain or reject the voxel block, the present invention defines a distance threshold τ.
Figure BDA0003807676910000074
Here K i E {0,1} is a flag indicating whether a voxel is retained, where 1 indicates a retained voxel.
The method comprises the following steps of: the remaining voxel blocks are decomposed into 8 sub-voxel blocks, and the feature vectors of the corner vertices of the newly generated sub-voxel blocks, which are subsequently individually optimized, are computed using the feature retrieval function Γ.
The effect of each voxel block culling and decomposition is shown in the left 4 panels of fig. 3, and the resulting surface and texture is shown in the right 2 panels of fig. 3.
In order to optimize the feature vectors, the geometric analysis network and the texture analysis network, the present invention utilizes the following loss functions. For each ray, the ray's cumulative color C (r) is first calculated and then compared to the true color
Figure RE-GDA0003985188760000075
The L1 loss was calculated.
Figure BDA0003807676910000076
In order to constrain the SDF, the invention also adds an eikonal loss term on the sampled three-dimensional point p, and the term maintains the stability of the SDF by constraining the normal vector of the adjacent sampling point.
Figure BDA0003807676910000077
The loss function finally used is
Figure BDA0003807676910000081
If depth information is entered. The present invention additionally uses the depth loss based on the occupied field.
The occupancy field is defined as
occ(t)=Sigmoid(-scale·F σ (Γ(r(t)))[0])
Since the gradient of the occupied field peaks only near the surface S. Therefore, the present invention divides the ray r (t) with depth information into three intervals, with different corresponding losses:
Figure BDA0003807676910000082
for a given depth
Figure BDA0003807676910000083
At the previous point, δ t is a small noise tolerant depth range. The present invention always assumes that these points are outside the surface.
Figure BDA0003807676910000084
For a given depth
Figure BDA0003807676910000085
The present invention always assumes that these points are within the surface. It has been found in experiments that this loss is still valid even in the case of rays intersecting multiple surfaces, as long as sufficient observations are made.
Figure BDA0003807676910000086
For
Figure BDA0003807676910000087
The points in between, which the present invention considers to be on the surface, so the straight constraint SDF is 0.
Finally, the total depth loss is a combination of the three losses mentioned above:
Figure BDA0003807676910000088
and 4, step 4: and (4) for the neural implicit surface of the scene obtained in the step (3), which contains the voxel blocks of the geometrical and textural information feature vectors, performing independent rendering and interaction on the voxel blocks. The individual rendering and interaction in the step 4 specifically include: each of the voxel blocks trained in the step 3 can be regarded as an independent geometric unit, interactive editing is directly carried out on the scene by changing the properties of the position, the size, the orientation and the like of the voxel block, the interactive freedom degree is improved, and the real texture effect under the current visual angle is directly generated through volume rendering in the step 3.
Examples
Experiments are carried out on two different types of data sets, namely a small scene volume data set DTU and an indoor scene data set ScanNet. For a DTU dataset, the invention first generates an initial voxel block and corresponding octree within a unit cube with a voxel size of 0.8. And adopting voxel block elimination once every 50,000 iterations, and respectively carrying out voxel block decomposition at 20,000, 50,000, 100,000, 20,000 and 300,000 iterations, wherein the elimination threshold is 0.01. Uniform voxel sampling is used prior to the second segmentation, and a full-surface-aware voxel resampling strategy is used during the second to fourth segmentations. After the fourth segmentation, the first surface-aware voxel resampling is used to continue refining detail. The voxel embedding length is 16, the geometry resolution network is a 4-layer MLP with 128 hidden units per layer, and the texture resolution network is a 4-layer MLP with 128 hidden units per layer. Before being input into the extractor, the voxel features are coded with 6 frequencies and the ray direction is coded with 8 frequencies. Compared with COLMAP, DVR, IDR and the prior NeuS method with the highest precision, the invention adopts chamfer measurement index to evaluate the precision of a real three-dimensional model and a reconstructed three-dimensional model, and the average precision of the invention is higher than that of the NeuS and the IDR.
For the Scannet dataset, the present invention uses data with depth for training, first backprojecting all depth observations into three-dimensional points, and then voxelizing these points using an initial voxel size of 0.4. Since the RGB-D sensor is accurate only within a certain distance, the maximum depth range is limited to 5.0 to reduce noise samples. The invention also gradually decomposes and eliminates the voxels twice in the training process, so that the minimum voxel size is 0.1. The invention compares COLMAP and TSDF methods, and the invention compares chamfer index and F-score index, and the result is obviously superior to the traditional TSDF method.
The method can be applied to editing scenes and objects, and as shown in fig. 4, the corresponding realistic scene can be directly rendered by modifying the voxel units in the method, aligning, copying, locally zooming, dividing and the like.
The foregoing lists merely illustrate specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims (8)

1. A method for generating and interacting with a neural implicit surface based on voxels, comprising the steps of:
step 1: dividing a scene into a plurality of non-overlapping voxel blocks aligned with coordinate axes in advance, and establishing an octree structure corresponding to the voxel blocks; storing geometric and texture information inside the voxel block in 8 vertexes of the voxel block in a fixed-length optimized feature vector form;
step 2: generating a ray passing through pixels on an image from the center of a camera by inputting a plurality of RGB or RGBD images with known positions and orientations of the camera, calculating the intersection of the ray and a voxel block, performing three-dimensional point sampling in the intersection area of the ray and the voxel block, and acquiring the feature vector of the voxel block where the ray is located through three-dimensional point coordinates; obtaining a Symbol Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining a color through a texture analysis network from the obtained intermediate information;
and step 3: calculating a space density value corresponding to the three-dimensional point through SDF, performing weight accumulation on colors on rays in a volume rendering mode, obtaining the color of a predicted ray corresponding pixel through volume rendering, comparing the color with the real color, optimizing a geometric analysis network, a texture analysis network and a feature vector on a voxel block, and gradually generating a neural implicit surface of a scene by adopting progressive training;
and 4, step 4: and (3) for the neural implicit surface of the scene obtained in the step (3), which contains the voxel blocks of the geometric and texture information feature vectors, carrying out independent rendering and interaction on the voxel blocks.
2. The method for generating and interacting with a neural implicit surface based on voxels according to claim 1, wherein the step 1 is specifically:
scene is divided into a set of boldfaced blocks
Figure FDA0003807676900000011
Dividing, wherein each voxel block has 8 angular vertexes, and the angular vertexes contain coded geometric and texture information; this information is based on a fixed-length optimizable feature vector->
Figure FDA0003807676900000012
Is represented by L e Is the length of the feature vector; thus, for any voxel V i Any 3d point inside->
Figure FDA0003807676900000013
Adjacent voxel blocks share the feature vectors of 4 angular vertices.
3. The method for generating and interacting a neural implicit surface based on voxels according to claim 1, wherein the intersection of the computed ray and the voxel block in step 2 is specifically:
a ray passing through a pixel on the image in the d direction from the camera center o is defined as r (t) = o + dt, t being the depth in the direction of the ray; and each Ray calculates the depth of the intersection point of the Ray and the voxel block through a Ray-AABB intersection detection algorithm, so that the region with the intersection of the Ray and the voxel block on the emergent line is divided.
4. The method for generating and interacting a neural implicit surface based on voxels according to claim 1, wherein the step 2 of performing three-dimensional point sampling specifically comprises:
sampling an area with intersection with a voxel block on a ray by adopting a surface perception sampling strategy, wherein the process is divided into three steps:
(1) Firstly, sampling a three-dimensional point p according to uniform probability in an area with intersection with a voxel block on a ray and obtaining a characteristic vector e of p through a characteristic extraction function,
(2) Then using a geometry analysis network F σ Calculating the SDF of each sampling point; determining whether the section of the area contains the surface by judging whether the SDF of continuous 2 three-dimensional points changes from positive to negative along the ray direction, and marking a voxel block containing the points as an important voxel;
(3) And finally, improving the sampling probability inside important voxels, reducing the sampling probability in other voxel blocks, resampling the region with intersection with the voxel blocks on the ray, and keeping the total sampling point number fixed.
5. The method according to claim 1, wherein the step 2 comprises obtaining a Symbolic Distance Field (SDF) and intermediate information through a geometric analysis network, and obtaining color from the obtained intermediate information through a texture analysis network, specifically:
defining a feature extraction function Γ:
Figure FDA0003807676900000021
mapping a three-dimensional point p to a length L e Is greater than or equal to>
Figure FDA0003807676900000022
The feature extraction function is realized by trilinear interpolation, and feature vectors contained by 8 angular vertexes of the voxel block are interpolated according to the three-dimensional coordinate of p and the relative position of the p in the voxel block to obtain the feature vector of p;
representing a geometry-resolving network F using a multi-layer perceptron network (MLP) σ And a texture resolution network F c (ii) a Geometric resolution network F σ (e):
Figure FDA0003807676900000023
Mapping a feature vector e of p to its signed distance field->
Figure FDA0003807676900000024
And a length L f Geometric feature vector
Figure FDA0003807676900000025
The sign of σ indicates whether p is inside or outside the surface S. The surface S of the scene may be extracted by the following equation
Figure FDA0003807676900000026
Wherein operation [0]Means from F σ To obtain a symbol distance field sigma at a position p; the ray directions d of the geometric characteristic vectors F and p of the three-dimensional point p and the characteristic vector e of p are connected as a texture analysis network F c Get the color c at p.
6. The method for generating and interacting a neural implicit surface based on voxels according to claim 1, wherein the color of the pixel corresponding to the predicted ray obtained by volume rendering in step 3 is specifically:
using S-density function phi s (σ) converting the SDF of the three-dimensional point p into a density, φ s (σ) is a unimodal function of the symbol distance σ with respect to point p,wherein
Figure DEST_PATH_FDA0003985188750000027
Is Sigmoid function phi s S is a scale parameter controlling the shape of the distribution, the value of the point close to the surface being greater than the weight of the far point;
based on phi s (σ) defining the opacity density ρ (t) as
Figure DEST_PATH_FDA0003985188750000031
Thereby defining the bulk density function in volume rendering as:
Figure DEST_PATH_FDA0003985188750000032
the discrete cumulative transmittance in the definition volume rendering is as follows:
Figure DEST_PATH_FDA0003985188750000033
thus, for N on the ray p And performing volume rendering on the three-dimensional sampling points to obtain an accumulated color C (r):
Figure DEST_PATH_FDA0003985188750000034
wherein c is i Is the color of point i on the ray.
7. The method for generating and interacting a neural implicit surface based on voxels according to claim 1, wherein the progressive training in step 3 is specifically:
the progressive training is to remove and decompose the voxel blocks from coarse to fine, gradually remove the voxel blocks which do not contain the surface, and decompose the rest voxel blocks to obtain a finer surface generation effect;
firstly, uniformly sampling enough three-dimensional points in each voxel block; then using a geometry resolution network F σ Calculating the SDF of the three-dimensional point; to decide whether to retain or reject the voxel block, a distance threshold τ is defined;
Figure FDA0003807676900000035
here K i E {0,1} is a flag, where 1 denotes a voxel to be retained;
the method for decomposing the voxel blocks comprises the following steps: the remaining voxel blocks are decomposed into 8 sub-voxel blocks, and the feature vectors of the corner vertices of the newly generated sub-voxel blocks, which are subsequently individually optimized, are computed using the feature retrieval function Γ.
8. The method for generating and interacting a neural implicit surface based on voxels according to claim 1, wherein the individual rendering and interaction in step 4 specifically comprises:
and 3, each of the trained voxel blocks can be regarded as an independent geometric unit, interactive editing is directly carried out on the scene by changing the position, size and orientation of the voxel block, the interactive freedom degree is improved, and the real texture effect under the current visual angle is directly generated through volume rendering in the step 3.
CN202211001790.6A 2022-08-20 2022-08-20 Neural implicit surface generation and interaction method based on voxels Pending CN115880419A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211001790.6A CN115880419A (en) 2022-08-20 2022-08-20 Neural implicit surface generation and interaction method based on voxels

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211001790.6A CN115880419A (en) 2022-08-20 2022-08-20 Neural implicit surface generation and interaction method based on voxels

Publications (1)

Publication Number Publication Date
CN115880419A true CN115880419A (en) 2023-03-31

Family

ID=85769657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211001790.6A Pending CN115880419A (en) 2022-08-20 2022-08-20 Neural implicit surface generation and interaction method based on voxels

Country Status (1)

Country Link
CN (1) CN115880419A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456078A (en) * 2023-12-19 2024-01-26 北京渲光科技有限公司 Neural radiation field rendering method, system and equipment based on various sampling strategies

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117456078A (en) * 2023-12-19 2024-01-26 北京渲光科技有限公司 Neural radiation field rendering method, system and equipment based on various sampling strategies
CN117456078B (en) * 2023-12-19 2024-03-26 北京渲光科技有限公司 Neural radiation field rendering method, system and equipment based on various sampling strategies

Similar Documents

Publication Publication Date Title
Li et al. Neuralangelo: High-fidelity neural surface reconstruction
CN110738697B (en) Monocular depth estimation method based on deep learning
CN112258618A (en) Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
CN111105382B (en) Video repair method
CN110381268B (en) Method, device, storage medium and electronic equipment for generating video
JP2003077004A (en) Hierarchical image base representation of three- dimensional static or dynamic object, and method and device for using representation in rendering of object
CN113628348B (en) Method and equipment for determining viewpoint path in three-dimensional scene
Li et al. Vox-surf: Voxel-based implicit surface representation
CN113822993B (en) Digital twinning method and system based on 3D model matching
CN114424250A (en) Structural modeling
CN115115797B (en) Large-scene sparse light field semantic driving intelligent reconstruction method, system and device
Holzmann et al. Semantically aware urban 3d reconstruction with plane-based regularization
CN111462030A (en) Multi-image fused stereoscopic set vision new angle construction drawing method
Liu et al. High-quality textured 3D shape reconstruction with cascaded fully convolutional networks
CN114998515A (en) 3D human body self-supervision reconstruction method based on multi-view images
CN112562081A (en) Visual map construction method for visual layered positioning
CN111899295A (en) Monocular scene depth prediction method based on deep learning
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
CN114581571A (en) Monocular human body reconstruction method and device based on IMU and forward deformation field
CN115880419A (en) Neural implicit surface generation and interaction method based on voxels
CN113610912B (en) System and method for estimating monocular depth of low-resolution image in three-dimensional scene reconstruction
WO2020169959A1 (en) Image processing to determine object thickness
WO2023004559A1 (en) Editable free-viewpoint video using a layered neural representation
CN117635801A (en) New view synthesis method and system based on real-time rendering generalizable nerve radiation field
RU2710659C1 (en) Simultaneous uncontrolled segmentation of objects and drawing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination