CN116977536A - Novel visual angle synthesis method for borderless scene based on mixed nerve radiation field - Google Patents

Novel visual angle synthesis method for borderless scene based on mixed nerve radiation field Download PDF

Info

Publication number
CN116977536A
CN116977536A CN202311018456.6A CN202311018456A CN116977536A CN 116977536 A CN116977536 A CN 116977536A CN 202311018456 A CN202311018456 A CN 202311018456A CN 116977536 A CN116977536 A CN 116977536A
Authority
CN
China
Prior art keywords
feature
sampling
color
pixel
borderless
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311018456.6A
Other languages
Chinese (zh)
Inventor
崔林艳
张旭
尹继豪
薛斌党
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202311018456.6A priority Critical patent/CN116977536A/en
Publication of CN116977536A publication Critical patent/CN116977536A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/08Volume rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/06Ray-tracing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Generation (AREA)

Abstract

The invention relates to a novel visual angle synthesis method of borderless scene based on a mixed nerve radiation field, which comprises the following steps: parameterizing the borderless space to a bounded area, and encoding scene information by utilizing a hash feature grid and a plane feature grid to construct a color-volume density decoder taking MLP as a main body; obtaining feature vectors of all sampling points on the light through spiral sampling and linear interpolation of the surface of the light cone; then, a volume rendering equation is used for obtaining the characteristic vector and the depth average value of each ray, and the corresponding pixel color is decoded through the shallow layer MLP; for rendering results, monitoring a color field by using the real colors of pixels in the optimization process, and monitoring the rendering depth by using the sparse point cloud obtained by SFM; and aiming at the optimization result, giving any camera pose, and rendering an imaging result under the view angle. The invention can realize new view angle synthesis under 360-degree borderless scene, enhance modeling capability of nerve radiation field under sparse view angle, and improve image rendering quality under new view angle.

Description

Novel visual angle synthesis method for borderless scene based on mixed nerve radiation field
Technical Field
The invention relates to the field of new view angle synthesis, in particular to a borderless scene new view angle synthesis method based on a mixed nerve radiation field.
Background
New perspective synthesis is a computer graphics and computer vision technique aimed at generating new perspective images from limited input images. It allows us to observe scenes or objects in a virtual way from different angles, which are not actually present in the original input.
Traditional image synthesis techniques rely primarily on methods such as copying, cropping, stitching, and transforming the images to create new perspectives or scenes. However, these methods are generally limited by input image quality, range of viewing angles, and scene complexity, resulting in the generated image may not be sufficiently realistic or have visual distortion. With the rapid development of Virtual Reality (VR), augmented Reality (AR) and computer vision fields, there is an increasing demand for new angle synthesis of higher quality and more natural.
New view angle synthesis techniques involve complex algorithms and models, such as methods based on generating a countermeasure network (GAN), conditional generation models, and space transformer networks, etc. These methods can infer and generate additional perspectives by learning scene transition rules between input images, thereby enabling the composite results to be more diverse and natural. Despite significant advances in new view angle synthesis techniques, challenges remain in certain applications, such as inaccuracy that may occur when processing complex scenes, motion blur, or low quality images. Therefore, the method still has great significance for the improvement and optimization of the new view angle synthesis technology, and is particularly in the fields of virtual reality, augmented reality, game development, movie production and the like.
Aiming at the research of a new view angle synthesizing method facing borderless scenes, the defects of the prior art are mainly represented in the following aspects: (1) The borderless scene has a large scale change range, and continuous scene information is difficult to learn by a network; (2) The borderless scene contains rich information, and the classical characterization method requires long training time and has the forgetting problem; (3) The existing rendering process needs to perform MLP forward computation for many times, and a great deal of time is consumed for synthesizing the color of one ray.
Disclosure of Invention
The technical solution of the invention is that: the method overcomes the defects of the prior art, provides a new view angle synthesis method based on a borderless scene, improves training speed and reasoning efficiency of a model on the basis of a limited known view angle, simultaneously keeps lower parameter quantity of a nerve radiation field, and realizes high-quality image rendering under any view angle.
The technical scheme of the invention is a novel visual angle synthesis method of borderless scene based on mixed nerve radiation field, comprising the following steps:
(1) Parameterizing the borderless space to a bounded area, and utilizing the multi-resolution hash feature grid and the plane feature grid to encode scene information to construct a color-volume density decoder taking the MLP as a main body;
(2) Aiming at the feature grid established in the step (1), obtaining feature vectors of all sampling points on the light through spiral sampling and linear interpolation of the light cone surface;
(3) Aiming at the sampling point characteristics obtained in the step (2), a volume rendering equation is used for obtaining a characteristic vector and a depth average value of each ray, and corresponding pixel colors are decoded through shallow layer MLP;
(4) Aiming at the rendering result of the step (3), monitoring a color field by using the true color of the pixel in the optimization process, and monitoring the rendering depth by using the sparse point cloud obtained by SMF;
(5) And (3) aiming at the optimized result in the step (4), and giving any camera pose, so that an imaging result under the visual angle can be rendered.
In the step (1), the borderless space is parameterized to a bounded area, scene information is encoded by utilizing a multi-resolution 3D hash feature grid and a 2D plane feature grid, and a color-volume density decoder mainly comprising MLP is constructed, and the method comprises the following steps:
transforming the position of any three-dimensional point in space to a bounded spherical region using a coordinate transformation function:
where x is the three-dimensional coordinates of the sample point. The whole space is parameterized into a sphere area with the radius of 2 through f (x), a multi-resolution 3D hash characteristic grid is constructed on the sphere area, and the scene information is encoded with less parameter quantity:
N l =N min ·b l
wherein N is max The number of nodes for the highest resolution grid, N min The number of nodes for the lowest resolution grid, N l And b is the scale relation between adjacent resolutions for the node number of the first layer grid. The 3D hash feature grids can effectively encode scene information, but under the condition that the length of the hash table is determined, the higher resolution 3D hash feature grids are greatly influenced by hash collision, so color and bulk density ambiguity can be caused, and in order to relieve the hash collision in the query process, mutually orthogonal 2D plane feature grids are introduced to assist in representing scene details; finally, an MLP is built as a decoder responsible for regression of color and bulk density from the high-dimensional feature vectors.
In the step (2), for the feature grid established in the step (1), feature vectors of all sampling truncated cones on the light are obtained through spiral sampling and linear interpolation on the surface of the light cone, and the method comprises the following steps:
generating a light cone which starts from the light center and extends to infinity for each pixel through the pose of the camera, sampling a plurality of truncated cones according to the inverse depth, and ensuring that the sampling frequency is higher in a region close to the light center and gradually reduced in a region gradually far from the light center; for each truncated cone sampled, a spiral parametric equation was constructed on its side, and 7 points were sampled uniformly on the spiral to represent the truncated cone region:
where t is the sampling distance, r is the radius of the light cone on the normalized plane, n is the total number of sampling points, where n=7, m is the number of spiral turns, and p is the coordinates of the sampling points.
Each sampling point is converted into a sphere space through a coordinate transformation function, eight corner points closest to the corner points are queried in a 3D hash feature grid, feature vectors corresponding to the corner points are indexed through the hash function, and then the 3D features of the sampling points are obtained through tri-linear interpolation; for the 2D plane feature grid, the sampling points are respectively projected to three mutually orthogonal planes, four corner points closest to the projection points and feature vectors corresponding to the corner points are inquired in the planes, the feature vectors of the projection points are obtained through bilinear interpolation, and finally the feature vectors of the three projection points are fused, so that the 2D feature of the sampling points is obtained:
feature 3D =tri(f 1 ,f 2 ,...f 8 )
where tri is a tri-linear interpolation function, bil is a bi-linear interpolation function, f i G for indexed 3D features i Is the indexed 2D feature. The 3D features and the 2D features are spliced to serve as feature representation of sampling points, after feature vectors of 7 sampling points in each truncated cone are obtained through query and interpolation, the average value of space coordinates of the 7 sampling points and the distance between each sampling point and the average value point are calculated, inverse proportion of the distance is normalized to serve as weight of each sampling point, and seven feature vectors are fused into one truncated cone feature vector to describe the whole truncated cone region.
In the step (3), for the truncated cone feature vector obtained in the step (2), a volume rendering equation is used to obtain a color feature vector and a depth average value of each ray, and the corresponding pixel color and depth, that is, a rendering result, are decoded through the shallow layer MLP, and the method is as follows:
firstly, the truncated cone feature vector obtained in the step (2) is passed through a layer of fully-connected network to obtain the volume density and color feature vector at the truncated cone mean point, the volume density distribution on the light is utilized to calculate the light passing rate at all truncated cone mean points, and then the weight of each truncated cone on the light in a rendering equation is obtained:
α i =1-exp(-σ i δ i )
λ i =T i α i
wherein sigma i Is bulk density, alpha i For occupancy probability, delta i Is the interval distance of light cone, T i Is the light passing rate lambda i Is a rendering weight. And weighting all color feature vectors by using the weight, generating a feature for a pixel, finally sending the feature into the MLP, regressing the color of the pixel, and weighting by using the same weight to obtain the depth corresponding to the pixel:
feature pixel =∑ i λ i f i
rgb=MLP(feature pixel )
depth=∑ i λ i t i
wherein f i Feature vector, which is truncated cone pixel For pixel characteristics, t is the sampling distance of the truncated cone, rgb is the color of the rendered pixel, and depth is the rendered depth, i.e. the rendering result.
In the step (4), aiming at the rendering result in the step (3), the color field is supervised by using the true color of the pixel in the optimization process, and the depth of rendering is supervised by using the sparse point cloud obtained by the SMF, and the method comprises the following steps:
the method comprises the steps of constructing a loss function of an algorithm, wherein the first term is a color loss function, and because the algorithm is an optimization process of coarse-to-fine, pixel colors are respectively rendered in two stages, and therefore the mean square error is directly formed by using the true colors of pixels and the twice rendered colors:
wherein C (r) is the true color of the pixel, C coarse (r) pixel color rendered for the coarse phase, C fine (r) is the pixel color rendered for the fine phase.
The second term is a depth loss function, the training of the nerve radiation field requires a more accurate camera pose, the SMF method recovers the pose and obtains the sparse point cloud of the scene, and the sparse depth under the training pose can be obtained by projecting the sparse point cloud to each view angle and is used for supervising the depth of rendering:
wherein D is the light generated by all the point cloud projection points, D (r) is the point cloud projection depth, and D fine (r) is the rendering depth. Inverse deep supervision is employed instead of direct use of deep supervision to reduce the impact of outliers on optimization.
In the step (5), for the model optimized in the step (4), given any camera pose, an imaging result under the view angle can be rendered, and the method comprises the following steps:
generating an arbitrary viewing angle in the scene according to the distribution of the training angles, generating light cones for all pixels under the viewing angle, and spirally sampling 49 points on the surface of each truncated cone compared with the training stage for achieving better rendering effect; meanwhile, as the geometric distribution of the scene is found in the training stage, the sampling range can be compressed from zero to positive infinity to the depth vicinity; finally, the decoder returns the color for each pixel position to obtain the imaging result under the visual angle.
Compared with the prior art, the invention has the advantages that:
(1) The invention adopts the mixed scene characterization of fusing the 3D Hash feature grid, the 2D plane feature grid and the MLP, and on the basis of maintaining a lower parameter of an algorithm, the training speed and the reasoning efficiency of the model are accelerated, and the new view angle synthesis quality of the nerve radiation field in a borderless scene is improved.
(2) According to the invention, the rendering equation is improved, after density distribution on the light is returned, the sampling point characteristics on the light are weighted and summed, and then the color is decoded through the MLP, so that the calculated amount in the rendering process is obviously reduced.
In a word, the method has simple principle and can achieve the purpose of synthesizing a high-quality new view angle in a borderless scene.
Drawings
FIG. 1 is a flow chart of a new view angle synthesizing method of borderless scene based on mixed nerve radiation field.
Detailed Description
The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art without the inventive effort based on the embodiments of the present invention are within the scope of protection of the present invention.
As shown in FIG. 1, the method for synthesizing the new view angle of the borderless scene based on the mixed nerve radiation field comprises the following specific implementation steps:
and step 1, transforming the position of any three-dimensional point in the borderless space to a bounded spherical area by utilizing a coordinate transformation function. Constructing a multi-resolution 3D hash feature grid on the spherical region according to the set parameter total amount, and encoding scene information with less parameter amount:
N l =N min ·b l
wherein N is max The number of nodes for the highest resolution grid, N min The node number of the grid with the lowest resolution is L, the resolution layer number is N l And b is the scale relation between adjacent resolutions for the node number of the first layer grid.
The 3D hash feature grid can effectively describe the density distribution and the color distribution of the scene, but the high-resolution 3D hash feature grid is greatly influenced by hash collision, so that the problems of rendering quality reduction and the like can be caused. The scale change of the borderless scene is large, the feature grid under a single scale cannot learn all information well, and the characterization capability of the nerve radiation field is improved by constructing the 3D-2D feature grid under different resolutions; and finally, constructing the MLP as a color and bulk density decoder, and combining Fourier direction coding, so that the MLP can return the color of the same position in space, which changes along with the observation direction.
Step 2, generating light cones which start from the light center and extend to infinity through the pixel center by using the pose of the camera, sampling a plurality of truncated cones according to inverse depth after the total sampling number is specified, and ensuring that the sampling frequency is higher in a region close to the light center and gradually reduced in a region gradually far from the light center; for each truncated cone sampled, a spiral parametric equation was constructed on its side, and 7 points were sampled uniformly on the spiral to represent the truncated cone region:
wherein t is the sampling distance, r is the radius of the light cone on the normalized plane, n is the total number of sampling points, m is the number of spiral turns, and p is the coordinates of the sampling points.
The sampling point p is converted into a bounded sphere space through a coordinate transformation function, eight corner points closest to the corner points are queried in a 3D hash feature grid according to the converted coordinates, feature vectors corresponding to the corner points are indexed through the hash function, and then the 3D features of the sampling point are obtained through tri-linear interpolation; respectively carrying out plane projection on the sampling points to three mutually orthogonal 2D plane feature grids, inquiring four corner points closest to the projection points and feature vectors corresponding to the corner points in the plane, obtaining the feature vectors of the projection points through bilinear interpolation, and finally splicing the feature vectors of the three projection points to obtain the 2D features of the sampling points:
feature 3D =tri(f 1 ,f 2 ,...f 8 )
where tri is a tri-linear interpolation function, bil is a bi-linear interpolation function, f i G for indexed 3D features i Is the indexed 2D feature. And splicing the 3D features with the 2D features, and taking the 3D features and the 2D features as mixed feature representations of sampling points, wherein feature vectors of 7 sampling points in each truncated cone are weighted and fused into a truncated cone feature vector, so that the whole truncated cone region is described.
After the feature vectors of 7 sampling points in each truncated cone are obtained through query and interpolation, calculating the average value of the spatial coordinates of the 7 sampling points, and the distance between each sampling point and the average value point, normalizing the inverse proportion of the distance as the weight of each sampling point, merging the feature vectors of the seven sampling points into one truncated cone feature vector, describing the whole truncated cone region, and participating in subsequent rendering and regression operations.
Step 3, firstly, the truncated cone feature vector on the light line passes through a layer of fully connected network, the first dimension of the result represents the volume density at the truncated cone mean point, the later dimension represents the color feature vector, the light passing rate at all truncated cone mean points is calculated by utilizing the volume density distribution on the light line, and then the weight of each truncated cone on the light line in a rendering equation is obtained:
α i =1-exp(-σ i δ i )
λ i =T i α i
wherein sigma i Is bulk density, alpha i For occupancy probability, delta i Is the interval distance of light cone, T i Is the light passing rate lambda i Is a rendering weight. In order to accelerate the optimization process of the neural radiation field, the color feature vectors are not fed into the MLP regression color, but all the color feature vectors are weighted by the weights of truncated cones on the light, a feature is generated for one pixel, and finally the feature is fed into the MLP, and the pixel color is regressed. And simultaneously, weighting the sampling distance t by using the same weight to obtain the depth corresponding to the pixel:
feature pixel =∑ i λ i f i
rgb=MLP(feature pixel )
depth=∑ i λ i t i
wherein f i Feature vector, which is truncated cone pixel For pixel characteristics, rgb is the rendered pixel color, depth is the rendered depth. The MLP forward calculation is only carried out once in the whole rendering process, so that the calculated amount is obviously reduced, and the training time of the nerve radiation field is reduced.
Step 4, establishing a loss function of an algorithm, wherein the first term is a color loss function, and directly utilizing the true color and the rendering color of the pixel to perform a mean square error:
wherein C (r) is the true color of the pixel, C coarse (r) pixel color rendered for the coarse phase, C fine (r) is the pixel color rendered for the fine phase.
The second term is a depth loss function, the pose of a camera used in the training process is estimated through SFM, meanwhile, the obtained sparse point cloud of the scene is also regarded as the geometric prior of the scene, and the sparse depth under the training pose can be obtained through projection to each view angle and used for supervising the rendering depth:
wherein D is the light generated by all the point cloud projection points, D (r) is the point cloud projection depth, and D fine (r) is the rendering depth. Inverse deep supervision is employed instead of direct use of deep supervision to reduce the impact of outliers on optimization. Because of the sparsity of the point cloud, only a few of pixels participating in the iteration are projection points of the point cloud, so that the points participating in calculating the depth loss function are few, and the rapid convergence of scene geometry is considered to help the correct convergence of the color, so that a larger weight factor is given to the depth loss. Meanwhile, in order to enable the radiation field to correctly find the optimal direction in the initial stage of training and stably converge in the later stage of training, the learning rate which changes along with the training turns is adopted, and the radiation field is gradually lifted up in five thousand rounds before training and then smoothly falls down.
Step 5, generating any observation view angle in the scene according to the distribution area of the known view angle, generating light cones for all pixels under the view angle, and spirally sampling 49 points on the surface of each truncated cone for achieving better rendering effect; meanwhile, as the geometric distribution of the scene is found in the training stage, the sampling can be guided by the distribution of the density, and the sampling range is compressed from zero to positive infinity to the depth vicinity; finally, the decoder returns the color for each pixel position to obtain the imaging result under the visual angle.
Therefore, the invention can improve the training speed and the characterization precision of the nerve radiation field by utilizing a mixed characterization mode aiming at the borderless scene, and render a high-quality new visual angle image.
What is not described in detail in the present specification belongs to the known technology of those skilled in the art. While the foregoing has been described in relation to illustrative embodiments thereof, so as to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as limited to the spirit and scope of the invention as defined and defined by the appended claims, as long as various changes are apparent to those skilled in the art, all within the scope of which the invention is defined by the appended claims.

Claims (6)

1. The novel borderless scene view angle synthesis method based on the mixed nerve radiation field is characterized by comprising the following steps of:
(1) Parameterizing the borderless space to a bounded spherical area, and utilizing a 3D hash feature grid and a 2D plane feature grid to encode scene information to construct a color-bulk density decoder taking MLP as a main body;
(2) Aiming at the feature grid established in the step (1), obtaining feature vectors of all sampling truncated cones on the light line through spiral sampling and linear interpolation on the surface of the light cone;
(3) Aiming at the sampling truncated cone feature vector obtained in the step (2), a volume rendering equation is used for obtaining the feature vector and the depth average value of each ray color, and corresponding pixel colors and depths, namely rendering results, are decoded through shallow layer MLP;
(4) Aiming at the rendering result of the step (3), monitoring a color field by using the true color of the pixel in the optimization process, and monitoring the rendering depth by using the sparse point cloud obtained by SMF;
(5) And (3) aiming at the optimized result in the step (4), and giving any camera pose, so that an imaging result under the visual angle can be rendered.
2. The borderless scene new vision angle synthesis method based on mixed nerve radiation field of claim 1, characterized in that: the specific implementation method of the step (1) is as follows:
transforming the position of any three-dimensional point in borderless space to a bounded spherical region using a coordinate transformation function:
wherein x is the three-dimensional coordinates of the sampling points, the whole borderless space is parameterized into a bounded sphere region with radius of 2 by f (x), and a 3D hash feature grid is constructed on the region:
N l =N min ·b l
wherein N is max The number of nodes for the highest resolution grid, N min The node number of the grid with the lowest resolution is L, the resolution layer number is N l B is the scale relation between adjacent resolutions for the node number of the first layer grid;
2D plane feature grids are established on planes which are orthogonal to each other, and scene details are represented in an auxiliary mode; and constructing the MLP as a color and bulk density decoder, and combining Fourier direction coding, so that the MLP can return the color of the same position in the borderless space, which changes along with the observation direction.
3. The borderless scene new vision angle synthesis method based on mixed nerve radiation field of claim 1, characterized in that: the specific implementation method of the step (2) is as follows:
generating a light cone which extends to infinity from the light center for each pixel through the pose of the camera, sampling a plurality of truncated cones according to the inverse depth, wherein the sampling frequency is higher in a region close to the light center, and the sampling frequency is gradually reduced in a region gradually far from the light center; for each truncated cone sampled, a spiral parametric equation is constructed on its side, and n points are uniformly sampled on the spiral to represent the truncated cone region:
wherein t is the distance between the sampling point and the optical center along the light direction, r is the radius of the light cone on the normalization plane, n is the total number of the sampling points, n=7 is taken here, m is the number of spiral turns, and p is the coordinate of the sampling points;
each sampling point is converted into a bounded spherical area through a coordinate transformation function, eight corner points closest to the corner points are queried in a 3D hash feature grid, feature vectors corresponding to the corner points are indexed through the hash function, and then the 3D features of the sampling points are obtained through tri-linear interpolation; for the 2D plane feature grid, the sampling points are respectively projected to three mutually orthogonal planes, four corner points closest to the projection points and feature vectors corresponding to the corner points are inquired in the planes, the feature vectors of the projection points are obtained through bilinear interpolation, and finally the feature vectors of the three projection points are fused, so that the 2D feature of the sampling points is obtained:
feature 3D =tri(f 1 ,f 2 ,...f 8 )
where tri is a tri-linear interpolation function, bil is a bi-linear interpolation function, f i G for indexed 3D features i 2D features indexed; the 3D feature is spliced with the 2D feature to be used as a feature vector representation of sampling points, after the feature vectors of 7 sampling points in each truncated cone are obtained through query and interpolation, the average value of the space coordinates of the 7 sampling points and the distance between each sampling point and the average value point are calculated, the inverse proportion of the distance is normalized to be used as the weight of each sampling point, and seven feature vectors are fused into one truncated cone feature vector to describe the whole truncated cone region.
4. The borderless scene new vision angle synthesis method based on mixed nerve radiation field of claim 1, characterized in that: the specific implementation method of the step (3) is as follows:
firstly, the truncated cone feature vector obtained in the step (2) passes through a layer of fully connected network to obtain the volume density and color feature vector at the truncated cone mean point, the volume density distribution on the light is utilized to calculate the light passing rate at all truncated cone mean points, and then the weight of each truncated cone on the light in a rendering equation is obtained:
α i =1-exp(-σ i δ i )
λ i =T i α i
wherein sigma i Is bulk density, alpha i For occupancy probability, delta i Is the interval distance of light cone, T i Is the light passing rate lambda i Is a rendering weight; and weighting all color feature vectors by using the weight, generating a feature for a pixel, finally sending the feature into the MLP, regressing the color of the pixel, and weighting by using the same weight to obtain the depth corresponding to the pixel:
feature pixel =∑ i λ i f i
rgb=MLP(feature pixel )
depth=∑ i λ i t i
wherein f i Feature vector, which is truncated cone pixel For pixel characteristics, rgb is the rendered pixel color, depth is the rendered depth.
5. The borderless scene new vision angle synthesis method based on mixed nerve radiation field of claim 1, characterized in that: the specific implementation method of the step (4) is as follows:
constructing a loss function of an algorithm, wherein the first term is a color loss function, the algorithm adopts a coarse-to-fine optimization process, pixel colors are respectively rendered in two stages, and the mean square error is formed by using the true colors of the pixels and the twice rendered colors:
wherein C (r) is the true color of the pixelColor, C coarse (r) pixel color rendered for the coarse phase, C fine (r) pixel color rendered for the fine phase;
the second term is a depth loss function, the SMF method is used for recovering the pose and obtaining sparse point clouds of the scene at the same time, and the sparse depth under the pose is obtained by projecting the sparse point clouds to each view angle and is used for supervising the rendering depth:
wherein D is the light generated by all the point cloud projection points, D (r) is the point cloud projection depth, and D fine (r) is the rendering depth.
6. The borderless scene new vision angle synthesis method based on mixed nerve radiation field of claim 1, characterized in that: the specific implementation method of the step (5) is as follows:
generating any observation view angle under the scene, generating light cones for all pixel positions under the view angle, spirally sampling 49 points on the surface of each truncated cone, guiding and sampling through the depth rendered in the optimization process, and finally, returning the color for each pixel position by a decoder to obtain an imaging result under the view angle.
CN202311018456.6A 2023-08-14 2023-08-14 Novel visual angle synthesis method for borderless scene based on mixed nerve radiation field Pending CN116977536A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311018456.6A CN116977536A (en) 2023-08-14 2023-08-14 Novel visual angle synthesis method for borderless scene based on mixed nerve radiation field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311018456.6A CN116977536A (en) 2023-08-14 2023-08-14 Novel visual angle synthesis method for borderless scene based on mixed nerve radiation field

Publications (1)

Publication Number Publication Date
CN116977536A true CN116977536A (en) 2023-10-31

Family

ID=88484964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311018456.6A Pending CN116977536A (en) 2023-08-14 2023-08-14 Novel visual angle synthesis method for borderless scene based on mixed nerve radiation field

Country Status (1)

Country Link
CN (1) CN116977536A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333609A (en) * 2023-12-01 2024-01-02 北京渲光科技有限公司 Image rendering method, network training method, device and medium
CN117611492A (en) * 2023-12-06 2024-02-27 电子科技大学 Implicit expression and sharpening method for multispectral satellite remote sensing image
CN117934728A (en) * 2024-03-21 2024-04-26 海纳云物联科技有限公司 Three-dimensional reconstruction method, device, equipment and storage medium
CN117953137A (en) * 2024-03-27 2024-04-30 哈尔滨工业大学(威海) Human body re-illumination method based on dynamic surface reflection field
CN118135122A (en) * 2024-05-06 2024-06-04 浙江大学 Unbounded scene reconstruction and new view angle synthesis method and system based on 3DGS

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333609A (en) * 2023-12-01 2024-01-02 北京渲光科技有限公司 Image rendering method, network training method, device and medium
CN117333609B (en) * 2023-12-01 2024-02-09 北京渲光科技有限公司 Image rendering method, network training method, device and medium
CN117611492A (en) * 2023-12-06 2024-02-27 电子科技大学 Implicit expression and sharpening method for multispectral satellite remote sensing image
CN117611492B (en) * 2023-12-06 2024-06-04 电子科技大学 Implicit expression and sharpening method for multispectral satellite remote sensing image
CN117934728A (en) * 2024-03-21 2024-04-26 海纳云物联科技有限公司 Three-dimensional reconstruction method, device, equipment and storage medium
CN117953137A (en) * 2024-03-27 2024-04-30 哈尔滨工业大学(威海) Human body re-illumination method based on dynamic surface reflection field
CN118135122A (en) * 2024-05-06 2024-06-04 浙江大学 Unbounded scene reconstruction and new view angle synthesis method and system based on 3DGS

Similar Documents

Publication Publication Date Title
CN116977536A (en) Novel visual angle synthesis method for borderless scene based on mixed nerve radiation field
Gadelha et al. 3d shape induction from 2d views of multiple objects
Singer et al. Text-to-4d dynamic scene generation
CN109410307B (en) Scene point cloud semantic segmentation method
CN108921926B (en) End-to-end three-dimensional face reconstruction method based on single image
Lazova et al. Control-nerf: Editable feature volumes for scene rendering and manipulation
CN110390638B (en) High-resolution three-dimensional voxel model reconstruction method
CN108876814B (en) Method for generating attitude flow image
Genova et al. Deep structured implicit functions
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN116416376A (en) Three-dimensional hair reconstruction method, system, electronic equipment and storage medium
CN115205463A (en) New visual angle image generation method, device and equipment based on multi-spherical scene expression
CN117173315A (en) Neural radiation field-based unbounded scene real-time rendering method, system and equipment
CN117953180A (en) Text-to-three-dimensional object generation method based on dual-mode latent variable diffusion
Zhu et al. Rhino: Regularizing the hash-based implicit neural representation
CN117252987B (en) Dynamic scene reconstruction method based on explicit and implicit hybrid coding
CN116134491A (en) Multi-view neuro-human prediction using implicit differentiable renderers for facial expression, body posture morphology, and clothing performance capture
Zhao et al. FlexiDreamer: Single Image-to-3D Generation with FlexiCubes
CN117372644A (en) Three-dimensional content generation method based on period implicit representation
CN113808006B (en) Method and device for reconstructing three-dimensional grid model based on two-dimensional image
CN115482368A (en) Method for editing three-dimensional scene by utilizing semantic graph
CN117083638A (en) Accelerating neural radiation field for view synthesis
Zhao et al. Challenges and Opportunities in 3D Content Generation
Li et al. Guiding 3D Digital Content Generation with Pre-Trained Diffusion Models.
Sabae et al. NoPose-NeuS: Jointly Optimizing Camera Poses with Neural Implicit Surfaces for Multi-view Reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination