CN116168137B - New view angle synthesis method, device and memory based on nerve radiation field - Google Patents

New view angle synthesis method, device and memory based on nerve radiation field Download PDF

Info

Publication number
CN116168137B
CN116168137B CN202310433953.6A CN202310433953A CN116168137B CN 116168137 B CN116168137 B CN 116168137B CN 202310433953 A CN202310433953 A CN 202310433953A CN 116168137 B CN116168137 B CN 116168137B
Authority
CN
China
Prior art keywords
mesh
network
pose
density
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310433953.6A
Other languages
Chinese (zh)
Other versions
CN116168137A (en
Inventor
邓正秋
徐振语
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Original Assignee
Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Malanshan Video Advanced Technology Research Institute Co ltd filed Critical Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Priority to CN202310433953.6A priority Critical patent/CN116168137B/en
Publication of CN116168137A publication Critical patent/CN116168137A/en
Application granted granted Critical
Publication of CN116168137B publication Critical patent/CN116168137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Computer Graphics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Architecture (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a new view angle synthesizing method, a new view angle synthesizing device and a new view angle synthesizing memory based on a nerve radiation field. The deformation network is introduced in the NeRF work, so that the effective deformation of an implicit field of an object and the synthesis of a new visual angle are realized, the similar mesh initialization can be utilized, the problem that the initialization is easy to be completely white is solved, meanwhile, the whole training time is greatly reduced due to the rapid learning of a density field, and the faster and more efficient three-dimensional reconstruction technology is realized. Different from the density network and the color network which are trained simultaneously in most NeRF works, the invention adopts the training strategy of alternative learning, optimizes the color network and the density network respectively at certain iteration times, has better effect than the two networks which are trained simultaneously, and can realize the mutual supplement and correction of the networks while reducing training parameters.

Description

New view angle synthesis method, device and memory based on nerve radiation field
Technical Field
The invention relates to the technical field of image processing, in particular to a new view angle synthesizing method, device and memory based on a nerve radiation field.
Background
Reconstructing and re-rendering 3D scenes from a set of 2D images has been a central problem in the fields of computer vision and computer graphics, which has found widespread use in AR/VR technology. With the continuous development of deep learning and neural networks, reference 1 proposes a neural radiation field technique for view synthesis and its subsequent related work, which attracts attention of many scholars, and these works mainly achieve new view angle synthesis with high-quality realistic effects by implicitly representing three-dimensional scenes with a multi-layer perceptron and adopting a volume rendering method. However, most of these works require a large number of input pictures from different viewing angles and require a long training process to obtain a high quality NeRF scene, and this huge cost limits the application of the method, which hinders the wide application of the related method. Therefore, a bottleneck technology is needed to solve a small number of image inputs from different perspectives to achieve the composition of new perspectives.
There are some related technologies to solve this problem from other ideas, for example:
1) Input method for small number of viewing angles: reference 2 proposes a new neural rendering method MVSNeRF for effectively reconstructing a neural radiation field for view synthesis. This work suggests a general deep neural network through which the radiation field can be reconstructed from three nearby input views by fast network reasoning. The method specifically comprises the steps of utilizing a plane scanning cost volume (widely used for multi-view stereo) to perform geometric sense scene reasoning, combining the plane scanning cost volume with physical-based volume rendering for reconstructing a nerve radiation field, wherein the defect is that a synthesized new view angle is limited by three input adjacent view angles, and only about 10 images can be input, so that a new image can be synthesized at a full view angle of 360 degrees.
2) For utilizing additional geometric information of an object: reference 3 proposes DS-NeRF (deep supervision neural radiation field), learning the loss of the neural radiation field with deep supervision. The input of current NeRF-like work requires images with known camera pose, which is typically estimated by Structure From Motion (SFM), while SFM can also generate sparse 3D points as a depth supervision in training, implementing new view angle synthesis techniques for small number of views through geometric constraints. But this is only applicable to processing objects or scenes with rich texture, for objects with simple texture, it is difficult to find matching feature points due to the sparse image, thus preventing the generation of sparse 3D points.
3) The method for applying the needle to the real object comprises the following steps: the most of the current paper methods are mainly suitable for learning and rendering objects of synthetic data sets, reference 4 proposes an implicit model NeRS based on surface simulation, and the neural shape representation of a closed surface different from a sphere is learned by utilizing an initial mesh (an automobile or a cuboid), so that watertight reconstruction is ensured. The disadvantage of this method is that it is limited to a real object that looks like a cuboid or car, the phase input of the image also needs to be taken at a correspondingly fixed angle, and the image effect does not appear to be real with the image generated by the volume rendering.
However, the view synthesis method based on NeRF has some problems, such as difficult acquisition of pose of training data set and real data, easy occurrence of full white during initialization of NeRF, and the like.
Reference 1: b Mildahll, PP Srinivasan, M Tancik, JT Barron, R Ramamoorthi, N ren. "NeRF: representing Scenes as Neural Radiance Fields for View Synthesis." European Conference on Computer Vision,2020.
Reference 2: chen, A., et al, "MVSNeRF: fast Generalizable Radiance Field Reconstruction from Multi-View Stereo." 2021.
Reference 3: deng, K., et al, "Depth-superior NeRF: fewer Views and Faster Training for Free." 2021.
Reference 4: zhang, jason et al, "NeRS: neural reflectance surfaces for sparse-view 3D reconstruction in the wild," Advances in Neural Information Processing Systems (2021): 29835-29847.
Disclosure of Invention
Aiming at the technical problems in the related art, the invention provides a new view angle synthesis method based on a nerve radiation field, which comprises the following steps:
s1, acquiring an input image to be trained, and acquiring an initial pose of a mesh image corresponding to the image from a triangular surface patch library according to the input image as the initial pose of the input image;
s2, inputting the input image to be trained and the initial pose into NeRF for training, and monitoring by taking the mesh as a correct label in the first round of iteration, so that the density network firstly learns an initial geometric shape, and a loss function is used for solving cross entropy of an output value of the density network and a correct label value of the density;
s3, in the second round of iteration, introducing a deformation network into a density network of the NeRF, wherein the deformation network consists of a deformation part and a correction part, respectively outputting deformation quantity and correction value of a 3D point, then inputting coordinates of the input point and the deformation quantity into the density network for training to obtain coarse density, and then adding the correction value to the coarse density to obtain learned accurate density; the loss function adopts a reprojection error function of the image generated by rendering and the input image;
s4, in the third iteration, the input pose of the training image is optimized while the density network is continuously learned, the rotation angle R of the camera and the translation distance T are parameterized, and then the rotation angle R and the translation distance T are converted into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose;
and S5, in a fourth round of iteration, fixing the optimized pose, and training a density network and a color network of the NeRF.
Specifically, step S1 specifically includes: and searching each trained input image in the triangular patch mesh library by utilizing the mesh information, and quickly searching the mesh and pose most similar to the training object by calculating and comparing the score of the iou.
Specifically, if the search result shows that the input image corresponds to different similar meshes, a voting mode is adopted, the mesh with the highest score is selected as an initial mesh, and the pose of the image with the highest iou value in the mesh is used as the initial pose of the input image.
Specifically, the step S5 specifically includes: in the fourth iteration, fixing the optimized pose, fixing the density network of the NeRF, training the color network of the NeRF, fixing the color network of the NeRF after the error is converged, and training the density network of the NeRF, so that the training is performed alternately until the error is converged.
Specifically, in the step S2, the group trunk for obtaining the density of the points from the mesh specifically includes: converting the obj file into a single-layer non-hole mesh representation form; after obtaining a single-layer hole-free mesh, calculating the occupancy rate of any 3D point; the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is changed to be positioned inside the mesh; let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0.
In a second aspect, another embodiment of the present invention discloses a new view angle synthesizing device based on nerve radiation, comprising the following units:
the device comprises an initial mesh and pose acquisition unit, a pose acquisition unit and a pose processing unit, wherein the initial mesh and pose acquisition unit is used for acquiring an input image to be trained, and acquiring an initial pose of a mesh image corresponding to the image from a triangular surface patch library according to the input image as an initial pose of the input image;
the first round of iterative training unit is used for inputting the input image to be trained and the initial pose into NeRF for training, and in the first round of iteration, the mesh is used as a correct label for supervision, so that the density network firstly learns an initial geometric shape, and a loss function is used for solving cross entropy of the output value of the density network and the correct label value of the density;
the second round of iterative training unit is used for introducing a deformation network into a NeRF density network in the second round of iteration, wherein the deformation network consists of a deformation part and a correction part, the deformation quantity and the correction value of the 3D point are respectively output, then the coordinates of the input point are added with the deformation quantity to be input into the density network for training to obtain coarse density, and the coarse density is added with the correction value to obtain learned accurate density; the loss function adopts a reprojection error function of the image generated by rendering and the input image;
the third round of iterative training unit is used for optimizing the input pose of the training image while continuously learning the density network in the third round of iteration, parameterizing the rotation angle R of the camera and the translation distance T, and then converting the parameterized rotation angle R into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose;
and the fourth-round iterative training unit is used for fixing the optimized pose and training the density network and the color network of the NeRF in the fourth-round iteration.
Specifically, the initial mesh and pose acquisition unit specifically includes: searching each trained input image in a triangular patch mesh library by utilizing mesh information, and quickly searching the mesh and pose most similar to a training object by calculating and comparing the score of the iou; if the search result shows that the input image corresponds to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as the initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.
Specifically, the fourth-wheel iterative training unit specifically includes: in the fourth iteration, fixing the optimized pose, fixing the density network of the NeRF, training the color network of the NeRF, fixing the color network of the NeRF after the error is converged, and training the density network of the NeRF, so that the training is performed alternately until the error is converged.
Specifically, the group trunk of the density of the points obtained from the mesh in the first round of iterative training unit specifically includes: converting the obj file into a single-layer non-hole mesh representation form; after obtaining a single-layer hole-free mesh, calculating the occupancy rate of any 3D point; the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is changed to be positioned inside the mesh; let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0.
In a third aspect, another embodiment of the present invention discloses a non-volatile memory having instructions stored thereon, which when executed by a processor, are configured to implement a new visual angle synthesizing method based on a neural radiation field as described above.
According to the new view angle synthesizing method based on the nerve radiation field, an input image is firstly searched from a database containing about 30 meshes, the database contains near 100 pieces of image data which are rendered by each mesh at different distances and view angles, and the pose of the mesh which is most similar to the input image is obtained as an initial pose. On the one hand, by introducing a similar mesh as an initialization, only a small number (about 10 sheets) of images are required as inputs when using the technique, without any additional input information. On the other hand, the input image is not required to have accurate pose: according to the invention, a mini database is constructed, so that the mesh and the view angle which are similar to the input image are searched, and other complicated pose calculation modes are not needed.
The deformation network is introduced in the NeRF work, so that the effective deformation of an implicit field of an object and the synthesis of a new visual angle are realized, the similar mesh initialization can be utilized, the problem that the initialization is easy to be completely white is solved, meanwhile, the whole training time is greatly reduced due to the rapid learning of a density field, and the faster and more efficient three-dimensional reconstruction technology is realized.
Further, unlike most NeRF work training density network and color network at the same time, the invention adopts the training strategy of alternative learning, optimizes the color network and the density network respectively at certain iteration times, has better effect than training two networks at the same time, and can realize the mutual supplement and correction of the networks while reducing training parameters.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a new view angle synthesizing method based on a nerve radiation field provided by an embodiment of the invention;
FIG. 2 is an overall frame diagram of a new view angle synthesis method based on a neural radiation field provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a new view angle synthesizing device based on a nerve radiation field according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a new view angle synthesizing apparatus based on a nerve radiation field according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.
Example 1
Referring to fig. 1 and 2, the present embodiment discloses a new view angle synthesizing method based on a nerve radiation field, which includes the following steps:
s1, acquiring an input image to be trained, and acquiring an initial pose of a mesh image corresponding to the image from a triangular surface patch library according to the input image as the initial pose of the input image;
considering that the pose of real image data is difficult to acquire, the embodiment firstly utilizes related equipment to acquire, and then constructs a triangular patch mesh library aiming at the acquired image, wherein the triangular patch mesh library comprises 30 meshes with common shapes, and the triangular patch mesh library comprises nearly 100 pieces of image data which are rendered by each mesh at different distances and different visual angles.
Specifically, about 30 meshes with common shapes can be selected from the shape net data set to form a mini database, and the mini database contains near 100 pieces of image data which are rendered by each mesh at different distances and different visual angles.
Specifically, step S1 specifically includes: and searching each trained input image in the triangular patch mesh library by utilizing mesh information, quickly searching the mesh and pose most similar to the training object by calculating and comparing the scores of the ious, if the search result shows that the input images correspond to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as the initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.
S2, inputting the input image to be trained and the initial pose into NeRF for training, and monitoring by taking the mesh as a correct label in the first round of iteration, so that the density network firstly learns an initial geometric shape, and a loss function is used for solving cross entropy of an output value of the density network and a correct label value of the density;
the prior 5000 iterations are input into 3D point coordinates, and the similar mesh is used as a correct label for supervision, so that the density network learns an initial geometric shape, and the loss function obtains cross entropy of the output value of the density network and the correct label value of the density.
Regarding the correct labeling of the density of points if obtained from the mesh, since most of the current cad model files are obj format files, the obj files are first converted into a single-layer hole-free mesh representation. This example uses the TSDF method. After obtaining the single-layer hole-free mesh, the occupancy O calculation of any 3D point (X) is performed below, where 1 indicates that the 3D point belongs to an object, and 0 indicates that the 3D point does not belong to an object:
O(X) : R3− > [0, 1] (1)
the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is changed to be positioned inside the mesh. Let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0. At the same time, we normalize the output of the density network to a value of 0 to 1.
S3, in the second round of iteration, introducing a deformation network into a density network of the NeRF, wherein the deformation network consists of a deformation part and a correction part, respectively outputting deformation quantity and correction value of a 3D point, then inputting coordinates of the input point and the deformation quantity into the density network for training to obtain coarse density, and then adding the correction value to the coarse density to obtain learned accurate density; the loss function adopts a reprojection error function of the image generated by rendering and the input image;
after 5000 iterations, introducing a deformation network into a density network, wherein the network consists of a deformation part and a correction part, respectively outputting deformation quantity and correction value of 3D points, then inputting coordinates of the input points into the density network together with the deformation quantity for training to obtain coarse density, and finally adding correction value into the coarse density to obtain learned accurate density, wherein the point coordinates are required to be input again at the end of the density network to obtain more position information. Then, in the iteration from 5000 times to 7500 times, the density network is continuously learned, the network contains deformation network, which is different from the former 5000 times, the loss function adopts the reprojection error function of the image and the input image generated by rendering, and the deformation quantity and the correction value are required to be regularized, so that the deformation is ensured to be as small as possible.
S4, in three iterations, continuously learning a density network and simultaneously optimizing the input pose of a training image, parameterizing the rotation angle R of a camera and the translation distance T, and then converting the rotation angle R and the translation distance T into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose;
7500 iterating to 1w iterating, continuously learning the density network, optimizing the input pose of the training image, parameterizing the rotation angle R of the camera and the translation distance T, and converting into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose.
And S5, in a fourth round of iteration, fixing the optimized pose, and training a density network and a color network of the NeRF.
Specifically, in this embodiment, the density network is fixed first, the color network is trained, and after the error converges, the color network is fixed later, and the density network is trained, so that the training is performed alternately until the error converges.
And then fixing the pose after 1w of iterative training, and beginning to learn the color network and the density network, wherein the density network outputs a feature layer into the color network so that the learning of the color network utilizes the related density information, and in addition, the visual angle is required to be input into the color network. Because the two networks to be learned in the technology can have the relation of chicken eggs, and are difficult to train simultaneously, we firstly fix the density network and train the color network, and then fix the color network and train the density network after the error converges, so that the training is performed alternately until the error converges. This process is a process in which errors are continuously reduced, so that an optimization result can be ensured. The loss function here remains consistent with the 5000 times to 7500 loss function.
After training, the testing process is relatively simple, namely, a new visual angle is input in an interpolation mode, and then an image under the visual angle is rendered by utilizing a body.
According to the new view angle synthesizing method based on the nerve radiation field, aiming at an input image, a database containing about 30 meshes is used for searching, the database contains near 100 pieces of image data which are rendered by each mesh at different distances and view angles, and the pose of the mesh which is most similar to the input image is obtained as an initial pose. On the one hand, by introducing a similar mesh as an initialization, only a small number (about 10 sheets) of images are required as inputs when using the technique, without any additional input information. On the other hand, the input image is not required to have accurate pose: in the embodiment, a mini database is constructed to search the mesh and the view angle similar to the input image, and other complicated pose calculation modes are not needed.
The deformation network is introduced in the NeRF work, so that the effective deformation of an implicit field of an object and the synthesis of a new visual angle are realized, the similar mesh initialization can be utilized, the problem that the initialization is easy to be completely white is solved, meanwhile, the whole training time is greatly reduced due to the rapid learning of a density field, and the faster and more efficient three-dimensional reconstruction technology is realized.
Further, unlike most NeRF works which train the density network and the color network simultaneously, the embodiment adopts the training strategy of alternative learning, and optimizes the color network and the density network respectively at certain iteration times, so that the effect is better than that of training the two networks simultaneously, and the mutual supplement and correction of the networks can be realized while the training parameters are reduced.
Example two
Referring to fig. 3, the present embodiment discloses a new view angle synthesizing device based on nerve radiation, which includes the following units:
the device comprises an initial mesh and pose acquisition unit, a pose acquisition unit and a pose processing unit, wherein the initial mesh and pose acquisition unit is used for acquiring an input image to be trained, and acquiring an initial pose of a mesh image corresponding to the image from a triangular surface patch library according to the input image as an initial pose of the input image;
considering that the pose of real image data is difficult to acquire, the embodiment firstly utilizes related equipment to acquire, and then constructs a triangular patch mesh library aiming at the acquired image, wherein the triangular patch mesh library comprises 30 meshes with common shapes, and the triangular patch mesh library comprises nearly 100 pieces of image data which are rendered by each mesh at different distances and different visual angles.
Specifically, about 30 meshes with common shapes can be selected from the shape net data set to form a mini database, and the mini database contains near 100 pieces of image data which are rendered by each mesh at different distances and different visual angles.
Specifically, the initial mesh and pose acquisition unit specifically includes: and searching each trained input image in the triangular patch mesh library by utilizing mesh information, quickly searching the mesh and pose most similar to the training object by calculating and comparing the scores of the ious, if the search result shows that the input images correspond to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as the initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.
The first round of iterative training unit is used for inputting the input image to be trained and the initial pose into NeRF for training, and in the first round of iteration, the mesh is used as a correct label for supervision, so that the density network firstly learns an initial geometric shape, and a loss function is used for solving cross entropy of the output value of the density network and the correct label value of the density;
the prior 5000 iterations are input into 3D point coordinates, and the similar mesh is used as a correct label for supervision, so that the density network learns an initial geometric shape, and the loss function obtains cross entropy of the output value of the density network and the correct label value of the density.
Regarding the correct labeling of the density of the points if obtained from the mesh, since most of the current cad model files are obj format files, the obj files are first converted into a single-layer hole-free mesh representation. This example uses the TSDF method. After obtaining the single-layer hole-free mesh, the occupancy O calculation of any 3D point (X) is performed below, where 1 indicates that the 3D point belongs to an object, and 0 indicates that the 3D point does not belong to an object:
O(X) : R3− > [0, 1] (1)
the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is changed to be positioned inside the mesh. Let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0. At the same time, we normalize the output of the density network to a value of 0 to 1.
The second round of iterative training unit is used for introducing a deformation network into a NeRF density network in the second round of iteration, wherein the deformation network consists of a deformation part and a correction part, the deformation quantity and the correction value of the 3D point are respectively output, then the coordinates of the input point are added with the deformation quantity to be input into the density network for training to obtain coarse density, and the coarse density is added with the correction value to obtain learned accurate density; the loss function adopts a reprojection error function of the image generated by rendering and the input image;
after 5000 iterations, introducing a deformation network into a density network, wherein the network consists of a deformation part and a correction part, respectively outputting deformation quantity and correction value of 3D points, then inputting coordinates of the input points into the density network together with the deformation quantity for training to obtain coarse density, and finally adding correction value into the coarse density to obtain learned accurate density, wherein the point coordinates are required to be input again at the end of the density network to obtain more position information. Then, in the iteration from 5000 times to 7500 times, the density network is continuously learned, the network contains deformation network, which is different from the former 5000 times, the loss function adopts the reprojection error function of the image and the input image generated by rendering, and the deformation quantity and the correction value are required to be regularized, so that the deformation is ensured to be as small as possible.
The third round of iterative training unit is used for optimizing the input pose of the training image while continuously learning the density network in the third round of iteration, parameterizing the rotation angle R of the camera and the translation distance T, and then converting the parameterized rotation angle R into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose;
7500 iterating to 1w iterating, continuously learning the density network, optimizing the input pose of the training image, parameterizing the rotation angle R of the camera and the translation distance T, and converting into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose.
And the fourth-round iterative training unit is used for fixing the optimized pose and training the density network and the color network of the NeRF in the fourth-round iteration.
Specifically, in this embodiment, the density network is fixed first, the color network is trained, and after the error converges, the color network is fixed later, and the density network is trained, so that the training is performed alternately until the error converges.
And then fixing the pose after 1w of iterative training, and beginning to learn the color network and the density network, wherein the density network outputs a feature layer into the color network so that the learning of the color network utilizes the related density information, and in addition, the visual angle is required to be input into the color network. Because the two networks to be learned in the technology can have the relation of chicken eggs, and are difficult to train simultaneously, we firstly fix the density network and train the color network, and then fix the color network and train the density network after the error converges, so that the training is performed alternately until the error converges. This process is a process in which errors are continuously reduced, so that an optimization result can be ensured. The loss function here remains consistent with the 5000 times to 7500 loss function.
After training, the testing process is relatively simple, namely, a new visual angle is input in an interpolation mode, and then an image under the visual angle is rendered by utilizing a body.
According to the new view angle synthesizing method based on the nerve radiation field, aiming at an input image, a database containing about 30 meshes is used for searching, the database contains near 100 pieces of image data which are rendered by each mesh at different distances and view angles, and the pose of the mesh which is most similar to the input image is obtained as an initial pose. On the one hand, by introducing a similar mesh as an initialization, only a small number (about 10 sheets) of images are required as inputs when using the technique, without any additional input information. On the other hand, the input image is not required to have accurate pose: in the embodiment, a mini database is constructed to search the mesh and the view angle similar to the input image, and other complicated pose calculation modes are not needed.
The deformation network is introduced in the NeRF work, so that the effective deformation of an implicit field of an object and the synthesis of a new visual angle are realized, the similar mesh initialization can be utilized, the problem that the initialization is easy to be completely white is solved, meanwhile, the whole training time is greatly reduced due to the rapid learning of a density field, and the faster and more efficient three-dimensional reconstruction technology is realized.
Further, unlike most NeRF works which train the density network and the color network simultaneously, the embodiment adopts the training strategy of alternative learning, and optimizes the color network and the density network respectively at certain iteration times, so that the effect is better than that of training the two networks simultaneously, and the mutual supplement and correction of the networks can be realized while the training parameters are reduced.
Example III
Referring to fig. 4, fig. 4 is a schematic structural diagram of a new view angle synthesizing apparatus based on a nerve radiation field according to the present embodiment. The new view angle synthesizing device 20 based on a neural radiation field of this embodiment includes a processor 21, a memory 22, and a computer program stored in the memory 22 and executable on the processor 21. The steps of the above-described method embodiments are implemented by the processor 21 when executing the computer program. Alternatively, the processor 21 may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 22 and executed by the processor 21 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program in the new view angle synthesizing device 20 based on neural radiation fields. For example, the computer program may be divided into modules in the second embodiment, and specific functions of each module refer to the working process of the apparatus described in the foregoing embodiment, which is not described herein.
The new view angle synthesizing apparatus 20 based on the neural radiation field may include, but is not limited to, a processor 21, a memory 22. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a new view angle synthesizing device 20 based on a neural radiation field, and is not meant to be limiting of the new view angle synthesizing device 20 based on a neural radiation field, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the new view angle synthesizing device 20 based on a neural radiation field may also include an input-output device, a network access device, a bus, etc.
The processor 21 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 21 is a control center of the new view angle synthesizing apparatus 20 based on the nerve radiation field, and connects various parts of the entire new view angle synthesizing apparatus 20 based on the nerve radiation field by using various interfaces and lines.
The memory 22 may be used to store the computer program and/or module, and the processor 21 may implement the various functions of the neural radiation field based new angle of view synthesizing device 20 by executing or executing the computer program and/or module stored in the memory 22 and invoking data stored in the memory 22. The memory 22 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the modules/units integrated by the new view angle synthesizing device 20 based on neural radiation fields may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a separate product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the method embodiments described above when executed by the processor 21. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (8)

1. A new view angle synthesizing method based on a nerve radiation field is characterized in that: the method comprises the following steps:
s1, acquiring an input image to be trained, and acquiring an initial pose of a mesh image which is most similar to the input image from a triangular surface patch library according to the input image as the initial pose of the input image;
s2, inputting the input image to be trained and the initial pose into NeRF for training, and monitoring by taking the mesh as a correct label in the first round of iteration, so that the density network firstly learns an initial geometric shape, and a loss function is used for solving cross entropy of an output value of the density network and a correct label value of the density;
s3, in the second round of iteration, introducing a deformation network into a density network of the NeRF, wherein the deformation network consists of a deformation part and a correction part, respectively outputting deformation quantity and correction value of a 3D point, then inputting coordinates of the input point and the deformation quantity into the density network for training to obtain coarse density, and then adding the correction value to the coarse density to obtain learned accurate density; the loss function adopts a reprojection error function of the image generated by rendering and the input image;
s4, in the third iteration, the input pose of the training image is optimized while the density network is continuously learned, the rotation angle R of the camera and the translation distance T are parameterized, and then the rotation angle R and the translation distance T are converted into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose;
s5, in a fourth round of iteration, fixing the optimized pose, and training a density network and a color network of the NeRF; the step S5 specifically comprises the following steps: in the fourth iteration, fixing the optimized pose, fixing the density network of the NeRF, training the color network of the NeRF, fixing the color network of the NeRF after the error is converged, and training the density network of the NeRF, so that the training is performed alternately until the error is converged.
2. The method according to claim 1, characterized in that: the step S1 specifically comprises the following steps: and searching each trained input image in the triangular patch mesh library by utilizing the mesh information, and quickly searching the mesh and pose most similar to the training object by calculating and comparing the score of the iou.
3. The method according to claim 2, characterized in that: if the search result shows that the input image corresponds to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as an initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.
4. The method according to claim 1, characterized in that:
in the step S2, the obtaining the correct label group trunk of the point density from the mesh specifically includes: converting the obj file into a single-layer non-hole mesh representation form; after obtaining a single-layer hole-free mesh, calculating the occupancy rate of any 3D point; the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is positioned inside the mesh; let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0.
5. A new view angle synthesizing device based on nerve radiation, which is characterized in that: it comprises the following units:
the device comprises an initial mesh and pose acquisition unit, a pose acquisition unit and a pose processing unit, wherein the initial mesh and pose acquisition unit is used for acquiring an input image to be trained, and acquiring the initial pose of a mesh image which is most similar to the input image from a triangular surface patch library according to the input image as the initial pose of the input image;
the first round of iterative training unit is used for inputting the input image to be trained and the initial pose into NeRF for training, and in the first round of iteration, the mesh is used as a correct label for supervision, so that the density network firstly learns an initial geometric shape, and a loss function is used for solving cross entropy of the output value of the density network and the correct label value of the density;
the second round of iterative training unit is used for introducing a deformation network into a NeRF density network in the second round of iteration, wherein the deformation network consists of a deformation part and a correction part, the deformation quantity and the correction value of the 3D point are respectively output, then the coordinates of the input point are added with the deformation quantity to be input into the density network for training to obtain coarse density, and the coarse density is added with the correction value to obtain learned accurate density; the loss function adopts a reprojection error function of the image generated by rendering and the input image;
the third round of iterative training unit is used for optimizing the input pose of the training image while continuously learning the density network in the third round of iteration, parameterizing the rotation angle R of the camera and the translation distance T, and then converting the parameterized rotation angle R into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose;
a fourth-round iterative training unit, configured to fix the optimized pose and train the density network and the color network of the NeRF in a fourth-round iteration; the fourth-wheel iterative training unit specifically comprises: in the fourth iteration, fixing the optimized pose, fixing the density network of the NeRF, training the color network of the NeRF, fixing the color network of the NeRF after the error is converged, and training the density network of the NeRF, so that the training is performed alternately until the error is converged.
6. The apparatus according to claim 5, wherein: the initial mesh and pose acquisition unit specifically comprises: searching each trained input image in a triangular patch mesh library by utilizing mesh information, and quickly searching the mesh and pose most similar to a training object by calculating and comparing the score of the iou; if the search result shows that the input image corresponds to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as an initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.
7. The apparatus according to claim 5, wherein: the method for obtaining the correct label group trunk of the point density from the mesh in the first round of iterative training unit comprises the following steps: converting the obj file into a single-layer non-hole mesh representation form; after obtaining a single-layer hole-free mesh, calculating the occupancy rate of any 3D point; the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is positioned inside the mesh; let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0.
8. A non-volatile memory having instructions stored thereon, characterized by: the instructions, when executed by a processor, for implementing a new visual angle synthesis method based on a neural radiation field according to any one of claims 1-4.
CN202310433953.6A 2023-04-21 2023-04-21 New view angle synthesis method, device and memory based on nerve radiation field Active CN116168137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310433953.6A CN116168137B (en) 2023-04-21 2023-04-21 New view angle synthesis method, device and memory based on nerve radiation field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310433953.6A CN116168137B (en) 2023-04-21 2023-04-21 New view angle synthesis method, device and memory based on nerve radiation field

Publications (2)

Publication Number Publication Date
CN116168137A CN116168137A (en) 2023-05-26
CN116168137B true CN116168137B (en) 2023-07-11

Family

ID=86411741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310433953.6A Active CN116168137B (en) 2023-04-21 2023-04-21 New view angle synthesis method, device and memory based on nerve radiation field

Country Status (1)

Country Link
CN (1) CN116168137B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021069A (en) * 2019-04-15 2019-07-16 武汉大学 A kind of method for reconstructing three-dimensional model based on grid deformation
CN112991493A (en) * 2021-04-09 2021-06-18 华南理工大学 Gray level image coloring method based on VAE-GAN and mixed density network
WO2022100379A1 (en) * 2020-11-16 2022-05-19 华南理工大学 Object attitude estimation method and system based on image and three-dimensional model, and medium
CN114998548A (en) * 2022-05-31 2022-09-02 北京非十科技有限公司 Image reconstruction method and system
CN115512036A (en) * 2022-09-28 2022-12-23 浙江大学 Novel editable view synthesis method based on intrinsic nerve radiation field
CN115620085A (en) * 2022-10-11 2023-01-17 南京大学 Neural radiation field rapid optimization method based on image pyramid
CN115909015A (en) * 2023-02-15 2023-04-04 苏州浪潮智能科技有限公司 Construction method and device of deformable nerve radiation field network

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240005590A1 (en) * 2020-11-16 2024-01-04 Google Llc Deformable neural radiance fields
EP4150581A1 (en) * 2020-11-16 2023-03-22 Google LLC Inverting neural radiance fields for pose estimation
CN112613609B (en) * 2020-12-18 2022-05-06 中山大学 Nerve radiation field enhancement method based on joint pose optimization
CN113706714B (en) * 2021-09-03 2024-01-05 中科计算技术创新研究院 New view angle synthesizing method based on depth image and nerve radiation field
CN114972632A (en) * 2022-04-21 2022-08-30 阿里巴巴达摩院(杭州)科技有限公司 Image processing method and device based on nerve radiation field
CN114863035B (en) * 2022-07-05 2022-09-20 南京理工大学 Implicit representation-based three-dimensional human motion capturing and generating method
CN115690324A (en) * 2022-11-15 2023-02-03 广州中思人工智能科技有限公司 Neural radiation field reconstruction optimization method and device based on point cloud
CN115797571B (en) * 2023-02-03 2023-04-14 天津大学 New visual angle synthesis method of 3D stylized scene
CN115951784B (en) * 2023-03-08 2023-05-12 南京理工大学 Method for capturing and generating motion of wearing human body based on double nerve radiation fields

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021069A (en) * 2019-04-15 2019-07-16 武汉大学 A kind of method for reconstructing three-dimensional model based on grid deformation
WO2022100379A1 (en) * 2020-11-16 2022-05-19 华南理工大学 Object attitude estimation method and system based on image and three-dimensional model, and medium
CN112991493A (en) * 2021-04-09 2021-06-18 华南理工大学 Gray level image coloring method based on VAE-GAN and mixed density network
CN114998548A (en) * 2022-05-31 2022-09-02 北京非十科技有限公司 Image reconstruction method and system
CN115512036A (en) * 2022-09-28 2022-12-23 浙江大学 Novel editable view synthesis method based on intrinsic nerve radiation field
CN115620085A (en) * 2022-10-11 2023-01-17 南京大学 Neural radiation field rapid optimization method based on image pyramid
CN115909015A (en) * 2023-02-15 2023-04-04 苏州浪潮智能科技有限公司 Construction method and device of deformable nerve radiation field network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于对抗学习与深度估计的车辆检测系统;徐源;翟春艳;王国良;;辽宁石油化工大学学报(03);83-90 *
融合全局与局部视角的光场超分辨率重建;邓武;张旭东;熊伟;汪义志;;计算机应用研究(05);1549-1555 *

Also Published As

Publication number Publication date
CN116168137A (en) 2023-05-26

Similar Documents

Publication Publication Date Title
Gadelha et al. 3d shape induction from 2d views of multiple objects
Flynn et al. Deepstereo: Learning to predict new views from the world's imagery
CN112927359B (en) Three-dimensional point cloud completion method based on deep learning and voxels
CN113962858B (en) Multi-view depth acquisition method
CN114863038B (en) Real-time dynamic free visual angle synthesis method and device based on explicit geometric deformation
CN116958453B (en) Three-dimensional model reconstruction method, device and medium based on nerve radiation field
CN110942512B (en) Indoor scene reconstruction method based on meta-learning
WO2022198684A1 (en) Methods and systems for training quantized neural radiance field
CN116071278A (en) Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium
Kang et al. Competitive learning of facial fitting and synthesis using uv energy
CN116416376A (en) Three-dimensional hair reconstruction method, system, electronic equipment and storage medium
US20240095999A1 (en) Neural radiance field rig for human 3d shape and appearance modelling
Rabby et al. BeyondPixels: A comprehensive review of the evolution of neural radiance fields
CN114463408A (en) Free viewpoint image generation method, device, equipment and storage medium
CN115984949B (en) Low-quality face image recognition method and equipment with attention mechanism
CN116168137B (en) New view angle synthesis method, device and memory based on nerve radiation field
CN116385667A (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN114742950B (en) Ship shape 3D digital reconstruction method and device, storage medium and electronic equipment
CN115239559A (en) Depth map super-resolution method and system for fusion view synthesis
CN115375839A (en) Multi-view hair modeling method and system based on deep learning
Li et al. Point-Based Neural Scene Rendering for Street Views
Salvador et al. Multi-view video representation based on fast Monte Carlo surface reconstruction
CN111932670A (en) Three-dimensional human body self-portrait reconstruction method and system based on single RGBD camera
CN113034671B (en) Traffic sign three-dimensional reconstruction method based on binocular vision
CN115994966B (en) Multi-view image generation method and device, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant