CN116168137B

CN116168137B - New view angle synthesis method, device and memory based on nerve radiation field

Info

Publication number: CN116168137B
Application number: CN202310433953.6A
Authority: CN
Inventors: 邓正秋; 徐振语
Original assignee: Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Current assignee: Hunan Malanshan Video Advanced Technology Research Institute Co ltd
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-11
Anticipated expiration: 2043-04-21
Also published as: CN116168137A

Abstract

The invention provides a new view angle synthesizing method, a new view angle synthesizing device and a new view angle synthesizing memory based on a nerve radiation field. The deformation network is introduced in the NeRF work, so that the effective deformation of an implicit field of an object and the synthesis of a new visual angle are realized, the similar mesh initialization can be utilized, the problem that the initialization is easy to be completely white is solved, meanwhile, the whole training time is greatly reduced due to the rapid learning of a density field, and the faster and more efficient three-dimensional reconstruction technology is realized. Different from the density network and the color network which are trained simultaneously in most NeRF works, the invention adopts the training strategy of alternative learning, optimizes the color network and the density network respectively at certain iteration times, has better effect than the two networks which are trained simultaneously, and can realize the mutual supplement and correction of the networks while reducing training parameters.

Description

New view angle synthesis method, device and memory based on nerve radiation field

Technical Field

The invention relates to the technical field of image processing, in particular to a new view angle synthesizing method, device and memory based on a nerve radiation field.

Background

Reconstructing and re-rendering 3D scenes from a set of 2D images has been a central problem in the fields of computer vision and computer graphics, which has found widespread use in AR/VR technology. With the continuous development of deep learning and neural networks, reference 1 proposes a neural radiation field technique for view synthesis and its subsequent related work, which attracts attention of many scholars, and these works mainly achieve new view angle synthesis with high-quality realistic effects by implicitly representing three-dimensional scenes with a multi-layer perceptron and adopting a volume rendering method. However, most of these works require a large number of input pictures from different viewing angles and require a long training process to obtain a high quality NeRF scene, and this huge cost limits the application of the method, which hinders the wide application of the related method. Therefore, a bottleneck technology is needed to solve a small number of image inputs from different perspectives to achieve the composition of new perspectives.

There are some related technologies to solve this problem from other ideas, for example:

1) Input method for small number of viewing angles: reference 2 proposes a new neural rendering method MVSNeRF for effectively reconstructing a neural radiation field for view synthesis. This work suggests a general deep neural network through which the radiation field can be reconstructed from three nearby input views by fast network reasoning. The method specifically comprises the steps of utilizing a plane scanning cost volume (widely used for multi-view stereo) to perform geometric sense scene reasoning, combining the plane scanning cost volume with physical-based volume rendering for reconstructing a nerve radiation field, wherein the defect is that a synthesized new view angle is limited by three input adjacent view angles, and only about 10 images can be input, so that a new image can be synthesized at a full view angle of 360 degrees.

2) For utilizing additional geometric information of an object: reference 3 proposes DS-NeRF (deep supervision neural radiation field), learning the loss of the neural radiation field with deep supervision. The input of current NeRF-like work requires images with known camera pose, which is typically estimated by Structure From Motion (SFM), while SFM can also generate sparse 3D points as a depth supervision in training, implementing new view angle synthesis techniques for small number of views through geometric constraints. But this is only applicable to processing objects or scenes with rich texture, for objects with simple texture, it is difficult to find matching feature points due to the sparse image, thus preventing the generation of sparse 3D points.

3) The method for applying the needle to the real object comprises the following steps: the most of the current paper methods are mainly suitable for learning and rendering objects of synthetic data sets, reference 4 proposes an implicit model NeRS based on surface simulation, and the neural shape representation of a closed surface different from a sphere is learned by utilizing an initial mesh (an automobile or a cuboid), so that watertight reconstruction is ensured. The disadvantage of this method is that it is limited to a real object that looks like a cuboid or car, the phase input of the image also needs to be taken at a correspondingly fixed angle, and the image effect does not appear to be real with the image generated by the volume rendering.

However, the view synthesis method based on NeRF has some problems, such as difficult acquisition of pose of training data set and real data, easy occurrence of full white during initialization of NeRF, and the like.

Reference 1: b Mildahll, PP Srinivasan, M Tancik, JT Barron, R Ramamoorthi, N ren. "NeRF: representing Scenes as Neural Radiance Fields for View Synthesis." European Conference on Computer Vision,2020.

Reference 2: chen, A., et al, "MVSNeRF: fast Generalizable Radiance Field Reconstruction from Multi-View Stereo." 2021.

Reference 3: deng, K., et al, "Depth-superior NeRF: fewer Views and Faster Training for Free." 2021.

Reference 4: zhang, jason et al, "NeRS: neural reflectance surfaces for sparse-view 3D reconstruction in the wild," Advances in Neural Information Processing Systems (2021): 29835-29847.

Disclosure of Invention

Aiming at the technical problems in the related art, the invention provides a new view angle synthesis method based on a nerve radiation field, which comprises the following steps:

s1, acquiring an input image to be trained, and acquiring an initial pose of a mesh image corresponding to the image from a triangular surface patch library according to the input image as the initial pose of the input image;

s2, inputting the input image to be trained and the initial pose into NeRF for training, and monitoring by taking the mesh as a correct label in the first round of iteration, so that the density network firstly learns an initial geometric shape, and a loss function is used for solving cross entropy of an output value of the density network and a correct label value of the density;

s3, in the second round of iteration, introducing a deformation network into a density network of the NeRF, wherein the deformation network consists of a deformation part and a correction part, respectively outputting deformation quantity and correction value of a 3D point, then inputting coordinates of the input point and the deformation quantity into the density network for training to obtain coarse density, and then adding the correction value to the coarse density to obtain learned accurate density; the loss function adopts a reprojection error function of the image generated by rendering and the input image;

s4, in the third iteration, the input pose of the training image is optimized while the density network is continuously learned, the rotation angle R of the camera and the translation distance T are parameterized, and then the rotation angle R and the translation distance T are converted into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose;

and S5, in a fourth round of iteration, fixing the optimized pose, and training a density network and a color network of the NeRF.

Specifically, step S1 specifically includes: and searching each trained input image in the triangular patch mesh library by utilizing the mesh information, and quickly searching the mesh and pose most similar to the training object by calculating and comparing the score of the iou.

Specifically, if the search result shows that the input image corresponds to different similar meshes, a voting mode is adopted, the mesh with the highest score is selected as an initial mesh, and the pose of the image with the highest iou value in the mesh is used as the initial pose of the input image.

Specifically, the step S5 specifically includes: in the fourth iteration, fixing the optimized pose, fixing the density network of the NeRF, training the color network of the NeRF, fixing the color network of the NeRF after the error is converged, and training the density network of the NeRF, so that the training is performed alternately until the error is converged.

Specifically, in the step S2, the group trunk for obtaining the density of the points from the mesh specifically includes: converting the obj file into a single-layer non-hole mesh representation form; after obtaining a single-layer hole-free mesh, calculating the occupancy rate of any 3D point; the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is changed to be positioned inside the mesh; let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0.

In a second aspect, another embodiment of the present invention discloses a new view angle synthesizing device based on nerve radiation, comprising the following units:

the device comprises an initial mesh and pose acquisition unit, a pose acquisition unit and a pose processing unit, wherein the initial mesh and pose acquisition unit is used for acquiring an input image to be trained, and acquiring an initial pose of a mesh image corresponding to the image from a triangular surface patch library according to the input image as an initial pose of the input image;

the first round of iterative training unit is used for inputting the input image to be trained and the initial pose into NeRF for training, and in the first round of iteration, the mesh is used as a correct label for supervision, so that the density network firstly learns an initial geometric shape, and a loss function is used for solving cross entropy of the output value of the density network and the correct label value of the density;

the second round of iterative training unit is used for introducing a deformation network into a NeRF density network in the second round of iteration, wherein the deformation network consists of a deformation part and a correction part, the deformation quantity and the correction value of the 3D point are respectively output, then the coordinates of the input point are added with the deformation quantity to be input into the density network for training to obtain coarse density, and the coarse density is added with the correction value to obtain learned accurate density; the loss function adopts a reprojection error function of the image generated by rendering and the input image;

the third round of iterative training unit is used for optimizing the input pose of the training image while continuously learning the density network in the third round of iteration, parameterizing the rotation angle R of the camera and the translation distance T, and then converting the parameterized rotation angle R into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose;

and the fourth-round iterative training unit is used for fixing the optimized pose and training the density network and the color network of the NeRF in the fourth-round iteration.

Specifically, the initial mesh and pose acquisition unit specifically includes: searching each trained input image in a triangular patch mesh library by utilizing mesh information, and quickly searching the mesh and pose most similar to a training object by calculating and comparing the score of the iou; if the search result shows that the input image corresponds to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as the initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.

Specifically, the fourth-wheel iterative training unit specifically includes: in the fourth iteration, fixing the optimized pose, fixing the density network of the NeRF, training the color network of the NeRF, fixing the color network of the NeRF after the error is converged, and training the density network of the NeRF, so that the training is performed alternately until the error is converged.

Specifically, the group trunk of the density of the points obtained from the mesh in the first round of iterative training unit specifically includes: converting the obj file into a single-layer non-hole mesh representation form; after obtaining a single-layer hole-free mesh, calculating the occupancy rate of any 3D point; the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is changed to be positioned inside the mesh; let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0.

In a third aspect, another embodiment of the present invention discloses a non-volatile memory having instructions stored thereon, which when executed by a processor, are configured to implement a new visual angle synthesizing method based on a neural radiation field as described above.

According to the new view angle synthesizing method based on the nerve radiation field, an input image is firstly searched from a database containing about 30 meshes, the database contains near 100 pieces of image data which are rendered by each mesh at different distances and view angles, and the pose of the mesh which is most similar to the input image is obtained as an initial pose. On the one hand, by introducing a similar mesh as an initialization, only a small number (about 10 sheets) of images are required as inputs when using the technique, without any additional input information. On the other hand, the input image is not required to have accurate pose: according to the invention, a mini database is constructed, so that the mesh and the view angle which are similar to the input image are searched, and other complicated pose calculation modes are not needed.

The deformation network is introduced in the NeRF work, so that the effective deformation of an implicit field of an object and the synthesis of a new visual angle are realized, the similar mesh initialization can be utilized, the problem that the initialization is easy to be completely white is solved, meanwhile, the whole training time is greatly reduced due to the rapid learning of a density field, and the faster and more efficient three-dimensional reconstruction technology is realized.

Further, unlike most NeRF work training density network and color network at the same time, the invention adopts the training strategy of alternative learning, optimizes the color network and the density network respectively at certain iteration times, has better effect than training two networks at the same time, and can realize the mutual supplement and correction of the networks while reducing training parameters.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a new view angle synthesizing method based on a nerve radiation field provided by an embodiment of the invention;

FIG. 2 is an overall frame diagram of a new view angle synthesis method based on a neural radiation field provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a new view angle synthesizing device based on a nerve radiation field according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a new view angle synthesizing apparatus based on a nerve radiation field according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the invention, fall within the scope of protection of the invention.

Example 1

Referring to fig. 1 and 2, the present embodiment discloses a new view angle synthesizing method based on a nerve radiation field, which includes the following steps:

considering that the pose of real image data is difficult to acquire, the embodiment firstly utilizes related equipment to acquire, and then constructs a triangular patch mesh library aiming at the acquired image, wherein the triangular patch mesh library comprises 30 meshes with common shapes, and the triangular patch mesh library comprises nearly 100 pieces of image data which are rendered by each mesh at different distances and different visual angles.

Specifically, about 30 meshes with common shapes can be selected from the shape net data set to form a mini database, and the mini database contains near 100 pieces of image data which are rendered by each mesh at different distances and different visual angles.

Specifically, step S1 specifically includes: and searching each trained input image in the triangular patch mesh library by utilizing mesh information, quickly searching the mesh and pose most similar to the training object by calculating and comparing the scores of the ious, if the search result shows that the input images correspond to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as the initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.

the prior 5000 iterations are input into 3D point coordinates, and the similar mesh is used as a correct label for supervision, so that the density network learns an initial geometric shape, and the loss function obtains cross entropy of the output value of the density network and the correct label value of the density.

Regarding the correct labeling of the density of points if obtained from the mesh, since most of the current cad model files are obj format files, the obj files are first converted into a single-layer hole-free mesh representation. This example uses the TSDF method. After obtaining the single-layer hole-free mesh, the occupancy O calculation of any 3D point (X) is performed below, where 1 indicates that the 3D point belongs to an object, and 0 indicates that the 3D point does not belong to an object:

O(X) : R3− > [0, 1] (1)

the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is changed to be positioned inside the mesh. Let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0. At the same time, we normalize the output of the density network to a value of 0 to 1.

after 5000 iterations, introducing a deformation network into a density network, wherein the network consists of a deformation part and a correction part, respectively outputting deformation quantity and correction value of 3D points, then inputting coordinates of the input points into the density network together with the deformation quantity for training to obtain coarse density, and finally adding correction value into the coarse density to obtain learned accurate density, wherein the point coordinates are required to be input again at the end of the density network to obtain more position information. Then, in the iteration from 5000 times to 7500 times, the density network is continuously learned, the network contains deformation network, which is different from the former 5000 times, the loss function adopts the reprojection error function of the image and the input image generated by rendering, and the deformation quantity and the correction value are required to be regularized, so that the deformation is ensured to be as small as possible.

S4, in three iterations, continuously learning a density network and simultaneously optimizing the input pose of a training image, parameterizing the rotation angle R of a camera and the translation distance T, and then converting the rotation angle R and the translation distance T into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose;

7500 iterating to 1w iterating, continuously learning the density network, optimizing the input pose of the training image, parameterizing the rotation angle R of the camera and the translation distance T, and converting into a transformation matrix for multiplying the matrix of the initial pose to obtain the optimized pose.

Specifically, in this embodiment, the density network is fixed first, the color network is trained, and after the error converges, the color network is fixed later, and the density network is trained, so that the training is performed alternately until the error converges.

And then fixing the pose after 1w of iterative training, and beginning to learn the color network and the density network, wherein the density network outputs a feature layer into the color network so that the learning of the color network utilizes the related density information, and in addition, the visual angle is required to be input into the color network. Because the two networks to be learned in the technology can have the relation of chicken eggs, and are difficult to train simultaneously, we firstly fix the density network and train the color network, and then fix the color network and train the density network after the error converges, so that the training is performed alternately until the error converges. This process is a process in which errors are continuously reduced, so that an optimization result can be ensured. The loss function here remains consistent with the 5000 times to 7500 loss function.

After training, the testing process is relatively simple, namely, a new visual angle is input in an interpolation mode, and then an image under the visual angle is rendered by utilizing a body.

According to the new view angle synthesizing method based on the nerve radiation field, aiming at an input image, a database containing about 30 meshes is used for searching, the database contains near 100 pieces of image data which are rendered by each mesh at different distances and view angles, and the pose of the mesh which is most similar to the input image is obtained as an initial pose. On the one hand, by introducing a similar mesh as an initialization, only a small number (about 10 sheets) of images are required as inputs when using the technique, without any additional input information. On the other hand, the input image is not required to have accurate pose: in the embodiment, a mini database is constructed to search the mesh and the view angle similar to the input image, and other complicated pose calculation modes are not needed.

Further, unlike most NeRF works which train the density network and the color network simultaneously, the embodiment adopts the training strategy of alternative learning, and optimizes the color network and the density network respectively at certain iteration times, so that the effect is better than that of training the two networks simultaneously, and the mutual supplement and correction of the networks can be realized while the training parameters are reduced.

Example two

Referring to fig. 3, the present embodiment discloses a new view angle synthesizing device based on nerve radiation, which includes the following units:

Specifically, the initial mesh and pose acquisition unit specifically includes: and searching each trained input image in the triangular patch mesh library by utilizing mesh information, quickly searching the mesh and pose most similar to the training object by calculating and comparing the scores of the ious, if the search result shows that the input images correspond to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as the initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.

Regarding the correct labeling of the density of the points if obtained from the mesh, since most of the current cad model files are obj format files, the obj files are first converted into a single-layer hole-free mesh representation. This example uses the TSDF method. After obtaining the single-layer hole-free mesh, the occupancy O calculation of any 3D point (X) is performed below, where 1 indicates that the 3D point belongs to an object, and 0 indicates that the 3D point does not belong to an object:

O(X) : R3− > [0, 1] (1)

Example III

Referring to fig. 4, fig. 4 is a schematic structural diagram of a new view angle synthesizing apparatus based on a nerve radiation field according to the present embodiment. The new view angle synthesizing device 20 based on a neural radiation field of this embodiment includes a processor 21, a memory 22, and a computer program stored in the memory 22 and executable on the processor 21. The steps of the above-described method embodiments are implemented by the processor 21 when executing the computer program. Alternatively, the processor 21 may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 22 and executed by the processor 21 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program in the new view angle synthesizing device 20 based on neural radiation fields. For example, the computer program may be divided into modules in the second embodiment, and specific functions of each module refer to the working process of the apparatus described in the foregoing embodiment, which is not described herein.

The new view angle synthesizing apparatus 20 based on the neural radiation field may include, but is not limited to, a processor 21, a memory 22. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a new view angle synthesizing device 20 based on a neural radiation field, and is not meant to be limiting of the new view angle synthesizing device 20 based on a neural radiation field, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the new view angle synthesizing device 20 based on a neural radiation field may also include an input-output device, a network access device, a bus, etc.

The processor 21 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 21 is a control center of the new view angle synthesizing apparatus 20 based on the nerve radiation field, and connects various parts of the entire new view angle synthesizing apparatus 20 based on the nerve radiation field by using various interfaces and lines.

The memory 22 may be used to store the computer program and/or module, and the processor 21 may implement the various functions of the neural radiation field based new angle of view synthesizing device 20 by executing or executing the computer program and/or module stored in the memory 22 and invoking data stored in the memory 22. The memory 22 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Wherein the modules/units integrated by the new view angle synthesizing device 20 based on neural radiation fields may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a separate product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the method embodiments described above when executed by the processor 21. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A new view angle synthesizing method based on a nerve radiation field is characterized in that: the method comprises the following steps:

s1, acquiring an input image to be trained, and acquiring an initial pose of a mesh image which is most similar to the input image from a triangular surface patch library according to the input image as the initial pose of the input image;

s5, in a fourth round of iteration, fixing the optimized pose, and training a density network and a color network of the NeRF; the step S5 specifically comprises the following steps: in the fourth iteration, fixing the optimized pose, fixing the density network of the NeRF, training the color network of the NeRF, fixing the color network of the NeRF after the error is converged, and training the density network of the NeRF, so that the training is performed alternately until the error is converged.

2. The method according to claim 1, characterized in that: the step S1 specifically comprises the following steps: and searching each trained input image in the triangular patch mesh library by utilizing the mesh information, and quickly searching the mesh and pose most similar to the training object by calculating and comparing the score of the iou.

3. The method according to claim 2, characterized in that: if the search result shows that the input image corresponds to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as an initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.

4. The method according to claim 1, characterized in that:

in the step S2, the obtaining the correct label group trunk of the point density from the mesh specifically includes: converting the obj file into a single-layer non-hole mesh representation form; after obtaining a single-layer hole-free mesh, calculating the occupancy rate of any 3D point; the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is positioned inside the mesh; let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0.

5. A new view angle synthesizing device based on nerve radiation, which is characterized in that: it comprises the following units:

the device comprises an initial mesh and pose acquisition unit, a pose acquisition unit and a pose processing unit, wherein the initial mesh and pose acquisition unit is used for acquiring an input image to be trained, and acquiring the initial pose of a mesh image which is most similar to the input image from a triangular surface patch library according to the input image as the initial pose of the input image;

a fourth-round iterative training unit, configured to fix the optimized pose and train the density network and the color network of the NeRF in a fourth-round iteration; the fourth-wheel iterative training unit specifically comprises: in the fourth iteration, fixing the optimized pose, fixing the density network of the NeRF, training the color network of the NeRF, fixing the color network of the NeRF after the error is converged, and training the density network of the NeRF, so that the training is performed alternately until the error is converged.

6. The apparatus according to claim 5, wherein: the initial mesh and pose acquisition unit specifically comprises: searching each trained input image in a triangular patch mesh library by utilizing mesh information, and quickly searching the mesh and pose most similar to a training object by calculating and comparing the score of the iou; if the search result shows that the input image corresponds to different similar meshes, adopting a voting mode, selecting the mesh with the highest score as an initial mesh, and taking the pose of the image with the highest iou value in the mesh as the initial pose of the input image.

7. The apparatus according to claim 5, wherein: the method for obtaining the correct label group trunk of the point density from the mesh in the first round of iterative training unit comprises the following steps: converting the obj file into a single-layer non-hole mesh representation form; after obtaining a single-layer hole-free mesh, calculating the occupancy rate of any 3D point; the specific process of calculation is that a ray is sent out from the current point X, a plurality of intersection points of the ray and the mesh are judged, if the ray and the mesh have an even number of intersection points, the point is judged to be positioned outside the mesh, otherwise, the point is positioned inside the mesh; let the occupancy of the points located inside the mesh be 1 and the occupancy of the points located outside the mesh be 0.

8. A non-volatile memory having instructions stored thereon, characterized by: the instructions, when executed by a processor, for implementing a new visual angle synthesis method based on a neural radiation field according to any one of claims 1-4.