CN114882158B - Method, apparatus, device and readable medium for NERF optimization based on attention mechanism - Google Patents

Method, apparatus, device and readable medium for NERF optimization based on attention mechanism Download PDF

Info

Publication number
CN114882158B
CN114882158B CN202210610962.3A CN202210610962A CN114882158B CN 114882158 B CN114882158 B CN 114882158B CN 202210610962 A CN202210610962 A CN 202210610962A CN 114882158 B CN114882158 B CN 114882158B
Authority
CN
China
Prior art keywords
sampling points
light beam
sampling
color
rgb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210610962.3A
Other languages
Chinese (zh)
Other versions
CN114882158A (en
Inventor
王鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210610962.3A priority Critical patent/CN114882158B/en
Publication of CN114882158A publication Critical patent/CN114882158A/en
Application granted granted Critical
Publication of CN114882158B publication Critical patent/CN114882158B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2012Colour editing, changing, or manipulating; Use of colour codes

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Image Generation (AREA)

Abstract

The invention provides a method, a device, equipment and a readable medium for NERF optimization based on an attention mechanism, wherein the method comprises the following steps: acquiring a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and setting parameters for processing pixels in the pictures by taking a pixel patch as a unit; processing the coordinate dimension of the light beam according to parameters, and coding the coordinate information of all pixel points in 1 patch once by using an attention mechanism; uniformly sampling a plurality of sampling points in the light beam, and obtaining RGB values of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam; and performing volume rendering synthesis on the plurality of sampling points to obtain a color RGB, selecting a fine sampling point based on the probability of contribution of each point to the beam color, and performing volume rendering synthesis on the fine sampling point to obtain a color RGB value. By using the scheme of the invention, the model training and reasoning speed can be increased, and the rendering effect of the model can be improved.

Description

Method, apparatus, device and readable medium for NERF optimization based on attention mechanism
Technical Field
The present invention relates to the field of computers, and more particularly to a method, apparatus, device and readable medium for NERF optimization based on an attention mechanism.
Background
Rendering in computer graphics refers to simulating, with a computer, the optical process of taking a photograph in the real physical world. Typically, when studying rendering techniques, 3D scene modeling and representation are also included, based on which rendering can be done only if the 3D scene is implemented in a computer. Rendering techniques, which are widely used at present, are implemented based on related methods of computer graphics, such as rasterization, light projection, and light/path tracking, which simulate the motion process of light after 3D modeling is completed, and display the colors generated in the motion in a screen. With the increasing demand for image quality, such as industries of games, movies, etc., algorithms such as ray/path tracking are continuously accurate, and the industry also carries out calculation acceleration based on hardware such as GPU, but the problem between accuracy and efficiency still plagues numerous applications.
In recent years, computer vision technology based on deep learning has been developed, such as target tracking/image segmentation. In 2019-2020, computer graphics and deep learning (neural networks) have seen many excellent efforts, a significant portion of which are directed to the reconstruction of 3D scenes and their rendering, such as the occuppancy Field in 2019 and deep sdf, NERF in 2020, etc.
NERF (neural radiation field) is among the most influential of the current state of the art, and many NERF-based improvements have exploded over the two years. NERF realizes implicit reconstruction and multi-angle rendering of a 3D scene, and has potential to use 3D modeling rendering and visualization in digital twin scenes, digital human face and gesture reconstruction, animation and other meta universe digital scenes. The basic idea of NERF is to use a neural network as an implicit expression of a 3D scene to replace the traditional modes of point cloud, grid, voxel, TSDF and the like, and simultaneously, projection images at any angle and any position can be directly rendered through the network. The main work includes: 1) A method is proposed for expressing complex geometry + material continuous scenes with a 5D neural radiation field (Neural Radiance Field) parameterized with an MLP network; 2) An improved micro-renderable method based on classical voxel Rendering (Volume Rendering) is proposed, which is capable of obtaining RGB images by micro-renderable and targets this as an optimization. This section contains an acceleration strategy employing hierarchical sampling to allocate the capacity of the MLP to the visible content area. The main training and reasoning process is as follows: 1. a NERF model is created (MLP model initialization). 2. Acquiring and preprocessing light beams, wherein the dimensions of the light beams are (N, ro+rd, H, W, 3), N represents the total number of data set samples, HW represents resolution, ro represents a light beam starting point, rd represents a light beam direction, and 3 represents 3D coordinates; in the beam dimension, the RGB colors corresponding to each beam are added, i.e. the dimension is changed to (N, ro+rd+rgb, H, W, 3). 3. Selecting batch strip beam information, uniformly sampling on the beam, calculating RGB and opacity a of each sampling point by using the defined MLP, and performing volume rendering on the beam points to synthesize colors; re-fine sampling is performed according to the opacity weights of the table carried by the sampling points, the RGB and opacity a are still calculated by using the MLP, and the newly sampled beam spots are subjected to volume rendering to synthesize colors. The above is forward reasoning, the loss used by back propagation is that the RGB image information added in the light beam is gt, and the RGB image information is compared with the calculated RGB image information to obtain loss and back propagation is achieved. Attention mechanisms find numerous applications in the fields of computer vision and NLP, the core idea of which is to focus on creating a more intimate link between related quantities. The attentive mechanism is a natural ability of the human brain. When we see a picture, we first sweep the picture quickly and then lock the target area that needs to be focused on. The input to the attention mechanism is a plurality of correlation quantities and the output is the probability that the correlation quantity contributes to the whole.
In the prior art, faseNerf is represented, the NERF reasoning process is accelerated by a storage and replacement calculation mode, and the storage and replacement calculation is the most effective means on the basis of not changing an algorithm, but most of the present neural networks do reasoning based on GPU, so that the data is not economical and practical enough to be stored in a video memory, and the memory communication overhead is relatively large. The basic idea of FastNeRF is to pre-store the output values of all NeRF characterization functions, so that depth model calculation and table lookup are not needed during rendering. But the original NeRF is a 5D coordinate input, even with 1024 resolutions per dimension, the voxel characteristics of 1024T need to be saved. In order to reduce the values that need to be saved to a scale that modern graphics cards can handle, fastNeRF proposes to use the characteristics of scene rendering to decompose the NeRF model into two characterization networks, namely voxel 3D position and projection 2D view angle, to calculate separately, and to recombine to form voxel color features.
In the prior art, a neural network is used for replacing a process of calculating numerical integration represented by AutoInt, namely a backward propagation process is used for replacing a forward propagation network input end value to solve, the problem that a plurality of rays are calculated one by one still cannot be solved, a 1080P image is taken as an example (1920 x 1080), and 200 ten thousand rays need to be calculated; autoInt regards the projection of a ray in the volume rendering as a constant integral and defines a corresponding neural network G representing the integration process, and then derives the neural network G to obtain a corresponding derivative network D. Obviously, the neural networks G, D have common network parameters. AutoInt first trains the derivative network D and then substitutes the optimized parameters into the integral function network G. Given the starting point of the projection line, the volume projection can be calculated by calculating the difference of the neural network function G at two points, i.e. two forward calculations of G determine the projection value.
Disclosure of Invention
In view of the above, an object of the embodiments of the present invention is to provide a method, apparatus, device and readable medium for NERF optimization based on attention mechanism, by using the technical solution of the present invention, the speed of model training and reasoning can be increased, and the rendering effect of the model can be improved.
With the above object in view, according to one aspect of the present invention, there is provided a method for NERF optimization based on an attention mechanism, comprising the steps of:
acquiring a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and setting parameters for processing pixels in the pictures by taking a pixel patch as a unit;
processing the coordinate dimension of the light beam according to parameters, and coding the coordinate information of all pixel points in 1 patch once by using an attention mechanism;
uniformly sampling a plurality of sampling points in the light beam, and obtaining RGB values of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam;
and performing volume rendering synthesis on the plurality of sampling points to obtain a color RGB, selecting a fine sampling point based on the probability of contribution of each point to the beam color, and performing volume rendering synthesis on the fine sampling point to obtain a color RGB value.
According to one embodiment of the present invention, setting parameters that process pixels in a picture in pixel patch units includes:
the parameters of the pixel patch are set to 5*5, where 5*5 represents that 25 pixels in the picture are processed as one pixel patch by multiplying 5 pixels in the horizontal direction by 5 pixels in the vertical direction.
According to one embodiment of the present invention, uniformly sampling a plurality of sampling points in a light beam, and obtaining an RGB value of each of the plurality of sampling points according to information of a coordinate dimension of the light beam includes:
uniformly sampling 64 sampling points on the light beam processed by the MLP model;
and putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the attention mechanism of the characterization vector splicing into the MLP model to obtain the RGB value of one sampling point, and repeating the steps until the RGB values of 64 sampling points are obtained.
According to one embodiment of the present invention, performing volume rendering synthesis on a plurality of sampling points to obtain a color RGB, selecting a fine sampling point based on a probability that each point contributes to a beam color, and performing volume rendering synthesis on the fine sampling point to obtain a value of the color RGB includes:
performing volume rendering on 64 sampling points on the light beam to synthesize a color RGB, and calculating a probability value of each contribution of the 64 sampling points to the color of the light beam by using an attention mechanism;
sequencing the calculated probability values, and selecting 16 sampling points with highest probability values;
taking 16 sampling points as centers to acquire 8 sampling points respectively to obtain 128 fine sampling points;
the RGB values of the 128 fine sampling points are obtained, and the fine sampling points are subjected to volume rendering synthesis to obtain the RGB values of the colors.
According to another aspect of the present invention there is also provided an apparatus for NERF optimization based on an attention mechanism, the apparatus comprising:
the setting module is configured to acquire a plurality of pictures shot at different positions in the 3D scene and information of the pictures, and set parameters for processing pixels in the pictures by taking pixel patch as a unit;
the processing module is configured to process the coordinate dimension of the light beam according to parameters and encode the coordinate information of all pixel points in the 1 patch once by using an attention mechanism;
the calculating module is configured to uniformly sample a plurality of sampling points in the light beam, and RGB values of each sampling point in the plurality of sampling points are obtained according to information of coordinate dimensions of the light beam;
the synthesizing module is configured to perform volume rendering synthesis on the plurality of sampling points to obtain a color RGB, select a fine sampling point based on the probability of contribution of each point to the beam color, and perform volume rendering synthesis on the fine sampling point to obtain a value of the color RGB.
According to one embodiment of the invention, the setting module is further configured to:
the parameters of the pixel patch are set to 5*5, where 5*5 represents that 25 pixels in the picture are processed as one pixel patch by multiplying 5 pixels in the horizontal direction by 5 pixels in the vertical direction.
According to one embodiment of the invention, the computing module is further configured to:
uniformly sampling 64 sampling points on the light beam processed by the MLP model;
and putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the attention mechanism of the characterization vector splicing into the MLP model to obtain the RGB value of one sampling point, and repeating the steps until the RGB values of 64 sampling points are obtained.
According to one embodiment of the invention, the synthesis module is further configured to:
performing volume rendering on 64 sampling points on the light beam to synthesize a color RGB, and calculating a probability value of each contribution of the 64 sampling points to the color of the light beam by using an attention mechanism;
sequencing the calculated probability values, and selecting 16 sampling points with highest probability values;
taking 16 sampling points as centers to acquire 8 sampling points respectively to obtain 128 fine sampling points;
the RGB values of the 128 fine sampling points are obtained, and the fine sampling points are subjected to volume rendering synthesis to obtain the RGB values of the colors.
In another aspect of the embodiments of the present invention, there is also provided a computer apparatus including:
at least one processor; and
and a memory storing computer instructions executable on the processor, the instructions when executed by the processor performing the steps of any of the methods described above.
In another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of any of the methods described above.
The invention has the following beneficial technical effects: according to the NERF optimization method based on the attention mechanism, provided by the embodiment of the invention, the information of a plurality of pictures and pictures shot at different positions in a 3D scene is obtained, and parameters for processing pixels in the pictures by taking pixel patch as a unit are set; processing the coordinate dimension of the light beam according to parameters, and coding the coordinate information of all pixel points in 1 patch once by using an attention mechanism; uniformly sampling a plurality of sampling points in the light beam, and obtaining RGB values of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam; the technical scheme of synthesizing the color RGB by volume rendering is that the color RGB is synthesized by volume rendering is carried out on a plurality of sampling points, fine sampling points are selected based on the probability of contribution of each point to the color of the light beam, and the color RGB value is obtained by volume rendering on the fine sampling points, so that the model training and reasoning speed can be increased, and the model rendering effect can be improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method of attention-based NERF optimization in accordance with one embodiment of the invention;
FIG. 2 is a schematic diagram of an apparatus for attention-based NERF optimization in accordance with one embodiment of the present invention;
FIG. 3 is a schematic diagram of a computer device according to one embodiment of the invention;
fig. 4 is a schematic diagram of a computer-readable storage medium according to one embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
With the above object in mind, a first aspect of embodiments of the present invention proposes an embodiment of a method for NERF optimization based on an attention mechanism. Fig. 1 shows a schematic flow chart of the method.
As shown in fig. 1, the method may include the steps of:
s1, acquiring a plurality of pictures taken at different positions in a 3D scene and information of the pictures, and setting parameters for processing pixels in the pictures by taking a pixel patch as a unit.
Firstly, 100 pictures shot at different shooting positions in a 3D scene and corresponding camera pose information are acquired. Initializing a NERF model, namely an MLP (fully connected neural network) model, performing beam processing, wherein the number of initial beams is still (N, ro+rd, H, W, 3), so that direction and coordinate information on each pixel point can be conveniently obtained, and in order to reduce the number of projected light rays in a 3D scene, a strategy of using pixel patches is proposed, for example, 5*5 pixel points are regarded as one patch, 5*5 represents that 5 pixels in the transverse direction and 5 pixels in the longitudinal direction in a picture are multiplied by 25 pixels in total to be processed as one pixel patch, and the direction of light projection, namely, roi, rdi is determined by using each coordinate point in the patch and a light source projection point.
S2, processing the coordinate dimensions of the light beam according to parameters, and encoding the coordinate information of all pixel points in 1 patch once by using an attention mechanism.
The coordinate dimension is then adjusted to be the dimension of (N, ro+rd,25, H/5,W/5, 3), wherein 25 represents the number of pixels of a patch, the data can also be set according to the requirement, then the patch is used for subsequent network training, at this time, n×h/5*W/5 is taken as the number of all light beams, the patch is as 4096 (ro+rd) x 25×3, the attention mechanism (two layers of full connection) is used to encode the coordinate information of 25 pixels once, then the encoded vector replaces the previous single-point coordinate information, and the data output of one patch is 4096× (ro+rd) x 1*3 (that is, the information of one patch point represents the information of 25 pixels).
S3, uniformly sampling a plurality of sampling points in the light beam, and obtaining RGB values of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam.
The method comprises the steps of uniformly sampling 64 sampling points on a light beam processed by an MLP model, taking note that the light beam is not an actual light beam, fusing the direction and coordinate information of all pixel points in a patch, obtaining each sampling point 64 according to uniform sampling, inputting the sampling points into an MLP network to obtain the output of patch 4-dimensional coordinates (transparency and RGB) on each sampling point, namely, putting a 3D coordinate into a fully connected neural network to obtain a transparency a+ one characterization vector of patch 25 outputs according to the NERF input processing condition, then putting the direction information obtained in the characterization vector splicing attention mechanism into the RGB values of patch 25 outputs obtained through full connection, and repeating the steps 64 times to obtain the RGB values of 64 sampling points, wherein each sampling point has 25 pixels.
S4, performing volume rendering synthesis on the plurality of sampling points to obtain a color RGB, selecting a fine sampling point based on the probability of contribution of each point to the light beam color, and performing volume rendering synthesis on the fine sampling point to obtain a color RGB value.
The 64 sampling points on the light beam are subjected to volume rendering to synthesize the color RGB, a probability value of each point is determined by using an attention mechanism for the points of coarse sampling on the light beam, the input of attention is the transparency value of the pixel points in the patch, and the output is a probability. Then forming finer sampling on the projection light according to the probability, specifically rendering 16 sampling points with high probability, and taking the sampling points as the center to sample 8 sampling points respectively, wherein 128 sampling points are taken in total. The RGB values of the 128 fine sampling points are obtained, and the fine sampling points are subjected to volume rendering synthesis to obtain the RGB values of the colors. Experiments prove that the method can improve the training and reasoning speed by about 20 times on the premise of not losing the precision.
The technical scheme of the invention mainly aims to reduce the number of projected light rays in a 3D scene, forms a pixel patch based on the theory of the continuity of physical materials, uses an attention mechanism to complete the rendering of all pixel points, uses the attention mechanism after the rough sampling stage is completed for single projected light rays, evaluates the contribution rate to the color, and accordingly carries out fine sampling to complete single-point color calculation.
By the technical scheme, the model training and reasoning speed can be increased, and the rendering effect of the model can be improved.
According to the technical scheme, the number of the projected light rays in the 3D scene is reduced, a strategy using a pixel patch is provided, for example, 5*5 pixel points are regarded as one patch, and the direction of light ray projection, namely, the roi and rdi, are determined by using each coordinate point in the patch and a light source projection point; the method comprises the steps of encoding coordinates and directions by using an attention mechanism, replacing previous single-point coordinate information with the encoded vector, determining that the input of an MLP network is already established, obtaining a transparency and a supplementary vector through full connection, wherein the transparency is transparency information that a transparency vector represents pixel points in a patch range; the supplementary vector is merged into the direction information encoded by the attention mechanism to obtain an RGB vector; the RGB information at this time also represents RGB information of pixel points within the patch range. According to the method, fusion coding is carried out on each pixel in the patch on the coordinates and the directions by using an attention mechanism, information on relevant positions and directions is obtained, and then RGB and transparency information of pixel points in the range of the continuous material patch are deduced at one time by using the fitting capacity of the MLP.
Secondly, estimating the contribution rate of single light and single projection light to the color by using an attention mechanism after the coarse sampling stage is finished, and performing fine sampling to finish single-point color calculation according to the contribution rate; obtaining RGB and transparency information of all points in the patch after one time of inference by using MLP; information of all points on the 64 sampling patch can be obtained by coarse sampling, a probability value of each point is determined for the points of the coarse sampling on the ray by using an attention mechanism, the input of attention is a transparency value of a pixel point in the patch, and the output is a probability. And then sampling on the projection light according to finer probability, and finally calculating to obtain the color value on each pixel point by using a volume rendering method in NERF.
The attention mechanism method provided by the invention can greatly reduce the number of the projection light rays on the premise of keeping the precision unchanged, thereby reducing the time required by training and reasoning, and the attention mechanism can also be used to accord with the condition that adjacent materials in the 3D space have the same color expression and color correlation with higher probability.
In a preferred embodiment of the present invention, setting parameters that process pixels in a picture in units of pixel patch includes:
the parameters of the pixel patch are set to 5*5, where 5*5 represents that 25 pixels in the picture are processed as one pixel patch by multiplying 5 pixels in the horizontal direction by 5 pixels in the vertical direction.
In a preferred embodiment of the present invention, uniformly sampling a plurality of sampling points in a light beam, and obtaining an RGB value of each of the plurality of sampling points according to information of a coordinate dimension of the light beam includes:
uniformly sampling 64 sampling points on the light beam processed by the MLP model;
and putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the attention mechanism of the characterization vector splicing into the MLP model to obtain the RGB value of one sampling point, and repeating the steps until the RGB values of 64 sampling points are obtained.
In a preferred embodiment of the present invention, performing volume rendering on a plurality of sampling points to synthesize color RGB, selecting a fine sampling point based on a probability that each point contributes to a beam color, and performing volume rendering on the fine sampling point to obtain a value of color RGB includes:
performing volume rendering on 64 sampling points on the light beam to synthesize a color RGB, and calculating a probability value of each contribution of the 64 sampling points to the color of the light beam by using an attention mechanism;
sequencing the calculated probability values, and selecting 16 sampling points with highest probability values;
taking 16 sampling points as centers to acquire 8 sampling points respectively to obtain 128 fine sampling points;
the RGB values of the 128 fine sampling points are obtained, and the fine sampling points are subjected to volume rendering synthesis to obtain the RGB values of the colors.
The invention fully utilizes the attention mechanism and the characteristic of continuous physical material in space, changes the previous process of encoding one position information by the MLP network to form RGB and transparent value a thereof into a plurality of positions and simultaneously calculates the RGB and the transparent value a.
It should be noted that, it will be understood by those skilled in the art that all or part of the procedures in implementing the methods of the above embodiments may be implemented by a computer program to instruct related hardware, and the above program may be stored in a computer readable storage medium, and the program may include the procedures of the embodiments of the above methods when executed. Wherein the storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like. The computer program embodiments described above may achieve the same or similar effects as any of the method embodiments described above.
Furthermore, the method disclosed according to the embodiment of the present invention may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. When executed by a CPU, performs the functions defined above in the methods disclosed in the embodiments of the present invention.
With the above object in mind, in a second aspect of embodiments of the present invention, there is provided an apparatus for NERF optimization based on an attention mechanism, as shown in fig. 2, an apparatus 200 includes:
the setting module is configured to acquire a plurality of pictures shot at different positions in the 3D scene and information of the pictures, and set parameters for processing pixels in the pictures by taking pixel patch as a unit;
the processing module is configured to process the coordinate dimension of the light beam according to parameters and encode the coordinate information of all pixel points in the 1 patch once by using an attention mechanism;
the calculating module is configured to uniformly sample a plurality of sampling points in the light beam, and RGB values of each sampling point in the plurality of sampling points are obtained according to information of coordinate dimensions of the light beam;
the synthesizing module is configured to perform volume rendering synthesis on the plurality of sampling points to obtain a color RGB, select a fine sampling point based on the probability of contribution of each point to the beam color, and perform volume rendering synthesis on the fine sampling point to obtain a value of the color RGB.
In a preferred embodiment of the invention, the setting module is further configured to:
the parameters of the pixel patch are set to 5*5, where 5*5 represents that 25 pixels in the picture are processed as one pixel patch by multiplying 5 pixels in the horizontal direction by 5 pixels in the vertical direction.
In a preferred embodiment of the invention, the computing module is further configured to:
uniformly sampling 64 sampling points on the light beam processed by the MLP model;
and putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the attention mechanism of the characterization vector splicing into the MLP model to obtain the RGB value of one sampling point, and repeating the steps until the RGB values of 64 sampling points are obtained.
In a preferred embodiment of the invention, the synthesis module is further configured to:
performing volume rendering on 64 sampling points on the light beam to synthesize a color RGB, and calculating a probability value of each contribution of the 64 sampling points to the color of the light beam by using an attention mechanism;
sequencing the calculated probability values, and selecting 16 sampling points with highest probability values;
taking 16 sampling points as centers to acquire 8 sampling points respectively to obtain 128 fine sampling points;
the RGB values of the 128 fine sampling points are obtained, and the fine sampling points are subjected to volume rendering synthesis to obtain the RGB values of the colors.
Based on the above object, a third aspect of the embodiments of the present invention proposes a computer device. Fig. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention. As shown in fig. 3, an embodiment of the present invention includes the following means: at least one processor S21; and a memory S22, the memory S22 storing computer instructions S23 executable on the processor, the instructions when executed by the processor performing the method of:
acquiring a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and setting parameters for processing pixels in the pictures by taking a pixel patch as a unit;
processing the coordinate dimension of the light beam according to parameters, and coding the coordinate information of all pixel points in 1 patch once by using an attention mechanism;
uniformly sampling a plurality of sampling points in the light beam, and obtaining RGB values of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam;
and performing volume rendering synthesis on the plurality of sampling points to obtain a color RGB, selecting a fine sampling point based on the probability of contribution of each point to the beam color, and performing volume rendering synthesis on the fine sampling point to obtain a color RGB value.
In a preferred embodiment of the present invention, setting parameters that process pixels in a picture in units of pixel patch includes:
the parameters of the pixel patch are set to 5*5, where 5*5 represents that 25 pixels in the picture are processed as one pixel patch by multiplying 5 pixels in the horizontal direction by 5 pixels in the vertical direction.
In a preferred embodiment of the present invention, uniformly sampling a plurality of sampling points in a light beam, and obtaining an RGB value of each of the plurality of sampling points according to information of a coordinate dimension of the light beam includes:
uniformly sampling 64 sampling points on the light beam processed by the MLP model;
and putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the attention mechanism of the characterization vector splicing into the MLP model to obtain the RGB value of one sampling point, and repeating the steps until the RGB values of 64 sampling points are obtained.
In a preferred embodiment of the present invention, performing volume rendering on a plurality of sampling points to synthesize color RGB, selecting a fine sampling point based on a probability that each point contributes to a beam color, and performing volume rendering on the fine sampling point to obtain a value of color RGB includes:
performing volume rendering on 64 sampling points on the light beam to synthesize a color RGB, and calculating a probability value of each contribution of the 64 sampling points to the color of the light beam by using an attention mechanism;
sequencing the calculated probability values, and selecting 16 sampling points with highest probability values;
taking 16 sampling points as centers to acquire 8 sampling points respectively to obtain 128 fine sampling points;
the RGB values of the 128 fine sampling points are obtained, and the fine sampling points are subjected to volume rendering synthesis to obtain the RGB values of the colors.
Based on the above object, a fourth aspect of the embodiments of the present invention proposes a computer-readable storage medium. Fig. 4 is a schematic diagram of an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 4, the computer-readable storage medium S31 stores a computer program S32 that, when executed by a processor, performs the method as described above.
Furthermore, the method disclosed according to the embodiment of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. The above-described functions defined in the methods disclosed in the embodiments of the present invention are performed when the computer program is executed by a processor.
Furthermore, the above-described method steps and system units may also be implemented using a controller and a computer-readable storage medium storing a computer program for causing the controller to implement the above-described steps or unit functions.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general purpose or special purpose computer or general purpose or special purpose processor. Further, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (6)

1. A method of attention-based netf optimization, comprising the steps of:
acquiring a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and setting parameters for processing pixels in the pictures by taking a pixel patch as a unit;
processing the coordinate dimension of the light beam according to the parameters, and coding the coordinate information of all pixel points in 1 patch once by using an attention mechanism;
uniformly sampling a plurality of sampling points in a light beam, obtaining RGB values of each sampling point in the plurality of sampling points according to information of the coordinate dimension of the light beam, wherein uniformly sampling the plurality of sampling points in the light beam, obtaining RGB values of each sampling point in the plurality of sampling points according to information of the coordinate dimension of the light beam comprises uniformly sampling 64 sampling points on the light beam processed by an MLP model, putting 3D coordinates in the coordinate dimension of the light beam into the MLP model to obtain output transparency and characterization vectors, putting direction information obtained in a characterization vector splicing attention mechanism into the MLP model to obtain RGB values of one sampling point, and repeating the steps until the RGB values of 64 sampling points are obtained;
performing volume rendering synthesis on a plurality of sampling points to obtain a color RGB, selecting a fine sampling point based on the probability of contribution of each point to the color of the light beam, performing volume rendering synthesis on the fine sampling points to obtain a color RGB value, wherein the plurality of sampling points are subjected to volume rendering synthesis on the color RGB, selecting the fine sampling point based on the probability of contribution of each point to the color of the light beam, performing volume rendering synthesis on the fine sampling points to obtain the color RGB value comprises performing volume rendering synthesis on 64 sampling points on the light beam, calculating the probability value of contribution of each of the 64 sampling points to the color of the light beam by using an attention mechanism, sorting the calculated probability values, selecting 16 sampling points with the highest probability value, re-acquiring 8 sampling points with the 16 sampling points as the center to obtain 128 fine sampling points, acquiring RGB values of the 128 fine sampling points, and performing volume rendering synthesis on the fine sampling points to obtain the color RGB value.
2. The method of claim 1, wherein setting parameters that process pixels in a picture in pixel patch units comprises:
the parameters of the pixel patch are set to 5*5, where 5*5 represents that 25 pixels in the picture are processed as one pixel patch by multiplying 5 pixels in the horizontal direction by 5 pixels in the vertical direction.
3. An apparatus for NERF optimization based on an attention mechanism, the apparatus comprising:
the setting module is configured to acquire a plurality of pictures shot at different positions in the 3D scene and information of the pictures, and set parameters for processing pixels in the pictures by taking a pixel patch as a unit;
the processing module is configured to process the coordinate dimension of the light beam according to the parameters and encode the coordinate information of all pixel points in the 1 patch once by using an attention mechanism;
the computing module is configured to uniformly sample a plurality of sampling points in a light beam, obtain the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam, uniformly sample 64 sampling points on the light beam processed by the MLP model, put the 3D coordinate in the coordinate dimension of the light beam into the MLP model to obtain the output transparency and the characterization vector, put the direction information obtained in the characterization vector splicing attention mechanism into the MLP model to obtain the RGB value of one sampling point, and repeatedly execute the computing module until the RGB value of 64 sampling points is obtained;
the synthesizing module is configured to perform volume rendering synthesis on a plurality of sampling points to obtain color RGB, select fine sampling points based on the probability of contribution of each point to the color of the light beam, perform volume rendering synthesis on the fine sampling points to obtain color RGB values, perform volume rendering synthesis on 64 sampling points on the light beam to obtain color RGB, calculate probability values of contribution of each of the 64 sampling points to the color of the light beam by using an attention mechanism, sort the calculated probability values, select 16 sampling points with highest probability values, acquire 8 sampling points with the 16 sampling points as centers to obtain 128 fine sampling points, acquire RGB values of the 128 fine sampling points, and perform volume rendering synthesis on the fine sampling points to obtain color RGB values.
4. The apparatus of claim 3, wherein the setup module is further configured to:
the parameters of the pixel patch are set to 5*5, where 5*5 represents that 25 pixels in the picture are processed as one pixel patch by multiplying 5 pixels in the horizontal direction by 5 pixels in the vertical direction.
5. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method of any one of claims 1-2.
6. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1-2.
CN202210610962.3A 2022-05-31 2022-05-31 Method, apparatus, device and readable medium for NERF optimization based on attention mechanism Active CN114882158B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210610962.3A CN114882158B (en) 2022-05-31 2022-05-31 Method, apparatus, device and readable medium for NERF optimization based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210610962.3A CN114882158B (en) 2022-05-31 2022-05-31 Method, apparatus, device and readable medium for NERF optimization based on attention mechanism

Publications (2)

Publication Number Publication Date
CN114882158A CN114882158A (en) 2022-08-09
CN114882158B true CN114882158B (en) 2024-01-09

Family

ID=82679028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210610962.3A Active CN114882158B (en) 2022-05-31 2022-05-31 Method, apparatus, device and readable medium for NERF optimization based on attention mechanism

Country Status (1)

Country Link
CN (1) CN114882158B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116228855A (en) * 2022-12-30 2023-06-06 北京鉴智科技有限公司 Visual angle image processing method and device, electronic equipment and computer storage medium
CN117058049B (en) * 2023-05-04 2024-01-09 广州图语信息科技有限公司 New view image synthesis method, synthesis model training method and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330966A (en) * 2017-06-21 2017-11-07 杭州群核信息技术有限公司 A kind of rendering intent and device
CN110880162A (en) * 2019-11-22 2020-03-13 中国科学技术大学 Snapshot spectrum depth combined imaging method and system based on deep learning
CN114119849A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Three-dimensional scene rendering method, device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330966A (en) * 2017-06-21 2017-11-07 杭州群核信息技术有限公司 A kind of rendering intent and device
CN110880162A (en) * 2019-11-22 2020-03-13 中国科学技术大学 Snapshot spectrum depth combined imaging method and system based on deep learning
CN114119849A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Three-dimensional scene rendering method, device and storage medium

Also Published As

Publication number Publication date
CN114882158A (en) 2022-08-09

Similar Documents

Publication Publication Date Title
CN109255831B (en) Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning
CN114882158B (en) Method, apparatus, device and readable medium for NERF optimization based on attention mechanism
US20200357099A1 (en) Video inpainting with deep internal learning
US6791540B1 (en) Image processing apparatus
US9117310B2 (en) Virtual camera system
CN115082639A (en) Image generation method and device, electronic equipment and storage medium
CN115298708A (en) Multi-view neural human body rendering
CN116071278A (en) Unmanned aerial vehicle aerial image synthesis method, system, computer equipment and storage medium
Weiss et al. Learning adaptive sampling and reconstruction for volume visualization
CN108924528A (en) A kind of binocular stylization real-time rendering method based on deep learning
CN115205463A (en) New visual angle image generation method, device and equipment based on multi-spherical scene expression
WO2024007181A1 (en) Dynamic scene three-dimensional reconstruction method and system based on multi-scale space-time coding
CN116342804A (en) Outdoor scene three-dimensional reconstruction method and device, electronic equipment and storage medium
CN115100337A (en) Whole body portrait video relighting method and device based on convolutional neural network
CN116228855A (en) Visual angle image processing method and device, electronic equipment and computer storage medium
Peng et al. PDRF: progressively deblurring radiance field for fast scene reconstruction from blurry images
Wang et al. Neural opacity point cloud
CN118154770A (en) Single tree image three-dimensional reconstruction method and device based on nerve radiation field
CN117058334A (en) Method, device, equipment and storage medium for reconstructing indoor scene surface
US20210350547A1 (en) Learning apparatus, foreground region estimation apparatus, learning method, foreground region estimation method, and program
CN116843551A (en) Image processing method and device, electronic equipment and storage medium
Luvizon et al. Relightable Neural Actor with Intrinsic Decomposition and Pose Control
Evain et al. A lightweight neural network for monocular view generation with occlusion handling
Chen et al. Single-view Neural Radiance Fields with Depth Teacher
Park et al. Bridging Implicit and Explicit Geometric Transformation for Single-Image View Synthesis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant