CN114882158A - Method, device, equipment and readable medium for NERF optimization based on attention mechanism - Google Patents

Method, device, equipment and readable medium for NERF optimization based on attention mechanism Download PDF

Info

Publication number
CN114882158A
CN114882158A CN202210610962.3A CN202210610962A CN114882158A CN 114882158 A CN114882158 A CN 114882158A CN 202210610962 A CN202210610962 A CN 202210610962A CN 114882158 A CN114882158 A CN 114882158A
Authority
CN
China
Prior art keywords
sampling points
light beam
rgb
sampling
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210610962.3A
Other languages
Chinese (zh)
Other versions
CN114882158B (en
Inventor
王鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210610962.3A priority Critical patent/CN114882158B/en
Publication of CN114882158A publication Critical patent/CN114882158A/en
Application granted granted Critical
Publication of CN114882158B publication Critical patent/CN114882158B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/20Indexing scheme for editing of 3D models
    • G06T2219/2012Colour editing, changing, or manipulating; Use of colour codes

Abstract

The invention provides a method, a device, equipment and a readable medium for NERF optimization based on an attention mechanism, wherein the method comprises the following steps: acquiring a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and setting parameters for processing pixels in the pictures by taking a pixel patch as a unit; processing the coordinate dimension of the light beam according to the parameters, and performing primary encoding on the coordinate information of all the pixel points in 1 patch by using an attention mechanism; uniformly sampling a plurality of sampling points in the light beam, and obtaining the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam; and performing volume rendering synthesis on the plurality of sampling points to obtain RGB values, selecting fine sampling points based on the probability of contribution of each point to the light beam color, and performing volume rendering synthesis on the fine sampling points to obtain RGB values of the color. By using the scheme of the invention, the model training and reasoning speed can be accelerated, and the rendering effect of the model can be improved.

Description

Method, apparatus, device and readable medium for NERF optimization based on attention mechanism
Technical Field
The present invention relates to the field of computers, and more particularly to a method, apparatus, device and readable medium for NERF optimization based on an attention mechanism.
Background
Rendering in computer graphics refers to the computer simulation of the optical process of taking a picture in the real physical world. When the rendering technology is generally researched, 3D scene modeling and representation are also included, and rendering can be completed based on the 3D scene only when the 3D scene is realized in a computer. The rendering technology widely used at present is implemented based on the related methods of computer graphics, such as rasterization, ray casting, ray/path tracing, which simulate the moving process of rays after completing 3D modeling, and display the colors generated in motion on a screen. With the increasing requirements for image quality, in industries such as games and movies, algorithms such as ray/path tracing are continuously accurate, and the industries also perform calculation acceleration based on hardware such as a GPU, but the problem between accuracy and efficiency still puzzles many applications.
In recent years, computer vision techniques based on deep learning have been developed greatly, such as target tracking/image segmentation. In 2019-.
NERF (nerve radiation field) is among the most influential to the present, and numerous NERF-based improvements have exploded over the two years. The NERF realizes implicit reconstruction and multi-angle rendering of a 3D scene, and has the potential of being used in 3D modeling rendering and visualization in a digital twin scene, digital face and posture reconstruction, animation and other metacosmic digital scenes. The basic idea of NERF is to use a neural network as an implicit expression of a 3D scene, instead of the traditional point cloud, mesh, voxel, TSDF, etc., and meanwhile, a projection image at any position and any angle can be directly rendered through such a network. The main work of the method comprises the following steps: 1) a method is proposed for expressing complex geometric + material continuum scenes with a 5D Neural radiation Field (Neural radial Field), parameterized with an MLP network; 2) a micro-Rendering method based on classical voxel Rendering (Volume Rendering) improvement is provided, RGB images can be obtained through micro-Rendering, and the RGB images serve as optimization targets. This section contains an acceleration strategy that employs hierarchical sampling to allocate the capacity of the MLP to the visible content area. The main training and reasoning process comprises the following steps: 1. a NERF model is created (MLP model initialization). 2. Acquiring a light beam and preprocessing, wherein the dimension of the light beam is (N, ro + rd, H, W,3), wherein N represents the total number of data set samples, HW represents resolution, ro represents a light beam starting point, rd represents a light beam direction, and 3 represents 3D coordinates; in the dimension of the light beams, the RGB color corresponding to each light beam is increased, namely the dimension is changed to (N, ro + rd + RGB, H, W, 3). 3. Selecting batch beam information, uniformly sampling the beam, calculating RGB and opacity a of each sampling point by using the defined MLP, and performing volume rendering on the beam spots to synthesize colors; the newly sampled beam spot is volume rendered into a composite color, again using MLP to calculate RGB and opacity a, based on the weighted opacity of the table of sample points. The above is forward reasoning, and the loss used in the backward propagation is RGB image information added in the light beam which is gt, and the loss is obtained by comparing the RGB image information with the calculated RGB image information and then the backward propagation is realized. The attention mechanism has numerous applications in the fields of computer vision and NLP, and the core idea of the attention mechanism is to focus on establishing the relationship between more closely related quantities. Attention-driven mechanisms are a natural ability of the human brain. When we see a picture, we first sweep the picture quickly and then lock the target area that needs to be focused. Note that the input to the force mechanism is a plurality of correlation quantities and the output is the probability that the correlation quantity contributes to the whole.
In the prior art, FaseNerf is taken as a representative, a memory-to-fetch calculation mode is used for accelerating the reasoning process of NERF, and the memory-to-compute calculation is the most effective means on the basis of not changing an algorithm, but most of the current neural networks are based on GPU (graphics processing unit) for reasoning, so that the data is not economical and practical to store in a video memory, and the communication overhead of the data in the memory is relatively high. The basic idea of FastNeRF is to store the output values of all NeRF characterization functions in advance, so that depth model calculation and table lookup are not needed during rendering. But raw NeRF is a 5D coordinate input, and even 1024 resolution per dimension, 1024T voxel features need to be saved. In order to reduce the value required to be stored to the scale which can be processed by a modern display card, FastNeRF proposes that a NeRF model is decomposed into two characterization networks of a voxel 3D position and a projection 2D visual angle by utilizing the characteristic of scene rendering, and the two characterization networks are respectively calculated and then combined to form a voxel color characteristic.
In the prior art, a process of solving numerical integration is replaced by a neural network represented by AutoInt, that is, a backward propagation process is used for replacing a network input end value of forward propagation to solve, so that the problem of calculating a plurality of rays one by one still cannot be solved, and for example, a 1080P image (1920 x 1080), 200 ten thousand rays need to be calculated; autoInt considers the projection of a ray in volume rendering as a constant integral, defines a corresponding neural network G to represent the integral process, and then derives the neural network G to obtain a corresponding derivative network D. Obviously, the neural networks G, D have common network parameters. AutoInt first trains the derivative network D and then substitutes the optimized parameters into the integral function network G. Given the starting point of the projection line, the volume projection can be calculated by calculating the difference between the two points of the neural network function G, i.e. two forward calculations of G determine the projection value.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method, an apparatus, a device, and a readable medium for NERF optimization based on an attention mechanism, which can accelerate model training and reasoning speed and improve rendering effect of a model by using the technical solution of the present invention.
In view of the above objects, according to one aspect of the present invention, there is provided a method of NERF optimization based on attention mechanism, comprising the steps of:
acquiring a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and setting parameters for processing pixels in the pictures by taking a pixel patch as a unit;
processing the coordinate dimension of the light beam according to the parameters, and performing primary encoding on the coordinate information of all the pixel points in 1 patch by using an attention mechanism;
uniformly sampling a plurality of sampling points in the light beam, and obtaining the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam;
and performing volume rendering synthesis on the plurality of sampling points to obtain RGB values, selecting fine sampling points based on the probability of contribution of each point to the light beam color, and performing volume rendering synthesis on the fine sampling points to obtain RGB values of the color.
According to an embodiment of the present invention, setting a parameter for processing a pixel in a picture in units of a pixel patch includes:
the parameter of the pixel patch is set to 5 × 5, where 5 × 5 represents that 25 pixels in total, which are 5 pixels in the horizontal direction by 5 pixels in the vertical direction in the picture, are processed as one pixel patch.
According to an embodiment of the present invention, uniformly sampling a plurality of sampling points in a light beam, and obtaining an RGB value of each of the plurality of sampling points according to information of a coordinate dimension of the light beam includes:
uniformly sampling 64 sampling points on the beam processed by the MLP model;
and (3) putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the characterization vector splicing attention mechanism into the MLP model to obtain the RGB value of one sampling point, and repeating the step until the RGB values of 64 sampling points are obtained.
According to one embodiment of the present invention, volume rendering and synthesizing the color RGB for a plurality of sampling points, and selecting a fine sampling point based on a probability of contribution of each point to a beam color, and volume rendering and synthesizing the fine sampling point to obtain a value of the color RGB includes:
carrying out volume rendering on 64 sampling points on the light beam to synthesize color RGB, and calculating probability values of contribution of each of the 64 sampling points to the light beam color by using an attention mechanism;
sorting the calculated probability values, and selecting 16 sampling points with the highest probability values;
respectively acquiring 8 sampling points by taking the 16 sampling points as centers to obtain 128 fine sampling points;
the RGB values of 128 fine sampling points are acquired, and volume rendering synthesis is performed on the fine sampling points to obtain the RGB values of the colors.
According to another aspect of the present invention, there is also provided an apparatus for NERF optimization based on an attention mechanism, the apparatus comprising:
the setting module is configured to acquire a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and set parameters for processing pixels in the pictures by taking pixel patch as a unit;
the processing module is configured to process the coordinate dimensions of the light beams according to the parameters, and primarily encode the coordinate information of all the pixel points in 1 patch by using an attention mechanism;
the calculation module is configured to uniformly sample a plurality of sampling points in the light beam and obtain the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam;
and the synthesis module is configured to perform volume rendering synthesis on the plurality of sampling points to synthesize colors RGB, select fine sampling points based on the probability of contribution of each point to the light beam color, and perform volume rendering synthesis on the fine sampling points to obtain values of the colors RGB.
According to an embodiment of the invention, the setting module is further configured to:
the parameter of the pixel patch is set to 5 × 5, where 5 × 5 represents that 25 pixels in total, which are 5 pixels in the horizontal direction by 5 pixels in the vertical direction in the picture, are processed as one pixel patch.
According to one embodiment of the invention, the computing module is further configured to:
uniformly sampling 64 sampling points on the beam processed by the MLP model;
and (3) putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the characterization vector splicing attention mechanism into the MLP model to obtain the RGB value of one sampling point, and repeating the step until the RGB values of 64 sampling points are obtained.
According to an embodiment of the invention, the synthesis module is further configured to:
carrying out volume rendering on 64 sampling points on the light beam to synthesize color RGB, and calculating probability values of contribution of each of the 64 sampling points to the light beam color by using an attention mechanism;
sorting the calculated probability values, and selecting 16 sampling points with the highest probability values;
respectively acquiring 8 sampling points by taking the 16 sampling points as centers to obtain 128 fine sampling points;
the RGB values of 128 fine sampling points are acquired, and volume rendering synthesis is performed on the fine sampling points to obtain the RGB values of the colors.
In another aspect of an embodiment of the present invention, there is also provided a computer apparatus including:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of any of the methods described above.
In another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium storing a computer program, which when executed by a processor implements the steps of any one of the above-mentioned methods.
The invention has the following beneficial technical effects: according to the NERF optimization method based on the attention mechanism, provided by the embodiment of the invention, a plurality of pictures shot at different positions in a 3D scene and information of the pictures are obtained, and parameters for processing pixels in the pictures by taking pixel patch as a unit are set; processing the coordinate dimension of the light beam according to the parameters, and performing primary encoding on the coordinate information of all the pixel points in 1 patch by using an attention mechanism; uniformly sampling a plurality of sampling points in the light beam, and obtaining the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam; the technical scheme of synthesizing the RGB values of the color RGB values by volume rendering on a plurality of sampling points, selecting fine sampling points based on the probability of contribution of each point to the color of the light beam, and synthesizing the fine sampling points by volume rendering can accelerate the speed of model training and reasoning and improve the rendering effect of the model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of attention-based NERF optimization in accordance with one embodiment of the present invention;
FIG. 2 is a schematic diagram of an apparatus for NERF optimization based on an attention mechanism, according to one embodiment of the present invention;
FIG. 3 is a schematic diagram of a computer device according to one embodiment of the present invention;
fig. 4 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
In view of the above objects, a first aspect of embodiments of the present invention proposes an embodiment of a method for NERF optimization based on an attention mechanism. Fig. 1 shows a schematic flow diagram of the method.
As shown in fig. 1, the method may include the steps of:
s1 acquires a number of pictures taken at different positions in the 3D scene and information of the pictures, and sets parameters for processing pixels in the pictures in units of pixels patch.
Firstly, 100 pictures shot at different shooting positions in a 3D scene and corresponding camera pose information are obtained. Initializing a NERF model, namely an MLP (fully-connected neural network) model, processing beams, wherein the number of initial beams is still (N, ro + rd, H, W,3), so that the direction and coordinate information on each pixel point can be conveniently obtained, in order to reduce the number of projected rays in a 3D scene, a strategy of using a pixel patch is proposed, for example, 5 × 5 pixel points are regarded as one patch, wherein 5 × 5 means that 25 pixels in total, namely 5 pixels in the transverse direction are multiplied by 5 pixels in the longitudinal direction in a picture, are treated as one pixel patch, and each coordinate point in the patch and a light source projection point are used for determining the direction of the light projection, namely roi, rdi.
S2 processes the coordinate dimensions of the light beam according to the parameters, and once encodes the coordinate information of all the pixels in 1 patch using the attention mechanism.
And adjusting the coordinate dimension to the dimension of (N, ro + rd, 25, H/5, W/5, 3), wherein 25 represents the number of pixels of one patch, the data can also be set according to requirements, then the patch is used for subsequent network training, at this time, N × H/5W/5 is the number of all the light beams, the patch is like 4096 (ro + rd) 25 × 3, the coordinate information of 25 pixels is once encoded by using an attention mechanism (two layers are fully connected), then the encoded vector is substituted for the previous single-point coordinate information, and the data output of one patch is 4096 × rd 1 × 3 (i.e. the information of one patch represents the information of 25 pixels).
S3, uniformly sampling a plurality of sampling points in the light beam, and obtaining the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam.
Uniformly sampling 64 sampling points on a light beam processed by an MLP model, noting that the light beam at the moment is not an actual light beam, fusing direction and coordinate information of all pixel points in a patch, obtaining 64 sampling points according to uniform sampling, inputting the sampling points into an MLP network to obtain output of 4-dimensional coordinates (transparency and RGB) of the patch on each sampling point, namely, putting 3D coordinates into a fully-connected neural network to obtain transparency a + a characterization vector of the output of the patch (25) according to input processing conditions of NERF, then splicing the direction information obtained in the characterization vector and the RGB information obtained in the attention mechanism into the fully-connected neural network to obtain values of the output of the patch (25), and repeating the step for 64 times to obtain RGB values of 64 sampling points, wherein each sampling point has 25 pixels.
S4, carrying out volume rendering synthesis on the plurality of sampling points to obtain RGB values, selecting fine sampling points based on the probability of contribution of each point to the light beam color, and carrying out volume rendering synthesis on the fine sampling points to obtain RGB values.
And performing volume rendering on 64 sampling points on the light beam to synthesize color RGB, determining a probability value of each point for the points coarsely sampled on the ray by using an attention mechanism, wherein the input of the attention is the transparency value of a pixel point in the patch, and the output is the probability. And then, forming more refined sampling on the projection ray according to the probability, specifically rendering 16 sampling points with high probability, and sampling 8 sampling points by taking the sampling points as the center, wherein 128 sampling points are obtained. The RGB values of 128 fine sampling points are acquired, and volume rendering synthesis is performed on the fine sampling points to obtain the RGB values of the colors. Experiments prove that the method can improve the training and reasoning speed by about 20 times on the premise of not losing the precision.
The technical scheme of the invention is mainly used for reducing the number of the projection rays in the 3D scene, forming a pixel patch based on the theory of continuity of physical materials, finishing rendering of all pixel points by using an attention mechanism, evaluating the contribution rate of a single projection ray to the color by using the attention mechanism after a coarse sampling stage is finished, and finishing single-point color calculation by performing fine sampling according to the contribution rate.
By the technical scheme, the model training and reasoning speed can be increased, and the rendering effect of the model can be improved.
The technical scheme of the invention reduces the number of the projection rays in the 3D scene, and proposes a strategy of using pixel patch, for example, regarding 5 × 5 pixel points as one patch, and using each coordinate point and the light source projection point in the patch to determine the direction of the ray projection, namely roi, rdi; encoding coordinates and directions by using an attention mechanism, then replacing the encoded vector with the previous single-point coordinate information, determining the input of the MLP network at the moment, and obtaining a transparency and a supplementary vector through full connection, wherein the transparency at the moment is the transparency information of a pixel point in the patch range represented by a transparency vector; the supplementary vector is further integrated with the direction information of attention mechanism coding to obtain an RGB vector; the RGB information at this time also represents the RGB information of the pixel points within the patch range. In this way, the information on the relevant position and direction is obtained by performing fusion coding on the coordinates and the direction of each pixel in the patch by using an attention mechanism, and then the RGB and transparency information of the pixel points in the range of the continuous material patch is deduced at one time by using the fitting capability of the MLP.
Secondly, after the coarse sampling stage of the single light single projection light is finished, evaluating the contribution rate to the color by using an attention mechanism, and accordingly performing fine sampling to finish single-point color calculation; after MLP is used for completing one-time deduction, RGB and transparency information of all points in the patch are obtained; the information of all the points on 64 sampling patches can be obtained by using rough sampling, and then a probability value of each point is determined for the points roughly sampled on the ray by using an attention mechanism, wherein the input of the attention is the transparency value of a pixel point in the patch, and the output of the attention is the probability. And then sampling on the projection light according to the probability, and finally calculating to obtain the color value of each pixel point by using a volume rendering method in NERF.
By using the attention mechanism method provided by the invention, the number of the projected light rays can be greatly reduced on the premise of keeping the precision unchanged, so that the time required by training and reasoning is reduced, and the attention mechanism can also meet the requirement that adjacent materials in a 3D space have the same color expression and color correlation at a higher probability.
In a preferred embodiment of the present invention, setting a parameter for processing a pixel in a picture in units of pixels patch includes:
the parameter of the pixel patch is set to 5 × 5, where 5 × 5 represents that 25 pixels in total, which are 5 pixels in the horizontal direction by 5 pixels in the vertical direction in the picture, are processed as one pixel patch.
In a preferred embodiment of the present invention, uniformly sampling a plurality of sampling points in the light beam, and obtaining the RGB value of each of the plurality of sampling points according to the information of the coordinate dimension of the light beam includes:
uniformly sampling 64 sampling points on the beam processed by the MLP model;
and (3) putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the characterization vector splicing attention mechanism into the MLP model to obtain the RGB value of one sampling point, and repeating the step until the RGB values of 64 sampling points are obtained.
In a preferred embodiment of the present invention, volume rendering and synthesizing the color RGB for a plurality of sampling points, and selecting the fine sampling points based on the probability of the color contribution of each point to the light beam, and volume rendering and synthesizing the fine sampling points to obtain the values of the color RGB includes:
carrying out volume rendering on 64 sampling points on the light beam to synthesize color RGB, and calculating probability values of contribution of each of the 64 sampling points to the light beam color by using an attention mechanism;
sorting the calculated probability values, and selecting 16 sampling points with the highest probability values;
respectively acquiring 8 sampling points by taking the 16 sampling points as centers to obtain 128 fine sampling points;
the RGB values of 128 fine sampling points are acquired, and volume rendering synthesis is performed on the fine sampling points to obtain the RGB values of the colors.
The method fully utilizes the attention mechanism and the characteristic that physical materials are continuous in space, changes the process that the prior MLP network codes one position information to form the RGB and the transparent value a into the process that the RGB and the transparent value a are simultaneously calculated at a plurality of positions, can accelerate the speed of training and reasoning, obtains the contribution probability of a certain point color by utilizing the attention mechanism and finely samples when finely sampling on projection light, and can improve the rendering effect of a model.
It should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by instructing relevant hardware through a computer program, and the above programs may be stored in a computer-readable storage medium, and when executed, the programs may include the processes of the embodiments of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments corresponding thereto.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
In view of the above objects, a second aspect of the embodiments of the present invention proposes an apparatus for NERF optimization based on attention mechanism, as shown in fig. 2, the apparatus 200 includes:
the setting module is configured to acquire a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and set parameters for processing pixels in the pictures by taking pixel patch as a unit;
the processing module is configured to process the coordinate dimensions of the light beams according to the parameters, and primarily encode the coordinate information of all the pixel points in 1 patch by using an attention mechanism;
the calculation module is configured to uniformly sample a plurality of sampling points in the light beam and obtain the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam;
and the synthesis module is configured to perform volume rendering synthesis on the plurality of sampling points to synthesize colors RGB, select fine sampling points based on the probability of contribution of each point to the light beam color, and perform volume rendering synthesis on the fine sampling points to obtain values of the colors RGB.
In a preferred embodiment of the present invention, the setting module is further configured to:
the parameter of the pixel patch is set to 5 × 5, where 5 × 5 represents that 25 pixels in total, which are 5 pixels in the horizontal direction by 5 pixels in the vertical direction in the picture, are processed as one pixel patch.
In a preferred embodiment of the invention, the calculation module is further configured to:
uniformly sampling 64 sampling points on the beam processed by the MLP model;
and putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the output characterization vector, putting the direction information obtained in the characterization vector splicing attention mechanism into the MLP model to obtain the RGB value of one sampling point, and repeating the steps until the RGB values of 64 sampling points are obtained.
In a preferred embodiment of the invention, the synthesis module is further configured to:
carrying out volume rendering on 64 sampling points on the light beam to synthesize color RGB, and calculating probability values of contribution of each of the 64 sampling points to the light beam color by using an attention mechanism;
sorting the calculated probability values, and selecting 16 sampling points with the highest probability values;
respectively acquiring 8 sampling points by taking the 16 sampling points as centers to obtain 128 fine sampling points;
the RGB values of 128 fine sampling points are acquired, and volume rendering synthesis is performed on the fine sampling points to obtain the RGB values of the colors.
In view of the above object, a third aspect of the embodiments of the present invention provides a computer device. Fig. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention. As shown in fig. 3, an embodiment of the present invention includes the following means: at least one processor S21; and a memory S22, the memory S22 storing computer instructions S23 executable on the processor, the instructions when executed by the processor implementing the method of:
acquiring a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and setting parameters for processing pixels in the pictures by taking a pixel patch as a unit;
processing the coordinate dimension of the light beam according to the parameters, and performing primary encoding on the coordinate information of all the pixel points in 1 patch by using an attention mechanism;
uniformly sampling a plurality of sampling points in the light beam, and obtaining the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam;
and performing volume rendering synthesis on the plurality of sampling points to obtain RGB values, selecting fine sampling points based on the probability of contribution of each point to the light beam color, and performing volume rendering synthesis on the fine sampling points to obtain RGB values of the color.
In a preferred embodiment of the present invention, setting a parameter for processing a pixel in a picture in units of pixels patch includes:
the parameter of the pixel patch is set to 5 × 5, where 5 × 5 represents that 25 pixels in total, which are 5 pixels in the horizontal direction by 5 pixels in the vertical direction in the picture, are processed as one pixel patch.
In a preferred embodiment of the present invention, uniformly sampling a plurality of sampling points in the light beam, and obtaining the RGB value of each of the plurality of sampling points according to the information of the coordinate dimension of the light beam includes:
uniformly sampling 64 sampling points on the beam processed by the MLP model;
and (3) putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the characterization vector splicing attention mechanism into the MLP model to obtain the RGB value of one sampling point, and repeating the step until the RGB values of 64 sampling points are obtained.
In a preferred embodiment of the present invention, volume rendering and synthesizing the color RGB for a plurality of sampling points, and selecting the fine sampling points based on the probability of the color contribution of each point to the light beam, and volume rendering and synthesizing the fine sampling points to obtain the values of the color RGB includes:
performing volume rendering on 64 sampling points on the light beam to synthesize color RGB, and calculating probability values of contribution of each of the 64 sampling points to the light beam color by using an attention mechanism;
sorting the calculated probability values, and selecting 16 sampling points with the highest probability values;
respectively acquiring 8 sampling points by taking the 16 sampling points as centers to obtain 128 fine sampling points;
the RGB values of 128 fine sampling points are acquired, and volume rendering synthesis is performed on the fine sampling points to obtain the RGB values of the colors.
In view of the above object, a fourth aspect of the embodiments of the present invention proposes a computer-readable storage medium. FIG. 4 is a schematic diagram illustrating an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 4, the computer readable storage medium S31 stores a computer program S32 which, when executed by a processor, performs the method as described above.
Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, where the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method of NERF optimization based on an attention mechanism, comprising the steps of:
acquiring a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and setting parameters for processing pixels in the pictures by taking a pixel patch as a unit;
processing the coordinate dimension of the light beam according to the parameters, and performing primary encoding on the coordinate information of all the pixel points in 1 patch by using an attention mechanism;
uniformly sampling a plurality of sampling points in the light beam, and obtaining the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam;
and performing volume rendering synthesis on the plurality of sampling points to obtain RGB values, selecting fine sampling points based on the probability of contribution of each point to the light beam color, and performing volume rendering synthesis on the fine sampling points to obtain RGB values of the color.
2. The method of claim 1, wherein setting parameters for processing pixels in a picture in units of pixels patch comprises:
the parameter of the pixel patch is set to 5 × 5, where 5 × 5 represents that 25 pixels in total, which are 5 pixels in the horizontal direction by 5 pixels in the vertical direction in the picture, are processed as one pixel patch.
3. The method of claim 1, wherein a plurality of sampling points are uniformly sampled in the light beam, and obtaining the RGB value of each of the plurality of sampling points from the information on the coordinate dimension of the light beam comprises:
uniformly sampling 64 sampling points on the beam processed by the MLP model;
and (3) putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the characterization vector splicing attention mechanism into the MLP model to obtain the RGB value of one sampling point, and repeating the step until the RGB values of 64 sampling points are obtained.
4. The method of claim 3, wherein volume rendering the composite color RGB for a number of sample points and selecting fine sample points based on the probability of each point contributing to the beam color, and volume rendering the fine sample points to obtain values for color RGB comprises:
performing volume rendering on 64 sampling points on the light beam to synthesize color RGB, and calculating probability values of contribution of each of the 64 sampling points to the light beam color by using an attention mechanism;
sorting the calculated probability values, and selecting 16 sampling points with the highest probability values;
respectively acquiring 8 sampling points by taking the 16 sampling points as centers to obtain 128 fine sampling points;
the RGB values of 128 fine sampling points are acquired, and volume rendering synthesis is performed on the fine sampling points to obtain the RGB values of the colors.
5. An apparatus for NERF optimization based on an attention mechanism, the apparatus comprising:
the setting module is configured to acquire a plurality of pictures shot at different positions in a 3D scene and information of the pictures, and set parameters for processing pixels in the pictures by taking a pixel patch as a unit;
the processing module is configured to process the coordinate dimension of the light beam according to the parameters, and encode the coordinate information of all the pixel points in 1 patch by using an attention mechanism;
the calculation module is configured to uniformly sample a plurality of sampling points in the light beam and obtain the RGB value of each sampling point in the plurality of sampling points according to the information of the coordinate dimension of the light beam;
and the synthesis module is configured to perform volume rendering synthesis on the plurality of sampling points to synthesize colors RGB, select fine sampling points based on the probability of contribution of each point to the light beam color, and perform volume rendering synthesis on the fine sampling points to obtain values of the colors RGB.
6. The apparatus of claim 5, wherein the setup module is further configured to:
the parameter of the pixel patch is set to 5 × 5, where 5 × 5 represents that 25 pixels in total, which are 5 pixels in the horizontal direction by 5 pixels in the vertical direction in the picture, are processed as one pixel patch.
7. The apparatus of claim 5, wherein the computing module is further configured to:
uniformly sampling 64 sampling points on the beam processed by the MLP model;
and (3) putting the 3D coordinates in the coordinate dimension of the light beam into an MLP model to obtain the output transparency and the characterization vector, putting the direction information obtained in the characterization vector splicing attention mechanism into the MLP model to obtain the RGB value of one sampling point, and repeating the step until the RGB values of 64 sampling points are obtained.
8. The apparatus of claim 7, wherein the synthesis module is further configured to:
carrying out volume rendering on 64 sampling points on the light beam to synthesize color RGB, and calculating probability values of contribution of each of the 64 sampling points to the light beam color by using an attention mechanism;
sorting the calculated probability values, and selecting 16 sampling points with the highest probability values;
respectively acquiring 8 sampling points by taking the 16 sampling points as centers to obtain 128 fine sampling points;
the RGB values of 128 fine sampling points are acquired, and volume rendering synthesis is performed on the fine sampling points to obtain the RGB values of the colors.
9. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
CN202210610962.3A 2022-05-31 2022-05-31 Method, apparatus, device and readable medium for NERF optimization based on attention mechanism Active CN114882158B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210610962.3A CN114882158B (en) 2022-05-31 2022-05-31 Method, apparatus, device and readable medium for NERF optimization based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210610962.3A CN114882158B (en) 2022-05-31 2022-05-31 Method, apparatus, device and readable medium for NERF optimization based on attention mechanism

Publications (2)

Publication Number Publication Date
CN114882158A true CN114882158A (en) 2022-08-09
CN114882158B CN114882158B (en) 2024-01-09

Family

ID=82679028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210610962.3A Active CN114882158B (en) 2022-05-31 2022-05-31 Method, apparatus, device and readable medium for NERF optimization based on attention mechanism

Country Status (1)

Country Link
CN (1) CN114882158B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058049A (en) * 2023-05-04 2023-11-14 广州图语信息科技有限公司 New view image synthesis method, synthesis model training method and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330966A (en) * 2017-06-21 2017-11-07 杭州群核信息技术有限公司 A kind of rendering intent and device
CN110880162A (en) * 2019-11-22 2020-03-13 中国科学技术大学 Snapshot spectrum depth combined imaging method and system based on deep learning
CN114119849A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Three-dimensional scene rendering method, device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330966A (en) * 2017-06-21 2017-11-07 杭州群核信息技术有限公司 A kind of rendering intent and device
CN110880162A (en) * 2019-11-22 2020-03-13 中国科学技术大学 Snapshot spectrum depth combined imaging method and system based on deep learning
CN114119849A (en) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 Three-dimensional scene rendering method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058049A (en) * 2023-05-04 2023-11-14 广州图语信息科技有限公司 New view image synthesis method, synthesis model training method and storage medium
CN117058049B (en) * 2023-05-04 2024-01-09 广州图语信息科技有限公司 New view image synthesis method, synthesis model training method and storage medium

Also Published As

Publication number Publication date
CN114882158B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
US11055828B2 (en) Video inpainting with deep internal learning
CN109255831B (en) Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning
EP1059611A1 (en) Image processing apparatus
US20130016097A1 (en) Virtual Camera System
Li et al. Dual-scale single image dehazing via neural augmentation
CN111832745A (en) Data augmentation method and device and electronic equipment
CN115298708A (en) Multi-view neural human body rendering
CN114882158B (en) Method, apparatus, device and readable medium for NERF optimization based on attention mechanism
CN115205463A (en) New visual angle image generation method, device and equipment based on multi-spherical scene expression
Zhang et al. Depth of field rendering using multilayer-neighborhood optimization
Yuan et al. Neural radiance fields from sparse RGB-D images for high-quality view synthesis
CN116563459A (en) Text-driven immersive open scene neural rendering and mixing enhancement method
Kwak et al. View synthesis with sparse light field for 6DoF immersive video
Luo et al. Defocus to focus: Photo-realistic bokeh rendering by fusing defocus and radiance priors
Song et al. Harnessing low-frequency neural fields for few-shot view synthesis
CN108924528A (en) A kind of binocular stylization real-time rendering method based on deep learning
Qiu et al. RDNeRF: relative depth guided NeRF for dense free view synthesis
CN116863053A (en) Point cloud rendering enhancement method based on knowledge distillation
WO2024007181A1 (en) Dynamic scene three-dimensional reconstruction method and system based on multi-scale space-time coding
CN116228855A (en) Visual angle image processing method and device, electronic equipment and computer storage medium
Inamoto et al. Fly through view video generation of soccer scene
Evain et al. A lightweight neural network for monocular view generation with occlusion handling
Peng et al. PDRF: progressively deblurring radiance field for fast scene reconstruction from blurry images
Chen et al. Single-view Neural Radiance Fields with Depth Teacher
CN116250002A (en) Single image 3D photography with soft layering and depth aware restoration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant