CN114240785A - Denoising method and system for ray tracing rendering continuous frames - Google Patents

Denoising method and system for ray tracing rendering continuous frames Download PDF

Info

Publication number
CN114240785A
CN114240785A CN202111551086.3A CN202111551086A CN114240785A CN 114240785 A CN114240785 A CN 114240785A CN 202111551086 A CN202111551086 A CN 202111551086A CN 114240785 A CN114240785 A CN 114240785A
Authority
CN
China
Prior art keywords
denoising
time
result
scale
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111551086.3A
Other languages
Chinese (zh)
Inventor
王璐
曾言
徐延宁
孟祥旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202111551086.3A priority Critical patent/CN114240785A/en
Publication of CN114240785A publication Critical patent/CN114240785A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention provides a denoising method and a denoising system for ray tracing rendering continuous frames, wherein the denoising method comprises the following steps: acquiring a double motion vector of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame; acquiring a noise image and auxiliary characteristics of a current frame, combining with the distortion reprojection of the previous frame, and splicing into a tensor; inputting the tensor into a space-time multi-scale denoising network to obtain a denoising result of the current frame; the space-time multi-scale denoising network comprises: and the kernel prediction network and the space-time multi-scale mixing network combine the time filtering result and the space multi-scale mixing result to obtain the denoising result of the current frame. On the premise of ensuring to obtain a high-quality denoising result, the phenomena of over-blurring, coloring aliasing and tailing existing in the prior art are eliminated.

Description

Denoising method and system for ray tracing rendering continuous frames
Technical Field
The invention belongs to the technical field of high-reality rendering post-processing denoising, and particularly relates to a denoising method and system for rendering continuous frames by ray tracing.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Ray tracing techniques based on Monte Carlo (MC) integration are widely used in physics-based photorealistic rendering. Conventional MC path Tracing requires Time-consuming computations and a large number of samples per pixel (spp) to obtain a considerable rendering result, but due to the Real-Time frame rate requirement (>30FPS) and the limitation of current hardware devices, Real-Time Ray Tracing (RTRT) can only use 1spp, and an extremely low sampling rate will result in high variance and visually appear as disturbing noise.
Some approaches, such as Temporal Anti-Aliasing (TAA) and neural network-based post-processing denoising methods, have been proposed to enable the acquisition of a continuous sequence of noiseless ray-tracing rendered frames available at interactive rates, even real-time rates. As is known from the characteristics of ray tracing rendering of consecutive frames, flicker (flickering) between consecutive frames needs to be removed in addition to noise (noise). Motion vectors (Motion vectors) introduce historical frame information for reuse in the time domain, and play a key role in removing inter-frame flicker. A conventional screen-based motion vector (SVGF) is a two-dimensional vector that points from each pixel coordinate of a current frame to the corresponding pixel coordinate of an object in a previous frame, but it fails at shadows, gloss (gloss) material reflections, and motion occlusion. Particularly for motion occlusion regions, this region is defined as: as the moving object moves, the previous frame is occluded and the background region just appeared in this frame. The SVGF motion vector of the area is always a moving foreground object of the previous frame, which results in unreusable time information and severe tailing (framing) phenomenon.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a denoising method and a denoising system for ray tracing rendering continuous frames, which can remove coloring aliasing, over-blurring and tailing phenomena and obtain a high-quality denoising result on the premise of obtaining a sequence frame which is free of noise and stable in time.
In order to achieve the purpose, the invention adopts the following technical scheme:
the first aspect of the present invention provides a method for denoising ray tracing rendering continuous frames, comprising:
acquiring a double motion vector of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame;
acquiring a noise image and auxiliary characteristics of a current frame, combining with the distortion reprojection of the previous frame, and splicing into a tensor;
inputting the tensor into a space-time multi-scale denoising network to obtain a denoising result of the current frame; the space-time multi-scale denoising network comprises a nuclear prediction network and a space-time multi-scale mixing network;
the kernel prediction network is used for extracting the features of the tensor to obtain a pixel-by-pixel space kernel, a pixel-by-pixel time kernel, a time mixing weight and an interlayer mixing weight;
the space-time multi-scale mixing network applies pixel-by-pixel space kernels to the multi-scale noise map, and obtains a space multi-scale mixing result by weighting and summing interlayer mixing weights; and meanwhile, applying the pixel-by-pixel time kernel to the distortion reprojection of the previous frame to obtain a time filtering result, using the time mixing weight to perform weighted summation on the space multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as the denoising result of the current frame.
Further, the specific method for computing the warped re-projection of the previous frame is as follows:
creating a two-dimensional tensor for storing pixel coordinates and directly adding the two-dimensional tensor with the double motion vectors to obtain a pixel coordinate mapping relation between the current frame and the previous frame;
and quickly aligning the denoising result of the previous frame with the current frame by using the pixel coordinate mapping relation to obtain the distortion reprojection of the previous frame.
Further, the loss function of the space-time multi-scale denoising network adopts the sum of space loss, time loss and motion shielding loss.
Further, the specific calculation method of the motion occlusion loss is as follows:
obtaining motion vectors of a current frame and a plurality of adjacent frames, and a denoising result and a comparison map of the current frame;
calculating a time domain progressive weighted mask of the current frame based on the motion vectors of the current frame and a plurality of adjacent frames;
multiplying the time domain progressive weighted mask with the denoising result of the current frame and the comparison map according to the corresponding pixels of the matrix by using a Hadamard product;
and calculating the motion shielding loss based on the masked denoising result and the comparison map.
Further, the time-domain progressive weighted mask of the current frame is a weighted sum of a plurality of masks;
and the mask is the difference value between the double motion vector of the current frame or the adjacent frame and the SVGF motion vector.
Further, the assistant features include an albedo, a depth, and a normal of the current frame.
A second aspect of the present invention provides a denoising system for ray tracing rendering of continuous frames, comprising:
a warp reprojection calculation module configured to: acquiring a double motion vector of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame;
a stitching module configured to: acquiring a noise image and auxiliary characteristics of a current frame, combining with the distortion reprojection of the previous frame, and splicing into a tensor;
a spatiotemporal multi-scale denoising module configured to: inputting the tensor into a space-time multi-scale denoising network to obtain a denoising result of the current frame;
the space-time multi-scale denoising network comprises a nuclear prediction network and a space-time multi-scale mixing network;
the kernel prediction network is used for extracting the features of the tensor to obtain a pixel-by-pixel space kernel, a pixel-by-pixel time kernel, a time mixing weight and an interlayer mixing weight;
the space-time multi-scale mixing network applies the pixel-by-pixel space kernel to a multi-scale noise map, and obtains a space multi-scale mixing result by weighting and summing interlayer mixing weights; simultaneously, applying the pixel-by-pixel time kernel to the distorted re-projection of the previous frame to obtain a time filtering result; and using the time mixing weight to perform weighted summation on the spatial multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as a denoising result of the current frame.
A third aspect of the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in a method for ray-tracing denoising of rendering consecutive frames as described above.
A fourth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a method for ray-tracing denoising of successive frames for rendering a succession of frames as described above when executing the program.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a denoising method for ray tracing rendering continuous frames, which can be suitable for real-time denoising under sparse MC path tracing indirect illumination, can remove coloring aliasing and over-blurring existing in the existing post-processing denoising technology and trailing phenomena caused by using SVGFs (scalable vector graphics) motion vectors on the premise of obtaining a sequence frame which is free of noise and stable in time, and obtains a high-quality denoising result.
The invention provides a denoising method for ray tracing rendering continuous frames, which uses a double-motion vector method in a time reprojection stage, takes a denoising result of a previous frame as input, uses double-motion vectors to rapidly distort and align the double-motion vectors with a current frame, and obtains a distorted reprojected image of the previous frame which is more reasonable than the prior network-based real-time denoising technology.
The invention provides a denoising method for ray tracing rendering continuous frames, which provides a mask-based motion occlusion loss function, can fully utilize SVGF motion vectors and time-reliable semantic information of double motion vectors, quickly calculate and obtain a weighted mask with gradual time domain, accelerate convergence of a network at a motion occlusion area, and eliminate a tailing phenomenon of the area.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a schematic diagram of dual motion vectors according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of a motion occlusion region of a warped re-projection obtained by different motion vectors according to a first embodiment of the present invention;
FIG. 3 is a diagram of the visualization result of different motion vectors and the generated warped reprojection result according to the first embodiment of the present invention;
FIG. 4 is a schematic diagram of a mask used in the loss function according to the first embodiment of the present invention;
FIG. 5 is a flowchart illustrating a method for denoising ray tracing rendered continuous frames according to a first embodiment of the present invention;
FIG. 6 is a diagram of a core prediction network architecture according to a first embodiment of the present invention;
FIG. 7 is a comparison graph of denoising results according to the first embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
The embodiment provides a denoising method for ray tracing rendering continuous frames, which can be divided into four main stages: a time re-projection stage, a kernel prediction stage, a space-time multi-scale mixing stage and a cyclic feedback stage, wherein the flow chart is shown in fig. 5, and the method specifically comprises the following steps:
step 1, time re-projection stage: obtaining a time-reliable dual motion vector (TRMV) of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame
Figure BDA0003417211950000061
In the process, the final denoising result of the previous frame is used as input, and the time-reliable double motion vectors are used for performing pixel-by-pixel mapping on the input, so that a distorted reprojection image of the previous frame which is more reasonable and aligned with the current frame is rapidly generated and is respectively used for the input of a neural network and the subsequent pixel-by-pixel weighted mixing.
S101: two-dimensional temporally reliable dual motion vector (m) for current frame obtained from ray tracing rendererx,my) The dimensions of the two dimensions are the resolution of the rendered picture (1280 × 720), and the value is calculated as shown in fig. 1, which means the difference between the pixel position of the current frame and the pixel position corresponding to the last frame to which the surface of the object where the shading point is located, where m isxAnd myThe offsets of the pixel points in the current frame with respect to the previous frame in the x-axis and y-axis directions, respectively. The dual motion vectors obtained using the calculation method shown in fig. 1 will have a different result on occlusion areas than using conventional motion vectors (SVGF motion vectors).
S102: obtaining the denoising result of the previous frame from the loop of the network
Figure BDA0003417211950000062
S103: creating a two-dimensional tensor storing pixel coordinates, directly adding the two-dimensional tensor and the double motion vectors obtained in the step S101 to obtain the pixel coordinate mapping relation between the current frame and the previous frame according to the formula (1)Using grid _ sample function and bilinear interpolation mode in the Pythroch frame, using the pixel coordinate mapping obtained just before, and denoising the previous frame obtained in the step S102
Figure BDA0003417211950000071
Quickly aligning with the current frame to obtain the distorted re-projection of the previous frame
Figure BDA0003417211950000072
Figure BDA0003417211950000073
The key step of the above synthesis method is S101, and fig. 1 shows the respective calculation methods of the SVGF motion vector and the time-reliable dual motion vector. As shown in FIG. 1, the moving object (blue circle) is moving to the left, and a moving shelter x needs to be obtainediThe color of the dot. SVGF motion vector gives xiThe correspondence of → y (pointing to a moving object in the previous frame). The dual motion vector tracks the y → z motion from the previous frame to the current frame using the motion of the occluding object. Then according to xiAnd z (red dotted arrow), find position x in the previous framei-1To obtain xi→xi-1The corresponding relationship of (1). This process can be simply expressed by equations (2) to (4), where P is the viewport transformation matrix multiplied by the model view projection transformation matrix for each frame, and T is the geometric transformation matrix between frames.
xi-1=y+(xi-z) (2)
Figure BDA0003417211950000075
Figure BDA0003417211950000074
FIG. 2 illustrates the difference in motion occlusion regions for warped re-projection obtained for two motion vectors; fig. 3 shows this content in a visual manner, where a conventional motion vector closely fits the edge of a moving object, and a double motion vector has a gap in the opposite direction of the motion of the blocking object, so that in a motion blocking area, the former (SVGF motion vector) creates a repeated pasting mode of a moving object, the color of the repeated pasting mode is often greatly different from that of the background, and the latter (double motion vector) reuses background information to solve the problem of too large color difference, thereby further reducing the possibility of tailing of the final denoising result.
The warped re-projection results obtained in step 1 are superior to other prior art results using SVGF motion vectors because the repeated background regions are generally closer to the occluded background region than the color of the foreground moving object, providing more efficient and accurate historical time information.
Step 2, obtaining a spp noise map of the current frame 1, and auxiliary characteristics (albedo, normal and depth) of a geometric buffer (G buffer), and combining the warped re-projection of the previous frame obtained in the step 1
Figure BDA0003417211950000081
Splicing into a tensor; and inputting the tensor into a space-time multi-scale denoising network based on the neural network to obtain a denoising result of the current frame. The spatio-temporal multi-scale denoising network comprises a Kernel prediction network (KPCN) and a spatio-temporal multi-scale mixing network.
The method comprises the steps of utilizing a multiscale hierarchical core prediction network built based on a Pythroch frame, tracking a rendering result and a geometric cache (G buffer) by an MC (media controller) path based on an Optix frame, using a previous frame distorted and re-projected image in an additional time re-projection stage as input, training a space-time multiscale pixel-by-pixel core weight and a mixed weight through the hierarchical core prediction network, quickly applying to a current frame noise image and a previous frame distorted and re-projected image of different scales, and outputting a high-quality denoising sequence without noise, flicker and tailing phenomena, and specifically comprising the following steps:
step 201, kernel prediction stage: collecting the current frame 1spp noise map, G buffer assistant features (albedo, normal and depth) and the warped re-projection obtained in step 1
Figure BDA0003417211950000082
And (2) serially splicing a (720, 1280, N) tensor according to the third dimension as an input, sending the tensor into a kernel prediction network to extract the characteristics of the input tensor, and respectively outputting pixel-by-pixel kernels (including a pixel-by-pixel space kernel, a pixel-by-pixel time kernel, a time mixing weight and an interlayer mixing weight) with corresponding image sizes in the three layers from the bottom to the top. Fig. 6 shows details of a kernel prediction network, which is an encoder-decoder structure, using convolution layers of 3 × 3 kernel size and a Relu activation function, using 2 × 2 max pooling downsampling for the first half of each layer, bilinear interpolation upsampling for the second half of each layer, and using a skip connection between layers of the same scale, forming a symmetric U-type network structure. The core prediction network comprises three output layers: a first output layer, i.e., a last layer output (720, 1280, 51 × 3) core, a second output layer, i.e., a second to last layer output (360, 640, 26 × 3) core, and a third output layer, i.e., a third to last layer output (180, 320, 26 × 3) core. Where 51 is 5 × 5 spatial kernel +5 × 5 temporal kernel + temporal blending weight, 26 is 5 × 5 spatial kernel + inter-layer blending weight, and 3 is the number of RGB channels.
Step 202, a spatio-temporal multi-scale mixing stage: spatial hierarchical blending and temporal blending are mainly performed. The spatio-temporal multi-scale mixing network applies the three layers of pixel-by-pixel space kernels obtained in the step 201 to noise maps of corresponding sizes and scales, and obtains a spatial multi-scale mixing result by weighting and summing the interlayer mixing weights; simultaneously, applying the pixel-by-pixel time kernel to the distorted re-projection of the previous frame to obtain a time filtering result; and using the time mixing weight to perform weighted summation on the spatial multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as a denoising result of the current frame.
For spatial blending, as shown in equation (5), for each inter-layer blending, c is a relatively coarse scale, f is a relatively fine scale, D is 2 × 2 max pooled downsampling, U is bilinear upsampling, and i is the filtering result after applying the kernel. Using the three 5 x 5 pixel-by-pixel spatial kernels obtained from step 201, the spatial kernels are separately binnedApplying to noise map inputs of original size, 1/2 original size, and 1/4 original size, obtaining i, and using inter-layer blending weight alpha weighted summation to obtain spatial multi-scale blending results
Figure BDA0003417211950000091
Specifically, first, the third output layer and the second output layer are mixed, for which the noise map (360, 640, 3) of the artwork size of the second output layer 1/2 is of a relatively fine scale and the noise map (180, 320, 3) of the artwork size of the third output layer 1/4 is of a relatively coarse scale, and the spatial kernel of (360, 640, 25 × 3) obtained by the kernel prediction network from the second output layer is applied to the second output layer noise map (360, 640, 3), obtaining if. Similarly, (180, 320, 25 × 3) obtained by the core prediction network from the third last output layer is applied to the third output layer noise map (180, 320, 3) to obtain ic. Finally, the (180, 320, 1) inter-layer mixing weight alpha obtained by using the last-but-third output layer of the kernel prediction networkcBlending is performed (i needs to be matched since the inter-layer blending weight resolution is a relatively coarse scalefApplying weight to down-sampling and then up-sampling; to icUp-sampling is required after directly applying the weights), and then the mixed result of the third output layer and the second output layer is subjected to similar hierarchical mixing with the first output layer. The formula is as follows:
Figure BDA0003417211950000101
where p is the relatively fine-scale pixel index,
Figure BDA0003417211950000102
the result of the filtering at pixel p is a relatively fine scale. U [ alpha ] alphac[Dif]]pAs a result of filtering i of relatively fine scalefDi is obtained after down samplingfApplying a relatively coarse-scale inter-layer mixing weight alphacThen, the data is restored to U [ alpha ] with the size of fine scale through up-sampling Uc[Dif]]The value at pixel p.U[αc[ic]]pAs a result of filtering i of a relatively coarse scalecMixing the weight alpha between layers in the application of a relatively coarse regimecThereafter, upsampling yields a U [ α ] of relatively fine-scale resolutionc[ic]]The value at pixel p. And the spatial level mixing stage obtains a spatial multi-scale mixing result.
For temporal blending, the warped re-projection obtained in stage 1 is applied using a 5 × 5 pixel-by-pixel temporal kernel obtained from stage 201
Figure BDA0003417211950000103
Obtaining a temporal filtering result
Figure BDA0003417211950000104
The temporal blending weight α obtained in step 201 is then usedtFiltering the result in time
Figure BDA0003417211950000105
Mixing results with spatial multiscale
Figure BDA0003417211950000106
Weighted summation to obtain space-time multi-scale mixed result
Figure BDA0003417211950000107
(i.e., the final denoised output of the current frame). The formula is as follows:
Figure BDA0003417211950000108
step 203, a cyclic feedback stage, according to the formula (7), performing the space loss of L1 and the time loss of L1 (delta I) on the space-time multi-scale mixing result obtained in the step 202i=Ii-Ii-1,ΔRi=Ri-Ri-1) And motion occlusion loss LoccIs calculated, back-propagated, and the blended result is stored as in the next iteration
Figure BDA0003417211950000109
For use in step 1. That is, the loss function of the spatio-temporal multi-scale denoising network is the sum of the spatial loss, the temporal loss and the motion occlusion loss:
L=L1(Ii,Ri)+L1(ΔIi,ΔRi)+Locc (7)
wherein the motion shielding loss LoccSpecifically, for each frame of a continuous frame sequence, a single-layer mask is obtained by using a motion vector of a traditional screen space and a time-reliable double motion vector of each frame, and the additional masks of the previous frame and the previous frame are superimposed by using progressive descending weight of a time domain to perform loss calculation on a current frame denoising result and a reference image. The calculation process specifically comprises the following steps:
(1) for the ith frame, acquiring a denoising result I of the current framei2048spp comparison map RiAnd the following six motion vectors (motion vectors of the current frame and a plurality of adjacent frames): SVGF motion vector and double motion vector of the ith frame, the (i-1) th frame and the (i-2) th frame. For some special cases, such as the 1 st frame of the sequence only acquires two motion vectors, and the 2 nd frame only acquires four motion vectors.
(2) According to the method shown in fig. 4, a mask is quickly calculated by using the motion vector obtained in step (1), wherein the mask is a difference value between the double motion vector of the current frame or the adjacent frame and the SVGF motion vector, specifically: subtracting the SVGF motion vector by the double motion vector of the ith frame to obtain a mask
Figure BDA0003417211950000111
Similarly, the SVGF motion vector is subtracted from the dual motion vector of frame i-1 to obtain a mask
Figure BDA0003417211950000112
Subtracting the SVGF motion vector by the double motion vector of the i-2 th frame to obtain a mask
Figure BDA0003417211950000113
For special cases, e.g. frame 1 of the sequence
Figure BDA0003417211950000114
And
Figure BDA0003417211950000115
is directly arranged as
Figure BDA0003417211950000116
Of frame 2
Figure BDA0003417211950000117
Is directly arranged as
Figure BDA0003417211950000118
(3) For progressive accumulation over the time domain, the plurality of masks obtained from step (2) are subjected to
Figure BDA0003417211950000119
And
Figure BDA00034172119500001110
respectively set different weights: (
Figure BDA00034172119500001111
The number of the carbon atoms is 1,
Figure BDA00034172119500001112
the content of the organic acid is 0.8,
Figure BDA00034172119500001113
0.6) and then directly added to obtain the final time domain progressive weighted mask for the ith frame;
(4) application of the Hadamard product [ [ beta ] ] to I obtained in step (1) according to equations (9) and (10)i、RiAnd S203, multiplying the elements at the same position in the matrix to obtain the masked current frame denoising result
Figure BDA00034172119500001114
And comparison map
Figure BDA00034172119500001115
According to the formula (8), for
Figure BDA00034172119500001116
And
Figure BDA00034172119500001117
calculating the L1 loss to obtain the motion occlusion loss LOCCAnd combined into the final loss function.
Figure BDA00034172119500001118
Figure BDA00034172119500001119
Figure BDA0003417211950000121
Since the occluded regions of each frame are too narrow and elongated relative to the entire input tensor, the network cannot spontaneously pay additional attention to these regions, which results in the occluded regions not converging enough (at least not as much as elsewhere) resulting in residual trailing visual artifacts. The motion occlusion loss function provided by this embodiment uses a time-domain progressive weighted mask to define the loss calculation range and the attention degree, so that the attention of the network to these regions can be increased, the convergence speed of these regions can be further increased, and the residual trailing effect can be eliminated.
In this example, using the pytorech framework, Xavier initialization and Adam optimizer, the initial learning rate was 0.0001. Each training run was 200 generations. The 1spp noise map, 2048spp contrast map, assist feature buffer (albedo, depth, normal) and two motion vectors were obtained using the invida Optix framework.
The data set of this example was obtained from 4 different perspectives of the scene for a total of 281 slices, 142 for training and 139 for testing. Each slice contains 10 consecutive frames, first9 frames were used for training and the last 1 frame for validation. Warped reprojection of the first frame per slice
Figure BDA0003417211950000122
And the denoising result of the previous frame
Figure BDA0003417211950000123
Is initialized to the noise input.
The graphic processor is NVIDIA GeForce RTX 2080Ti in england, the video memory is 12GB, and all rendering results of this embodiment are obtained by testing in 1280 × 720 screen space. The denoising method of the embodiment can be performed at a real-time rate, the average denoising time is 7.9ms per frame, and the method can be subdivided into four time-consuming parts: the network extracts features and predicts pixel-by-pixel kernels (3.1ms), applies kernels to the multi-scale image (3.4ms), mixes the results of spatial and temporal filtering (0.8ms) and computes temporal re-projections (0.6 ms).
Compared with the prior art, the numerical value and the result are obviously improved. The embodiment reduces the mean square error MSE to 0.0002, and obviously removes the problems of over-blurring, colored aliasing and tailing existing in the prior art. Fig. 7 shows the denoising result of the present embodiment.
The invention uses a neural network-based space-time multi-scale network for indirect illumination time and space domain post-processing denoising of MC path tracking under real-time sparse sample counting (1 spp); noise and interframe flicker are removed through a spatio-temporal multiscale kernel prediction neural network, a high-quality noiseless time-stable continuous frame sequence can be obtained at a real-time rate, and compared with the prior art, the problems of coloring aliasing and excessive blurring are solved; besides generating a high-quality noise-free image and maintaining time stability, a reasonable previous frame distorted reprojection image is generated through a time-reliable double motion vector (TRMV), and a residual tailing phenomenon is reduced by using a motion shielding loss function, so that the method is suitable for a dynamic scene with a moving object in the scene.
Example two
The embodiment provides a denoising system for ray tracing rendering of continuous frames, which specifically comprises the following modules:
a warp reprojection calculation module configured to: acquiring a double motion vector of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame;
a stitching module configured to: acquiring a noise image and auxiliary characteristics of a current frame, combining with the distortion reprojection of the previous frame, and splicing into a tensor;
a spatiotemporal multi-scale denoising module configured to: inputting the tensor into a space-time multi-scale denoising network to obtain a denoising result of the current frame;
the space-time multi-scale denoising network comprises a nuclear prediction network and a space-time multi-scale mixing network;
the kernel prediction network is used for extracting the features of the tensor to obtain a pixel-by-pixel space kernel, a pixel-by-pixel time kernel, a time mixing weight and an interlayer mixing weight;
the space-time multi-scale mixing network applies the pixel-by-pixel space kernel to a multi-scale noise map, and obtains a space multi-scale mixing result by weighting and summing interlayer mixing weights; simultaneously, applying the pixel-by-pixel time kernel to the distorted re-projection of the previous frame to obtain a time filtering result; and using the time mixing weight to perform weighted summation on the spatial multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as a denoising result of the current frame.
The loss function of the space-time multi-scale denoising network is the sum of space loss, time loss and motion shielding loss.
It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.
EXAMPLE III
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in a method for denoising ray-tracing rendered consecutive frames as described in the first embodiment above.
Example four
The present embodiment provides a computer device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, and the processor executes the program to implement the steps in the denoising method for ray tracing rendering continuous frames as described in the first embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A denoising method for ray tracing rendering continuous frames is characterized by comprising the following steps:
acquiring a double motion vector of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame;
acquiring a noise image and auxiliary characteristics of a current frame, combining with the distortion reprojection of the previous frame, and splicing into a tensor;
inputting the tensor into a space-time multi-scale denoising network to obtain a denoising result of the current frame; the space-time multi-scale denoising network comprises a nuclear prediction network and a space-time multi-scale mixing network;
the kernel prediction network is used for extracting the features of the tensor to obtain a pixel-by-pixel space kernel, a pixel-by-pixel time kernel, a time mixing weight and an interlayer mixing weight;
the space-time multi-scale mixing network applies pixel-by-pixel space kernels to the multi-scale noise map, and obtains a space multi-scale mixing result by weighting and summing interlayer mixing weights; simultaneously, applying the pixel-by-pixel time kernel to the distorted re-projection of the previous frame to obtain a time filtering result; and using the time mixing weight to perform weighted summation on the spatial multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as a denoising result of the current frame.
2. The method of claim 1, wherein the warping re-projection of the previous frame is computed by:
creating a two-dimensional tensor for storing pixel coordinates and directly adding the two-dimensional tensor with the double motion vectors to obtain a pixel coordinate mapping relation between the current frame and the previous frame;
and quickly aligning the denoising result of the previous frame with the current frame by using the pixel coordinate mapping relation to obtain the distortion reprojection of the previous frame.
3. The method of claim 1, wherein the loss function of the spatiotemporal multi-scale denoising network is a sum of spatial loss, temporal loss, and motion occlusion loss.
4. The method as claimed in claim 3, wherein the motion occlusion loss is calculated by:
obtaining motion vectors of a current frame and a plurality of adjacent frames, and a denoising result and a comparison map of the current frame;
calculating a time domain progressive weighted mask of the current frame based on the motion vectors of the current frame and a plurality of adjacent frames;
multiplying the time domain progressive weighted mask with the denoising result of the current frame and the comparison map according to the corresponding pixels of the matrix by using a Hadamard product;
and calculating the motion shielding loss based on the masked denoising result and the comparison map.
5. The method of claim 4, wherein the progressive weighted mask of the current frame is a weighted sum of a plurality of masks;
and the mask is the difference value between the double motion vector of the current frame or the adjacent frame and the SVGF motion vector.
6. The method of claim 1, wherein the auxiliary features include albedo, depth and normal of the current frame.
7. A system for ray tracing denoising in rendering successive frames, comprising:
a warp reprojection calculation module configured to: acquiring a double motion vector of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame;
a stitching module configured to: acquiring a noise image and auxiliary characteristics of a current frame, combining with the distortion reprojection of the previous frame, and splicing into a tensor;
a spatiotemporal multi-scale denoising module configured to: inputting the tensor into a space-time multi-scale denoising network to obtain a denoising result of the current frame;
the space-time multi-scale denoising network comprises a nuclear prediction network and a space-time multi-scale mixing network;
the kernel prediction network is used for extracting the features of the tensor to obtain a pixel-by-pixel space kernel, a pixel-by-pixel time kernel, a time mixing weight and an interlayer mixing weight;
the space-time multi-scale mixing network applies the pixel-by-pixel space kernel to a multi-scale noise map, and obtains a space multi-scale mixing result by weighting and summing interlayer mixing weights; simultaneously, applying the pixel-by-pixel time kernel to the distorted re-projection of the previous frame to obtain a time filtering result; and using the time mixing weight to perform weighted summation on the spatial multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as a denoising result of the current frame.
8. The system of claim 7, wherein the loss function of the spatiotemporal multi-scale denoising network is a sum of spatial loss, temporal loss, and motion occlusion loss.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method for denoising ray-tracing rendered successive frames according to any one of claims 1 to 6.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps in a method of ray-tracing denoising for rendering successive frames as claimed in any of claims 1-6.
CN202111551086.3A 2021-12-17 2021-12-17 Denoising method and system for ray tracing rendering continuous frames Pending CN114240785A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111551086.3A CN114240785A (en) 2021-12-17 2021-12-17 Denoising method and system for ray tracing rendering continuous frames

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111551086.3A CN114240785A (en) 2021-12-17 2021-12-17 Denoising method and system for ray tracing rendering continuous frames

Publications (1)

Publication Number Publication Date
CN114240785A true CN114240785A (en) 2022-03-25

Family

ID=80757700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111551086.3A Pending CN114240785A (en) 2021-12-17 2021-12-17 Denoising method and system for ray tracing rendering continuous frames

Country Status (1)

Country Link
CN (1) CN114240785A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187491A (en) * 2022-09-08 2022-10-14 阿里巴巴(中国)有限公司 Image noise reduction processing method, image filtering processing method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187491A (en) * 2022-09-08 2022-10-14 阿里巴巴(中国)有限公司 Image noise reduction processing method, image filtering processing method and device
CN115187491B (en) * 2022-09-08 2023-02-17 阿里巴巴(中国)有限公司 Image denoising processing method, image filtering processing method and device

Similar Documents

Publication Publication Date Title
CN109003282B (en) Image processing method and device and computer storage medium
CN107025632B (en) Image super-resolution reconstruction method and system
US7876378B1 (en) Method and apparatus for filtering video data using a programmable graphics processor
CN110634147B (en) Image matting method based on bilateral guide up-sampling
US9202258B2 (en) Video retargeting using content-dependent scaling vectors
US9565414B2 (en) Efficient stereo to multiview rendering using interleaved rendering
CN107993208A (en) It is a kind of based on sparse overlapping group prior-constrained non local full Variational Image Restoration method
KR101987079B1 (en) Method for removing noise of upscaled moving picture with dynamic parameter based on machine learning
KR100860968B1 (en) Image-resolution-improvement apparatus and method
Li et al. Underwater image high definition display using the multilayer perceptron and color feature-based SRCNN
KR20190059157A (en) Method and Apparatus for Improving Image Quality
WO2005031653A1 (en) Generation of motion blur
JP2013250891A (en) Super-resolution method and apparatus
WO2014008329A1 (en) System and method to enhance and process a digital image
Briedis et al. Neural frame interpolation for rendered content
CN114240785A (en) Denoising method and system for ray tracing rendering continuous frames
Zuo et al. View synthesis with sculpted neural points
Yu et al. Learning to super-resolve blurry images with events
CN114022809A (en) Video motion amplification method based on improved self-coding network
EP3939248B1 (en) Re-timing objects in video via layered neural rendering
Takeda et al. Spatiotemporal video upscaling using motion-assisted steering kernel (mask) regression
Qin An improved super resolution reconstruction method based on initial value estimation
Soh et al. Joint high dynamic range imaging and super-resolution from a single image
CN111899166A (en) Medical hyperspectral microscopic image super-resolution reconstruction method based on deep learning
Zhang et al. Video superresolution reconstruction using iterative back projection with critical-point filters based image matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination