CN114240785A

CN114240785A - Denoising method and system for ray tracing rendering continuous frames

Info

Publication number: CN114240785A
Application number: CN202111551086.3A
Authority: CN
Inventors: 王璐; 曾言; 徐延宁; 孟祥旭
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-03-25

Abstract

The invention provides a denoising method and a denoising system for ray tracing rendering continuous frames, wherein the denoising method comprises the following steps: acquiring a double motion vector of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame; acquiring a noise image and auxiliary characteristics of a current frame, combining with the distortion reprojection of the previous frame, and splicing into a tensor; inputting the tensor into a space-time multi-scale denoising network to obtain a denoising result of the current frame; the space-time multi-scale denoising network comprises: and the kernel prediction network and the space-time multi-scale mixing network combine the time filtering result and the space multi-scale mixing result to obtain the denoising result of the current frame. On the premise of ensuring to obtain a high-quality denoising result, the phenomena of over-blurring, coloring aliasing and tailing existing in the prior art are eliminated.

Description

Denoising method and system for ray tracing rendering continuous frames

Technical Field

The invention belongs to the technical field of high-reality rendering post-processing denoising, and particularly relates to a denoising method and system for rendering continuous frames by ray tracing.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Ray tracing techniques based on Monte Carlo (MC) integration are widely used in physics-based photorealistic rendering. Conventional MC path Tracing requires Time-consuming computations and a large number of samples per pixel (spp) to obtain a considerable rendering result, but due to the Real-Time frame rate requirement (>30FPS) and the limitation of current hardware devices, Real-Time Ray Tracing (RTRT) can only use 1spp, and an extremely low sampling rate will result in high variance and visually appear as disturbing noise.

Some approaches, such as Temporal Anti-Aliasing (TAA) and neural network-based post-processing denoising methods, have been proposed to enable the acquisition of a continuous sequence of noiseless ray-tracing rendered frames available at interactive rates, even real-time rates. As is known from the characteristics of ray tracing rendering of consecutive frames, flicker (flickering) between consecutive frames needs to be removed in addition to noise (noise). Motion vectors (Motion vectors) introduce historical frame information for reuse in the time domain, and play a key role in removing inter-frame flicker. A conventional screen-based motion vector (SVGF) is a two-dimensional vector that points from each pixel coordinate of a current frame to the corresponding pixel coordinate of an object in a previous frame, but it fails at shadows, gloss (gloss) material reflections, and motion occlusion. Particularly for motion occlusion regions, this region is defined as: as the moving object moves, the previous frame is occluded and the background region just appeared in this frame. The SVGF motion vector of the area is always a moving foreground object of the previous frame, which results in unreusable time information and severe tailing (framing) phenomenon.

Disclosure of Invention

In order to solve the technical problems in the background art, the invention provides a denoising method and a denoising system for ray tracing rendering continuous frames, which can remove coloring aliasing, over-blurring and tailing phenomena and obtain a high-quality denoising result on the premise of obtaining a sequence frame which is free of noise and stable in time.

In order to achieve the purpose, the invention adopts the following technical scheme:

the first aspect of the present invention provides a method for denoising ray tracing rendering continuous frames, comprising:

acquiring a double motion vector of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame;

acquiring a noise image and auxiliary characteristics of a current frame, combining with the distortion reprojection of the previous frame, and splicing into a tensor;

inputting the tensor into a space-time multi-scale denoising network to obtain a denoising result of the current frame; the space-time multi-scale denoising network comprises a nuclear prediction network and a space-time multi-scale mixing network;

the kernel prediction network is used for extracting the features of the tensor to obtain a pixel-by-pixel space kernel, a pixel-by-pixel time kernel, a time mixing weight and an interlayer mixing weight;

the space-time multi-scale mixing network applies pixel-by-pixel space kernels to the multi-scale noise map, and obtains a space multi-scale mixing result by weighting and summing interlayer mixing weights; and meanwhile, applying the pixel-by-pixel time kernel to the distortion reprojection of the previous frame to obtain a time filtering result, using the time mixing weight to perform weighted summation on the space multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as the denoising result of the current frame.

Further, the specific method for computing the warped re-projection of the previous frame is as follows:

creating a two-dimensional tensor for storing pixel coordinates and directly adding the two-dimensional tensor with the double motion vectors to obtain a pixel coordinate mapping relation between the current frame and the previous frame;

and quickly aligning the denoising result of the previous frame with the current frame by using the pixel coordinate mapping relation to obtain the distortion reprojection of the previous frame.

Further, the loss function of the space-time multi-scale denoising network adopts the sum of space loss, time loss and motion shielding loss.

Further, the specific calculation method of the motion occlusion loss is as follows:

obtaining motion vectors of a current frame and a plurality of adjacent frames, and a denoising result and a comparison map of the current frame;

calculating a time domain progressive weighted mask of the current frame based on the motion vectors of the current frame and a plurality of adjacent frames;

multiplying the time domain progressive weighted mask with the denoising result of the current frame and the comparison map according to the corresponding pixels of the matrix by using a Hadamard product;

and calculating the motion shielding loss based on the masked denoising result and the comparison map.

Further, the time-domain progressive weighted mask of the current frame is a weighted sum of a plurality of masks;

and the mask is the difference value between the double motion vector of the current frame or the adjacent frame and the SVGF motion vector.

Further, the assistant features include an albedo, a depth, and a normal of the current frame.

A second aspect of the present invention provides a denoising system for ray tracing rendering of continuous frames, comprising:

a warp reprojection calculation module configured to: acquiring a double motion vector of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame;

a stitching module configured to: acquiring a noise image and auxiliary characteristics of a current frame, combining with the distortion reprojection of the previous frame, and splicing into a tensor;

a spatiotemporal multi-scale denoising module configured to: inputting the tensor into a space-time multi-scale denoising network to obtain a denoising result of the current frame;

the space-time multi-scale denoising network comprises a nuclear prediction network and a space-time multi-scale mixing network;

the space-time multi-scale mixing network applies the pixel-by-pixel space kernel to a multi-scale noise map, and obtains a space multi-scale mixing result by weighting and summing interlayer mixing weights; simultaneously, applying the pixel-by-pixel time kernel to the distorted re-projection of the previous frame to obtain a time filtering result; and using the time mixing weight to perform weighted summation on the spatial multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as a denoising result of the current frame.

A third aspect of the invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps in a method for ray-tracing denoising of rendering consecutive frames as described above.

A fourth aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a method for ray-tracing denoising of successive frames for rendering a succession of frames as described above when executing the program.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a denoising method for ray tracing rendering continuous frames, which can be suitable for real-time denoising under sparse MC path tracing indirect illumination, can remove coloring aliasing and over-blurring existing in the existing post-processing denoising technology and trailing phenomena caused by using SVGFs (scalable vector graphics) motion vectors on the premise of obtaining a sequence frame which is free of noise and stable in time, and obtains a high-quality denoising result.

The invention provides a denoising method for ray tracing rendering continuous frames, which uses a double-motion vector method in a time reprojection stage, takes a denoising result of a previous frame as input, uses double-motion vectors to rapidly distort and align the double-motion vectors with a current frame, and obtains a distorted reprojected image of the previous frame which is more reasonable than the prior network-based real-time denoising technology.

The invention provides a denoising method for ray tracing rendering continuous frames, which provides a mask-based motion occlusion loss function, can fully utilize SVGF motion vectors and time-reliable semantic information of double motion vectors, quickly calculate and obtain a weighted mask with gradual time domain, accelerate convergence of a network at a motion occlusion area, and eliminate a tailing phenomenon of the area.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a schematic diagram of dual motion vectors according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a motion occlusion region of a warped re-projection obtained by different motion vectors according to a first embodiment of the present invention;

FIG. 3 is a diagram of the visualization result of different motion vectors and the generated warped reprojection result according to the first embodiment of the present invention;

FIG. 4 is a schematic diagram of a mask used in the loss function according to the first embodiment of the present invention;

FIG. 5 is a flowchart illustrating a method for denoising ray tracing rendered continuous frames according to a first embodiment of the present invention;

FIG. 6 is a diagram of a core prediction network architecture according to a first embodiment of the present invention;

FIG. 7 is a comparison graph of denoising results according to the first embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

The embodiment provides a denoising method for ray tracing rendering continuous frames, which can be divided into four main stages: a time re-projection stage, a kernel prediction stage, a space-time multi-scale mixing stage and a cyclic feedback stage, wherein the flow chart is shown in fig. 5, and the method specifically comprises the following steps:

step 1, time re-projection stage: obtaining a time-reliable dual motion vector (TRMV) of a current frame and a denoising result of a previous frame, and calculating a distortion reprojection of the previous frame

In the process, the final denoising result of the previous frame is used as input, and the time-reliable double motion vectors are used for performing pixel-by-pixel mapping on the input, so that a distorted reprojection image of the previous frame which is more reasonable and aligned with the current frame is rapidly generated and is respectively used for the input of a neural network and the subsequent pixel-by-pixel weighted mixing.

S101: two-dimensional temporally reliable dual motion vector (m) for current frame obtained from ray tracing renderer_x,m_y) The dimensions of the two dimensions are the resolution of the rendered picture (1280 × 720), and the value is calculated as shown in fig. 1, which means the difference between the pixel position of the current frame and the pixel position corresponding to the last frame to which the surface of the object where the shading point is located, where m is_xAnd m_yThe offsets of the pixel points in the current frame with respect to the previous frame in the x-axis and y-axis directions, respectively. The dual motion vectors obtained using the calculation method shown in fig. 1 will have a different result on occlusion areas than using conventional motion vectors (SVGF motion vectors).

S102: obtaining the denoising result of the previous frame from the loop of the network

S103: creating a two-dimensional tensor storing pixel coordinates, directly adding the two-dimensional tensor and the double motion vectors obtained in the step S101 to obtain the pixel coordinate mapping relation between the current frame and the previous frame according to the formula (1)Using grid _ sample function and bilinear interpolation mode in the Pythroch frame, using the pixel coordinate mapping obtained just before, and denoising the previous frame obtained in the step S102

Quickly aligning with the current frame to obtain the distorted re-projection of the previous frame

The key step of the above synthesis method is S101, and fig. 1 shows the respective calculation methods of the SVGF motion vector and the time-reliable dual motion vector. As shown in FIG. 1, the moving object (blue circle) is moving to the left, and a moving shelter x needs to be obtained_iThe color of the dot. SVGF motion vector gives x_iThe correspondence of → y (pointing to a moving object in the previous frame). The dual motion vector tracks the y → z motion from the previous frame to the current frame using the motion of the occluding object. Then according to x_iAnd z (red dotted arrow), find position x in the previous frame_i-1To obtain x_i→x_i-1The corresponding relationship of (1). This process can be simply expressed by equations (2) to (4), where P is the viewport transformation matrix multiplied by the model view projection transformation matrix for each frame, and T is the geometric transformation matrix between frames.

x_i-1＝y+(x_i-z) (2)

FIG. 2 illustrates the difference in motion occlusion regions for warped re-projection obtained for two motion vectors; fig. 3 shows this content in a visual manner, where a conventional motion vector closely fits the edge of a moving object, and a double motion vector has a gap in the opposite direction of the motion of the blocking object, so that in a motion blocking area, the former (SVGF motion vector) creates a repeated pasting mode of a moving object, the color of the repeated pasting mode is often greatly different from that of the background, and the latter (double motion vector) reuses background information to solve the problem of too large color difference, thereby further reducing the possibility of tailing of the final denoising result.

The warped re-projection results obtained in step 1 are superior to other prior art results using SVGF motion vectors because the repeated background regions are generally closer to the occluded background region than the color of the foreground moving object, providing more efficient and accurate historical time information.

Step 2, obtaining a spp noise map of the current frame 1, and auxiliary characteristics (albedo, normal and depth) of a geometric buffer (G buffer), and combining the warped re-projection of the previous frame obtained in the step 1

Splicing into a tensor; and inputting the tensor into a space-time multi-scale denoising network based on the neural network to obtain a denoising result of the current frame. The spatio-temporal multi-scale denoising network comprises a Kernel prediction network (KPCN) and a spatio-temporal multi-scale mixing network.

The method comprises the steps of utilizing a multiscale hierarchical core prediction network built based on a Pythroch frame, tracking a rendering result and a geometric cache (G buffer) by an MC (media controller) path based on an Optix frame, using a previous frame distorted and re-projected image in an additional time re-projection stage as input, training a space-time multiscale pixel-by-pixel core weight and a mixed weight through the hierarchical core prediction network, quickly applying to a current frame noise image and a previous frame distorted and re-projected image of different scales, and outputting a high-quality denoising sequence without noise, flicker and tailing phenomena, and specifically comprising the following steps:

step 201, kernel prediction stage: collecting the current frame 1spp noise map, G buffer assistant features (albedo, normal and depth) and the warped re-projection obtained in step 1

And (2) serially splicing a (720, 1280, N) tensor according to the third dimension as an input, sending the tensor into a kernel prediction network to extract the characteristics of the input tensor, and respectively outputting pixel-by-pixel kernels (including a pixel-by-pixel space kernel, a pixel-by-pixel time kernel, a time mixing weight and an interlayer mixing weight) with corresponding image sizes in the three layers from the bottom to the top. Fig. 6 shows details of a kernel prediction network, which is an encoder-decoder structure, using convolution layers of 3 × 3 kernel size and a Relu activation function, using 2 × 2 max pooling downsampling for the first half of each layer, bilinear interpolation upsampling for the second half of each layer, and using a skip connection between layers of the same scale, forming a symmetric U-type network structure. The core prediction network comprises three output layers: a first output layer, i.e., a last layer output (720, 1280, 51 × 3) core, a second output layer, i.e., a second to last layer output (360, 640, 26 × 3) core, and a third output layer, i.e., a third to last layer output (180, 320, 26 × 3) core. Where 51 is 5 × 5 spatial kernel +5 × 5 temporal kernel + temporal blending weight, 26 is 5 × 5 spatial kernel + inter-layer blending weight, and 3 is the number of RGB channels.

Step 202, a spatio-temporal multi-scale mixing stage: spatial hierarchical blending and temporal blending are mainly performed. The spatio-temporal multi-scale mixing network applies the three layers of pixel-by-pixel space kernels obtained in the step 201 to noise maps of corresponding sizes and scales, and obtains a spatial multi-scale mixing result by weighting and summing the interlayer mixing weights; simultaneously, applying the pixel-by-pixel time kernel to the distorted re-projection of the previous frame to obtain a time filtering result; and using the time mixing weight to perform weighted summation on the spatial multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as a denoising result of the current frame.

For spatial blending, as shown in equation (5), for each inter-layer blending, c is a relatively coarse scale, f is a relatively fine scale, D is 2 × 2 max pooled downsampling, U is bilinear upsampling, and i is the filtering result after applying the kernel. Using the three 5 x 5 pixel-by-pixel spatial kernels obtained from step 201, the spatial kernels are separately binnedApplying to noise map inputs of original size, 1/2 original size, and 1/4 original size, obtaining i, and using inter-layer blending weight alpha weighted summation to obtain spatial multi-scale blending results

Specifically, first, the third output layer and the second output layer are mixed, for which the noise map (360, 640, 3) of the artwork size of the second output layer 1/2 is of a relatively fine scale and the noise map (180, 320, 3) of the artwork size of the third output layer 1/4 is of a relatively coarse scale, and the spatial kernel of (360, 640, 25 × 3) obtained by the kernel prediction network from the second output layer is applied to the second output layer noise map (360, 640, 3), obtaining i^f. Similarly, (180, 320, 25 × 3) obtained by the core prediction network from the third last output layer is applied to the third output layer noise map (180, 320, 3) to obtain i^c. Finally, the (180, 320, 1) inter-layer mixing weight alpha obtained by using the last-but-third output layer of the kernel prediction network^cBlending is performed (i needs to be matched since the inter-layer blending weight resolution is a relatively coarse scale^fApplying weight to down-sampling and then up-sampling; to i^cUp-sampling is required after directly applying the weights), and then the mixed result of the third output layer and the second output layer is subjected to similar hierarchical mixing with the first output layer. The formula is as follows:

where p is the relatively fine-scale pixel index,

the result of the filtering at pixel p is a relatively fine scale. U [ alpha ] alpha^c[Di^f]]_pAs a result of filtering i of relatively fine scale^fDi is obtained after down sampling^fApplying a relatively coarse-scale inter-layer mixing weight alpha^cThen, the data is restored to U [ alpha ] with the size of fine scale through up-sampling U^c[Di^f]]The value at pixel p.U[α^c[i^c]]_pAs a result of filtering i of a relatively coarse scale^cMixing the weight alpha between layers in the application of a relatively coarse regime^cThereafter, upsampling yields a U [ α ] of relatively fine-scale resolution^c[i^c]]The value at pixel p. And the spatial level mixing stage obtains a spatial multi-scale mixing result.

For temporal blending, the warped re-projection obtained in stage 1 is applied using a 5 × 5 pixel-by-pixel temporal kernel obtained from stage 201

Obtaining a temporal filtering result

The temporal blending weight α obtained in step 201 is then used^tFiltering the result in time

Mixing results with spatial multiscale

Weighted summation to obtain space-time multi-scale mixed result

(i.e., the final denoised output of the current frame). The formula is as follows:

step 203, a cyclic feedback stage, according to the formula (7), performing the space loss of L1 and the time loss of L1 (delta I) on the space-time multi-scale mixing result obtained in the step 202_i＝I_i-I_i-1，ΔR_i＝R_i-R_i-1) And motion occlusion loss L_occIs calculated, back-propagated, and the blended result is stored as in the next iteration

For use in step 1. That is, the loss function of the spatio-temporal multi-scale denoising network is the sum of the spatial loss, the temporal loss and the motion occlusion loss:

L＝L₁(I_i,R_i)+L₁(ΔI_i,ΔR_i)+L_occ (7)

wherein the motion shielding loss L_occSpecifically, for each frame of a continuous frame sequence, a single-layer mask is obtained by using a motion vector of a traditional screen space and a time-reliable double motion vector of each frame, and the additional masks of the previous frame and the previous frame are superimposed by using progressive descending weight of a time domain to perform loss calculation on a current frame denoising result and a reference image. The calculation process specifically comprises the following steps:

(1) for the ith frame, acquiring a denoising result I of the current frame_i2048spp comparison map R_iAnd the following six motion vectors (motion vectors of the current frame and a plurality of adjacent frames): SVGF motion vector and double motion vector of the ith frame, the (i-1) th frame and the (i-2) th frame. For some special cases, such as the 1 st frame of the sequence only acquires two motion vectors, and the 2 nd frame only acquires four motion vectors.

(2) According to the method shown in fig. 4, a mask is quickly calculated by using the motion vector obtained in step (1), wherein the mask is a difference value between the double motion vector of the current frame or the adjacent frame and the SVGF motion vector, specifically: subtracting the SVGF motion vector by the double motion vector of the ith frame to obtain a mask

Similarly, the SVGF motion vector is subtracted from the dual motion vector of frame i-1 to obtain a mask

Subtracting the SVGF motion vector by the double motion vector of the i-2 th frame to obtain a mask

For special cases, e.g. frame 1 of the sequence

And

is directly arranged as

Of frame 2

Is directly arranged as

(3) For progressive accumulation over the time domain, the plurality of masks obtained from step (2) are subjected to

And

respectively set different weights: (

The number of the carbon atoms is 1,

the content of the organic acid is 0.8,

0.6) and then directly added to obtain the final time domain progressive weighted mask for the ith frame;

(4) application of the Hadamard product [ [ beta ] ] to I obtained in step (1) according to equations (9) and (10)_i、R_iAnd S203, multiplying the elements at the same position in the matrix to obtain the masked current frame denoising result

And comparison map

According to the formula (8), for

And

calculating the L1 loss to obtain the motion occlusion loss L_OCCAnd combined into the final loss function.

Since the occluded regions of each frame are too narrow and elongated relative to the entire input tensor, the network cannot spontaneously pay additional attention to these regions, which results in the occluded regions not converging enough (at least not as much as elsewhere) resulting in residual trailing visual artifacts. The motion occlusion loss function provided by this embodiment uses a time-domain progressive weighted mask to define the loss calculation range and the attention degree, so that the attention of the network to these regions can be increased, the convergence speed of these regions can be further increased, and the residual trailing effect can be eliminated.

In this example, using the pytorech framework, Xavier initialization and Adam optimizer, the initial learning rate was 0.0001. Each training run was 200 generations. The 1spp noise map, 2048spp contrast map, assist feature buffer (albedo, depth, normal) and two motion vectors were obtained using the invida Optix framework.

The data set of this example was obtained from 4 different perspectives of the scene for a total of 281 slices, 142 for training and 139 for testing. Each slice contains 10 consecutive frames, first9 frames were used for training and the last 1 frame for validation. Warped reprojection of the first frame per slice

And the denoising result of the previous frame

Is initialized to the noise input.

The graphic processor is NVIDIA GeForce RTX 2080Ti in england, the video memory is 12GB, and all rendering results of this embodiment are obtained by testing in 1280 × 720 screen space. The denoising method of the embodiment can be performed at a real-time rate, the average denoising time is 7.9ms per frame, and the method can be subdivided into four time-consuming parts: the network extracts features and predicts pixel-by-pixel kernels (3.1ms), applies kernels to the multi-scale image (3.4ms), mixes the results of spatial and temporal filtering (0.8ms) and computes temporal re-projections (0.6 ms).

Compared with the prior art, the numerical value and the result are obviously improved. The embodiment reduces the mean square error MSE to 0.0002, and obviously removes the problems of over-blurring, colored aliasing and tailing existing in the prior art. Fig. 7 shows the denoising result of the present embodiment.

The invention uses a neural network-based space-time multi-scale network for indirect illumination time and space domain post-processing denoising of MC path tracking under real-time sparse sample counting (1 spp); noise and interframe flicker are removed through a spatio-temporal multiscale kernel prediction neural network, a high-quality noiseless time-stable continuous frame sequence can be obtained at a real-time rate, and compared with the prior art, the problems of coloring aliasing and excessive blurring are solved; besides generating a high-quality noise-free image and maintaining time stability, a reasonable previous frame distorted reprojection image is generated through a time-reliable double motion vector (TRMV), and a residual tailing phenomenon is reduced by using a motion shielding loss function, so that the method is suitable for a dynamic scene with a moving object in the scene.

Example two

The embodiment provides a denoising system for ray tracing rendering of continuous frames, which specifically comprises the following modules:

The loss function of the space-time multi-scale denoising network is the sum of space loss, time loss and motion shielding loss.

It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.

EXAMPLE III

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in a method for denoising ray-tracing rendered consecutive frames as described in the first embodiment above.

Example four

The present embodiment provides a computer device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, and the processor executes the program to implement the steps in the denoising method for ray tracing rendering continuous frames as described in the first embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A denoising method for ray tracing rendering continuous frames is characterized by comprising the following steps:

the space-time multi-scale mixing network applies pixel-by-pixel space kernels to the multi-scale noise map, and obtains a space multi-scale mixing result by weighting and summing interlayer mixing weights; simultaneously, applying the pixel-by-pixel time kernel to the distorted re-projection of the previous frame to obtain a time filtering result; and using the time mixing weight to perform weighted summation on the spatial multi-scale mixing result and the time filtering result to obtain a space-time multi-scale mixing result, and taking the space-time multi-scale mixing result as a denoising result of the current frame.

2. The method of claim 1, wherein the warping re-projection of the previous frame is computed by:

3. The method of claim 1, wherein the loss function of the spatiotemporal multi-scale denoising network is a sum of spatial loss, temporal loss, and motion occlusion loss.

4. The method as claimed in claim 3, wherein the motion occlusion loss is calculated by:

5. The method of claim 4, wherein the progressive weighted mask of the current frame is a weighted sum of a plurality of masks;

6. The method of claim 1, wherein the auxiliary features include albedo, depth and normal of the current frame.

7. A system for ray tracing denoising in rendering successive frames, comprising:

8. The system of claim 7, wherein the loss function of the spatiotemporal multi-scale denoising network is a sum of spatial loss, temporal loss, and motion occlusion loss.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method for denoising ray-tracing rendered successive frames according to any one of claims 1 to 6.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps in a method of ray-tracing denoising for rendering successive frames as claimed in any of claims 1-6.