CN113096248B

CN113096248B - Photon collection method and photon mapping rendering method based on shared video memory optimization

Info

Publication number: CN113096248B
Application number: CN202110339915.5A
Authority: CN
Inventors: 周闻达; 段元兴; 李胜
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2022-05-03
Anticipated expiration: 2041-03-30
Also published as: CN113096248A

Abstract

The invention discloses a photon collection method and a photon mapping rendering method based on shared video memory optimization. The photon collection method comprises the following steps: 1) dividing the whole three-dimensional scene to be rendered into a Hash grid; 2) the light source emits photons, and when the photons intersect with the diffuse reflection surface in the scene, the position, energy and incident direction information of the photons are recorded; 3) calculating corresponding hash values according to the positions of the photons in the record, and reordering the photons according to the sequence from small to large of the hash values to generate an index array; 4) starting from a viewpoint, emitting light rays to each pixel, calculating an intersection point p of the first light ray on a path and the scene diffuse reflection surface, and recording the position and the incident direction of the intersection point p; 5) dividing pixels needing to be calculated into groups and distributing a thread for each pixel; 6) and judging whether the hit points of the pixels in one group are positioned in the same grid or not, and if so, accelerating by using the shared video memory.

Description

Photon collection method and photon mapping rendering method based on shared video memory optimization

Technical Field

The invention belongs to the technical field of software, and relates to a photon collection method and a photon mapping rendering method based on shared video memory optimization in photon mapping.

Background

Ray tracing is a solution to solve the problem of solving global illumination in high-fidelity rendering, but its inherent computational complexity requires that the monte carlo method must be adopted to simulate illumination of complex surface materials. The monte carlo method has the defect of noise when the sampling number is insufficient, and even if a strong neural network noise reduction model is used, a good effect cannot be directly obtained due to insufficient information quantity.

The photon mapping algorithm is one of many variations of the ray tracing algorithm. Originally derived from inverse ray tracing, Jensen then refined the algorithm using k-nearest neighbor estimation (Jensen h.w.: Global drilling using photon maps. in Rendering Techniques 96.spring, 1996, pp.21-30). Photon mapping is a two-channel method (two pass), the first of which is called the photon tracking channel (phototransingpass), in which a light source emits photons carrying energy (also called flux) towards a scene, where the photons then interact with a surface. Photons will dwell and be recorded with a certain probability on intersecting non-smooth surfaces and stored in a data structure called a photon map. The second channel, called the render channel, emits rays from the camera into the scene, and at the intersection points estimates of the intensity are made using the adjacent photons in the photon map (this process is called photon density estimation). Unlike the monte carlo ray tracing defect, the photon mapping method is prone to produce a bias, i.e., a blur phenomenon. However, in some applications with higher requirements on speed, such as applications facing immersive Virtual Reality (VR), the sensitivity of the human eye's visual system to noise is higher than the sensitivity to blur, since the display glasses in the virtual reality helmet are very close to the eye (i.e., belong to a near-to-eye display device). Therefore, although the conventional photon mapping method is also an off-line rendering method, the method has the potential of significantly increasing the speed under the condition of reducing the rendering quality (introducing a certain amount of blur), and thus can be applied to application systems which have high requirements on speed but allow a certain amount of image quality blur. Aiming at the application requirement on the rendering speed, the invention adopts a photorealistic rendering mode of photon mapping to avoid noise points at the pixel level, obtains global light illumination rendering, improves visual effect and obtains better rendering speed at the same time.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention aims to provide a photon collection method based on shared video memory optimization in photon mapping.

The invention provides a storage structure and index of photon data based on a hash grid in a video memory and a corresponding method for searching photons based on shared video memory by using a quick KNN (neighbor-nearest neighbor) of adjacent pixels, which aims at the steps of photon tracking (photon tracking) and photon collection (photon collection) and the like in a photon mapping method. The invention designs a fast photon data access method based on shared video memory, which is beneficial to fast photon collection when a KNN search method is carried out, and the method is based on the following two facts: first, the photon collection process of neighboring pixels may visit the same area; second, the shared memory on the hardware device is accessed much faster than the global memory. The method mainly comprises the steps of dividing pixels on all rendering imaging planes into groups, judging whether the pixels belonging to the same group access the same region or not, if so, dividing the pixels into work, loading a part of photons in the global video memory into the shared video memory, then respectively accessing the shared video memory for calculation, and continuing the whole process for many times until all the photons in the region are loaded.

The main process of the invention is as follows:

1) dividing the whole three-dimensional scene to be rendered into a Hash grid;

2) the light source emits photons, the traversing process of the photons in the scene is tracked, and when the photons are intersected with the diffuse reflection surface in the scene, the information such as the positions, the energy and the incident directions of the photons are recorded;

3) calculating corresponding hash values according to the positions of the photons in the record, reordering the photons according to the sequence from small to large of the hash values, and simultaneously generating an index array;

4) emitting rays from a viewpoint, tracking the traversal process of the rays in a scene, and calculating an intersection point p of the first ray on a path and a scene diffuse reflection surface;

5) the pixels are divided into groups, density estimation based on shared video memory is carried out in the groups, and finally the radiance (illumination calculation) corresponding to each pixel is calculated to obtain a rendered image.

The technical scheme of the invention is as follows:

a photon collection method based on shared video memory optimization in photon mapping comprises the following steps:

1) dividing the whole three-dimensional scene to be rendered into a Hash grid;

2) the method comprises the steps that a light source emits photons, the traversing process of the photons in a scene is tracked, and when the photons are intersected with a diffuse reflection surface in the scene, the position, the energy and the incident direction information of the photons are recorded;

3) calculating corresponding hash values according to the positions of the photons in the record, and reordering the photons in the order from small to large according to the hash values to generate an index array;

4) starting from a viewpoint, emitting a ray to each pixel, tracking the traversal process of the ray in a scene, calculating an intersection point p of the first ray on a path and a scene diffuse reflection surface, and recording the position and the incident direction of the intersection point p;

5) dividing pixels needing to be calculated into groups and distributing a thread for each pixel;

6) judging whether the hit points of the pixels in one group are positioned in the same grid, if so, accessing the same video memory interval by the corresponding thread group of the group, and accelerating by using the shared video memory: and loading photon information in the global video memory into the shared video memory, carrying out synchronous operation of a thread after the loading is finished, and traversing and accessing all photons in the shared video memory by each thread in a circulating manner to finish photon collection.

Further, in step 6), if the hit points of the pixels in one group are not located in the same grid, each thread in the thread group corresponding to the group performs the calculation of the corresponding pixel according to the traditional flow, and the photon collection is completed.

Further, the method for determining whether the hit points of the pixels in a group are located in the same grid includes: firstly, setting two shared variables, namely hashValue and flag; the hashValue is initialized by the thread No. 0 in the thread group corresponding to the group, and the value of the hashValue is the grid number of the hit point of the pixel corresponding to the thread No. 0; the flag is initialized by the thread No. 0, and the value of the flag is 0; after the initialization is finished, carrying out primary synchronization on the threads in the thread group; then each thread in the thread group calculates the grid number of the hit point of the pixel corresponding to the thread group, compares the grid number with the hash value, judges that two different grids are hit by the current thread and the thread No. 0 if the two grids are not equal, and sets a flag to be 1 for marking.

Further, in step 6), the method for accelerating by using the shared video memory to complete photon collection comprises:

61) loading photon information in the global video memory into the shared video memory in batches; in each batch of loading process, each thread loads m photons from the global display memory to the shared display memory, and after each thread completes synchronous operation, the thread accesses the photons in the shared display memory; the size of the shared display memory is m × group size photons;

62) each thread maintains a heap with the size of k and is used for storing k photons closest to the hit point of the corresponding pixel of the current thread; when the thread acquires a photon from the shared video memory each time, if the number of the photons in the current heap is less than k, inserting the newly accessed photon into the heap; if k photons are already contained in the stack and the newly accessed photon is closer than the farthest photon in the stack, then the farthest photon in the stack is deleted and the newly accessed photon is inserted into the stack;

63) and after all batches are loaded, the photon set stored in the heap of each thread in the thread group is the photon set of the corresponding pixel.

Further, the method for dividing the whole three-dimensional scene to be rendered into the hash grids comprises the following steps: firstly, generating a bounding box of a three-dimensional scene to be rendered, setting the origin of a grid network as the vertex of the bounding box with the minimum coordinate in each dimension, and marking as gridOrigin, and setting the size of the grid network as the size of the bounding box; dividing the size of the scene by the side length of the grid and rounding upwards to obtain the grid number of each dimension of the grid, and recording as gridsize; the coordinates (p.x, p.y, p.z) of any given point p in the bounding box in the world coordinate system are then mapped to the number of the grid to which it belongs.

Further, a grid number hash (p) corresponding to the given point p is calculated by using a hash function hash (p) ═ z × gridsize.x × gridsize.y + y × gridsize.x + x; where x is the result of the offset of x coordinate p.x of a given point p relative to the x coordinate of the mesh origin removed with the grid side length and rounded down, y is the result of the offset of y coordinate p.y of the given point p relative to the y coordinate of the mesh origin removed with the grid side length and rounded down, and z is the result of the offset of z coordinate p.z of the given point p relative to the z coordinate of the mesh origin removed with the grid side length and rounded down; gridsize.x is the number of grids in the x-dimension of the grid mesh, and gridsize.y is the number of grids in the v-dimension of the grid mesh.

Further, the index array is a starting subscript array startIdx, wherein array elements startIdx [ i ] record the number of photons in all grids with numbers less than i; the subscripts in the photon map for a photon in the grid numbered n start from startIdx [ n ] to startIdx [ n +1], but do not include startIdx [ n +1] itself.

Further, the side length of the grid is the photon search radius r.

A photon mapping rendering method is characterized in that density estimation is carried out on photons collected by the method, the radiance corresponding to each pixel is calculated, and a rendered image is obtained.

Compared with the prior art, the invention has the following positive effects:

since photon collection is a core step of rendering by a photon mapping method and is one of the major bottlenecks affecting rendering speed, increasing photon collection speed will have a significant impact on rendering speed. The method of the invention improves the photon collection speed in photon mapping, thereby obtaining advantages in rendering performance. In addition, the method is also suitable for VR-oriented binocular rendering, and due to the fact that imaging of two eyes has great similarity during binocular rendering, the rendering efficiency of the rendering method is further improved based on the similarity characteristic of adjacent images of the method, and the superiority of the method can be reflected.

In the conventional implementation, each pixel needs to perform one round of access to the global video memory, whereas in the algorithm of the present invention, the access to the global video memory is converted into the same number of accesses to the shared video memory, and the access to the global video memory is reduced to one round. Since the access of the shared memory is much faster than the access of the global memory, the algorithm can have an acceleration effect.

Drawings

FIG. 1 is a flow chart of a photon mapping rendering method of the present invention;

FIG. 2 is a flow chart of step 5;

FIG. 3 is a schematic diagram of pixel division;

(a) is a schematic diagram of a pixel array with 4 pixels in width and height and its index number,

(b) for a grouping scheme of the pixel array in (a),

(c) 4 pixels for the pixel group (0, 0) constituting the upper right corner in (b);

FIG. 4 is a schematic view of the acceleration principle;

(a) the method is a traditional video memory access method, and the method (b) is a video memory access method of the invention.

Detailed Description

The present invention will be described in further detail below with reference to specific examples and the accompanying drawings.

The technical scheme adopted by the invention is shown in figure 1, and the method comprises the following steps:

step 1: the three-dimensional scene space is evenly divided into a large number of cubic grids, and the cubic grids are numbered from 0, and form a grid network. Conveniently, the side length of each grid is set to the size r of the photon search radius, so that each point, when searching, only needs to visit a 3 × 3 × 3 area centered on the small cube it is in to search for all possible contributing photons.

The entire grid origin position and size are first calculated. Objects in the whole three-dimensional scene usually consist of triangular meshes and can be contained in a bounding box, the origin of the grid mesh can be set as the vertex of the bounding box with the minimum coordinate in each dimension and is marked as gridOrigin, and the size of the grid mesh can be set as the size of the bounding box and is also the size of the scene. An additional grid can be added on both sides of each dimension of the cubic grid network of the scene to avoid the problem of boundary check during the program running process.

Dividing the size of the scene by the side length of the grid and rounding up to obtain the grid number of each dimension of the grid, which is marked as gridSize, and the grid numbers in the three dimensions of x, y and z are respectively marked as gridsize.x, gridsize.y and gridsize.z.

In order to map the coordinates (p.x, p.y, p.z) of any given point p in the bounding box corresponding to the scene in the world coordinate system to the number of the grid to which the given point p belongs, the following hash function is adopted:

hash(p)＝z×gridSize.x×gridSize.y+y×gridSize.x+x

the function value is the grid number to which the point p belongs, where

Is the result of the offset of the x coordinate p.x of a given point p relative to the x coordinate of the mesh origin being removed by the grid side length and rounded down,

is the result of the offset of the y coordinate p.y of a given point p relative to the y coordinate of the mesh origin being removed by the grid side length and rounded down,

is the result of the offset of the z coordinate p.z of a given point p relative to the z coordinate of the mesh origin being rounded down and with the grid side length. gridsize.x is the number of grids in the x-dimension of the grid mesh, and gridsize.y is the number of grids in the y-dimension of the grid mesh. The hash function determines the order in which the hash grids are numbered, x-direction first, then y-direction, and finally z-direction.

Step 2: the method comprises the steps of emitting photons carrying energy from a light source to a scene, enabling the photons to interact with the scene, determining whether the interaction result is reflection, refraction or absorption according to a Russian roulette mode, and recording the energy, intersection position and incidence direction of the photons during each interaction with a diffuse reflection surface.

2-1. recording of photons

The tracing of each photon path is performed by a separate thread. When a photon hits a diffuse reflecting surface, a record of the photon needs to be generated in the array. To avoid conflicts where multiple threads attempt to write to the same location at the same time, there are two solutions.

First, all threads can share an index variable, and each time a photon is to be recorded by a thread, an atomic add (AtomicAdd) operation is used for the index variable, which is provided by a hardware device and is not interrupted, so that each thread obtains a different value each time. According to the obtained value of the index variable, the thread can record the information of the current photon at the corresponding position of the array.

Second, a sufficiently large array is allocated and an exclusive interval is allocated for each thread, and the location of each write is determined by each thread. This is done without any collision at the time of writing, at the cost of using an array that is large enough (number of emitted photons multiplied by the maximum number of reflections) and initializing its content to 0 in its entirety.

2-2. Russian wheel disc

The Russian roulette (Russian roulette) is a commonly used method in photon mapping that provides an unbiased result while avoiding an exponential increase in the number of photons. Considering photons with energy only one component, generating a random variable xi uniformly distributed in a obedient interval [0, 1] for a surface with diffuse reflection coefficient d and specular reflection coefficient s (s + d is less than or equal to 1), and then determining whether the photons are absorbed, diffusely reflected or specularly reflected according to the following relation:

if the photon energy has multiple components (e.g., RGB), the diffuse and specular reflection coefficients need to be recalculated:

wherein (d)_r，d_g，d_b) Diffuse reflectance(s) representing RGB components_r，s_g，s_b) Specular reflectance (P) representing RGB components_r，P_g，P_b) Representing the energy of a photon in the RGB components, P_dAnd P_sRespectively representing the combined diffuse and specular reflectance. The current russian wheel disc is changed into the following form:

the energy of the outgoing photons on each component also needs to be adjusted, and the energy needs to be multiplied by the reflection coefficient of the corresponding component and divided by the corresponding comprehensive reflection coefficient. Taking the R component of a photon as an example, if the determined reflection type is diffuse reflection, the energy of the outgoing photon in the R direction needs to be modified to P_rd_r/P_d。

If the determined interaction result is "absorb," then the tracking task for the current path ends and the thread returns. If the result is "specular reflection," the direction of the reflected photon is determined following the law of reflection of light and the path of the reflected photon is continued to be traced. If the result of the decision is "diffuse reflection", then one direction is sampled as the direction of reflection based on the BRDF of the current surface, and then tracking continues.

The number of reflections of the photon needs to be limited, and the tracking process can be ended when the energy of each component of the photon is smaller than a certain threshold, because even if tracking is performed again, the contribution to the final result is small. A maximum number of reflections may also be set and when the number of photon reflections exceeds this value, the thread is terminated.

And step 3: creating a photonic graph and an index array

3-1 reordering Generation of Photonic maps

In this step, the hash values corresponding to the positions of all recorded photons are calculated, and all the photons are reordered from small to large according to the hash values.

Since the hash value itself can be calculated using the photon location, this value is not necessarily stored in the record of photons, on the other hand, in order to minimize the bandwidth overhead of repeatedly moving the record of photons during sorting, a temporary array can be used, each element of the array containing two fields, one being the hash value of a photon and the other being a pointer to a photon. This array is reordered according to the hash value using an ordering algorithm. And then scanning the temporary array once, and reading the photons from the corresponding position of the video memory into a new array according to the value of the pointer field, wherein the new array is the photon graph. Photon data collected in the photon collection process described in the subsequent steps are all derived from photon graphs.

3-2. array of initial subscripts

In addition, it is also necessary to generate a start index array startIdx, where array element startIdx [ i ] records the number of photons in all grids with numbers smaller than i, and since the photons in the photon sub-graph are reordered according to the grid numbers, startIdx [ i ] also represents the start index of those photons in the photon sub-graph located in the grid with number i, which facilitates accessing the photon data in the photon sub-graph. The subscripts in the photon map for the photon in the grid numbered n start from startIdx [ n ] to startIdx [ n +1] (not included).

The starting index array can be obtained by:

in the first step, an auxiliary array temp is used, where temp [ i ] represents the number of photons with hash value i (i.e., the number of photons in the grid numbered i). In order to calculate the value of temp array, the hash value of each photon needs to be calculated, and each time a hash value h is obtained, the value of temp [ h ] is added with 1. This is repeated until all photons have been counted.

And secondly, after the temp array is completely calculated, calculating the value of the initial subscript array according to the following formula:

startIdx[i+1]＝startIdx[i]+temp[i]

starting from 0, i in this formula is updated in descending order, and startIdx [0] is initialized to 0.

3-3. neighbor offset look-up table

Given the number of a grid, in the photon collection process, it is necessary to traverse a 3 × 3 × 3 region centered on the grid, calculate the offset of the number of the neighbor grid with respect to the number of the central grid in advance, and store the offset in a neighbor offset lookup table (neighbor offset lookup table), and if necessary, replace the dynamic calculation with a lookup table. The table contains the following sets:

{z×gridSize.x×gridSize.y+y×gridSize.x+x|x，y，z＝-1，0，1}

in fact, since the numbers of the grid are consecutive in the x direction, only the smallest offset of 9 neighbors in the x direction needs to be stored, and the offsets of the other 18 neighbors can be calculated from the 9 neighbors by one addition operation. The simplified offset table contains the following sets:

{z×gridSize.x×gridSize.y+y×gridSize.x-1|y，z＝-1，0，1}

and 4, step 4: calculating the intersection point p of the ray emitted by the viewpoint and the diffuse reflection surface of the scene

The method of this step is the same as typical ray tracing, starting from a viewpoint, emitting rays to each pixel, then tracing the rays in the scene, if hitting the surface of specular reflection, calculating a new direction according to the law of reflection of light and performing recursive tracing; if a diffuse reflection surface is hit, information on the position of the intersection point p, the incident direction, etc. is recorded.

In conventional implementations, the density estimate is calculated in a ray tracing channel. In order to make the photon collection process based on contribution video memory optimization clearer and more suitable for parallel execution, the photon collection work is stripped from the ray tracing channel. Therefore, the information of the hit point p can be directly used when each thread carries out pixel brightness calculation, and synchronous operation after the ray tracing task is finished is not needed.

And 5: dividing pixels into groups, performing density estimation based on shared video memory, and finally calculating the corresponding radiance of each pixel (illumination calculation) and obtaining a rendered image

In this step, the present invention will calculate the radiance of the pixel corresponding to each position p in parallel according to the spatial position p obtained in step 4. The calculation of each pixel is executed by a special thread, the threads are divided into groups, and the calculation of the pixels in the same group can be accelerated by utilizing the shared memory. The density estimation flow chart based on the shared video memory in a grouping mode is shown in fig. 2.

5-1. grouping

The sub-step divides the pixels to be calculated into groups

As shown in fig. 3(a), the pixels to be calculated can be regarded as a two-dimensional array, each pixel has an index value index, and they are sequentially assigned according to the principle of x-direction priority.

These pixels are divided into groups having a length of blockdim.x in the x direction and a length of blockdim.y in the y direction, and each group contains the number of pixels of blockdim.x blockdim.y as shown in fig. 3 (b). After dividing into groups, the groups form a new coarse-grained grid with a total of griddim.x columns and griddim.y rows. The column in which a group is located in the new grid is denoted by blockidx.x, and the row in which it is located is denoted by blockidx.y. The two numbers within each group in the diagram represent blockidx.x and blockidx.y, respectively.

As shown in fig. 3(c), inside each group, the pixels also form a grid, and the dotted line indicates that the pixel group located at the upper right corner in fig. 3(b) is formed of 4 pixels in fig. 3 (c). The column of the pixel in the corresponding pixel group is represented by threadadx.x, and the row of the pixel in the corresponding pixel group is represented by threadadx.y. The two numbers within each pixel in the diagram represent threadidx.x and threadidx.v, respectively.

The above-mentioned blockDim, blockIdx, threadaidx, griddym, etc. are variables provided by the kernel function of the CUDA, and can be directly used inside the kernel function.

The thread corresponding to each pixel may know the index value index of the pixel calculated by itself according to the following formula: index ═ blockidx.x blockdim.x + readidx.x + (blockidx.y blockdim.y + readidx.y) blockdim.x gridddim.x.

5-2. coincidence judgment

This step determines whether the hit points of the pixels in a group are located in the same grid, and determines which scheme to use for photon collection.

In order to calculate the radiance of a pixel, photons around the corresponding point p need to be collected for density estimation. Using the hash (p), the grid number at which point p is located can be calculated. Since the radius of the photon search and the side length of the grid are set to the same value r, photons in grids outside the set range (for example, 3 × 3 × 3 grid range) cannot be located within the search radius around the grid where p is located, and only photons in the 3 × 3 × 3 range need to be visited one by one.

When the hit points corresponding to all threads in the thread group are in the same grid, the threads are meant to access the same video memory interval next, and thus the shared video memory can be used for acceleration.

In order to judge the above conditions, two shared variables hashValue and flag are used (i.e. the same group of threads maintains hashValue and flag, and each thread only needs to query the shared flag). The hashValue is initialized by thread number 0 in the thread group, and the value of the hashValue is the grid number of the hit point of the pixel corresponding to thread number 0. flag is also initialized by thread # 0, which has a value of 0. After the initialization is finished, the threads in the thread group need to be synchronized once to ensure that each thread starts to execute the following operations after the initialization is finished.

Next, each thread calculates the grid number of the hit point of the pixel corresponding to the thread, compares the grid number with the hash value, and if the two are not equal, it indicates that two different grids are hit by the current thread and the thread 0, the above condition cannot be met, and sets flag to 1 for marking. There is no need to make the flag modification mutually exclusive, since the flag value is eventually 1 even if a collision of simultaneous writing occurs. After this step, the threads within the thread group need to be synchronized again to ensure that all threads perform the comparison before beginning to perform the following operations.

Then, each thread checks the value of the flag, if the hit point of at least one thread in the group is different from other threads (namely the value of the flag is 1), the shared video memory is not used, and each thread performs corresponding pixel calculation according to the traditional flow (the operation of the steps as described in the following 5-3 is adopted); otherwise if the value of flag is 0, then the shared memory is used (with the operations of the steps described in the following 5-4).

5-3. collection of unshared video memory

1) This sub-step determines the range around position p where photons will be accessed

The grid number where the point p to be calculated is located is hash (p), and the number of the ith grid around the point p to be calculated is hash (p) + offset [ i ] by using the array offset obtained in step three. The start address of the photon in this grid in the display memory is start ═ startIdx [ hash (p) + offset [ i ] ], and the photon starting from end ═ startIdx [ hash (p) + offset [ i ] +1] belongs to the next grid. Therefore, the photon in the ith grid is accessed only by traversing the video memory space from start to end (not including end) of the address interval.

If a simplified array of offsets is used, only nine directions need be accessed, where the start address of the photon in the kth direction in the video memory is start ═ startIdx [ hash (p)) + offset [ i ] ], and the photon starting from end ═ startIdx [ hash (p)) + offset [ i ] +3 belongs to the next grid.

2) The substep is a K nearest neighbor searching step (namely a kNN step)

Photon mapping algorithms in the density estimation, k nearest photons near a given position are typically obtained using a k-nearest neighbors (kNN) algorithm. For this reason, it is necessary to maintain a maximum stack of size k (hereinafter referred to as "stack"), which has the advantage of easily tracking the maximum distance of the photons in the stack from a given point, and if the number of photons in the current stack is less than k, then the newly accessed photons can be directly inserted into the stack; if k photons are already contained in the stack and the newly accessed photon is closer than the farthest photon in the stack, the farthest photon in the stack may be deleted and the newly accessed photon inserted.

After the traversal is completed, the set of photons that ultimately participate in the contribution calculation is saved in the heap.

5-4. Collection Using shared video memory

If flag is 0, this indicates that the above condition is satisfied, and the grid number at which each thread hit point is hash value. Each thread loads a part of the photons needing to be traversed from the global video memory into the shared video memory. The loading means that the information of the photons in the global video memory is read and stored in the shared video memory. After the loading is finished, the synchronous operation of the threads is needed to be performed once, and then each thread traverses and accesses all photons in the shared video memory in a circulating mode.

Compared with the global video memory, the shared video memory has smaller storage capacity, and all photons cannot be loaded into the shared video memory at one time, but only can be loaded in batches. In each batch, each thread loads a number of photons (for simplicity, typically 1 photon) from the global video memory into the shared video memory, and after a synchronization operation, the thread accesses the photons in the shared video memory. The threads in the thread group are grouped size threads, so that only one shared video memory capable of accommodating the grouped size threads needs to be allocated. In this way, the related photon data in the global video memory is loaded into the shared video memory in batches and circularly.

FIG. 4(b) is a schematic diagram of an algorithm with small circles representing threads performing computational tasks. The solid line indicates that in the first round of loading, a part of photons in the global video memory are read into the shared video memory. The dotted lines indicate that each thread independently traverses access all photons read into the shared memory. The dashed line indicates that in the second round of loading, another portion of the photons in the global video memory are read into the shared video memory. After that, the access of the shared video memory represented by the dotted line is executed again. And circularly executing the process until the photon data which is judged in the step 5-2) and is related to all the threads in the same grid are loaded into the shared video memory and accessed.

It should be noted that the number of photons associated with all threads in the same grid is not necessarily exactly an integer multiple of the group size, and therefore, some threads may not need to perform the loading task during the loading of the last batch. Even so, these threads still need to access photons in the shared video memory after the synchronization operation, otherwise an error is generated.

The scheme of sharing the video memory can still use kNN, each thread maintains a maximum heap (hereinafter referred to as "heap") with the size of k, and k photons closest to the corresponding hit point of the current thread are stored. When the thread acquires a photon from the shared video memory each time, if the number of the photons in the current heap is less than k, the newly accessed photon can be directly inserted into the heap; if k photons are already contained in the heap and the newly visited photon is closer to the hit point of the pixel corresponding to the current thread than the photon in the heap that is farthest from the hit point of the pixel corresponding to the current thread, the farthest photon in the heap may be deleted and the newly visited photon inserted.

When this step is completed, the heap for each thread holds the set of photons that will ultimately participate in the contribution calculations for that thread.

5-5. Density estimation

This substep performs density estimation on the collected photons and obtains the brightness value. The radiance L at the point p is based on the assumption that the area near the point p is locally flat_rThe following formula can be used for estimation:

where n is the number of photons used for estimation, r is the radius of the sphere containing the n photons, f_rIs the BRDF reflection coefficient phi_iIs the energy carried by the ith photon, ω and ω_iRespectively representing the exit direction and the incident direction of the ith photon.

The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims

1. A photon collection method based on shared video memory optimization in photon mapping comprises the following steps:

1) dividing the whole three-dimensional scene to be rendered into a Hash grid;

2. The method as claimed in claim 1, wherein in step 6), if the hit points of the pixels in a group are not located in the same grid, the threads in the thread group corresponding to the group perform the calculation of the corresponding pixels according to the conventional process, thereby completing the photon collection.

3. A method as claimed in claim 1 or 2, wherein the method of determining whether the hit points of pixels in a group are located in the same grid is: firstly, setting two shared variables, namely hashValue and flag; the hashValue is initialized by the thread No. 0 in the thread group corresponding to the group, and the value of the hashValue is the grid number of the hit point of the pixel corresponding to the thread No. 0; the flag is initialized by the thread No. 0, and the value of the flag is 0; after the initialization is finished, carrying out primary synchronization on the threads in the thread group; then each thread in the thread group calculates the grid number of the hit point of the pixel corresponding to the thread group, compares the grid number with the hash value, judges that two different grids are hit by the current thread and the thread No. 0 if the two grids are not equal, and sets a flag to be 1 for marking.

4. The method of claim 1, wherein in step 6), the acceleration is performed by using a shared video memory, and the photon collection is completed by:

5. The method of claim 1, wherein the method of dividing the entire three-dimensional scene to be rendered into the hash grids is: firstly, generating a bounding box of a three-dimensional scene to be rendered, setting the origin of a grid network as the vertex of the bounding box with the minimum coordinate in each dimension, and marking as gridOrigin, and setting the size of the grid network as the size of the bounding box; dividing the size of the scene by the side length of the grid and rounding upwards to obtain the grid number of each dimension of the grid, and recording as gridsize; the coordinates (p.x, p.y, p.z) of any given point p in the bounding box in the world coordinate system are then mapped to the number of the grid to which it belongs.

6. The method according to claim 5, wherein the grid number hash (p) corresponding to a given point p is calculated using a hash function hash (p) z × gridsize.x × gridsize.y + y × gridsize.x + x; where x is the result of the offset of x coordinate p.x of a given point p relative to the x coordinate of the mesh origin removed with the grid side length and rounded down, y is the result of the offset of y coordinate p.y of the given point p relative to the y coordinate of the mesh origin removed with the grid side length and rounded down, and z is the result of the offset of z coordinate p.z of the given point p relative to the z coordinate of the mesh origin removed with the grid side length and rounded down; gridsize.x is the number of grids in the x-dimension of the grid mesh, and gridsize.y is the number of grids in the y-dimension of the grid mesh.

7. The method of claim 1, wherein the index array is a startIdx array, wherein array elements startIdx [ i ] record the number of photons in all grids numbered less than i; the subscripts in the photon map for the photon in the grid numbered n start from startIdx [ n ] to startIdx [ n +1], but do not include startIdx [ n +1] itself.

8. The method of claim 1, wherein the side length of the grid is a photon search radius r.

9. A photon mapping rendering method, wherein the photons collected by the method of claim 1 are used for density estimation, and the radiance corresponding to each pixel is calculated to obtain a rendered image.