CN113096248B - Photon collection method and photon mapping rendering method based on shared video memory optimization - Google Patents

Photon collection method and photon mapping rendering method based on shared video memory optimization Download PDF

Info

Publication number
CN113096248B
CN113096248B CN202110339915.5A CN202110339915A CN113096248B CN 113096248 B CN113096248 B CN 113096248B CN 202110339915 A CN202110339915 A CN 202110339915A CN 113096248 B CN113096248 B CN 113096248B
Authority
CN
China
Prior art keywords
photon
photons
thread
grid
video memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110339915.5A
Other languages
Chinese (zh)
Other versions
CN113096248A (en
Inventor
周闻达
段元兴
李胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202110339915.5A priority Critical patent/CN113096248B/en
Publication of CN113096248A publication Critical patent/CN113096248A/en
Application granted granted Critical
Publication of CN113096248B publication Critical patent/CN113096248B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • G06T17/205Re-meshing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Image Generation (AREA)

Abstract

The invention discloses a photon collection method and a photon mapping rendering method based on shared video memory optimization. The photon collection method comprises the following steps: 1) dividing the whole three-dimensional scene to be rendered into a Hash grid; 2) the light source emits photons, and when the photons intersect with the diffuse reflection surface in the scene, the position, energy and incident direction information of the photons are recorded; 3) calculating corresponding hash values according to the positions of the photons in the record, and reordering the photons according to the sequence from small to large of the hash values to generate an index array; 4) starting from a viewpoint, emitting light rays to each pixel, calculating an intersection point p of the first light ray on a path and the scene diffuse reflection surface, and recording the position and the incident direction of the intersection point p; 5) dividing pixels needing to be calculated into groups and distributing a thread for each pixel; 6) and judging whether the hit points of the pixels in one group are positioned in the same grid or not, and if so, accelerating by using the shared video memory.

Description

Photon collection method and photon mapping rendering method based on shared video memory optimization
Technical Field
The invention belongs to the technical field of software, and relates to a photon collection method and a photon mapping rendering method based on shared video memory optimization in photon mapping.
Background
Ray tracing is a solution to solve the problem of solving global illumination in high-fidelity rendering, but its inherent computational complexity requires that the monte carlo method must be adopted to simulate illumination of complex surface materials. The monte carlo method has the defect of noise when the sampling number is insufficient, and even if a strong neural network noise reduction model is used, a good effect cannot be directly obtained due to insufficient information quantity.
The photon mapping algorithm is one of many variations of the ray tracing algorithm. Originally derived from inverse ray tracing, Jensen then refined the algorithm using k-nearest neighbor estimation (Jensen h.w.: Global drilling using photon maps. in Rendering Techniques 96.spring, 1996, pp.21-30). Photon mapping is a two-channel method (two pass), the first of which is called the photon tracking channel (phototransingpass), in which a light source emits photons carrying energy (also called flux) towards a scene, where the photons then interact with a surface. Photons will dwell and be recorded with a certain probability on intersecting non-smooth surfaces and stored in a data structure called a photon map. The second channel, called the render channel, emits rays from the camera into the scene, and at the intersection points estimates of the intensity are made using the adjacent photons in the photon map (this process is called photon density estimation). Unlike the monte carlo ray tracing defect, the photon mapping method is prone to produce a bias, i.e., a blur phenomenon. However, in some applications with higher requirements on speed, such as applications facing immersive Virtual Reality (VR), the sensitivity of the human eye's visual system to noise is higher than the sensitivity to blur, since the display glasses in the virtual reality helmet are very close to the eye (i.e., belong to a near-to-eye display device). Therefore, although the conventional photon mapping method is also an off-line rendering method, the method has the potential of significantly increasing the speed under the condition of reducing the rendering quality (introducing a certain amount of blur), and thus can be applied to application systems which have high requirements on speed but allow a certain amount of image quality blur. Aiming at the application requirement on the rendering speed, the invention adopts a photorealistic rendering mode of photon mapping to avoid noise points at the pixel level, obtains global light illumination rendering, improves visual effect and obtains better rendering speed at the same time.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention aims to provide a photon collection method based on shared video memory optimization in photon mapping.
The invention provides a storage structure and index of photon data based on a hash grid in a video memory and a corresponding method for searching photons based on shared video memory by using a quick KNN (neighbor-nearest neighbor) of adjacent pixels, which aims at the steps of photon tracking (photon tracking) and photon collection (photon collection) and the like in a photon mapping method. The invention designs a fast photon data access method based on shared video memory, which is beneficial to fast photon collection when a KNN search method is carried out, and the method is based on the following two facts: first, the photon collection process of neighboring pixels may visit the same area; second, the shared memory on the hardware device is accessed much faster than the global memory. The method mainly comprises the steps of dividing pixels on all rendering imaging planes into groups, judging whether the pixels belonging to the same group access the same region or not, if so, dividing the pixels into work, loading a part of photons in the global video memory into the shared video memory, then respectively accessing the shared video memory for calculation, and continuing the whole process for many times until all the photons in the region are loaded.
The main process of the invention is as follows:
1) dividing the whole three-dimensional scene to be rendered into a Hash grid;
2) the light source emits photons, the traversing process of the photons in the scene is tracked, and when the photons are intersected with the diffuse reflection surface in the scene, the information such as the positions, the energy and the incident directions of the photons are recorded;
3) calculating corresponding hash values according to the positions of the photons in the record, reordering the photons according to the sequence from small to large of the hash values, and simultaneously generating an index array;
4) emitting rays from a viewpoint, tracking the traversal process of the rays in a scene, and calculating an intersection point p of the first ray on a path and a scene diffuse reflection surface;
5) the pixels are divided into groups, density estimation based on shared video memory is carried out in the groups, and finally the radiance (illumination calculation) corresponding to each pixel is calculated to obtain a rendered image.
The technical scheme of the invention is as follows:
a photon collection method based on shared video memory optimization in photon mapping comprises the following steps:
1) dividing the whole three-dimensional scene to be rendered into a Hash grid;
2) the method comprises the steps that a light source emits photons, the traversing process of the photons in a scene is tracked, and when the photons are intersected with a diffuse reflection surface in the scene, the position, the energy and the incident direction information of the photons are recorded;
3) calculating corresponding hash values according to the positions of the photons in the record, and reordering the photons in the order from small to large according to the hash values to generate an index array;
4) starting from a viewpoint, emitting a ray to each pixel, tracking the traversal process of the ray in a scene, calculating an intersection point p of the first ray on a path and a scene diffuse reflection surface, and recording the position and the incident direction of the intersection point p;
5) dividing pixels needing to be calculated into groups and distributing a thread for each pixel;
6) judging whether the hit points of the pixels in one group are positioned in the same grid, if so, accessing the same video memory interval by the corresponding thread group of the group, and accelerating by using the shared video memory: and loading photon information in the global video memory into the shared video memory, carrying out synchronous operation of a thread after the loading is finished, and traversing and accessing all photons in the shared video memory by each thread in a circulating manner to finish photon collection.
Further, in step 6), if the hit points of the pixels in one group are not located in the same grid, each thread in the thread group corresponding to the group performs the calculation of the corresponding pixel according to the traditional flow, and the photon collection is completed.
Further, the method for determining whether the hit points of the pixels in a group are located in the same grid includes: firstly, setting two shared variables, namely hashValue and flag; the hashValue is initialized by the thread No. 0 in the thread group corresponding to the group, and the value of the hashValue is the grid number of the hit point of the pixel corresponding to the thread No. 0; the flag is initialized by the thread No. 0, and the value of the flag is 0; after the initialization is finished, carrying out primary synchronization on the threads in the thread group; then each thread in the thread group calculates the grid number of the hit point of the pixel corresponding to the thread group, compares the grid number with the hash value, judges that two different grids are hit by the current thread and the thread No. 0 if the two grids are not equal, and sets a flag to be 1 for marking.
Further, in step 6), the method for accelerating by using the shared video memory to complete photon collection comprises:
61) loading photon information in the global video memory into the shared video memory in batches; in each batch of loading process, each thread loads m photons from the global display memory to the shared display memory, and after each thread completes synchronous operation, the thread accesses the photons in the shared display memory; the size of the shared display memory is m × group size photons;
62) each thread maintains a heap with the size of k and is used for storing k photons closest to the hit point of the corresponding pixel of the current thread; when the thread acquires a photon from the shared video memory each time, if the number of the photons in the current heap is less than k, inserting the newly accessed photon into the heap; if k photons are already contained in the stack and the newly accessed photon is closer than the farthest photon in the stack, then the farthest photon in the stack is deleted and the newly accessed photon is inserted into the stack;
63) and after all batches are loaded, the photon set stored in the heap of each thread in the thread group is the photon set of the corresponding pixel.
Further, the method for dividing the whole three-dimensional scene to be rendered into the hash grids comprises the following steps: firstly, generating a bounding box of a three-dimensional scene to be rendered, setting the origin of a grid network as the vertex of the bounding box with the minimum coordinate in each dimension, and marking as gridOrigin, and setting the size of the grid network as the size of the bounding box; dividing the size of the scene by the side length of the grid and rounding upwards to obtain the grid number of each dimension of the grid, and recording as gridsize; the coordinates (p.x, p.y, p.z) of any given point p in the bounding box in the world coordinate system are then mapped to the number of the grid to which it belongs.
Further, a grid number hash (p) corresponding to the given point p is calculated by using a hash function hash (p) ═ z × gridsize.x × gridsize.y + y × gridsize.x + x; where x is the result of the offset of x coordinate p.x of a given point p relative to the x coordinate of the mesh origin removed with the grid side length and rounded down, y is the result of the offset of y coordinate p.y of the given point p relative to the y coordinate of the mesh origin removed with the grid side length and rounded down, and z is the result of the offset of z coordinate p.z of the given point p relative to the z coordinate of the mesh origin removed with the grid side length and rounded down; gridsize.x is the number of grids in the x-dimension of the grid mesh, and gridsize.y is the number of grids in the v-dimension of the grid mesh.
Further, the index array is a starting subscript array startIdx, wherein array elements startIdx [ i ] record the number of photons in all grids with numbers less than i; the subscripts in the photon map for a photon in the grid numbered n start from startIdx [ n ] to startIdx [ n +1], but do not include startIdx [ n +1] itself.
Further, the side length of the grid is the photon search radius r.
A photon mapping rendering method is characterized in that density estimation is carried out on photons collected by the method, the radiance corresponding to each pixel is calculated, and a rendered image is obtained.
Compared with the prior art, the invention has the following positive effects:
since photon collection is a core step of rendering by a photon mapping method and is one of the major bottlenecks affecting rendering speed, increasing photon collection speed will have a significant impact on rendering speed. The method of the invention improves the photon collection speed in photon mapping, thereby obtaining advantages in rendering performance. In addition, the method is also suitable for VR-oriented binocular rendering, and due to the fact that imaging of two eyes has great similarity during binocular rendering, the rendering efficiency of the rendering method is further improved based on the similarity characteristic of adjacent images of the method, and the superiority of the method can be reflected.
In the conventional implementation, each pixel needs to perform one round of access to the global video memory, whereas in the algorithm of the present invention, the access to the global video memory is converted into the same number of accesses to the shared video memory, and the access to the global video memory is reduced to one round. Since the access of the shared memory is much faster than the access of the global memory, the algorithm can have an acceleration effect.
Drawings
FIG. 1 is a flow chart of a photon mapping rendering method of the present invention;
FIG. 2 is a flow chart of step 5;
FIG. 3 is a schematic diagram of pixel division;
(a) is a schematic diagram of a pixel array with 4 pixels in width and height and its index number,
(b) for a grouping scheme of the pixel array in (a),
(c) 4 pixels for the pixel group (0, 0) constituting the upper right corner in (b);
FIG. 4 is a schematic view of the acceleration principle;
(a) the method is a traditional video memory access method, and the method (b) is a video memory access method of the invention.
Detailed Description
The present invention will be described in further detail below with reference to specific examples and the accompanying drawings.
The technical scheme adopted by the invention is shown in figure 1, and the method comprises the following steps:
step 1: the three-dimensional scene space is evenly divided into a large number of cubic grids, and the cubic grids are numbered from 0, and form a grid network. Conveniently, the side length of each grid is set to the size r of the photon search radius, so that each point, when searching, only needs to visit a 3 × 3 × 3 area centered on the small cube it is in to search for all possible contributing photons.
The entire grid origin position and size are first calculated. Objects in the whole three-dimensional scene usually consist of triangular meshes and can be contained in a bounding box, the origin of the grid mesh can be set as the vertex of the bounding box with the minimum coordinate in each dimension and is marked as gridOrigin, and the size of the grid mesh can be set as the size of the bounding box and is also the size of the scene. An additional grid can be added on both sides of each dimension of the cubic grid network of the scene to avoid the problem of boundary check during the program running process.
Dividing the size of the scene by the side length of the grid and rounding up to obtain the grid number of each dimension of the grid, which is marked as gridSize, and the grid numbers in the three dimensions of x, y and z are respectively marked as gridsize.x, gridsize.y and gridsize.z.
In order to map the coordinates (p.x, p.y, p.z) of any given point p in the bounding box corresponding to the scene in the world coordinate system to the number of the grid to which the given point p belongs, the following hash function is adopted:
hash(p)=z×gridSize.x×gridSize.y+y×gridSize.x+x
the function value is the grid number to which the point p belongs, where
Figure BDA0002999149380000051
Is the result of the offset of the x coordinate p.x of a given point p relative to the x coordinate of the mesh origin being removed by the grid side length and rounded down,
Figure BDA0002999149380000052
is the result of the offset of the y coordinate p.y of a given point p relative to the y coordinate of the mesh origin being removed by the grid side length and rounded down,
Figure BDA0002999149380000053
is the result of the offset of the z coordinate p.z of a given point p relative to the z coordinate of the mesh origin being rounded down and with the grid side length. gridsize.x is the number of grids in the x-dimension of the grid mesh, and gridsize.y is the number of grids in the y-dimension of the grid mesh. The hash function determines the order in which the hash grids are numbered, x-direction first, then y-direction, and finally z-direction.
Step 2: the method comprises the steps of emitting photons carrying energy from a light source to a scene, enabling the photons to interact with the scene, determining whether the interaction result is reflection, refraction or absorption according to a Russian roulette mode, and recording the energy, intersection position and incidence direction of the photons during each interaction with a diffuse reflection surface.
2-1. recording of photons
The tracing of each photon path is performed by a separate thread. When a photon hits a diffuse reflecting surface, a record of the photon needs to be generated in the array. To avoid conflicts where multiple threads attempt to write to the same location at the same time, there are two solutions.
First, all threads can share an index variable, and each time a photon is to be recorded by a thread, an atomic add (AtomicAdd) operation is used for the index variable, which is provided by a hardware device and is not interrupted, so that each thread obtains a different value each time. According to the obtained value of the index variable, the thread can record the information of the current photon at the corresponding position of the array.
Second, a sufficiently large array is allocated and an exclusive interval is allocated for each thread, and the location of each write is determined by each thread. This is done without any collision at the time of writing, at the cost of using an array that is large enough (number of emitted photons multiplied by the maximum number of reflections) and initializing its content to 0 in its entirety.
2-2. Russian wheel disc
The Russian roulette (Russian roulette) is a commonly used method in photon mapping that provides an unbiased result while avoiding an exponential increase in the number of photons. Considering photons with energy only one component, generating a random variable xi uniformly distributed in a obedient interval [0, 1] for a surface with diffuse reflection coefficient d and specular reflection coefficient s (s + d is less than or equal to 1), and then determining whether the photons are absorbed, diffusely reflected or specularly reflected according to the following relation:
Figure BDA0002999149380000061
if the photon energy has multiple components (e.g., RGB), the diffuse and specular reflection coefficients need to be recalculated:
Figure BDA0002999149380000062
Figure BDA0002999149380000063
wherein (d)r,dg,db) Diffuse reflectance(s) representing RGB componentsr,sg,sb) Specular reflectance (P) representing RGB componentsr,Pg,Pb) Representing the energy of a photon in the RGB components, PdAnd PsRespectively representing the combined diffuse and specular reflectance. The current russian wheel disc is changed into the following form:
Figure BDA0002999149380000064
the energy of the outgoing photons on each component also needs to be adjusted, and the energy needs to be multiplied by the reflection coefficient of the corresponding component and divided by the corresponding comprehensive reflection coefficient. Taking the R component of a photon as an example, if the determined reflection type is diffuse reflection, the energy of the outgoing photon in the R direction needs to be modified to Prdr/Pd
If the determined interaction result is "absorb," then the tracking task for the current path ends and the thread returns. If the result is "specular reflection," the direction of the reflected photon is determined following the law of reflection of light and the path of the reflected photon is continued to be traced. If the result of the decision is "diffuse reflection", then one direction is sampled as the direction of reflection based on the BRDF of the current surface, and then tracking continues.
The number of reflections of the photon needs to be limited, and the tracking process can be ended when the energy of each component of the photon is smaller than a certain threshold, because even if tracking is performed again, the contribution to the final result is small. A maximum number of reflections may also be set and when the number of photon reflections exceeds this value, the thread is terminated.
And step 3: creating a photonic graph and an index array
3-1 reordering Generation of Photonic maps
In this step, the hash values corresponding to the positions of all recorded photons are calculated, and all the photons are reordered from small to large according to the hash values.
Since the hash value itself can be calculated using the photon location, this value is not necessarily stored in the record of photons, on the other hand, in order to minimize the bandwidth overhead of repeatedly moving the record of photons during sorting, a temporary array can be used, each element of the array containing two fields, one being the hash value of a photon and the other being a pointer to a photon. This array is reordered according to the hash value using an ordering algorithm. And then scanning the temporary array once, and reading the photons from the corresponding position of the video memory into a new array according to the value of the pointer field, wherein the new array is the photon graph. Photon data collected in the photon collection process described in the subsequent steps are all derived from photon graphs.
3-2. array of initial subscripts
In addition, it is also necessary to generate a start index array startIdx, where array element startIdx [ i ] records the number of photons in all grids with numbers smaller than i, and since the photons in the photon sub-graph are reordered according to the grid numbers, startIdx [ i ] also represents the start index of those photons in the photon sub-graph located in the grid with number i, which facilitates accessing the photon data in the photon sub-graph. The subscripts in the photon map for the photon in the grid numbered n start from startIdx [ n ] to startIdx [ n +1] (not included).
The starting index array can be obtained by:
in the first step, an auxiliary array temp is used, where temp [ i ] represents the number of photons with hash value i (i.e., the number of photons in the grid numbered i). In order to calculate the value of temp array, the hash value of each photon needs to be calculated, and each time a hash value h is obtained, the value of temp [ h ] is added with 1. This is repeated until all photons have been counted.
And secondly, after the temp array is completely calculated, calculating the value of the initial subscript array according to the following formula:
startIdx[i+1]=startIdx[i]+temp[i]
starting from 0, i in this formula is updated in descending order, and startIdx [0] is initialized to 0.
3-3. neighbor offset look-up table
Given the number of a grid, in the photon collection process, it is necessary to traverse a 3 × 3 × 3 region centered on the grid, calculate the offset of the number of the neighbor grid with respect to the number of the central grid in advance, and store the offset in a neighbor offset lookup table (neighbor offset lookup table), and if necessary, replace the dynamic calculation with a lookup table. The table contains the following sets:
{z×gridSize.x×gridSize.y+y×gridSize.x+x|x,y,z=-1,0,1}
in fact, since the numbers of the grid are consecutive in the x direction, only the smallest offset of 9 neighbors in the x direction needs to be stored, and the offsets of the other 18 neighbors can be calculated from the 9 neighbors by one addition operation. The simplified offset table contains the following sets:
{z×gridSize.x×gridSize.y+y×gridSize.x-1|y,z=-1,0,1}
and 4, step 4: calculating the intersection point p of the ray emitted by the viewpoint and the diffuse reflection surface of the scene
The method of this step is the same as typical ray tracing, starting from a viewpoint, emitting rays to each pixel, then tracing the rays in the scene, if hitting the surface of specular reflection, calculating a new direction according to the law of reflection of light and performing recursive tracing; if a diffuse reflection surface is hit, information on the position of the intersection point p, the incident direction, etc. is recorded.
In conventional implementations, the density estimate is calculated in a ray tracing channel. In order to make the photon collection process based on contribution video memory optimization clearer and more suitable for parallel execution, the photon collection work is stripped from the ray tracing channel. Therefore, the information of the hit point p can be directly used when each thread carries out pixel brightness calculation, and synchronous operation after the ray tracing task is finished is not needed.
And 5: dividing pixels into groups, performing density estimation based on shared video memory, and finally calculating the corresponding radiance of each pixel (illumination calculation) and obtaining a rendered image
In this step, the present invention will calculate the radiance of the pixel corresponding to each position p in parallel according to the spatial position p obtained in step 4. The calculation of each pixel is executed by a special thread, the threads are divided into groups, and the calculation of the pixels in the same group can be accelerated by utilizing the shared memory. The density estimation flow chart based on the shared video memory in a grouping mode is shown in fig. 2.
5-1. grouping
The sub-step divides the pixels to be calculated into groups
As shown in fig. 3(a), the pixels to be calculated can be regarded as a two-dimensional array, each pixel has an index value index, and they are sequentially assigned according to the principle of x-direction priority.
These pixels are divided into groups having a length of blockdim.x in the x direction and a length of blockdim.y in the y direction, and each group contains the number of pixels of blockdim.x blockdim.y as shown in fig. 3 (b). After dividing into groups, the groups form a new coarse-grained grid with a total of griddim.x columns and griddim.y rows. The column in which a group is located in the new grid is denoted by blockidx.x, and the row in which it is located is denoted by blockidx.y. The two numbers within each group in the diagram represent blockidx.x and blockidx.y, respectively.
As shown in fig. 3(c), inside each group, the pixels also form a grid, and the dotted line indicates that the pixel group located at the upper right corner in fig. 3(b) is formed of 4 pixels in fig. 3 (c). The column of the pixel in the corresponding pixel group is represented by threadadx.x, and the row of the pixel in the corresponding pixel group is represented by threadadx.y. The two numbers within each pixel in the diagram represent threadidx.x and threadidx.v, respectively.
The above-mentioned blockDim, blockIdx, threadaidx, griddym, etc. are variables provided by the kernel function of the CUDA, and can be directly used inside the kernel function.
The thread corresponding to each pixel may know the index value index of the pixel calculated by itself according to the following formula: index ═ blockidx.x blockdim.x + readidx.x + (blockidx.y blockdim.y + readidx.y) blockdim.x gridddim.x.
5-2. coincidence judgment
This step determines whether the hit points of the pixels in a group are located in the same grid, and determines which scheme to use for photon collection.
In order to calculate the radiance of a pixel, photons around the corresponding point p need to be collected for density estimation. Using the hash (p), the grid number at which point p is located can be calculated. Since the radius of the photon search and the side length of the grid are set to the same value r, photons in grids outside the set range (for example, 3 × 3 × 3 grid range) cannot be located within the search radius around the grid where p is located, and only photons in the 3 × 3 × 3 range need to be visited one by one.
When the hit points corresponding to all threads in the thread group are in the same grid, the threads are meant to access the same video memory interval next, and thus the shared video memory can be used for acceleration.
In order to judge the above conditions, two shared variables hashValue and flag are used (i.e. the same group of threads maintains hashValue and flag, and each thread only needs to query the shared flag). The hashValue is initialized by thread number 0 in the thread group, and the value of the hashValue is the grid number of the hit point of the pixel corresponding to thread number 0. flag is also initialized by thread # 0, which has a value of 0. After the initialization is finished, the threads in the thread group need to be synchronized once to ensure that each thread starts to execute the following operations after the initialization is finished.
Next, each thread calculates the grid number of the hit point of the pixel corresponding to the thread, compares the grid number with the hash value, and if the two are not equal, it indicates that two different grids are hit by the current thread and the thread 0, the above condition cannot be met, and sets flag to 1 for marking. There is no need to make the flag modification mutually exclusive, since the flag value is eventually 1 even if a collision of simultaneous writing occurs. After this step, the threads within the thread group need to be synchronized again to ensure that all threads perform the comparison before beginning to perform the following operations.
Then, each thread checks the value of the flag, if the hit point of at least one thread in the group is different from other threads (namely the value of the flag is 1), the shared video memory is not used, and each thread performs corresponding pixel calculation according to the traditional flow (the operation of the steps as described in the following 5-3 is adopted); otherwise if the value of flag is 0, then the shared memory is used (with the operations of the steps described in the following 5-4).
5-3. collection of unshared video memory
1) This sub-step determines the range around position p where photons will be accessed
The grid number where the point p to be calculated is located is hash (p), and the number of the ith grid around the point p to be calculated is hash (p) + offset [ i ] by using the array offset obtained in step three. The start address of the photon in this grid in the display memory is start ═ startIdx [ hash (p) + offset [ i ] ], and the photon starting from end ═ startIdx [ hash (p) + offset [ i ] +1] belongs to the next grid. Therefore, the photon in the ith grid is accessed only by traversing the video memory space from start to end (not including end) of the address interval.
If a simplified array of offsets is used, only nine directions need be accessed, where the start address of the photon in the kth direction in the video memory is start ═ startIdx [ hash (p)) + offset [ i ] ], and the photon starting from end ═ startIdx [ hash (p)) + offset [ i ] +3 belongs to the next grid.
2) The substep is a K nearest neighbor searching step (namely a kNN step)
Photon mapping algorithms in the density estimation, k nearest photons near a given position are typically obtained using a k-nearest neighbors (kNN) algorithm. For this reason, it is necessary to maintain a maximum stack of size k (hereinafter referred to as "stack"), which has the advantage of easily tracking the maximum distance of the photons in the stack from a given point, and if the number of photons in the current stack is less than k, then the newly accessed photons can be directly inserted into the stack; if k photons are already contained in the stack and the newly accessed photon is closer than the farthest photon in the stack, the farthest photon in the stack may be deleted and the newly accessed photon inserted.
After the traversal is completed, the set of photons that ultimately participate in the contribution calculation is saved in the heap.
5-4. Collection Using shared video memory
If flag is 0, this indicates that the above condition is satisfied, and the grid number at which each thread hit point is hash value. Each thread loads a part of the photons needing to be traversed from the global video memory into the shared video memory. The loading means that the information of the photons in the global video memory is read and stored in the shared video memory. After the loading is finished, the synchronous operation of the threads is needed to be performed once, and then each thread traverses and accesses all photons in the shared video memory in a circulating mode.
Compared with the global video memory, the shared video memory has smaller storage capacity, and all photons cannot be loaded into the shared video memory at one time, but only can be loaded in batches. In each batch, each thread loads a number of photons (for simplicity, typically 1 photon) from the global video memory into the shared video memory, and after a synchronization operation, the thread accesses the photons in the shared video memory. The threads in the thread group are grouped size threads, so that only one shared video memory capable of accommodating the grouped size threads needs to be allocated. In this way, the related photon data in the global video memory is loaded into the shared video memory in batches and circularly.
FIG. 4(b) is a schematic diagram of an algorithm with small circles representing threads performing computational tasks. The solid line indicates that in the first round of loading, a part of photons in the global video memory are read into the shared video memory. The dotted lines indicate that each thread independently traverses access all photons read into the shared memory. The dashed line indicates that in the second round of loading, another portion of the photons in the global video memory are read into the shared video memory. After that, the access of the shared video memory represented by the dotted line is executed again. And circularly executing the process until the photon data which is judged in the step 5-2) and is related to all the threads in the same grid are loaded into the shared video memory and accessed.
It should be noted that the number of photons associated with all threads in the same grid is not necessarily exactly an integer multiple of the group size, and therefore, some threads may not need to perform the loading task during the loading of the last batch. Even so, these threads still need to access photons in the shared video memory after the synchronization operation, otherwise an error is generated.
The scheme of sharing the video memory can still use kNN, each thread maintains a maximum heap (hereinafter referred to as "heap") with the size of k, and k photons closest to the corresponding hit point of the current thread are stored. When the thread acquires a photon from the shared video memory each time, if the number of the photons in the current heap is less than k, the newly accessed photon can be directly inserted into the heap; if k photons are already contained in the heap and the newly visited photon is closer to the hit point of the pixel corresponding to the current thread than the photon in the heap that is farthest from the hit point of the pixel corresponding to the current thread, the farthest photon in the heap may be deleted and the newly visited photon inserted.
When this step is completed, the heap for each thread holds the set of photons that will ultimately participate in the contribution calculations for that thread.
5-5. Density estimation
This substep performs density estimation on the collected photons and obtains the brightness value. The radiance L at the point p is based on the assumption that the area near the point p is locally flatrThe following formula can be used for estimation:
Figure BDA0002999149380000111
where n is the number of photons used for estimation, r is the radius of the sphere containing the n photons, frIs the BRDF reflection coefficient phiiIs the energy carried by the ith photon, ω and ωiRespectively representing the exit direction and the incident direction of the ith photon.
The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (9)

1. A photon collection method based on shared video memory optimization in photon mapping comprises the following steps:
1) dividing the whole three-dimensional scene to be rendered into a Hash grid;
2) the method comprises the steps that a light source emits photons, the traversing process of the photons in a scene is tracked, and when the photons are intersected with a diffuse reflection surface in the scene, the position, the energy and the incident direction information of the photons are recorded;
3) calculating corresponding hash values according to the positions of the photons in the record, and reordering the photons in the order from small to large according to the hash values to generate an index array;
4) starting from a viewpoint, emitting a ray to each pixel, tracking the traversal process of the ray in a scene, calculating an intersection point p of the first ray on a path and a scene diffuse reflection surface, and recording the position and the incident direction of the intersection point p;
5) dividing pixels needing to be calculated into groups and distributing a thread for each pixel;
6) judging whether the hit points of the pixels in one group are positioned in the same grid, if so, accessing the same video memory interval by the corresponding thread group of the group, and accelerating by using the shared video memory: and loading photon information in the global video memory into the shared video memory, carrying out synchronous operation of a thread after the loading is finished, and traversing and accessing all photons in the shared video memory by each thread in a circulating manner to finish photon collection.
2. The method as claimed in claim 1, wherein in step 6), if the hit points of the pixels in a group are not located in the same grid, the threads in the thread group corresponding to the group perform the calculation of the corresponding pixels according to the conventional process, thereby completing the photon collection.
3. A method as claimed in claim 1 or 2, wherein the method of determining whether the hit points of pixels in a group are located in the same grid is: firstly, setting two shared variables, namely hashValue and flag; the hashValue is initialized by the thread No. 0 in the thread group corresponding to the group, and the value of the hashValue is the grid number of the hit point of the pixel corresponding to the thread No. 0; the flag is initialized by the thread No. 0, and the value of the flag is 0; after the initialization is finished, carrying out primary synchronization on the threads in the thread group; then each thread in the thread group calculates the grid number of the hit point of the pixel corresponding to the thread group, compares the grid number with the hash value, judges that two different grids are hit by the current thread and the thread No. 0 if the two grids are not equal, and sets a flag to be 1 for marking.
4. The method of claim 1, wherein in step 6), the acceleration is performed by using a shared video memory, and the photon collection is completed by:
61) loading photon information in the global video memory into the shared video memory in batches; in each batch of loading process, each thread loads m photons from the global display memory to the shared display memory, and after each thread completes synchronous operation, the thread accesses the photons in the shared display memory; the size of the shared display memory is m × group size photons;
62) each thread maintains a heap with the size of k and is used for storing k photons closest to the hit point of the corresponding pixel of the current thread; when the thread acquires a photon from the shared video memory each time, if the number of the photons in the current heap is less than k, inserting the newly accessed photon into the heap; if k photons are already contained in the stack and the newly accessed photon is closer than the farthest photon in the stack, then the farthest photon in the stack is deleted and the newly accessed photon is inserted into the stack;
63) and after all batches are loaded, the photon set stored in the heap of each thread in the thread group is the photon set of the corresponding pixel.
5. The method of claim 1, wherein the method of dividing the entire three-dimensional scene to be rendered into the hash grids is: firstly, generating a bounding box of a three-dimensional scene to be rendered, setting the origin of a grid network as the vertex of the bounding box with the minimum coordinate in each dimension, and marking as gridOrigin, and setting the size of the grid network as the size of the bounding box; dividing the size of the scene by the side length of the grid and rounding upwards to obtain the grid number of each dimension of the grid, and recording as gridsize; the coordinates (p.x, p.y, p.z) of any given point p in the bounding box in the world coordinate system are then mapped to the number of the grid to which it belongs.
6. The method according to claim 5, wherein the grid number hash (p) corresponding to a given point p is calculated using a hash function hash (p) z × gridsize.x × gridsize.y + y × gridsize.x + x; where x is the result of the offset of x coordinate p.x of a given point p relative to the x coordinate of the mesh origin removed with the grid side length and rounded down, y is the result of the offset of y coordinate p.y of the given point p relative to the y coordinate of the mesh origin removed with the grid side length and rounded down, and z is the result of the offset of z coordinate p.z of the given point p relative to the z coordinate of the mesh origin removed with the grid side length and rounded down; gridsize.x is the number of grids in the x-dimension of the grid mesh, and gridsize.y is the number of grids in the y-dimension of the grid mesh.
7. The method of claim 1, wherein the index array is a startIdx array, wherein array elements startIdx [ i ] record the number of photons in all grids numbered less than i; the subscripts in the photon map for the photon in the grid numbered n start from startIdx [ n ] to startIdx [ n +1], but do not include startIdx [ n +1] itself.
8. The method of claim 1, wherein the side length of the grid is a photon search radius r.
9. A photon mapping rendering method, wherein the photons collected by the method of claim 1 are used for density estimation, and the radiance corresponding to each pixel is calculated to obtain a rendered image.
CN202110339915.5A 2021-03-30 2021-03-30 Photon collection method and photon mapping rendering method based on shared video memory optimization Active CN113096248B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110339915.5A CN113096248B (en) 2021-03-30 2021-03-30 Photon collection method and photon mapping rendering method based on shared video memory optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110339915.5A CN113096248B (en) 2021-03-30 2021-03-30 Photon collection method and photon mapping rendering method based on shared video memory optimization

Publications (2)

Publication Number Publication Date
CN113096248A CN113096248A (en) 2021-07-09
CN113096248B true CN113096248B (en) 2022-05-03

Family

ID=76671215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110339915.5A Active CN113096248B (en) 2021-03-30 2021-03-30 Photon collection method and photon mapping rendering method based on shared video memory optimization

Country Status (1)

Country Link
CN (1) CN113096248B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826214A (en) * 2010-03-29 2010-09-08 中山大学 Photon mapping-based global illumination method
CN104200509A (en) * 2014-08-19 2014-12-10 山东大学 Photon mapping accelerating method based on point cache
CN108961372A (en) * 2018-03-27 2018-12-07 北京大学 A kind of gradual Photon Mapping method examined based on statistical model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826214A (en) * 2010-03-29 2010-09-08 中山大学 Photon mapping-based global illumination method
CN104200509A (en) * 2014-08-19 2014-12-10 山东大学 Photon mapping accelerating method based on point cache
CN108961372A (en) * 2018-03-27 2018-12-07 北京大学 A kind of gradual Photon Mapping method examined based on statistical model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Photon Mapping Parallel Based On Shared Memory System;He Huaiqing等;《2009 Sixth International Conference on Computer Graphics, Imaging and Visualization》;20091231;69-73 *
一种基于空间索引技术的全局光照快速绘制算法;熊德华等;《计算机应用与软件》;20110415;第28卷(第04期);267-270、279 *
基于空间均匀网格的光子映射算法研究;王海波等;《软件导刊》;20161231;第15卷(第12期);19-21 *

Also Published As

Publication number Publication date
CN113096248A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN108961390B (en) Real-time three-dimensional reconstruction method based on depth map
US8570322B2 (en) Method, system, and computer program product for efficient ray tracing of micropolygon geometry
US8089481B2 (en) Updating frame divisions based on ray tracing image processing system performance
EP1081655B1 (en) System and method for rendering using ray tracing
US7940265B2 (en) Multiple spacial indexes for dynamic scene management in graphics rendering
US9355491B2 (en) Ray tracing apparatus and method
US7786991B2 (en) Applications of interval arithmetic for reduction of number of computations in ray tracing problems
JP5476138B2 (en) A method for updating acceleration data structures for ray tracing between frames based on changing field of view
US7836258B2 (en) Dynamic data cache invalidate with data dependent expiration
US8284195B2 (en) Cooperative utilization of spatial indices between application and rendering hardware
US20080024489A1 (en) Cache Utilization Optimized Ray Traversal Algorithm with Minimized Memory Bandwidth Requirements
US20080122838A1 (en) Methods and Systems for Referencing a Primitive Located in a Spatial Index and in a Scene Index
US8248402B2 (en) Adaptive ray data reorder for optimized ray temporal locality
US20080192044A1 (en) Deferred Acceleration Data Structure Optimization for Improved Performance
US8339398B2 (en) Integrated acceleration data structure for physics and ray tracing workload
US20080074418A1 (en) Methods and Systems for Texture Prefetching Based on a Most Recently Hit Primitive Algorithm
US20080079714A1 (en) Workload Distribution Through Frame Division in a Ray Tracing Image Processing System
CA2810880C (en) Integration cone tracing
US20060066616A1 (en) Diffuse photon map decomposition for parallelization of global illumination algorithm
US8963920B2 (en) Image processing apparatus and method
CN116402936A (en) Octree-based transparent rendering complex scene depth estimation method and device
US20080079715A1 (en) Updating Spatial Index Partitions Based on Ray Tracing Image Processing System Performance
CN113096248B (en) Photon collection method and photon mapping rendering method based on shared video memory optimization
US20240095995A1 (en) Reducing false positive ray traversal using ray clipping
CN109829970B (en) Mass data parallel volume rendering method based on unequal-size blocks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant