GB2493425A - Constructing an acceleration structure - Google Patents

Constructing an acceleration structure Download PDF

Info

Publication number
GB2493425A
GB2493425A GB1212642.1A GB201212642A GB2493425A GB 2493425 A GB2493425 A GB 2493425A GB 201212642 A GB201212642 A GB 201212642A GB 2493425 A GB2493425 A GB 2493425A
Authority
GB
United Kingdom
Prior art keywords
primitives
another embodiment
acceleration structure
split
construction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1212642.1A
Other versions
GB201212642D0 (en
Inventor
Kirill Vladimirovich Garanzha
Jacopo Pantaleoni
David Kirk Mcallister
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nvidia Corp
Original Assignee
Nvidia Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nvidia Corp filed Critical Nvidia Corp
Publication of GB201212642D0 publication Critical patent/GB201212642D0/en
Publication of GB2493425A publication Critical patent/GB2493425A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/06Ray-tracing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/005Tree description, e.g. octree, quadtree
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/12Bounding box

Abstract

A system, method, and computer program product are provided for constructing an acceleration structure, in use, a plurality of primitives associated with a scene is identified and acceleration structure is constructed, utilizing the primitives. The structure may include a hierarchical linearized bounding volume hierarchy, a plurality of nodes including child nodes representing bounding boxes located within a parent node and leaf nodes representing one or more primitives residing within respective parent bounding boxes. The construction may include sorting the primitives along a space filling curve that spans a bounding box of the scene and is determined by calculating a Morton code of a centroid of each primitive or by using a least significant radix algorithm. The primitives may be clustered using a run-length compression algorithm and the primitives may be partitioned within each cluster and the construction may be performed entirely in a GPU.

Description

I
SYSTEM, METHOIX AND COMPUTER PROGRAM PRODUCT FOR CONSTRUCTING AN ACCELER4TION
STRUCTU RE
FIELD OF THE INVENTION
[0001] The present invention relates to rendering images. and more particularly to performing ray tracing.
BACKGROUND
[0002] Traditionally, ray tracing has been used to generate images within a displayed scene. For example. inrenections between a plurality of rays and a plurality of primitives of the displayed scene maybe determined in order to render images associated with the primitives. However, current techniques for performing ray tracing have been associated with various lindtadcins, [0003] For example, current methods for pertbrming ray tracing may inefficiently construct acceleration strucwres used in association with the ray tracing. This may result in time-intensive construction of acceleration structures that are associated with large amounts of primitives.
(0004] There is thus a need for addressing these andior other issues associated with
the prior art.
SUMMAx [0005] A system, methon, and computer program product am ncovicled for construcing an acc&eration structure. in use, a plurality of primitives associated with a scene is identified, Additionally, an acceleration structure is constructed, utilizing the orimitives BRIEF DESCRWHON OF tUt DRAWINGS [0006] Figure 1 shows a method for constructing an acceleration structure, in accordance with one embodiment.
[0007] FIgure 2 shows a task queue system used in performing partitioning during the construction of an acceleration structure, in accordance with another embodiment.
[0008] Figure 3 shows a sorting of a group of primitives using Morton codes, in accordance with yet another embodiment.
10009] Figure 4 shows a plurality of middle-split queues corresponding to the sorting performed in Figure 3, in accordance with yet another embodiment.
[0010j FigureS shows a data flow visualization of a SAN binning procedure, in accordance with yet another embodiment.
[0011] Figure 6 illustrates an exemplary system in which the various architecture and/or functionality of the various previous embodiments may be implemented.
aIAQ&UQBfl (0012) Figure 1 shows a method 100 for constmcting an acceleration structure, in accordance with one embodiment. As shown in operation 102, a plurality of primitives associated with a scene is identified. in one embodiment, the scene may include a scene that is in the process of being rendered. For example. the scene may be in the process of being rendered using ray tracing. In another embodiment, the plurality of primitives may be included within the scene. For example, the scene may be composed of the plurality of the primitives. In yet another embodiment, the plurality of' primitives may include a plurality of triangles. Of course, however, the plurality of primitives may include any primitives used to perform ray tracing.
[0013] Additionally, as shown in operation 104, an acceleration structure is constructed, utilizing the primitives. In one embodiment, the acceleration structure may include a bounding volume hierarchy (Dvii). In another embodiment, the acceleration structure may include a linearized bounding volume hierarchy (LBVH). In yet another embodiment, the acceleration structure may include a hierarchical linearized bounding volume hierarchy (HLBVH).
[0014] In another embodiment, the acceleration strucwre may include a plurality of nodes. For example, the acceleration structure may include a hierarchy of nodes, where child nodes represent bounding boxes located within respective parent node bounding boxes, and where leaf nodes represent one or more primitives that reside within respective parent bounding boxes, In this way, the acceleration structure may include a bounding volume hierarchy which may organize the primitives into a plurality of hierarchical boxes to be used during ray tracing.
[0015] Further, in one embodiment, constructing the acceleration structure may include sorting the primitives. For example, the primitives may be sorted along a space-filling curve (e.g., a Morton curve, a 1-lilbert curve, etc.) that spans a bounding box of the scene. In another embodiment, the space-filling curve may be determined by calculating a Morton code of a centroid of each primitive in the scene (e.g., an average location in the middle of the primitive may be transformed from three dimensional (3D) coordinates into a one dimensional coordinate associated with a recursively designed Morton curve, etc.).
[00161 In another example, the sorting may be performed utilizing a least significant digit mdix sorting algorithm. In another embodiment, constructing the acceleration structure may include forming clusters of primitives (e.g., coarse cluster of primitives, etc.) within the scene, For example, die clusters may be formed utilizing a mn-length encoding compression algorithm.
[0017] Further still, in one embodiment, constructing the acceleration structure may include partitioning primitives within each formed cluster. For example, constructing the acceleration structure may include partitioning all primitives within each cluster using spatial middle splits (e.g. LBVH-style spatial middle splits, etc.). In another example, constructing the accelention structure may include creating a tree (e.g., a top-level tree, etc.), utilizing the clusters. For example, constructing the acceleration structure may include creating a top-level tree by partitioning the clusters (e.g., utilizing a binned surface area heuristic (SAil), a SAH-optimized tree construction algorithm, etc.). In another embodiment, the SAH may utilize a parallel binning scheme.
(0018] Also, in one embodiment, partitioning the primitives and the clusters may be performed utilizing one or more task queues. For example, a task queue system may be used to parallelize work during the construction of the acceleration structure (e.g.. by creating a pipeline, etc.). In another embodiment, the acceleration structure may be constructed utilizing one or more algorithms. For example, sorting the primitives, forming the clusters of the primitives, partitioning the prImitives, and creating the tree may all be performed utilizing one or more algorithms.
(0019] Additionally, in one embodiment, constructing the acceleration structure may be performed utilizing a graphics pmcessing unit ((PU). For example, a (PU may perform the entire construction of the acceleration structure, In this way, the transfer of data between the OPU and system memory associated with a central processing unit (CPU) may be avoided, which may decrease the time necessary to construct the acceleration structure.
[00201 More illustrative information will now be set forth regaitling various optional architectures and features with which the foregoing framework may or may not be implemented, per the desires of the user. It should be strongly noted that the following information is set forth for illustrative purposes and should not be construed as limitIng in any manner. Any of the tbllowing features may be optionally incorporated with or without the exclusion of other features described.
(0021] Figure 2 shows a task queue system 200 used in performing partitioning during the construction of an acceleration structure, in accordance with another embodiment. As an option, the present task queue system 200 may be carried out in the context of the functionality of Figure 1. Of course, however, the task queue system 204) may be implemented in any desired envfronment. It should also be noted that the aforementioned definitions may apply during the present description.
[0022] As shown, the task queue system 200 includes a plurality of warps 2024 and 2028 that each fetch sets of tasks to process (e.g., from an input queue. etc.). In one embodiment, each of the plurality of warps 2024 and 2028 may include a unit of work (e.g., a physical SIMT unit of work on a CPU, etc.). In another embodiment, each individual task may correspond to processing a single node during the construction of an acceleration structure.
(0023] Additionally, in one embodiment, at run time, each of the plurality of warps 202A and 2028 may continue to fetch sets of tasks to process from the input queue, where each set may contain one task per thread. Additionally, each of the plurality of warps 202A and 2028 may use a single global memory atomic add per warp to update the queue head. Further, each thread in each of the plurality of warps 202A and 20213 computes a number of output tasks 204 that it will generate.
[0024] Further still, after each thread in each of the plurality of warps 202A and 20213 has computed the number of output tasks 204 that it will generate, all threads in each of the plurality of warps 202A and 2028 participate in a warp-wide prefix sum 206 to compute the offset of their output tasks relative to the common base of each of the plurality of warps 202A and 2028. In one embodiment, the first thread in each of the plurality of warps 202A and 2028 may perform a single global memory atomic add to compute a base address in an output queue of the plurality of warps 202A and 2028.
Also, in one embodiment, a separate queue may be used per level, which may enable all the processing to be performed inside a single kernel call, while at the same time producing a breadth-first tree layout [0025] In one embodiment, constructing the acceleration structure may include using one or more algorithms to create both a standard LBVH and a higher quality SAIl hybrid.
See, for example, "H LB VH: Hierarchical LBVH construction for real-rime ray tracing of dynamic geometry," (Pantaleoni et aL, High-Performance Graphics 2010. ACM Siggraph / Eurographics Symposium Proceedings, Eurographics, 87-95), which is hereby incorporated by reference in its entirety, and which describes methods for constructing an LBVH and an HLBVH.
(0026] Additionally, in another embodiment, constructing the acceleration structure may include sorting primitives along a 30-bit Morton curve that spans a bounding box of a scene. See, for example, "Fast bvh construction on OPUs," (Lauterbach et al.. Coinput.
Graph. Forum 28,2,375-384), which is hereby incorporated by reference in its entirety, and which describes methods for sorting primitives and constructing BVHs. In yet another embodiment, the primitives may be sorted utilizing a brute forte algorithm (e.g., a least-significant digit radix sorting algorithm, etc.).
[0027] In still another embodiment, utilizing an observation that Morton codes define a hierarchical grid, where each 3n bit code identifies a unique voxel in a regular grid with 2" entries per side, and where in one embodiment, the first 3m bits of the code identify the parent voxel in the coarser grid with 2" subdivisions per side, coarse clusters of objects may be formed falling in each 3m bit bin. In another embodiment, the grid in which the unique voxel is identified may include different amounts of entries per side. In yet another embodiment, forming the coarse clusters of objects may be performed utilizing an instance of a run--length encoding compression algorithm, and may be irnp;emented with a. single compaction operaflon.
[04128] Further, in one embodiment, after the clusters are identified, all the primitives may he partitioned inside each cluster (e.g.. using LRVH-style spatiai middle splits, etc.).
In another embodiment, a top-level tree may toen be created. where the clusters may he partitioned with a binned SAB builder, See, for example. "On fast Construction of SAU based Bounding Volun'te Hierarchies." (Wald, I., In Proceeding-s of the 2007 Eurographics/IFEiF. Symposium on Interactive Ray Tracing, Eurographics). which is hereby incorporated by reference in its entirety, and which describes methods for partitioning dusters.
[0029] Further still, in one embodment, both the spatial middle split partitioning ann the SAR builder may rely on an efficient task queue system (e.g.. the task cItcue system 200, etc.), which may parallelize work over the individual nodes of the output hierarchies.
[0030] Also, in one embodiment, middle spIt hierarchy emission may he performed.
For example, it may he noted that each node in the hierarchy may correspond to a consecutive range of primitives sorted by their Morton codes, and that splitting a node may require finding the first element in the range whose code differed from the preceding element. Additionally, in another embodiment. complex machinery may be avotacd by reverting to a standard ordering that may be used on a seria device. For example, each node may he mapped to a single thread, and each thread may he allowed, to find its own split plane.
[0031] in yet another embodiment, instead of looping through the entire range of primitives in the node, it may be observed that it is possible to reformulate the proHem as a simple binary search. For example, it maybe determined that if a node is located at a level I, the Morton codes of the primitives of the nodes may have the exact same set of high /-I hits. In another embodiment, the first bit p 1 by which the first and last Morton a code in the node's range differ may be determined. n still another embodiment, a binary search may be performed to locate die first Morton code thai contains a I at bt p. [0032] In this way, for a node containing N' primitives, the algorithm may find the split plane by touching only O(log2(N)) memory cells, instead of the entire set of N Morton codes.
[0033 Additionally, in one embed iment, middle spits may sometimes fail, whtch may lead t.o occasional large eaves. In another embodiment, when such a failure is detected. the leaves may he split by the object--median. in yet another embodiment, after the topology of the BHV has been computed, a bottom--up re4itting procedure may he run to compute the bounding boxes of each node in the tree. This process may be simplified by the fact that the BVH is stored in hreadth4irsr order. In another enhooirnent, one kernel launch may be used pci tree level, and one thread. may be used per node in the le s'el.
[0034] Figure 3 shows a sorting 300 of a group of phniitives using Morton codes, in accordance with another embodiment. As an option. the present sorting 300 may be carried out in the context of the functionality of Figures 14. 01 course, however, the sorting 300 may he implemented in any desired environment. it should also be noted that the aforementioned definitions may apply-during the present description.
[0035] As shown, centroids of a plurality at' hounded primitives 302Aj located within a twodimensional projection are each assigned Morton codes (e.g., four'hit Morton codes, etc.). Additionally, the plurality' of bounded primitives 302AJ are sorted into a sequence of rows 306 Aj, where the assigned Morton codes are used as keys. For example, for every respective pnrnitive of sequence 30$ AJ, the Morton code bits are shown in separate rows 308. Additionally, binary search partitions 310 are made to the sequence of rows 30$ A4. Further. Figure 4 shows a pluralit of iniddIespUt queues 402AE corresponding to the sortinc; 300 peribrmed in Figure 3, in accordance with another embodiment. to
10036] Additionally, in one embodiment, a SAH-opdmized tree construction algorithm may be run over the coarse clusters defined by the first 3m bits of the Morton curve. In one embodiment, in may be between 5 and 7. Of course, however, in may include any integer. In another embodiment, the construction algorithm may run in a bounded memory footprint. For example, if Nclusters are processed, space may be preallocated only for 2N4 nodes.
(0037] Table 1 illustrates pseudo-code for the SAH binning procedure associated with the optimized tree construction algorithm. Of course, it should be noted that the pseudo-code shown in Table us set forth for illustrative purposes only, and thus should not be construed as limiting in any manner.
Table 1
lntqin = 0; mt numQElerns = 1; hltop_queueJnlt(queueLqln], Clusters, numClusters); while(numQElems> 0) // Init aft bins (empty boundIng boxes, reset counters) blns_inlt(queue[qln], numQElems); // compute bin statistics accumulate_blns(queue(qln], Clusters, numClusters); mt output_counter = 0; // compute best spOts sah_spftt( queue(qin], numQElems, queue(1-qln], &output_counter, BvhReferences, numBvhNodes); // dIstribute clusters to their new spilt task distrlbute_dusters( queue(qinj, Clusters, numClusters); numQElems = output..,,counter; numBvhNodes += output_counter; qln = 1 -qin; BvhLevelOffset(numevhLevels÷+J = numBvhNodes; 3. I]
[0038] In one embodiment, in a pass. a cluster from the prior pass (with its aggregate hounding box) may he treated as a pri rnitive. In another embodiment, the computation may be split into split tasks organized in a single input queue and a single output queue.
In yet another embodiment, each task may correspond to a node that needs to he split, and may be. described by three input fields (e.g, the node's hounding box, the number of clusters inside the node, and the node ID).
[00391 Additionally, in one embodiment, two additional nodes may be computed on the fly (e.g.. the best split platte and the ID 4 the first child split task). In another embodiment, these fields may be stored in a structure of an-ays (SOA) format, which may keep a number (eg, five. etc,) of separate arrays indexed by a task ID. In yet another embodiment, an array (eg., cluster split td. etc.) may he kept that maps each cluster to the current node (ic. split task, etc) it belongs to. where the array may he updated with every splitting operation.
[0640] Further, in one embodiment, the loop in Fable 1 may start by assigning all clusters to the root node, which may form a split-task (I. Then, for each oop iteration, binning, SM-I evaluation,and cluster distribution steps maybe peribrmed. For example.
each n-ode's bounding box may he split into i (e,g.. /v! including an integer such as eight.
etc) sab-shaned bins in each dimension, See, for example, "Ray Tracing Deformable Scenes using Dynamic Bounding Vohn'ne Hierarchies," (Waid, et al., ACM Transactions on Graphics 26, 1. 48.5493), which is hereby incorporated by reference in its entirety, and which describes methods for spfltting node bounding boxes, [0041] Further still, in another embodiment, a bin may store an initially ertpty bounding box and a count, fri yet another embodiment, each cluster's bounding box may he accumulated into the bin containins, its centroid, and the count of the number of clusters failing within the bin may be atomically incremented. In still another embodiment, this procedure may be executed in parallel across the clusters, where each thread may look, at a single duster and may accumulate its bounding box into the corresponding bin within the corresponding split-task, using atomic mitt/max to grow the bins' bounding boxes, [0042] Also, in one embodiment, for each split-task in the input queue, the surface area metric may be evaluated for all the split planes in each dimension between the uniformly distributed bins, and the best one may be selected. In another embodiment, if the split-task contains a single cluster, the subdivision may be stopped; otherwise, two output split-tasks may be created, where bounding boxes corresponding to the left and right subspaces may be determined by the SAIl split.
[00431 In addition, in one embodiment, the mapping between clusters and split-tasks may be updated, where each cluster may be mapped to one of the two output split-tasks generated by its previous owner. In order to determine the new split-task ID, the -th cluster's bin id may be compared to the value stored in the best split field of the corresponding split-task. Table 2 illustrates pseudo-code for a comparison of the i-tb cluster's bin Id to the value stored in the best split field of the corresponding split-task.
Of course, it should be noted that the pseudo-code shown in Table Zis set forth for illustrative purposes only, and thus should not be construed as limiting in any manner.
Table2
hit old_id = cluster_splitJd(i]; hit binJd = cluster_binJd[iJ; liii split_id = queue(in].best_split( old_id J; mt new_id = queue[in].new_task[ old_id]; cluster_split_id(i] = newjd + (bin_id c splitjd ? 0: 1); [0044] Further, in one embodiment, there may be some flexibility in the oMer of the algorithm phases. For example, refitting may be performed separately for bottom-level and top-level phases to trade off cluster bounding box precision against parallelism.
[0045:1 Agure S shows a data flow visualization 500 of a SM! binning procedure, in accordance with another embodiment. As an option, the present data flow visualization 500 may be carried out in the context cyf the functionality of Figures 14, Of course.
however, the data [row VIsualIrati.on 500 may be impiemente.d in any desired envrronment, It should also be noted that the aforementioned definitions may appk' during the present cescription.
[0046] As shown, clusters 502A and 502B contribute to forming the bin statistics 504 of their parent node. Additionally, nodes in the input task queue 506 are split, generating two entries 508A and 5USD into the output queue 510.
[0047] Additionally, in one embodiment. specialized builders for clusters of fine intricate geometry (e.g, hair, fur, Foliage, etc.) may be integrated. in another embodiment, this work may be easily integrated with triangle splitting, strategies. See, for example. "Early split clipping for bounding volume hierarchies." (Ernst, et nil., Symposium on Interactive Ray Tracing, 0, 737$), which is hereby incorporated by reference in its entirety, and which describes triangle splitting strategies. In yet another embodiment, cosrivress-sorudecompress techniques may be re-incomorated in order to exploit coherence internal to the mesh.
[0048] In this way, HLBVH may be unpemented. based on generic task queues.
which may inciud.e a flexible parad[gm or work dispatching that may be used to build simple and fast jarailei algorithms. Additionafly. in one embodiment, the same mechanism may be used to implement a massivek' panniel binned SAH builder for the high quality H RVH variant. l another embodiment, the HLBVH impementatior may he pe' lot med enttre..y on the GPI' h this way ss ich'wuratio' and netnov copies hetwcer CPU and (JELl imny he e'n'run fle r cxamplc, when o lining the eliminahon of these overheads the resulting builder may be faster (e.g., 5l0 times faster, etc.) than previous techniques. In another example, when considering just the kernel times aione may also he faster (e.g., up to 3 time.s faster, etc.) than previous techniques.
[0049] Adñitionaiiy, irt one embodiment, high quality bounding. volume, hierarchies may he produced, in reiil4i me even for moderately complex models. In another embodiment, the algorithms may be faster than previous HLBVH imolementations. This 4.
may be. possible thanks to a general simphfication offered by the adoption of work queues, which may allow a significant reduction in the number of high latency kernel launches and may reduce data transformation passes.
[0050] Further, in one embodiment, hierarchical linear bounduig volume hierarchies (HLB VHs) may be able to reconstructing the spatial index needed for ray tracing in real-time, even in the presence or. millions of fully dynarnw triangles. In another ernhodinent, the aforernendoned algorithms may enable a simpler and faster variant of UL.BVH, where all the complex bookkeeping of prefix sums, compaction and artial breadth-first tree traversal needed for spatial partitioning may be replaced with an elegant pipeline built on top of efficient work queues and binary search. In yet another embodiment, the new algorithm may be both faster and more memory efficient, which tTIay remove the need for temporary storage of geometry data for intermediate computations. Also, in one.
esnhodirnem, the same pipeline may be extended to paraEle!ize the construction of the top-level SAH optimh ed tree on the CPU, which may eliminate round-trips to the CPU.
thereby accelerating the overall construction speed (e.g., by a [actor of five to ten times.
etc.).
[0051] In another embodiment, a novel variant of hierarchical linear hounding volume hierarchies (.HLBVHs) may he provided that is stmple, fast and easy to generalize, in one embodiment, an ad--hoc, complex mix of prefix-sums, compaction and partial breadth-first tree traversal primitives used to perform an actual object partitioning step may he replaced with a single, elegant pipeline based on efficient work-queues, in this way, the origtnal. HI..BVH algorithm mar be simplified, and superior speeds may be ot'fbred Additionally, in one embodiment, the new pipeline may also remove the need for all additional temporary storage that may have been previously required.
[0052] Further still, in one embodiment, surface area heuristic (SAlt optimzed HLBVH hybrid may he parallelized. For example. the added flexibility of a task-based prpeltne may be combined with the efficiency of a parallel binning scheme. In this way, a speedup factor of up to ten times traditional methods may he obtained, Additionally, by parallelizing the entire pipeline, all acceleration structure construction may he run on the OPtS, which may eliminate costly copies between a CPU and GP.J nwmory spaces.
[0053] Also, in one embodiment, all algorithms used to construct the acceleradon structure may he implemented using CUDA parallel computing architecture. See, for example, "Scalable parallel programming with cuda" (Niekolls, et aL, ACM Queue 6, 2.
4153), which is hereby incorporated by reference in its entirety, and which describes tmplemcntauons of parallel computing with CUDA, Additionally. the construction of the acceleration structure may be performed utilizing efficient sorting primitives. See, for example. "Revisiting sorting for GPGPU stream architectures," (Merrill, et al,,Tech, Rep. CS2OIO-03, Department of Computer Science, University of Virginia, February), which is hereby incorporated by reference in its entirety. and which describes efficient sorting. piitives.
[0054 Additionally, in one embodiment, the acceleration structure may include constructing a RVH. For example, a 3D extent of a scene may he discretized using n bits per dimension, and each point may he assigned a linear coordinate along a space4ihing Morton curve of order n (which may be computed by interieaving th.e binary digits of the dscreuzed coorUinates. In another enihothnent, pnmnrves may then be sorted according to the Morton code of their centrosd, in still another embodiment, the hierarchy may be built by grouping the primitives in clusters with the sante 3n bit code, then grouping the. clusters with the same 3tn* I) high order bits, and so on, UntAl a complete tree is built. in yet another embodiment, the 3m high order bits of a Morton code may identify the isarent voxel in a coarse grid with 2" divisions per side, such that this process may correspond to splitting the primitives recursively in the spatial middle.
from top to bottom.
[0055] Further, in one embodiment. HLBVH may improve on the basic algorithm in multiple ways. For example. it may provide a faster construction algorithm applyinga compresssotideconipress strategy to exploit spatiar and temporal coherence in the input mesh, in another example, it may introduce a highquality hybrid builder, in which the top of the hierarchy is built using a Surface Area Heuristic (SAS) sweep bulkier over the clusters defined by the voxelization at level m. See, for example. "Automatic creation of object hierarchies for ray tracing.t' (Goldsmith, et al., IEEE Computer Graphics and Applications 7, 5, 14-20), which is hereby incorporated by reference in its entirety, and which describes an exemplary SAM.
[0056] In another embodiment, a custom scheduler may be built based on task-queues to implement a light-weight threading model, which may avoid overheads of built in hardware threads support. See, for example, Past Construction of SAM BV}ls on the Intel Many Integrated Core (MIC) Architecture," (Wald, 1, IEEE Transactions on Visualization and Computer Graphics), which is hereby incorporated by reference in its entirety, and which describes a parallel binned-SAN BVFI builder optimized for a prototype many core architecture, [0037) Figure 6 illustrates an exemplary system 600 in which the various architecture and/or functionality of the various previous embodiments may be implemented. As shown, a system 600 is provided including at least one host processor 601 which is connected to a communication bus 602. The system 600 also includes a main memory 604. Control logic (software) and data are stored in the main memory 604 which may take the form of random access memory (RAM).
[0058] The system 600 also includes a graphics processor 606 and a display 608. i.e. a computer monitor. In one embodiment, the graphics processor 606 may include a plurality of shader modules, a rasterization module, etc. Each of the foregoing modules may even be situated on a single semiconductor platform to form a graphics processing unit (OPU).
[0059] In the present description, a single semiconductor platform may refer to a sole unitary semiconductor-based integrated circuit or chip. It should be noted that the term single semiconductor platform may also refer to multi-chip modules with increased connectivity which simulate on-chip operation, and make substantial improvements over utilizing a conventional central processing unit (CPU) and bus implementation. Of course, the various modules may also be situated separately or in various combinations of semiconductor platforms per the desires of the user.
[0060] The system 600 may also include a secondary storage 610. The secondary storage 610 includes, for example, a hard disk drive and/or a removable storage drive, representing a floppy disk drive, a magnetic tape drive, a compact disk drive, etc. The removable storage drive reads from and/or writes to a removable storage unit in a well known manner.
[00611 Computer programs, or computer control logic algorithms, may be stored in the main memory 604 and/or the secondary storage 610. Such computer programs, when executed, enable the system 600 to perform various functions. Memory 604, storage 610 and/or any other storage are possible examples of computer-readable media.
[0062] In one embodiment, the architecture and/or functionality of the various previous figures may be implemented in the context of the host processor 601, graphics processor 606, an integrated circuit (not shown) that is capable of at least a portion of the capabilities of both the host processor 601 and the graphics processor 606, a chipset (i.e. a gmup of integrated circuits designed to work and sold as a unit for performing related functions, etc.), and/or any other integrated circuit for that matter.
[0063] still yet, the architecture and/or functionality of the various previous figures may be implemented in the context of a general computer system, a circuit board system, a game console system dedicated for entertainment purposes, an application-specific system, and/or any other desired system. For example, the system 600 may take the for i of a desktop computer, lap-top computer, and/or any other type of logic. Still yet, the system 600 may take the form of various other devices m including, but not limited to a personal digital assistant (PDA) device, a mobile phone device, a television, etc. (0OMJ Further, while not shown, the system 600 may be coupled to a network (e.g. a telecommunications network, local area network (LAN). wireless network, wide area network (WAN) such as the Internet, peer-to-peer network, cable network, etc.) for communication purposes.
1006S] While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation.
Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents. CLATh
GB1212642.1A 2011-08-04 2012-07-16 Constructing an acceleration structure Withdrawn GB2493425A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/198,656 US20130033507A1 (en) 2011-08-04 2011-08-04 System, method, and computer program product for constructing an acceleration structure

Publications (2)

Publication Number Publication Date
GB201212642D0 GB201212642D0 (en) 2012-08-29
GB2493425A true GB2493425A (en) 2013-02-06

Family

ID=46799698

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1212642.1A Withdrawn GB2493425A (en) 2011-08-04 2012-07-16 Constructing an acceleration structure

Country Status (6)

Country Link
US (1) US20130033507A1 (en)
JP (1) JP2013037691A (en)
KR (1) KR20130016120A (en)
CN (1) CN103106681A (en)
DE (1) DE102012213292A1 (en)
GB (1) GB2493425A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107635750A (en) * 2015-07-31 2018-01-26 惠普发展公司,有限责任合伙企业 Part for 3D printing structure envelope, which is arranged, to be determined
EP3319047A1 (en) * 2016-11-04 2018-05-09 Samsung Electronics Co., Ltd. Method and apparatus for generating acceleration structure
EP3675049A1 (en) * 2018-12-28 2020-07-01 Intel Corporation Apparatus and method for acceleration data structure refit

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105957134B (en) * 2011-08-05 2019-11-08 想象技术有限公司 The method and apparatus for creating and updating for 3-D scene acceleration structure
US9595074B2 (en) * 2011-09-16 2017-03-14 Imagination Technologies Limited Multistage collector for outputs in multiprocessor systems
US9424685B2 (en) 2012-07-31 2016-08-23 Imagination Technologies Limited Unified rasterization and ray tracing rendering environments
GB2546020B (en) * 2012-11-02 2017-08-30 Imagination Tech Ltd Method of scheduling discrete productions of geometry
US20140340412A1 (en) * 2013-05-14 2014-11-20 The Provost, Fellows, Foundation Scholars, & the other members of Board, et al. Hardware unit for fast sah-optimized bvh constrution
US9946658B2 (en) 2013-11-22 2018-04-17 Nvidia Corporation Memory interface design having controllable internal and external interfaces for bypassing defective memory
US8817026B1 (en) 2014-02-13 2014-08-26 Raycast Systems, Inc. Computer hardware architecture and data structures for a ray traversal unit to support incoherent ray traversal
US9990758B2 (en) * 2014-03-31 2018-06-05 Intel Corporation Bounding volume hierarchy generation using a heterogeneous architecture
US20160139919A1 (en) * 2014-11-14 2016-05-19 Intel Corporation Machine Level Instructions to Compute a 3D Z-Curve Index from 3D Coordinates
US20160139924A1 (en) * 2014-11-14 2016-05-19 Intel Corporation Machine Level Instructions to Compute a 4D Z-Curve Index from 4D Coordinates
US9984492B2 (en) * 2015-04-02 2018-05-29 Qualcomm Incorporated Efficient hierarchy traversal in ray tracing applications
KR102537530B1 (en) * 2015-10-26 2023-05-26 삼성전자 주식회사 Method and apparatus for generating acceleration structure
KR102570584B1 (en) * 2015-12-02 2023-08-24 삼성전자 주식회사 System and Method for constructing a Bounding Volume Hierarchy Tree
US10559125B2 (en) * 2015-12-02 2020-02-11 Samsung Electronics Co., Ltd. System and method of constructing bounding volume hierarchy tree
KR102604737B1 (en) 2016-01-11 2023-11-22 삼성전자주식회사 METHOD AND APPARATUS for generating acceleration structure
EP3206190A1 (en) * 2016-02-15 2017-08-16 Thomson Licensing Device and process for improving efficiency of image rendering
TWI636422B (en) * 2016-05-06 2018-09-21 國立臺灣大學 Indirect illumination method and 3d graphics processing device
KR20180069461A (en) * 2016-12-15 2018-06-25 삼성전자주식회사 Method and apparatus for generating acceleration structure
GB2612147A (en) * 2022-01-12 2023-04-26 Imagination Tech Ltd Building an acceleration structure for use in ray tracing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100097372A1 (en) * 2008-10-17 2010-04-22 Caustic Graphics, Inc. Synthetic acceleration shapes for use in ray tracing
CN101819675A (en) * 2010-04-19 2010-09-01 浙江大学 Method for quickly constructing bounding volume hierarchy (BVH) based on GPU
CN101819684A (en) * 2010-04-12 2010-09-01 长春理工大学 Spatial acceleration structure for virtual three-dimensional scene of animated film and creation and update method thereof
US20110050698A1 (en) * 2006-09-19 2011-03-03 Caustic Graphics, Inc. Architectures for parallelized intersection testing and shading for ray-tracing rendering
WO2011053181A1 (en) * 2009-10-30 2011-05-05 Intel Corporation Graphics rendering using a hierarchical acceleration structure

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6570578B1 (en) * 1998-04-03 2003-05-27 Avid Technology, Inc. System for automatic generation of selective partial renderings of complex scenes
US6583787B1 (en) * 2000-02-28 2003-06-24 Mitsubishi Electric Research Laboratories, Inc. Rendering pipeline for surface elements
EP2052366A2 (en) * 2006-08-15 2009-04-29 Mental Images GmbH Simultaneous simulation of markov chains using quasi-monte carlo techniques
KR101550477B1 (en) * 2008-03-21 2015-09-04 이메지네이션 테크놀로지스 리미티드 Architectures for parallelized intersection testing and shading for ray-tracing rendering
US9483864B2 (en) * 2008-12-05 2016-11-01 International Business Machines Corporation System and method for photorealistic imaging using ambient occlusion
US8669977B2 (en) * 2009-10-01 2014-03-11 Intel Corporation Hierarchical mesh quantization that facilitates efficient ray tracing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110050698A1 (en) * 2006-09-19 2011-03-03 Caustic Graphics, Inc. Architectures for parallelized intersection testing and shading for ray-tracing rendering
US20100097372A1 (en) * 2008-10-17 2010-04-22 Caustic Graphics, Inc. Synthetic acceleration shapes for use in ray tracing
WO2011053181A1 (en) * 2009-10-30 2011-05-05 Intel Corporation Graphics rendering using a hierarchical acceleration structure
CN101819684A (en) * 2010-04-12 2010-09-01 长春理工大学 Spatial acceleration structure for virtual three-dimensional scene of animated film and creation and update method thereof
CN101819675A (en) * 2010-04-19 2010-09-01 浙江大学 Method for quickly constructing bounding volume hierarchy (BVH) based on GPU

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"KD-Tree Acceleration Structures for a GPU Raytracer",Tim Foley, Jeremy Sugerman, Proc of Graphics Hardware 2005, http://graphics.stanford.edu/papers/gpu_kdtree/kdtree.pdf *
"Rendering Complex Scenes with Memory-Coherent Ray Tracing", graphics.stanford.edu/papers/coherentrt/coherentrt-figs.pdf, To appear in Proc of SIGGRAPH 1997. *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107635750A (en) * 2015-07-31 2018-01-26 惠普发展公司,有限责任合伙企业 Part for 3D printing structure envelope, which is arranged, to be determined
US10518473B2 (en) 2015-07-31 2019-12-31 Hewlett-Packard Development Company, L.P. Parts arrangement determination for a 3D printer build envelope
CN107635750B (en) * 2015-07-31 2020-11-27 惠普发展公司,有限责任合伙企业 Method and apparatus for determining the arrangement of components to be printed in a build envelope
EP3319047A1 (en) * 2016-11-04 2018-05-09 Samsung Electronics Co., Ltd. Method and apparatus for generating acceleration structure
US10460506B2 (en) 2016-11-04 2019-10-29 Samsung Electronics Co., Ltd. Method and apparatus for generating acceleration structure
EP3675049A1 (en) * 2018-12-28 2020-07-01 Intel Corporation Apparatus and method for acceleration data structure refit
EP3798992A1 (en) * 2018-12-28 2021-03-31 INTEL Corporation Apparatus and method for acceleration data structure refit
US11501484B2 (en) 2018-12-28 2022-11-15 Intel Corporation Apparatus and method for acceleration data structure refit

Also Published As

Publication number Publication date
GB201212642D0 (en) 2012-08-29
US20130033507A1 (en) 2013-02-07
JP2013037691A (en) 2013-02-21
CN103106681A (en) 2013-05-15
DE102012213292A8 (en) 2013-04-18
DE102012213292A1 (en) 2013-02-07
KR20130016120A (en) 2013-02-14

Similar Documents

Publication Publication Date Title
GB2493425A (en) Constructing an acceleration structure
Garanzha et al. Simpler and faster HLBVH with work queues
US8072460B2 (en) System, method, and computer program product for generating a ray tracing data structure utilizing a parallel processor architecture
US7002571B2 (en) Grid-based loose octree for spatial partitioning
Bédorf et al. A sparse octree gravitational N-body code that runs entirely on the GPU processor
US9721320B2 (en) Fully parallel in-place construction of 3D acceleration structures and bounding volume hierarchies in a graphics processing unit
US8570322B2 (en) Method, system, and computer program product for efficient ray tracing of micropolygon geometry
US8773422B1 (en) System, method, and computer program product for grouping linearly ordered primitives
US8922550B2 (en) System and method for constructing a bounding volume hierarchical structure
US20090109219A1 (en) Real-time mesh simplification using the graphics processing unit
US20130235050A1 (en) Fully parallel construction of k-d trees, octrees, and quadtrees in a graphics processing unit
JP4858795B2 (en) Instant ray tracing
Liu et al. Exact and adaptive signed distance fieldscomputation for rigid and deformablemodels on gpus
CN112102467A (en) Parallel octree generation and device based on GPU and electronic equipment
Zellmann et al. A linear time BVH construction algorithm for sparse volumes
Zellmann et al. Rapid kd Tree Construction for Sparse Volume Data.
EP3319047A1 (en) Method and apparatus for generating acceleration structure
Hastings et al. Optimization of large-scale, real-time simulations by spatial hashing
Zellmann et al. Hybrid grids for sparse volume rendering
CN106570934A (en) Modeling method for spatial implicit function in large-scale scene
Du et al. DCCD: Distributed N-Body Rigid Continuous Collision Detection for Large-Scale Virtual Environments
Sanjurjo et al. Parallel global illumination method based on a nonuniform partitioning of the scene
KR100674428B1 (en) Adaptive load balancing apparatus using a combination of hierarchical data structures for parallel volume rendering
Chen et al. BADF: Bounding Volume Hierarchies Centric Adaptive Distance Field Computation for Deformable Objects on GPUs
KR101651827B1 (en) The Voxelization Method of Objects using File handling and Parallel Processing

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)