WO2022040472A1 - Système et procédé de lancer de rayons accéléré - Google Patents

Système et procédé de lancer de rayons accéléré Download PDF

Info

Publication number
WO2022040472A1
WO2022040472A1 PCT/US2021/046777 US2021046777W WO2022040472A1 WO 2022040472 A1 WO2022040472 A1 WO 2022040472A1 US 2021046777 W US2021046777 W US 2021046777W WO 2022040472 A1 WO2022040472 A1 WO 2022040472A1
Authority
WO
WIPO (PCT)
Prior art keywords
rtu
acceleration structure
ray
intersection
shader
Prior art date
Application number
PCT/US2021/046777
Other languages
English (en)
Inventor
Mark Evan Cerny
Original Assignee
Sony Interactive Entertainment LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Interactive Entertainment LLC filed Critical Sony Interactive Entertainment LLC
Priority to CN202180050800.2A priority Critical patent/CN116157841A/zh
Priority to KR1020237008976A priority patent/KR20230078645A/ko
Priority to JP2023512444A priority patent/JP2023538127A/ja
Priority to EP21859164.2A priority patent/EP4200811A1/fr
Publication of WO2022040472A1 publication Critical patent/WO2022040472A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/06Ray-tracing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/21Collision detection, intersection

Definitions

  • the application relates generally to systems and methods for accelerated ray tracing.
  • Ray tracing is used to simulate optical effects in computer-generated 3D graphics by tracing the path of light from the eye of an imaginary observer (typically the camera location) to virtual objects in the graphics. Ray tracing produces optical effects with a higher degree of realism than other techniques such as rasterization but at a greater computational cost. This means that in real-time applications such as video games, ray tracing presents challenges because rendering speed is critical.
  • a method for graphics processing includes executing, on a graphics processing unit (GPU), a shader program that performs ray tracing of a 3D environment represented by an acceleration structure.
  • the method includes using a hardware-implemented ray tracing unit (RTU) within the GPU that traverses the acceleration structure at the request of the shader program, and using, at the shader program, results of the acceleration structure traversal.
  • RTU hardware-implemented ray tracing unit
  • the acceleration structure traversal by the RTU can be asynchronous with respect to the shader program.
  • the results of the acceleration structure traversal by the RTU include the detection of intersection between a ray and bounding volumes contained within the acceleration structure.
  • the RTU processing includes maintenance of a stack used in the acceleration structure traversal.
  • the acceleration structure may be a hierarchy with a plurality of levels.
  • the results of the acceleration structure traversal by the RTU may include detection of a transition from a higher level to a lower level within the plurality of levels of the acceleration structure.
  • the results of the acceleration structure traversal by the RTU can also include detection of a transition from a lower level to a higher level within the plurality of levels of the acceleration structure.
  • the acceleration structure traversal by the RTU may include handling of transitions between the plurality of levels of the acceleration structure.
  • the results of the acceleration structure traversal by the RTU can include detection of intersection between a ray and primitives contained within the acceleration structure.
  • the results of the acceleration structure traversal by the RTU can include detection of the earliest intersection between a ray and primitives contained within the acceleration structure.
  • the results of the acceleration structure traversal by the RTU may include a sorting by the RTU of the intersections detected by the RTU, by distance of the intersections from ray origin, such that the RTU detects a first intersection between a ray and a primitive as it traverses the acceleration structure, the RTU detects a second intersection between the ray and a primitive as it traverses the acceleration structure, and when communicating results from the RTU to the shader program, the second intersection result is communicated before the first intersection result.
  • the shader program and RTU subsequently communicate regarding the results of the shader program’s hit testing between the ray and the primitive.
  • the shader program and RTU may subsequently communicate regarding the shader program’s determination of whether or not to ignore the intersection, and/or the shader program’s determination of the location of the intersection along the ray.
  • a graphics processing unit includes at least one processor core adapted to execute a software-implemented shader, and at least one hardware-implemented ray tracing unit (RTU) separate from the processor core and adapted to traverse an acceleration structure to identify intersections of rays with objects represented in the acceleration structure to generate results and return the results to the shader for identification by the shader of hits associated with the intersections.
  • RTU hardware-implemented ray tracing unit
  • the RTU may include hardware circuitry to identify the intersections and the shader can be adapted to identify the hits using software.
  • the shader can be configured with instructions executable by the processor core to shade pixels in 3D computer graphics.
  • the RTU may include hardware circuitry to implement traversal logic to traverse the acceleration structure.
  • the RTU may include hardware circuitry to implement stack management of a stack used in traversal of the acceleration structure.
  • the RTU may include hardware circuitry to sort the intersections by distance from an origin.
  • the RTU is adapted to identify the intersections asynchronously with the shader identifying the hits.
  • the shader may include instructions executable by the processor core to read status of the RTU.
  • the RTU may include hardware circuitry to transform a ray from the coordinate space used by a higher level of an acceleration structure with a plurality of levels to the coordinate space used by a lower level of the acceleration structure.
  • the RTU also may include hardware circuitry to transform a ray from the coordinate space used by a lower level of an acceleration structure with a plurality of levels to the coordinate space used by a higher level of the acceleration structure, and/or restore ray attributes to the ray attributes used when traversing the higher level of the acceleration structure.
  • the RTU may include hardware circuitry to identify a first intersection between a first ray and a first bounding volume contained within the acceleration structure and the shader may include instructions executable to determine whether to ignore the first intersection, and responsive to a determination not to ignore the first intersection, identify a location of the first intersection along the first ray.
  • the processor core and RTU can be supported on a common semiconductor die.
  • Plural processor cores and plural RTUs can be on the common semiconductor die.
  • an assembly in another aspect, includes at least one processor core adapted to execute at least one shader to shade pixels in graphics.
  • the assembly also includes at least one raytracing unit (RTU) separate from the processor core.
  • the RTU includes hardware circuitry to identify intersections of rays with objects represented in an acceleration structure for identification of hits associated with the intersections by the processor core, implement logic for traversing the acceleration structure, and implement management of a data stack used in traversing the acceleration structure.
  • Figure 1 illustrates an acceleration structure
  • Figure 2 illustrates further details of a multi-level acceleration structure
  • FIG. 3 illustrates a simplified graphics processing unit (GPU)
  • FIG. 4 illustrates an example GPU with ray tracing units (RTU) having hardware circuitry for identifying ray intersections, traversal logic for traversing the acceleration structure, and stack management circuitry, along with a traversal diagram;
  • RTU ray tracing units
  • Figures 4A, 4B, and 4C illustrate example logic consistent with Figure 4 in example flow chart format
  • Figure 5 illustrates two example GPUs executing asynchronous processing between the shaders and the RTUs
  • Figure 5A illustrates example logic consistent with Figure 5 in example flow chart format
  • Figure 6 illustrates a GPU with a traversal diagram showing collaborative processing when hit testing is not required
  • Figure 7 illustrates further examples involving multi-level acceleration structures
  • Figure 7A illustrates example logic consistent with Figure 7 in example flow chart format
  • Figure 8 illustrates a GPU with a traversal diagram showing intersections determined by the shader.
  • a system herein may include server and client components which may be connected over a network such that data may be exchanged between the client and server components.
  • the client components may include one or more computing devices including game consoles such as Sony PlayStation® or a game console made by Microsoft or Nintendo or other manufacturer, virtual reality (VR) headsets, augmented reality (AR) headsets, portable televisions (e.g. smart TVs, Internet-enabled TVs), portable computers such as laptops and tablet computers, and other mobile devices including smart phones and additional examples discussed below.
  • game consoles such as Sony PlayStation® or a game console made by Microsoft or Nintendo or other manufacturer
  • VR virtual reality
  • AR augmented reality
  • portable televisions e.g. smart TVs, Internet-enabled TVs
  • portable computers such as laptops and tablet computers, and other mobile devices including smart phones and additional examples discussed below.
  • These client devices may operate with a variety of operating environments.
  • client computers may employ, as examples, Linux operating systems, operating systems from Microsoft, or a Unix operating system, or operating systems produced by Apple, Inc., or Google.
  • These operating environments may be used to execute one or more browsing programs, such as a browser made by Microsoft or Google or Mozilla or other browser program that can access websites hosted by the Internet servers discussed below.
  • an operating environment according to present principles may be used to execute one or more computer game programs.
  • Servers and/or gateways may include one or more processors executing instructions that configure the servers to receive and transmit data over a network such as the Internet. Or a client and server can be connected over a local intranet or a virtual private network.
  • a server or controller may be instantiated by a game console such as a Sony PlayStation®, a personal computer, etc.
  • servers and/or clients can include firewalls, load balancers, temporary storages, and proxies, and other network infrastructure for reliability and security.
  • servers may form an apparatus that implement methods of providing a secure community such as an online social website to network members.
  • a processor or processor core may be a single- or multi-chip processor that can execute logic by means of various lines such as address lines, data lines, and control lines and registers and shift registers.
  • Logic may be represented herein in various forms including flow charts without limiting present principles. For example, state logic may be used where appropriate.
  • a system having at least one of A, B, and C includes systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.
  • an acceleration structure 10 is shown that is a data structure representing a three-dimensional (3D) environment of computer-generated objects such as may be used for films, computer simulations such as computer games, etc.
  • the architecture of the acceleration structure 10 shown in the example of Figure 1 is a bounding volume hierarchy (BVH), which is a tree structure with a root node 12, internal nodes 14, and leaves 16.
  • the root node 12 and internal nodes 14 contain a number of bounding volumes, each of which corresponds to a child of the node.
  • Leaves 16 contain one or more primitives, or other types of geometry.
  • Figure 2 illustrates an example multi-level acceleration structure 200 with two levels for simplicity.
  • the upper level (the “top level acceleration structure”) 202 contains no primitives in its leaves 204; instead, its leaves 204 are references to the lower level (the “bottom level acceleration structures”) 206.
  • the bottom level acceleration structure 206 is different from the top level 202 in that the leaves 208 of the bottom level 206 contain primitives.
  • the top-level acceleration structure 202 may have bounding volumes (and in some applications, primitives) in world space coordinates.
  • Bottom level acceleration structures 206 each have their own coordinate spaces. This allows, for example, the 3D object represented in the bottom level acceleration structure “ Y” to appear two times, each in a different location and orientation.
  • FIGS 1 and 2 illustrate non-limiting examples of acceleration structures that may be used consistent with present principles.
  • FIG 3 is a simplified diagram of a graphics processor unit (GPU) 300 that includes a semiconductor die 302 supporting one or more processor cores 304 (in the example, plural processor cores 304), and one or more (in the example shown, plural) intersection engines 306.
  • the intersection engines assist in ray tracing.
  • the GPU 300 may also include a number of other functional units such as caches 308, memory interfaces 310, rasterizers 312, and render backends 314.
  • a processor core 304 executes a software-implemented shader program (also referred to herein as “shader”) to shoot rays through the 3D environment represented by, e.g., the acceleration structure 200 in Figure 2, by initializing the ray and then traversing the acceleration structure, successively colliding the ray with the bounding volumes and primitives that the acceleration structure contains.
  • the intersection engines 306 which may be implemented by dedicated hardware, may compute the intersection of the ray with bounding volumes and primitives. Thus, identifying ray-bounding volume and ray-primitive intersections can be offloaded to the intersection engines 306.
  • a processor core 304 transfers the node or leaf address to an intersection engine 306, along with a description of the ray, and after the intersection engine computes the intersection between ray and bounding volume or primitive it returns the results to the processor core.
  • FIG. 4 illustrates a GPU 400 on a common semiconductor die 402 that may be implemented in a rendering device 404 that may be, without limitation, a computer simulation or game console, a computer server streaming games to end users, devices associated with computer-enhanced graphics in films, and the like.
  • the GPU 400 includes one or more (in the example shown, plural) processor cores 406 for executing one or more shader programs and for communicating with one or more (in the example shown, plural) ray tracing units (RTU) 408.
  • the RTU are implemented by hardware with hardware circuitry for executing the tasks disclosed herein.
  • a RTU 408 can traverse an acceleration structure using its own hardware- implemented traversal logic.
  • a RTU 408 may include one or more intersection engine circuitry 410 that can calculate the intersection of a ray with a bounding volume or primitive (such as a triangle), traversal logic circuitry 412 to traverse the acceleration structure, and stack management circuitry 414 for maintaining the stack used in traversal of the acceleration structure.
  • a RTU 408 may include other subunits as well.
  • Figure 4A illustrates high level logic and Figure 4B illustrates logic in greater granularity that may be implemented by the hardware circuitry in a RTU 410.
  • the shader being executed on a processor core passes the root node of the acceleration structure and ray information to the RTU 410.
  • the RTU traverses the acceleration structure using its traversal logic circuitry 412 and stack management circuitry 414, identifying intersections of rays with bounding volumes and primitives.
  • One or more of the intersections are communicated to the shader at block 418.
  • the shader identifies hits of any intersections at block 420 and using the overall information gained thereby, shades pixels for rendering at block 422.
  • the RTU can include circuitry to execute the stack management and traversal of the acceleration structure illustrated in Figure 4B. If it is determined at state 424 that the current node is a root node or internal node, the RTU moves to block 426 to check the intersection of the ray with each bounding volume contained in the current node. If it is determined at state 428 that there are multiple intersections, they are sorted at block 430 so that the earliest (i.e., closest to origin of ray) intersection is first, and the rest are in order of distance from the origin of the ray. Essentially, the intersections are sorted shortest to longest (from the origin of the ray).
  • the child nodes corresponding to the second and subsequent intersections are pushed on the stack at block 432. From block 432 or from state 428 if the test there was negative, the circuitry moves to block 434 to continue processing with the child node corresponding to the first intersection, or if there are no intersections, processing continues with the node popped from the top of the stack. If there is no node to pop because the stack is empty, then the acceleration structure traversal is complete. If it is determined at state 424 that the current node is a leaf, the RTU checks the intersection of the ray with the primitive or primitives contained in the leaf at block 436.
  • the traversal may be stackless.
  • Bold arrows show the traversal of the acceleration structure.
  • Bold solid boxes depict the nodes for which the RTU has found an intersection between the ray and the node’ s bounding volume and bold dashed boxes depict the nodes for which the RTU has found an intersection between the ray and the primitive contained in the leaf.
  • Leaves in the traversal that are not marked with bold dashed outlines are leaves for which the RTU did not find and intersection between the ray and the primitive contained in the leaf.
  • traversal begins by processing root node A to identify intersections of the ray with the bounding volumes it contains.
  • bounding volumes corresponding to A’s children E and J, respectively are identified as intersections by the processing of the root node A, and E and J are therefore pushed to the stack.
  • the bounding volume corresponding to A’s child B is also identified as an intersection from the processing of root node A, and at 444 B is then processed to identify intersections of the ray with its bounding volumes.
  • This identifies primitive D, which is pushed to the stack at 446, and primitive C, which is processed at 448. In the example shown it is determined by this processing that there is not an intersection between the ray and primitive C.
  • Primitive D is popped from the stack and processed at 450 to identify an intersection with the ray; in this example case, an intersection between the ray and primitive D is identified, and the primitive is passed to the shader program to identify whether or not there was a “hit” between ray and primitive (as described below and in Figure 4C).
  • bounding volume E is then popped from the stack and processed to identify intersections.
  • primitives H and I which are identified from processing bounding volume E, are pushed to the stack and bounding volume F is processed to identify intersection at 456.
  • Primitive G is next processed at 458, an intersection is identified, and primitive G is passed to the shader program for hit testing.
  • primitive H is popped from the stack and processed at 460, no intersection is found.
  • primitive I is popped from the stack and processed at 462 (an intersection is identified and primitive I is passed to the shader program for hit testing) and then bounding volume J is popped from the stack and processed at 464. This leads to identifying primitive K, which is processed at 466; no intersection is identified.
  • the shader program running on a processor core and a RTU collaborate to perform ray tracing.
  • the above example shows processing when the primitives are partially transparent, e.g., they are triangles representing the foliage of a tree.
  • the shader program’s objective is to identify the earliest (i.e. closest to origin of ray) intersection with a non-transparent portion of the 3D environment represented by the acceleration structure (a “hit”).
  • the RTU passes primitive D to the shader program.
  • the shader identifies whether the ray hit an opaque portion of primitive D, meaning that it tests primitive D to determine if the ray passed through a transparent portion of the primitive, or hit a solid portion of the primitive. In this example, the ray hit a solid portion of the primitive.
  • the shader program notes primitive D as the earliest intersection encountered so far, and at block 474 passes the result of the intersection (a “hit”) to the RTU.
  • the RTU shortens the ray, as there is no point testing past the location of the intersection of the ray with primitive D.
  • Block 478 indicated that the RTU continues traversal of the acceleration structure, reaching primitive G, and determines that the ray intersected it.
  • the RTU passes G to the shader program for hit testing.
  • the shader program performs hit testing consistent with Figures 4 and 4C, determining that the ray passed through a transparent portion of primitive G and consequently there was not a hit. In some embodiments, the shader program passes the information that G was not a hit to the RTU. In other embodiments, as there was no hit, the shader program does not pass the information that primitive G was not a hit to the RTU.
  • the RTU continues traversal and reaches primitive H, as discussed in relation to Figure 4.
  • the RTU continues traversal and reaches primitive I, where it detects an intersection between the ray and the primitive. It passes primitive I to the shader program, which performs hit testing.
  • the shader program determines that there is a hit, and that the hit of primitive I is earlier (i.e., closer to the origin of the ray) than the hit of primitive D; the shader program therefore updates the earliest intersection encountered to primitive I.
  • the shader program also informs the RTU that there was a hit on primitive I.
  • the RTU shortens the ray based on the hit on primitive I, and continues its traversal of the acceleration structure, reaching primitive K. There is not an intersection between the ray and primitive K, so the RTU attempts to pop the next node to process from the stack, but the stack is empty. The RTU accordingly terminates processing (it has reached the end of the acceleration structure traversal) and notifies the shader program that it is done. The shader program now knows that the earliest hit is on primitive I, and it continues its processing accordingly.
  • processor core passes root node A to the RTU; the RTU passes primitive D to the shader program for hit testing, and the shader program reports a hit; the RTU passes primitive G to the shader program for hit testing, and the shader program reports a miss; the RTU passes primitive I to the shader program for hit testing, and the shader program reports a hit; the RTU informs the shader program that acceleration structure traversal is complete.
  • the above processing strategy may result in a significant improvement of ray tracing speed, as the shader program is only performing hit testing. It is not performing acceleration structure traversal or managing the corresponding stack.
  • phrases such as “passes node A,” “passes primitive G,” and the like describe any strategy for communication, including without limitation passing a pointer to the node or primitive, or passing an ID for the node or primitive.
  • phrases such as “the RTU informs the shader program” or “the shader program informs the RTU” likewise refer to any strategy for communication, including without limitation “push” strategies such as setting a register and ringing a doorbell, or interrupt driven communication, and “pull” strategies such as reading or polling the status of the other unit.
  • a GPU 500 includes one or more processor cores 502 and one or more RTUs 504 operating asynchronously from the shader program.
  • the GPU 500 can be substantially identical in configuration and operation to the GPU 400 shown in Figure 4 with the exceptions noted below.
  • FIG. 5 A illustrates at block 506 that the shader program sends the root node A and ray information to the RTU to initiate processing.
  • the RTU begins traversing the acceleration structure asynchronously to shader operation, with the shader reading status of the RTU periodically at block 510 and the RTU reporting status at block 512 as it continues its traversal.
  • the shader passes hit identifications to the RTU at block 514 to enable the RTU to shorten its rays.
  • left pointing arrows are requests for information by the shader program
  • the right pointing arrows are transfers of information from the shader program to the RTU. More specifically, as indicated at 516, the shader program sends the root node A to the RTU and reads status from the RTU and receives the status “WIP” at 518, meaning that the RTU has not found any intersections yet. The shader program reads status again, and receives the status “D” at 520, meaning that the RTU has found an intersection with primitive D.
  • the shader program performs hit testing, finds that primitive D was hit by the ray, and informs the RTU of this hit at 522 so that the RTU can shorten the ray.
  • the read/report process continues as the RTU traverses the acceleration structure asynchronously with the shader issuing reads.
  • FIG. 5 In the bottom half of Figure 5 is an example of another embodiment, where the RTU sorts intersections by distance. This can result in improvement of ray tracing speed.
  • the closest intersection to the ray origin is “I”. It is a hit as indicated at 524.
  • the next closest intersection to the ray origin is “G”. It is a miss.
  • the farthest intersection from the ray origin is “D”. It is a hit as indicated previously.
  • the shader program sends root node A and ray information to the RTU to initiate processing, and then the shader program reads status from the RTU and receives the status “WIP” at 526 meaning that the RTU has not found any intersections yet.
  • the shader program reads status again, and receives the status “D” at 528, meaning that the RTU has found an intersection with primitive D.
  • the shader program performs hit testing, finds that primitive D was hit by the ray, and informs the RTU of this hit at 530 so that the RTU can shorten the ray.
  • the shader program While the shader program was performing hit testing, the RTU traversal determined that there were intersections with primitives G and I, and that primitive I is closer to the ray origin that primitive G (i.e., the RTU sorted the intersections by distance from ray origin). When the shader program next reads status, it is informed that it should perform hit testing on primitive I (not G) at 532, as primitive I is the closest known intersection to the ray origin. As indicated at 534, the shader program performs hit testing, finds that primitive I was hit by the ray, and informs the RTU of this hit. While the shader program was performing hit testing, the RTU finished traversal of the acceleration structure without finding any more intersections.
  • primitive G As primitive G’s intersection is farther from the ray origin than primitive I, primitive G is discarded.
  • the shader program next reads status, it is informed at 536 that traversal of the acceleration structure is complete, and no more primitives require hit testing.
  • Such sorting of intersections may also be of value when performing ray tracing of environments with translucency; in that case, it is beneficial for the intersections to be sorted so that the shader program is informed first about the intersection farthest from the ray origin.
  • Figure 6 illustrates examples of collaborative processing when hit testing is not required by showing an acceleration traversal diagram 600 executed by the RTU and communication diagrams 602, 604 between one or more processor cores 606 and one or more RTU 608 of a GPU such as any of the GPUs described in Figure 4 or Figure 5 or elsewhere herein.
  • hit testing is not required such as when primitives in the acceleration structure are opaque, then the processing performed by the shader program is further reduced, resulting in increased performance.
  • a primitive may be indicated as being opaque to the RTU by a flag or other means.
  • the RTU can track the earliest intersection between the ray and primitives, without the need for the shader program to perform hit testing.
  • the acceleration traversal diagram 600 is essentially the same example shown in Figure 4 but in Figure 6 the leaves that the ray intersects are shown in bold solid borders as opposed to bold dashed borders.
  • the communication diagram 602 in the middle of Figure 6 shows collaborative ray tracing between the shader program and the RTU when locating the earliest hit. Since the RTU can track the earliest intersection, it reports a status of work-in-progress (“WIP”) until traversal of the acceleration structure is complete, at which point it reports a status of “done” as indicated at 612 and provides I as the earliest primitive that the ray intersects.
  • WIP work-in-progress
  • the communication diagram 604 at the bottom of Figure 6 shows collaborative ray tracing between the shader program and the RTU when the shader program wants to know if there was an intersection but does not need to know the details of the intersection; this is typical of ray tracing for shadows and ambient occlusions.
  • the shader processing ends as soon as it locates a primitive that the ray intersects, in this case it is D as indicated at 614.
  • Figure 7 illustrates an acceleration structure 700 with a top-level acceleration structure (TLAS) 702 and a set of bottom level acceleration structures (BLAS) 704 similar to those shown in Figure 2, except as noted below.
  • Figure 7 also shows a communication diagram between one or more processor cores 706 and one or more RTU 708 of a GPU to illustrate an example of collaborative processing between shader program and RTU when the acceleration structure is a multi-level hierarchy.
  • the TLAS 702 has leaves (such as X 710) that each give a link to a respective BLAS (such as BLAS X, one of the set of BLAS 704). As discussed previously, the TLAS 702 uses world space coordinates. Each BLAS in the set of BLAS 704 has leaves such as D 712 that contain primitives, and as also discussed previously, each BLAS in the set of BLAS 704 has its own coordinate space. In this example, the shader program’s goal is to determine the earliest intersection, and all primitives in the acceleration structure are opaque.
  • the desired traversal of the acceleration structure 700 that is implemented by the RTU 708 is as follows. Processing of the root node A of the TLAS 720 identifies ray intersections with the bounding volumes representing children B and E; E in the TLAS 702 is pushed to the stack maintained in this case by the RTU, and the bounding volumes contained within B of the TLAS 702 are processed to identify ray intersections. An intersection with the ray is found for the bounding volume within B that corresponds to leaf X; this in turn leads to processing leaf X, which represents the BLAS X of the set of BLAS 704.
  • the coordinate space for X is different from world space coordinates (its primitives have their own coordinate space), so the ray attributes such as ray origin must be transformed.
  • the root node of BLAS X is processed to identify ray intersections, leading to processing of C; when processing C, an intersection is identified with the bounding volume for leaf node D 712.
  • the processing of leaf node D 712 identifies an intersection with primitive D and recall that since it is assumed primitives in Figure 7 are opaque, an intersection is automatically counted as a hit. Because D is counted as a hit, the RTU shortens the ray to be the length from the ray origin to D.
  • E in the TLAS 702 is popped from the stack.
  • the coordinate space for the BLAS portion X will no longer be used, so the ray attributes for world space must be restored. Ray length must be preserved in this process.
  • E is processed to identify ray intersections, and an intersection is identified for the bounding volume corresponding to leaf Z; this in turn leads to processing leaf Z, which represents the BLAS Z of the set of BLAS 704.
  • the ray attributes such as ray origin must be transformed to the coordinate space of Z.
  • the root node of BLAS Z is processed to identify ray intersections, leading to processing of F and then to the processing of leaf G, which is in the example shown is identified as an intersection (and, thus, a hit in the “opaque” example shown).
  • the ray is shortened to the length between the origin and the primitive of G.
  • the RTU detects transitions from the TLAS 702 to BLAS within the set of BLAS 704 and from BLAS within the set of BLAS 704 to the TLAS 702, and the shader program performs the ray transformation updates and passes the result to the RTU.
  • the communication steps are as shown in Figure 7, beginning at 714 in which the shader program sends the root node A and ray information to the RTU to initiate processing.
  • the shader program reads status from the RTU and receives the status work in progress (“WIP”), meaning that the RTU has not found any intersections yet.
  • the shader program reads status again, and at 718 receives the status “enter BLAS X,” indicating that the RTU has detected a transition to BLAS X in its traversal of the acceleration structure 700.
  • the shader program transforms the ray attributes such as origin to the BLAS X coordinate space and sends (720) the ray attributes and BLAS root node X to the RTU.
  • the RTU traverses BLAS X and discovers an intersection between the ray and primitive D and shortens the ray accordingly. Recall that this example assumes the primitive is opaque, so there is no need for the shader unit to perform hit testing.
  • the shader program reads status, and receives at 722 the status “exit BLAS,” indicating the RTU processing of the BLAS is complete. As indicated at 724, the shader program sends the world space ray attributes to the RTU.
  • the shader program reads status again, and as indicated at 726 receives the status “enter BLAS Z.”
  • the shader program transforms the ray attributes such as origin to the BLAS Z coordinate space and sends at 728 the ray attributes and BLAS root node Z to the RTU.
  • the RTU traverses BLAS Z and discovers an intersection between the ray and primitive G and shortens the ray accordingly.
  • the shader program next reads status, it is informed at 730 that traversal of the acceleration structure is complete, and that the earliest intersection was with primitive G.
  • the RTU can handle the detected transitions from TLAS to BLAS and BLAS to TLAS, updating ray attributes as needed.
  • all processing is performed by the RTU, so after the shader program sends root node A to the RTU, the shader program will read status “WIP” until the RTU traversal of the acceleration structure is complete, at which point the shader program will read status “done” and that the earliest intersection was with primitive “G ”
  • Figure 7A illustrates the coordinate transition logic discussed above in terms of logic that may be executed entirely by the RTU or by collaboration between the shader and the RTU.
  • This example illustrates an acceleration structure with two levels.
  • ray-bounding volume intersection determinations are initially executed in world space.
  • the logic moves to block 736 to convert the ray to the coordinate space specific to the BLAS and process the ray at block 738 for intersection identification.
  • the logic moves to block 742 to convert the ray to world space (or restore the ray attributes as they were in world space) and process the ray at block 744 for intersection identification.
  • the TLAS may not be in world coordinate space but in its own specific coordinate space.
  • Figure 8 illustrates an example of intersection determined by the shader executing on one or more processor cores 800 communicating with one or more RTUs 802 of a GPU.
  • the RTU determines whether the ray under test intersects the primitive, e.g., the primitive is a triangle and the RTU’s intersection engine can determine intersection between a ray and a triangle.
  • the geometry associated with a leaf N 804 in an acceleration structure 806 has a geometry is such that the RTU’s intersection engine cannot compute the intersection, e.g. it is a sphere.
  • the shader program and RTU collaborate to perform ray tracing as follows.
  • the RTU traverses the acceleration structure 806 as discussed elsewhere herein.
  • the RTU tests the ray against the bounding volumes in node M 808 and determines that the ray intersects the bounding volume corresponding to leaf N 804.
  • the shader program reads status, it receives at 810 the status that the bounding volume for N has been intersected.
  • the shader program performs hit testing between the ray and the sphere contained in leaf N and determines that there has been a hit. It informs the RTU that there has been a hit at 812, and also informs the RTU of the location of the hit so that the ray can be shortened accordingly by the RTU.
  • the fact that the primitive in leaf N is a sphere may be identified by the RTU and/or shader on the basis of a flag or other indicator associated with the primitive that it is a sphere (or other geometry beyond the ability of the RTU to process).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Generation (AREA)

Abstract

Selon l'invention, une unité de traitement graphique (GPU) (300, 400, 500) comprend un ou plusieurs cœurs (304, 406, 502) de processeur conçus pour exécuter un programme de nuanceur implémenté par logiciel, et une ou plusieurs unités de lancer de rayons (RTU) mises en œuvre par matériel (408, 410, 504) conçues pour traverser une structure d'accélération pour calculer des intersections de rayons avec des volumes de délimitation et des primitives graphiques. La RTU met en œuvre une logique de traversée pour traverser (416) la structure d'accélération, la gestion d'empilement et d'autres tâches pour réduire la charge sur le nuanceur, communiquant (418) les intersections au nuanceur qui calcule ensuite (420) si l'intersection frappe une partie transparente ou opaque de l'objet croisé. Ainsi, un ou plusieurs cœurs de traitement à l'intérieur de la GPU effectuent un lancer de rayons accéléré par le délestage d'aspects du traitement vers la RTU, qui traverse la structure d'accélération à l'intérieur de laquelle est représenté l'environnement 3D.
PCT/US2021/046777 2020-08-20 2021-08-19 Système et procédé de lancer de rayons accéléré WO2022040472A1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202180050800.2A CN116157841A (zh) 2020-08-20 2021-08-19 用于加速光线跟踪的系统和方法
KR1020237008976A KR20230078645A (ko) 2020-08-20 2021-08-19 가속 광선 추적을 위한 시스템 및 방법
JP2023512444A JP2023538127A (ja) 2020-08-20 2021-08-19 加速レイトレーシングのシステムおよび方法
EP21859164.2A EP4200811A1 (fr) 2020-08-20 2021-08-19 Système et procédé de lancer de rayons accéléré

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US16/998,195 2020-08-20
US16/998,195 US11704859B2 (en) 2020-08-20 2020-08-20 System and method for accelerated ray tracing

Publications (1)

Publication Number Publication Date
WO2022040472A1 true WO2022040472A1 (fr) 2022-02-24

Family

ID=80269708

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/046777 WO2022040472A1 (fr) 2020-08-20 2021-08-19 Système et procédé de lancer de rayons accéléré

Country Status (6)

Country Link
US (1) US11704859B2 (fr)
EP (1) EP4200811A1 (fr)
JP (1) JP2023538127A (fr)
KR (1) KR20230078645A (fr)
CN (1) CN116157841A (fr)
WO (1) WO2022040472A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2599124A (en) * 2020-09-24 2022-03-30 Imagination Tech Ltd Memory allocation for recursive processing in a ray tracing system
CN112230931B (zh) * 2020-10-22 2021-11-02 上海壁仞智能科技有限公司 适用于图形处理器的二次卸载的编译方法、装置和介质
US20220309734A1 (en) * 2021-03-29 2022-09-29 Samsung Electronics Co., Ltd. Apparatus and method with graphics processing
US20230281955A1 (en) 2022-03-07 2023-09-07 Quidient, Llc Systems and methods for generalized scene reconstruction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110234583A1 (en) * 2010-01-04 2011-09-29 Reuven Bakalash Method and apparatus for parallel ray-tracing employing modular space division
US20160070820A1 (en) * 2014-09-04 2016-03-10 Nvidia Corporation Short stack traversal of tree data structures
US20200051318A1 (en) * 2018-08-10 2020-02-13 Nvidia Corporation Method for forward progress and programmable timeouts of tree traversal mechanisms in hardware
WO2020051315A1 (fr) * 2018-09-06 2020-03-12 Ppj, Llc Systèmes de lit réglables à cadre de lit articulé rotatif
US10699370B1 (en) * 2018-12-28 2020-06-30 Intel Corporation Apparatus and method for a compressed stack representation for hierarchical acceleration structures of arbitrary widths
US20200327712A1 (en) * 2019-04-11 2020-10-15 Siliconarts, Inc. Graphics processing apparatus based on hybrid gpu architecture

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9607425B2 (en) * 2014-10-17 2017-03-28 Qualcomm Incorporated Ray-box intersection testing using dot product-based fixed function logic
KR102604737B1 (ko) * 2016-01-11 2023-11-22 삼성전자주식회사 가속 구조를 생성하는 방법 및 장치
EP3923246A3 (fr) 2016-03-21 2022-06-15 Imagination Technologies Limited Traitement d'une hiérarchie pour le rendu d'une scène
US10417807B2 (en) 2017-07-13 2019-09-17 Imagination Technologies Limited Hybrid hierarchy of bounding and grid structures for ray tracing
US10482650B2 (en) * 2017-07-27 2019-11-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E. V. Methods, computer program and apparatus for an ordered traversal of a subset of nodes of a tree structure and for determining an occlusion of a point along a ray in a raytracing scene
US11200724B2 (en) * 2017-12-22 2021-12-14 Advanced Micro Devices, Inc. Texture processor based ray tracing acceleration method and system
US10740952B2 (en) * 2018-08-10 2020-08-11 Nvidia Corporation Method for handling of out-of-order opaque and alpha ray/primitive intersections
US11138009B2 (en) 2018-08-10 2021-10-05 Nvidia Corporation Robust, efficient multiprocessor-coprocessor interface
US10580196B1 (en) * 2018-08-10 2020-03-03 Nvidia Corporation Method for continued bounding volume hierarchy traversal on intersection without shader intervention
US10867429B2 (en) * 2018-08-10 2020-12-15 Nvidia Corporation Query-specific behavioral modification of tree traversal
US10929948B2 (en) 2018-12-28 2021-02-23 Intel Corporation Page cache system and method for multi-agent environments
US11481953B2 (en) * 2019-05-28 2022-10-25 Advanced Micro Devices, Inc. Command processor based multi dispatch scheduler
US10964086B2 (en) * 2019-08-27 2021-03-30 Apical Limited Graphics processing
US11087522B1 (en) * 2020-03-15 2021-08-10 Intel Corporation Apparatus and method for asynchronous ray tracing
US11276224B2 (en) * 2020-04-17 2022-03-15 Samsung Electronics Co., Ltd. Method for ray intersection sorting
US11508112B2 (en) * 2020-06-18 2022-11-22 Nvidia Corporation Early release of resources in ray tracing hardware
US11238640B2 (en) * 2020-06-26 2022-02-01 Advanced Micro Devices, Inc. Early culling for ray tracing
US11367242B2 (en) * 2020-07-30 2022-06-21 Apple Inc. Ray intersect circuitry with parallel ray testing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110234583A1 (en) * 2010-01-04 2011-09-29 Reuven Bakalash Method and apparatus for parallel ray-tracing employing modular space division
US20160070820A1 (en) * 2014-09-04 2016-03-10 Nvidia Corporation Short stack traversal of tree data structures
US20200051318A1 (en) * 2018-08-10 2020-02-13 Nvidia Corporation Method for forward progress and programmable timeouts of tree traversal mechanisms in hardware
WO2020051315A1 (fr) * 2018-09-06 2020-03-12 Ppj, Llc Systèmes de lit réglables à cadre de lit articulé rotatif
US10699370B1 (en) * 2018-12-28 2020-06-30 Intel Corporation Apparatus and method for a compressed stack representation for hierarchical acceleration structures of arbitrary widths
US20200327712A1 (en) * 2019-04-11 2020-10-15 Siliconarts, Inc. Graphics processing apparatus based on hybrid gpu architecture

Also Published As

Publication number Publication date
CN116157841A (zh) 2023-05-23
EP4200811A1 (fr) 2023-06-28
US11704859B2 (en) 2023-07-18
JP2023538127A (ja) 2023-09-06
KR20230078645A (ko) 2023-06-02
US20220058854A1 (en) 2022-02-24

Similar Documents

Publication Publication Date Title
US11704859B2 (en) System and method for accelerated ray tracing
US11928772B2 (en) Method for forward progress and programmable timeouts of tree traversal mechanisms in hardware
US11164360B2 (en) Method for handling of out-of-order opaque and alpha ray/primitive intersections
US8243061B2 (en) Image processing apparatus and method of controlling operation of same
EP3958216A2 (fr) Système et procédé de traçage de rayon accéléré avec fonctionnement asynchrone et transformation de rayon
US20200051317A1 (en) Method for forward progress and programmable timeouts of tree traversal mechanisms in hardware
US7932905B2 (en) Method, apparatus, and computer readable medium for light energy accounting in ray tracing
US11418852B2 (en) Detecting latency anomalies from pipeline components in cloud-based systems
CN113808245B (zh) 用于遍历光线追踪加速结构的增强技术
CN113808241B (zh) 共享顶点的射线追踪图元的硬件加速
US11315303B2 (en) Graphics processing
CN114078077A (zh) 使用会话性能元数据评估定性流体验
Wang et al. Efficient and Reliable Self‐Collision Culling Using Unprojected Normal Cones
Goldiez et al. Real-time visual simulation on PCs
WO2011073361A1 (fr) Système à microarchitecture et procédé pour le lancer de rayons et la détection de collision
CN117726732A (zh) 减少包围体层次结构中的假阳性光线遍历
CN117726496A (zh) 使用光线剪裁减少假阳性光线遍历
Morvan et al. High performance gpu‐based proximity queries using distance fields
Fu et al. Dynamic shadow rendering with shadow volume optimization
Morvan et al. Efficient Image‐Based Proximity Queries with Object‐Space Precision
CN116109756A (zh) 光线追踪方法、装置、设备及存储介质
CN117726743A (zh) 使用点退化剔除减少假阳性光线遍历
CN115761123A (zh) 三维模型处理方法、装置、电子设备以及存储介质
Hong et al. Mathematical Approaches for Collision Detection in Fundamental Game Objects
Wu et al. A scalable framework for distributed virtual reality using heterogeneous processors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21859164

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023512444

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021859164

Country of ref document: EP

Effective date: 20230320