CN117788676A

CN117788676A - Ray tracing

Info

Publication number: CN117788676A
Application number: CN202311247079.3A
Authority: CN
Inventors: P·史密斯-莱西; S·芬尼
Original assignee: Imagination Technologies Ltd
Current assignee: Imagination Technologies Ltd
Priority date: 2022-09-27
Filing date: 2023-09-25
Publication date: 2024-03-29
Also published as: GB2617219A; GB202214124D0

Abstract

The present invention relates to ray tracing. A computer-implemented method for converting ray data of a ray into a ray representation is provided, wherein the ray representation is a compressed representation of the ray data, and wherein the ray data includes three directional components and three location components of the ray. The method includes identifying which one of three directional components of the ray data has the greatest magnitude and defining an axis of the identified directional component as a long axis of the ray; the method also includes determining a transition position on the light ray where the position component along the long axis is zero, and rescaling the three directional components of the light ray such that the magnitude of the directional component along the long axis is 1. The ray representation includes: (i) Two positional components of the transition position along an axis other than the long axis, and (ii) two rescaled directional components along an axis other than the long axis.

Description

Ray tracing

Cross Reference to Related Applications

The present application claims priority from uk patent applications GB 2214122.0 and GB 2214124.6 filed on 9 and 27 of 2022, which are incorporated herein by reference in their entirety.

Background

Ray tracing is a computational rendering technique used to generate an image of a scene by tracing the path of light through the scene, typically from the perspective of a camera. The path of light traced through a scene is referred to as a ray. Each ray to be traced is modeled as originating from the view of the scene and passes through the pixels into the scene. As a ray traverses a scene, it may intersect objects within the scene. The intersection between a ray and its intersecting object can be modeled to create a realistic visual effect. For example, in response to determining that a ray intersects an object, a shader program may be executed for the intersection. The shader program is part of the computer code. A programmer may write a shader program to define how the system reacts to intersections, which may, for example, cause one or more secondary rays to be emitted into a scene. Alternatively, the shader program may cause one or more rays to be emitted into the scene for determining whether the object is in a shadow at the intersection.

Rendering an image of a scene using ray tracing may involve a large number of intersection tests. In real-life ray tracing systems, billions of intersection tests may be performed to render a single image of a scene. To reduce the number of intersection tests that need to be performed, the ray tracing system may generate acceleration structures. The acceleration structure includes a plurality of nodes, each node representing a region (e.g., volume) within the scene. The acceleration structures are typically hierarchical, forming a tree structure such that they include multiple levels of nodes. The nodes near the top of the acceleration structure represent a relatively large area in the scene. For example, the root node of the acceleration structure may represent the entire scene. The nodes near the bottom of the acceleration structure represent relatively small areas in the scene. The leaf nodes of the acceleration structure represent regions in the scene that at least partially enclose one or more primitives (e.g., triangles) and include pointers to the primitives they enclose.

Traditionally, intersection testing is performed on rays using acceleration structures by first testing the ray for intersection with the root node of the acceleration structure. If a ray is found to intersect a parent node (such as a root node), then the test may proceed to a child node of the parent node. In contrast, if a ray is found not to intersect a parent node, intersection testing of child nodes of the parent node may be avoided, thereby minimizing computational intensity. If a ray is found to intersect a leaf node, the ray may be tested against objects within the region represented by the leaf node to determine which object(s) the ray intersects. The object may be represented using primitives. The primitives represent geometric units in the system.

Ray tracing operations are typically highly computationally intensive. The dense nature of these operations means that it is desirable to increase the speed of these operations or reduce the latency associated with these operations. Further improvements that may be made to the ray tracing technique include reducing the hardware area required to perform the processing operations.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

There is provided a computer-implemented method for converting ray data of a ray into a ray representation, wherein the ray representation is a compressed representation of the ray data, and wherein the ray data comprises three directional components and three location components of the ray, the method comprising:

identifying which of the three directional components of the ray data has the greatest magnitude and defining an axis of the identified directional component as the long axis of the ray;

determining a transition position on the ray for which the position component along the long axis is 0; and

Rescaling the three directional components of the light ray such that the magnitude of the directional component along the long axis is 1;

wherein the ray representation comprises: (i) Two positional components of the transition position along an axis other than the long axis, and (ii) two rescaled directional components along an axis other than the long axis.

The ray representation may include exactly two directional components and exactly two location components.

The light representation may also include an indication of the long axis.

The indication of the long axis may comprise two bits.

The ray data may further include a minimum distance component and a maximum distance component, and the method may further include rescaling the minimum distance component and the maximum distance component based on the transition position of the ray and the rescaling of the three direction components.

Rescaling of the three directional components of the light ray may result in a value of +1 for the directional component along the long axis.

The method may further include converting the ray representation into a quantized ray identifier by generating a data packet of the ray representation, the data packet including data indicative of a principal axis of the ray, two position components of the transition position, and two rescaled direction components.

The quantized ray identifier may have a fixed bit width.

The data in the data packet of the quantized ray identifier may include no more than three bits to indicate each of the two rescaled directional components along an axis other than the long axis.

The data in the data packet of the quantized ray identifier may include no more than five bits to indicate each of the two position components with transition positions along an axis other than the long axis.

The quantized ray identifier may identify a ray set, each ray in the ray set including similar location and direction components.

The method may further include generating a hash of the quantized ray identifier to represent the ray representation.

The method may further include generating a hash including performing a logical exclusive-or operation on bits of the quantized ray identifier to reduce a number of bits of the quantized ray identifier.

The hash may include eight bits.

The ray representation may be used to store an indication of the ray in a cache for storing data for intersection testing that is used by the ray tracing system to render an image of the scene.

There is provided a computer system for converting ray data of a ray into a ray representation, wherein the ray representation is a compressed representation of the ray data, wherein the ray data comprises three directional components and three location components of the ray, the computer system comprising processing logic configured to:

determining a transition position on the light ray having a position component along the long axis of 0; and

rescaling the three directional components of the light such that the magnitude of the directional component along the long axis is 1;

The light representation may also include an indication of the long axis.

The ray data may also include a minimum distance component and a maximum distance component, and the processing logic may be further configured to rescale the minimum distance component and the maximum distance component based on the transition position of the ray and the rescale of the three direction components.

The processing logic may be further configured to: the ray representation is converted to a quantized ray identifier by generating a data packet of the ray representation that includes data indicating a long axis of the ray, two position components of the transition position, and two rescaled direction components.

The computer system may further include a cache, wherein the ray representations are indicative of instructions for storing the ray in the cache, and wherein the ray tracing system is configured to retrieve data from the cache for intersection testing, the data being used to render an image of the scene.

A computer-implemented method is provided for converting a ray representation into ray data for a ray, wherein the ray representation is a compressed representation of the ray data and comprises: (i) two positional components of the transition position of the light ray, (ii) two directional components of the light ray, and (iii) an indication of the long axis of the light ray, the method comprising:

inserting a third position component of the ray according to the indication of the long axis of the ray, wherein the value of the third position component is zero; and

a third directional component of the ray is inserted according to the indication of the long axis of the ray, wherein the magnitude of the third directional component is 1.

The ray representations may be generated according to any of the methods described herein.

The method may further include adding a further bit to each of the two direction components and the two position components of the ray representation, wherein the further bit is a least significant bit.

The ray representation may further include a minimum distance component and a maximum distance component, and the method may further include adding a further bit to each of the minimum distance component and the maximum distance component, wherein the further bit is a least significant bit.

The method may further include adding a symbol to a third direction component of the ray data based on the minimum distance component and the maximum distance component.

The method may further include reordering the minimum distance component and the maximum distance component to determine which of the two components is closest to the origin of the ray.

There is provided a computer system for converting a ray representation into ray data for a ray, wherein the ray representation is a compressed representation of the ray data and comprises: (i) two location components of the transition location of the light ray, (ii) two direction components of the light ray, and (iii) an indication of the long axis of the light ray, the computer system comprising processing logic configured to:

The computer system may be a ray tracing system.

A computer system configured to perform any of the methods described herein may be provided.

The computer systems described herein may be embodied in hardware on an integrated circuit. A method of manufacturing a computer system as described herein at an integrated circuit manufacturing system may be provided. An integrated circuit definition data set may be provided that, when processed in an integrated circuit manufacturing system, configures the system to manufacture a computer system as described herein. A non-transitory computer-readable storage medium having stored thereon a computer-readable description of a computer system as described herein may be provided that, when processed in an integrated circuit manufacturing system, causes the integrated circuit manufacturing system to manufacture an integrated circuit embodying the computer system.

An integrated circuit manufacturing system may be provided, the integrated circuit manufacturing system comprising: a non-transitory computer readable storage medium having stored thereon a computer readable description of a computer system as described herein; a layout processing system configured to process the computer-readable description to generate a circuit layout description of an integrated circuit embodying the computer system; and an integrated circuit generation system configured to fabricate the computer system according to the circuit layout description.

A computer-implemented method of performing intersection testing in a ray tracing system may be provided, wherein intersection testing is performed on each ray of a plurality of rays for a node of a hierarchical acceleration structure, wherein intersection testing for each ray comprises:

in response to identifying in memory an indication of a re-entry point associated with a ray identifier associated with a ray, retrieving from memory an indication of a re-entry point associated with a ray identifier, the re-entry point being a node of the hierarchical acceleration structure that identifies an intersection for a previously tested ray associated with the ray identifier; and

From the re-entry point, an intersection test of the ray against the node set of the hierarchical acceleration structure is performed.

A ray tracing system may be provided that is configured to perform intersection testing on each of a plurality of rays for a node of a hierarchical acceleration structure, the system comprising:

a memory configured to store one or more indications of re-entry points associated with the ray identifier; and

processing logic configured to, for each of the rays:

in response to identifying in the memory an indication of a re-entry point associated with a ray identifier associated with the ray, the indication of the re-entry point associated with the ray identifier being a node of the hierarchical acceleration structure that identified an intersection for a previously tested ray associated with the ray identifier; and

from the extracted re-entry point, an intersection test of the ray against the nodes of the hierarchical acceleration structure is performed.

A computer readable code may be provided which is configured to cause any of the methods described herein to be performed when the code is run. A non-transitory computer readable storage medium having stored thereon computer readable instructions that, when executed at a computer system, cause the computer system to perform any of the methods described herein may be provided.

As will be apparent to those skilled in the art, the above features may be suitably combined and combined with any of the aspects of the examples described herein.

Drawings

Examples will now be described in detail with reference to the accompanying drawings, in which:

FIG. 1 illustrates a ray tracing system configured to perform intersection testing on each of a plurality of rays for a node of a hierarchical acceleration structure;

FIG. 2A illustrates an example of a region and primitives for which intersections of rays to be tested within a scene;

FIG. 2B illustrates a hierarchical acceleration structure for representing the region shown in FIG. 2A;

FIG. 3 illustrates a layout of a memory that may be used to store an indication of re-entry points that may be used for intersection testing;

FIG. 4 illustrates a first example of a computer-implemented method of performing intersection testing in a ray tracing system;

FIGS. 5A and 5B illustrate a second example of a computer-implemented method of performing intersection testing in a ray tracing system;

FIG. 6 shows a graphical representation of a ray, and a graphical representation of the ray;

FIGS. 7A and 7B illustrate examples of quantized ray representations;

FIG. 8 illustrates an exemplary method for generating a hash result of a quantized ray identifier representing a ray representation;

FIG. 9 illustrates a computer-implemented method for converting ray data of a ray into a ray representation;

FIG. 10 illustrates a computer-implemented method for converting a ray representation into ray data;

FIG. 11 illustrates a computer system implementing a computing system as described herein; and is also provided with

FIG. 12 illustrates an integrated circuit manufacturing system for generating an integrated circuit embodying a computing system.

The figures illustrate various examples. Skilled artisans will appreciate that element boundaries (e.g., blocks, groups of blocks, or other shapes) illustrated in the figures represent one example of boundaries. In some examples, it may be the case that one element may be designed as a plurality of elements, or that a plurality of elements may be designed as one element. Where appropriate, common reference numerals have been used throughout the various figures to indicate like features.

Detailed Description

FIG. 1 illustrates a ray tracing system 100 configured to perform intersection testing. The ray tracing system includes a ray tracing unit 102 and a memory 104. Ray tracing unit 102 includes a processing module 106, an intersection test module 108, and processing logic 110. Intersection test module 108 includes one or more box intersection test units 112 and one or more primitive intersection test units 114. In operation, ray traced unit 102 receives geometric data defining objects within a 3D scene. The processing module 106 is configured to generate an acceleration structure based on the geometric data and send the acceleration structure to the memory 104 for storage therein. After the acceleration structure has been stored in the memory 104, the intersection test module 108 can retrieve the nodes of the acceleration structure from the memory 104 to perform ray intersection tests for the retrieved nodes. The results of the intersection test are provided to processing logic 110. Processing logic 110 is configured to process the results of the intersection test to determine rendering values for images representing the 3D scene. The rendering values determined by processing logic 110 may be returned to memory 104 for storage therein to represent an image of the 3D scene.

Fig. 2A illustrates a scene 200 rendered using a ray tracing method. The scene may be rendered using an acceleration structure. For simplicity of explanation, this scene is illustrated in fig. 2A as a two-dimensional scene. However, it should be understood that ray tracing methods are most commonly used to render scenes having more than two dimensions, such as three-dimensional scenes, and that the following description also applies to three-dimensional scenes and acceleration structures. Fig. 2B shows an acceleration structure for representing the region shown in fig. 2A. The acceleration structure is layered. This means that the acceleration structure comprises a plurality of layers (or "hierarchies"). The acceleration structure may be arranged as a tree. Thus, the acceleration structure may include one or more sub-trees, where each sub-tree is associated with a different node.

The scene 200 is divided into a plurality of regions. Each region of the scene may be an axis-aligned box that partitions the scene into constituent components. In some examples, when the scene is a two-dimensional scene, each region of the scene may be a quadrant of the scene. In examples where the scene is a three-dimensional scene, each region of the scene may be a trigram of the scene. Each region of the scene covers a different area (or volume) of the scene. Each region of the scene may be further divided into sub-regions of the scene. Each sub-region of the scene may cover a non-overlapping portion of its region. Each region/sub-region of the scene may cover a different level of detail of the scene. A level of detail in the context of this application refers to a level in the hierarchy of acceleration structures for processing a scene. That is, the level of detail of a node is related to the number of levels between the node and the root node. Each sized region corresponds to a layer of the acceleration structure. For example, the first hierarchical region includes a first region 202. The first region 202 is the only region in the first hierarchy and covers the entire scene. The first region 202 may be represented by a root node 202' of the acceleration structure (as shown in fig. 2B). The root node 202' of the acceleration structure may be associated with a first level of detail of the scene. In the case where the scene in fig. 2A is a two-dimensional scene, the first region 202 may be further divided into four sub-regions or quadrants 204a, 204b, 204c, 204d. The sub-regions 204a, 204b, 204c, 204d may be associated with a second level of detail of the scene. The four sub-regions 204a, 204B, 204c, 204d of the scene 200 are represented by sub-nodes 204a ', 204B ', 204c ', 204d ' of the root node 202' (as shown in fig. 2B). An exemplary sub-region 204a of the first region 202 is shown using dashed shading. The sub-regions 204a, 204b, 204c, 204d of the scene have a higher level of detail than the first region 202 represented by the root node. Thus, the sub-regions 204a, 204B, 204c, 204d are represented by sub-nodes 204a ', 204B', 204c ', 204d' of the acceleration structure at a second level of detail (as shown in fig. 2B). In the case where the scene in fig. 2A is a two-dimensional scene, each of the sub-regions 204a, 204b, 204c, 204d may be further divided into four sub-regions or quadrants. For example, the sub-region 204a may be further divided into sub-regions 206a, 206b, 206c, 206d. The sub-regions 206a, 206b, 206c, 206d may be associated with a third level of detail of the scene. For example, sub-regions 206a, 206b, 206c, 206d of scene 200 are represented by sub-nodes 206a ', 206b ', 206c ', 206d ' of node 204a '. An exemplary sub-region 206a of sub-region 204a is shown using hatched shading. The sub-region 206a of the scene has a higher level of detail than the sub-regions 204a, 204b, 204c, 204d. Thus, the sub-regions 206a, 206b, 206c, 206d are represented by sub-nodes of the acceleration structure at a third level of detail. The scene shown in fig. 2A is divided into three different levels of detail. Thus, the sub-regions 206a, 206b, 206c, 206d comprise the highest level of detail of the scene. The sub-region 206a may be associated with a leaf node of the acceleration structure.

Scene 200 also includes a plurality of primitives 208a through 208g. The primitives are located within an area of the scene. The primitives are geometric units in the system and may be, for example, convex polygons. In fig. 2A, each primitive 208a to 208g is shown as a triangle. However, in alternative examples, the primitive may be another shape. Examples of shapes that may form primitives are square, rectangular, pentagonal, and hexagonal. Some primitives may not be convex polygons, or even polygons. For example, the primitives may be disk-shaped or some other surface or volume. The primitives shown in FIG. 2B are not actually part of the hierarchical acceleration structure, but they are shown to illustrate how the primitives relate to nodes of the hierarchical acceleration structure. The nodes of the hierarchical acceleration structure represent regions of the scene to be rendered. The term "tree node" refers to a node that has pointers to other nodes in the hierarchical acceleration structure (i.e., a tree node has child nodes in the hierarchical acceleration structure). Referring to FIG. 2B, nodes 202' and 204a ' through 204d ' are tree nodes of a hierarchical acceleration structure; nodes 206a 'through 206p' are leaf nodes of the hierarchical acceleration structure; and triangles 208a 'through 208g' are not nodes of the hierarchical acceleration structure, but are shown in FIG. 2B to indicate which leaf nodes have pointers to primitives. Note that while the hierarchy shown in fig. 2B includes nodes that do not contain primitives (i.e., nodes 206a ', 206B', 206d ', 206e', 206g ', 206h', 206j ', 206m', 206n 'and 206 o'), in some examples, these nodes are culled from the acceleration structure or otherwise are not built during the establishment of the acceleration structure because they do not have any geometry to be tested for rays. In the primary examples described herein, a leaf node represents a region of a scene that may include one or more primitives, and the leaf node includes a respective one or more pointers to the one or more primitives. In these examples, the region represented by the leaf nodes may be an aligned bounding box around the axis of one or more primitives (e.g., the most tightly conserved axis aligned bounding box). However, in other examples, the primitives may be considered part of an acceleration structure such that the leaf nodes are primitives in the scene.

Ray (r) may be defined as r=o+dt, where O is a vector representing the origin of the ray, D is a vector representing the direction of the ray, and t represents the distance from the origin along the ray. An exemplary ray traversing the scene 200 is indicated by reference numeral 210 in fig. 2A. Ray 210 has an origin 214, which may or may not correspond to the origin of the scene. Ray 210 may have a defined range within the scene. Ray 210 may intersect more than one of primitives 208a through 208g in the scene. Thus, after the primitive intersection phase, if an intersection is found between the ray and the primitive, the intersection selection phase determines whether a new intersection or an old intersection of the ray should be selected. Typically, the closer of the two intersections is selected (i.e., the first intersection that the ray "physically" encounters in the scene is selected). The term "closer" may mean closer to the ray origin in implementations where the intersection distance can only be positive (i.e., the intersection distance extending in front of the ray origin). In a particular implementation where the intersection distance may be positive or negative (i.e., extending behind the ray origin), the term "closer" may mean closer to negative infinity. In some examples, deterministic tie resolution may be used to confirm which of the two intersections is to be selected. Note that, contrary to conventional geometric measurements, the "distance" of intersection referred to herein is typically measured as a multiple of the ray length. The ' ray length ' may be given by the magnitude of the ray's direction vector D. The selected intersection is then used for light Further processing of the line, while the unselected intersections are discarded. This process is also known as hidden surface determination, display surface determination, hidden Surface Removal (HSR), occlusion Culling (OC), or Visible Surface Determination (VSD). The current closest intersection point is the attribute of each ray and thus may be stored as ray data. In some cases, the intersection determination may be based on whether the distance along the ray at which the intersection occurred is between the minimum and maximum culling distances of the ray (also referred to as a distance range or culling interval) (which may be referred to as t) _min And t _max ). Although t _min And t _max The initial values of (a) are predetermined prior to the intersection test, but their values may be dynamically changed from these initial values during the intersection test. For example, during intersection testing, t _max And/or t _min May be updated to a value equal to the distance between the ray origin and the determined intersection. This approach has the advantage that the total number of distance values stored for the ray can be reduced during intersection testing. For example, consider that a ray is initially rejected by its minimum rejection distance t _min And maximum rejection distance t _max And (3) limiting. The intersection distance for the "closest" intersection point of the ray determined during the intersection test may be defined as t _int . Due to t _int Is less than or equal to t _max The "closest" intersection distance of (c) so that ray data for a ray can be compiled during intersection testing to use t _int Substitution t _max Thereby shortening the effective distance range of the ray and allowing rejecting more intersections so that traversing the entire subtree outside the shortened effective distance range of the ray can be avoided. Furthermore, by using t _int Substitution t _max The effective distance range of the light ray can be defined by two distance values t _min 、t _int Definition, which means t after the first intersection of rays _max Is redundant and does not require storage.

Traditionally, intersection testing of light rays can be performed recursively using an acceleration structure by first testing the intersection of the light rays with the root node 202' of the acceleration structure. If a ray is found to intersect a parent node, then the test may proceed to a child node (or child node) of the parent node. In contrast, if a ray is found not to intersect a parent node, intersection testing of child nodes of the parent node may be avoided, thereby saving computational effort. If a ray is found to intersect a leaf node, the ray may be tested against objects within the region represented by the leaf node to determine which object(s) the ray intersects. For an exemplary ray 210 traversing the scene 200, an intersection test is first performed on the root node 202' of the acceleration structure corresponding to the first region 202 of the scene. Based on this first intersection test, it is determined that the ray 210 intersects (or passes through) the first region 202. Thus, intersection tests must be performed at a finer level of detail within the first region in order to determine whether there are any primitive intersections within the first region 202. Then, intersection tests are performed on the child nodes 204a 'to 204d' of the root node (i.e., the nodes corresponding to the child regions 204a to 204 d). In fig. 2A, ray 210 intersects (or passes through) sub-regions 204a, 204b, and 204 c. The ray does not intersect with sub-region 204 d. Thus, further intersection tests are performed on nodes corresponding to regions within sub-regions 204a, 204b, and 204 c. For example, for region 204a, the child node of node 204a' will be tested. As described above, in some examples, this subset of child nodes includes only child node 206c' corresponding to region 206c, as this is the only child node for testing that includes geometry. In other examples, it will be determined from this test that the ray intersects (or passes through) sub-regions 206a, 206b, and 206c, but does not intersect sub-region 206 d. Similarly, an intersection test will be performed on a subset of the available child nodes of node 204b' and, based on this test, it may be determined that the ray intersects (or passes through) the sub-region 206h, but does not intersect the sub-regions 206e, 206f, or 206 g. Similarly, an intersection test will be performed on a subset of the available child nodes of node 204c' and it may be determined from this test that the ray intersects (or passes through) the child regions 206i, 206j, and 206k, but does not intersect the child region 206 l. An intersection test may then be performed to determine if there are any primitive intersections in the region corresponding to the child nodes for which an intersection has been determined. When the intersection test is completed, it will be revealed that ray 210 intersects primitives 208a, 208b, 208d and 208e, with the intersection at primitive 208a being the closest intersection. In this context, "closest" intersection refers to the intersection with the smallest't' value. In some implementations, the value of t (in ray equation r=o+dt) may be constrained to be non-negative, such that the "closest" intersection refers to the intersection closest to ray origin 214; while in some other implementations, the value of t may be allowed to be negative, such that the "closest" intersection refers to the intersection closest to minus infinity in the direction of the ray.

A disadvantage of performing intersection testing in a recursive manner as explained above is that it is computationally intensive. That is, for each ray traversing the scene 200, each level of the acceleration structure must be tested in order to confirm the final intersection point of the rays. That is, ray 210 is first tested against the root node 202' corresponding to the first region 202, and then testing is performed against the nodes of each level in the acceleration structure until an intersection with a leaf node is found. In FIG. 2A, within the scene 202, at a distance along the ray that is less far (i.e., at a smaller value of t) than the intersection of the ray 210 with the sub-regions 204b and 204c, the ray 210 has a primitive intersection in the sub-region 204a, and the ray 210 does not intersect the sub-region 204 d. Thus, testing of ray 210 for child nodes 204b ', 204c ', and 204d ' may be considered redundant, as testing for these child nodes will not ultimately reveal any primitive intersections of ray 210 that are closer than primitive intersections within child region 204 a. The computational intensity and efficiency of intersection testing systems can be improved by designing a method that reduces the number of intersection tests that need to be performed on nodes of an acceleration structure that does not produce intersection results at intersection distances along a ray that are not as far as another intersection.

The improvement may be achieved by using a re-entry point for the ray for which the intersection test is to be performed. The re-entry point may be described as a node of the hierarchical acceleration structure that has identified an intersection for a previously tested ray. In other words, the re-entry point indicates a node that includes a leaf node or primitive that has intersected the previously tested ray. The re-entry point may be a leaf node of the acceleration structure. The re-entry point may be the root node of the acceleration structure. The re-entry point may be a tree node of the acceleration structure that is associated with a level of detail between a level of detail of the root node and a level of detail of the leaf node. The re-entry point selected for a ray to be tested for acceleration structures may indicate a node containing primitives that have intersected the previously tested ray. The previously tested light may be a light similar to the light to be tested. More specifically, the ray data of the previously tested ray may be similar to the data of the incident ray.

The advantage of using a re-entry point is that when a new ray is to be processed, it is possible to perform intersection tests on this new ray in the acceleration structure starting from the re-entry point instead of starting from the root node. The likelihood of a new ray intersecting a primitive located within a re-entry point is high because the ray to be tested will be similar to the previous ray in order to be associated with the same re-entry point as the previous ray. In a preferred example, the re-entry point of the new ray to be tested is not the root node. In this example, by using the re-entry point, the ray tracing system may perform a preliminary intersection test from a child node of the acceleration structure that is not the root node. The impact of this on the ray intersection test efficiency varies depending on the type of ray to be tested. For an occlusion ray for which the result of the intersection test only needs to determine whether the ray intersects any object in the scene, also referred to as an arbitrary hit ray, testing of all nodes outside the subtree defined by the re-entry point can be avoided by finding early intersections within the subtree defined by the re-entry point. For non-occluding rays for which the result of the intersection test should determine the closest intersection between the ray and an object in the scene, also referred to as the closest hit ray, testing of some nodes outside the subtree defined by the re-entry point may be avoided by finding early intersections in the subtree. A more detailed description of occluded and non-occluded light rays is provided below. The purpose of using re-entry points is therefore to find intersections as quickly as possible, thereby minimizing the number of intersection tests performed and thus minimizing computational intensity.

The indication of the re-entry point to be used by the ray tracing system shown in fig. 1 may be stored in a memory as shown in fig. 3. Memory 300 of FIG. 3 may store an indication of a re-entry point accessed by the ray tracing system of FIG. 1 and used for intersection testing. Memory 300 is a storage device that stores data for immediate use in a computer or related computer hardware and digital electronic devices. Memory 300 may be any type of storage device suitable for holding data to be accessed by a ray tracing system. The memory 300 may be a main memory. Memory 300 may form part of main memory 104 shown in fig. 1. Alternatively, the memory 300 may be a cache memory. The memory may be referred to as a likely hit cache because it stores an indication of a re-entry point indicating a likely hit between the ray and a node in the acceleration structure.

The difference between main memory and cache memory is that main memory is more suitable for long-term storage of data than cache memory. The advantage of memory 300 being a cache memory is that the cache memory is more suitable for updating and replacing values stored in the memory with updated data. More specifically, where memory 300 is a cache, the new indication of the re-entry point may be stored in the cache and the existing indication of the re-entry point that is no longer needed by the ray tracing system may be replaced. The cache memory may also have finer addressing than the main memory, which means that a smaller portion of the data may be written to the cache in a single read/write request. Furthermore, the cache memory is configured to store data such that the ray traced unit may retrieve the data from memory without having to retrieve the data directly from main memory 102. Access to the cache is faster, i.e. less latency, than a corresponding access to the main memory. This means that transactions on the cache (such as read or write transactions) can be completed much faster than corresponding transactions on the main memory. This helps to reduce delay in the system.

Where the memory is a cache, it may be a direct mapped cache. The direct-mapped cache determines the cache line associated with an entry by using, as its address, an index generated as a result of the hash of the identifier of the entry. Direct-mapped caches include multiple sets (or columns) of memory, but only single-way (or row) memory. If a line in the direct-mapped cache was previously occupied by a memory entry when a new entry needs to be stored, the old entry is replaced to make room for the new entry. The advantage of direct-mapped cache over other types of caches is that it allows fast access to data. Alternatively, the cache may be an associative cache (also referred to as a full associative cache). In an associative cache, an entry may be placed at any location in the cache. The associative cache includes a set (or column) of memory but includes multiple ways (or rows) of memory. The advantage of memory 300 being an associative cache is that the associative cache provides a high degree of flexibility (i.e., minimizes cache conflicts) for mapping entries, but at the cost of slower access than other caches.

In a third example, the cache may be a set associative cache. Setting the associative cache is a hybrid of direct-mapped and associative caches, where benefits/disadvantages of direct-mapped and associative are weighed against. The set of associative caches are arranged to combine a plurality of cache lines together to create a set of cache lines (also referred to as a way set) that can be mapped by a plurality of entries having the same index address generated as a result of a hash of the identifier of the entry. Setting up an associative cache is advantageous because it provides a tradeoff between the access speed of a direct-mapped cache and the flexible nature of the associative cache. Thus, the associative cache is set to access faster than the associative cache and is more flexible than the direct-mapped cache. The set associative cache may be referred to as an n-way set associative cache. The n-way set associative cache includes multiple sets (or columns) of memory, each set having n ways (or rows).

In one example, the memory 300 is remote from the ray traced unit 102 of FIG. 1. In an alternative example, the memory 300 is provided at the ray tracing unit. In further examples, memory may be disposed adjacent to the ray traced unit in the data path so as to be between the ray traced unit 102 and the memory 104. The data in memory may be accessed via a read transaction, for example, by a ray traced unit. Data may be written to memory via a write transaction. For example, data may be written by the first processor to the first cache memory and/or the main memory.

Memory 300 may include multiple memory cache lines. A representative memory cache line is indicated by reference numeral 302. In fig. 3, memory 300 includes eight cache lines. In alternative examples, the memory may include more or fewer memory cache lines. The memory cache line provides a separate storage area in the cache. Individual memory cache lines within memory 300 are individually addressable. In addition, the memory cache lines may be identical. Each memory cache line is adapted to store data indicating an indication of a ray identifier 304 and a re-entry point 306 associated with the ray identifier. Ray identifier 304 for a ray may be the original ray data for the ray or may include a coarser representation than the original ray data. The ray identifier will be described in further detail below. Ray identifiers for rays may summarize, quantify, or compress ray data for rays. In summarizing ray data, the same ray identifier may correspond to data for multiple rays having similar data. The indication of the re-entry point may be an index to the re-entry point. The index to a re-entry point may otherwise be referred to as a reference to the re-entry point or a pointer to the re-entry point. Each indication of a re-entry point in memory is stored with a corresponding ray identifier. Each indication of a re-entry point is associated with a corresponding ray identifier. The read request of the memory 300 may be performed using the optical line identifier. More specifically, memory 300 may be searched to determine whether an indication of a ray identifier contained within a read request is present in memory. If a ray identifier exists in memory, an indication of a re-entry point associated with the identifier may be retrieved from memory. The indication of the re-entry point may then be used by ray tracing system 100.

A computer-implemented method of performing intersection testing in a ray tracing system using an indication of a re-entry point stored in memory 300 of fig. 3 is shown in fig. 4. The method may be performed by ray tracing system 100 of fig. 1. In a more specific example, the method may be performed by the intersection test module 108 of the ray tracing system in FIG. 1. Intersection tests may be performed on each ray of the plurality of rays for nodes of the hierarchical acceleration structure. Intersection testing of each of the rays includes at least three steps. The method begins when data for a new ray on which an intersection test is to be performed is received by the ray tracing system. When data of a new ray is received, a read request is sent to the memory. The read request may include a ray identifier associated with the new ray. The read request forms part of a read operation performed on the memory to determine whether an entry exists in the memory corresponding to the new ray.

Each ray identifier in memory 300 is stored with an indication of the corresponding re-entry point for that identifier. As described above, the same ray identifier may correspond to data for multiple rays having similar data. In other words, not every ray has a unique ray identifier, but rather there is a one-to-many relationship between the ray identifier and the ray. If there is a ray identifier corresponding to the identifier of the new ray stored in memory, there is also an indication of the re-entry point associated with that identifier in memory. In fig. 4, a first step S402 includes identifying an indication of a re-entry point associated with a ray identifier associated with a new ray in memory. In other words, if the result of a read operation to memory reveals that a ray identifier corresponding to the identifier in the read request exists in memory, then the re-entry point associated with that identifier is considered to be associated with the new ray to be processed. If an indication of a re-entry point is identified, the indication of the re-entry point is read from memory. That is, at step S404, an indication of the re-entry point is extracted from the memory. As already mentioned above, a re-entry point is a node of the hierarchical acceleration structure that has identified an intersection for a previously tested ray associated with a ray identifier. Once the indication of the re-entry point has been fetched from memory, at S406, an intersection test of the ray for the nodes of the hierarchical acceleration structure is performed using the re-entry point identified by the indication. More specifically, from the re-entry point, an intersection test of rays is performed for nodes of a subtree of the acceleration structure. In other words, the intersection test of rays is performed only for a set of nodes in the acceleration structure, where the set of nodes includes nodes of the subtree starting from the re-entry point.

As described above, the re-entry point may be used to identify a node (e.g., node 204a' in FIG. 2B) within which there may be an intersection of rays used to traverse the hierarchy. The initial intersection test may be performed beginning at the re-entry point, avoiding nodes at lower levels of detail in the acceleration structure (e.g., root node 202' in fig. 2B). In the target intersection test, testing of other nodes at the same level of detail as the re-entry point (e.g., nodes 204B 'through 204d' in FIG. 2B) and subtrees from these other nodes (i.e., including nodes at higher levels of detail than the re-entry point, such as child nodes 206e 'through 206p' in FIG. 2B) may also be avoided. Thus, a ray tracing system using re-entry points would have potential "leading advantages" over systems using conventional intersection testing. This approach is particularly useful for intersection testing performed on occluded rays, as it can be used to quickly identify a single intersection of rays within a scene, after which no further intersection testing needs to be performed. It may also be beneficial to perform intersection tests on non-occluded rays that have one or more intersections within the subtree of the acceleration structure. For these non-occluded rays, the traversal time is reduced by performing a target intersection test within the subtree starting from the re-entry point (where there may be at least one intersection), rather than traversing the entire hierarchy. Due to the high cost of testing non-occluded rays (in terms of box/primitive testing), it is very beneficial to reduce the traversal time of the intersection test of these rays. Thus, significant cost-effectiveness can be achieved even with a small reduction in the time taken to perform intersection testing on non-occluded rays. Unlike non-occluded rays, intersection tests performed on occluded rays need only determine whether a ray intersects any object in the scene, but need not identify which phase The intersection is the closest intersection. An example of an occlusion ray is a shadow ray that is used to determine if there is any occlusion between the ray origin and the light source. Thus, to complete the intersection test of an occluded ray, only a single intersection of the ray with the primitive is required. Thus, in the case of an occluded ray, if the intersection is determined from the target intersection test of the ray using a re-entry point, no further intersection test needs to be performed on the ray. That is, testing of all nodes outside the subtree defined by the re-entry point is avoided by finding an intersection in the subtree. Thus, the use of re-entry points can greatly speed up the processing of the ray. For non-occluded rays, testing of some nodes outside the subtree defined by the re-entry point can be avoided by finding early intersections in the subtree, as this will reduce t faster _max A value, and thus no test subtrees are required and the (reduced) t is exceeded _max Nodes of values (originally when they were tested).

A second example of a computer-implemented method of performing intersection testing using a re-entry point is shown in fig. 5A and 5B. The method of fig. 5A and 5B may be considered as a more detailed example of the method shown in fig. 4. As with the method shown in fig. 4, the methods of fig. 5A and 5B may be performed by the ray tracing system 100 of fig. 1. In a more specific example, the method may be performed by the intersection test module 114 of fig. 1. The methods of fig. 5A and 5B are performed when data for a new ray to be processed is received by the ray tracing system. When data for a new ray is received, the ray tracing system accesses memory 300 to determine if an indication of the re-entry point for the ray exists in memory. More specifically, processing logic of the ray tracing unit may use the ray identifier of the new ray to access memory 300 to determine whether an indication of a re-entry point corresponding to the ray identifier is present in memory. The processing logic may correspond to logic 110 shown in fig. 1. If a corresponding indication is identified in memory, a re-entry point corresponding to the indication is used as a starting point from which an intersection test for new rays is performed.

The method of fig. 5A and 5B begins at step S502, where a new ray for traversal is extracted by ray tracing system 100. More specifically, new rays may be extracted by ray tracing unit 102. The ray may be extracted by intersection test module 108 of ray tracing unit 102. Extracting a new ray for traversal means that ray data for the new ray is received by the ray tracing system. The ray data defines a new ray to be processed. The new ray will be tested against the nodes of the layered acceleration structure.

Once the ray data for the new ray has been extracted, at step S532, the ray identifier for the ray is determined. The ray identifier may be the same for multiple rays that may also have similar data. More specifically, the ray identifier may identify a collection of rays, each ray in the collection of rays having a single indication of a re-entry point. The collection of rays may be similar rays. The ray identifier may include a coarser (i.e., quantized) version of the ray data than the original ray data of the new ray such that the ray identifier may be used to identify rays similar to the new ray. Step S532 may also include the optional steps of converting the ray representation into a ray representation (S504) and converting the ray representation into a quantized ray representation (S506). These steps are indicated in fig. 5A as optional by their dashed encapsulation. Steps S504 and S506 will be explained in further detail below.

At step S508, the stored ray is compared with the new ray. In this manner, step S508 includes performing a similarity check to determine whether the new ray is sufficiently similar to the stored ray to warrant the new ray using the re-entry point corresponding to the stored ray. Step S508 may be implemented in different ways. In a first specific implementation (as shown in fig. 5A), in step S508, a lookup of the stored ray identifier is performed by the ray tracing system. The lookup of the stored ray identifier may be performed by ray tracing unit 102. A lookup of the stored ray identifier is performed on the memory 300. In this step, a read operation to memory 300 is performed using the ray identifier associated with the new ray to be processed. The ray identifier may be, or form part of, the ray data received at step S502. Alternatively, a ray identifier may be generated in steps S504 and S506. Step S508 may determine whether a ray identifier (and thus an indication of a re-entry point) corresponding to the new ray is stored in memory. In the specific implementation of step S508 shown in fig. 5A, the similarity check is performed by using the first quantized ray data (e.g., quantized ray representations) generated for the new ray. This step may include performing an 'equal comparison' (see discussion of step S530 below) for each previously stored identifier to compare the first quantized ray data to the previously stored identifier as the second quantized ray data. If the first quantized ray data and the second quantized ray data are identified as identical, an indication of a stored re-entry point associated with the stored identifier may be used for the new ray. The advantages of this particular implementation (as compared to the alternative particular implementations mentioned below) are that it provides implicit handling of similarities, a more compact ray identifier (and thus reduced storage requirements), and a data set that does not overlap. In an alternative implementation, the similarity check of step S508 may be performed using ray data (e.g., original ray data, ray representations, or quantized ray representations) of two rays (stored ray and new ray), for example, by comparing a distance metric between two ray data defining the two rays to a threshold. If the difference measure (strictly or non-strictly) between the ray data is below the threshold, this indicates that the two rays are 'similar' and the stored re-entry point is available for the new ray. If the difference metric (either non-strictly or strictly) between the two ray data is above the threshold, then this indicates that the two rays are not 'similar' and the stored re-entry point will not be for the new ray. The 'difference measure' may be any suitable indication of similarity (or difference) between two ray data sets defining two rays, e.g., based on differences between their origin vectors and/or differences between their direction vectors. The threshold may be a strictly positive value.

At step S510, it is confirmed whether a ray identifier corresponding to ray data of the next ray for traversal is stored in the memory. If the answer is yes, an indication of a re-entry point associated with the ray identifier is found in memory. This may correspond to step S402 of fig. 4. If an indication of a re-entry point is found in the memory, the method proceeds to step S512 of FIG. 5A. If the answer to the question at step S510 is no, then an indication of the re-entry point associated with the ray identifier does not exist in memory. In this case, the method proceeds to step S520 of fig. 5B, which will be described in further detail below.

At step S512, in the case where it is confirmed that the ray identifier of the new ray for traversal is stored in the memory, an indication of the re-entry point stored together with the ray identifier is fetched from the memory. The extraction indication in step S512 may correspond to step S404 in fig. 4. As described above, a re-entry point is a node of the hierarchical acceleration structure that has identified an intersection for a previously tested ray associated with a ray identifier. In other words, a re-entry point is a point in the acceleration structure associated with an area of the scene on which intersection testing of rays is to be performed. The indication of the re-entry point may be an identifier of the node. That is, the indication of the re-entry point may include data identifying a node within the acceleration structure that corresponds to the point.

At step S514, an intersection test is performed on the new ray by traversing the ray through the acceleration structure from the re-entry point that has been fetched from memory. In other words, using the re-entry point corresponding to the extracted indication as a starting point, the new ray traverses through the acceleration structure. The re-entry point is a node in the acceleration structure. Thus, traversal of the acceleration structure is performed on the subtrees associated with the nodes of the re-entry point. The subtree associated with the node of the re-entry point is a subtree in the hierarchy having the node of the re-entry point as its root node. The subtree associated with the node of the re-entry point may be defined as a set of nodes. In other words, the ray traverses through the acceleration structure, through each of the children of the node set defined by the re-entry point, down to the primitives of the re-entry point (assuming primitive intersections are found, otherwise if necessary, the ray traverses through the children of the re-entry point to determine that no primitive intersections occur). The purpose of performing ray traversal in this manner is to have an opportunity at a re-entry pointAt least one primitive intersecting the ray is determined within the node of (a). Once the ray has traversed the subtree of the re-entry point completely, step S514 is complete. Step S514 may correspond to S406 in fig. 4. It will be appreciated that the phrase 'fully traversed' does not necessarily refer to the case where each node of a tree or sub-tree has been tested, but rather indicates that the tree/sub-tree is traversed to the following points: (a) the known ray does not intersect a primitive within the subtree, (b) a single hit has been identified (for the occluded ray), or (c) the closest intersection of the ray with a primitive within the subtree has been identified (for the non-occluded ray). During intersection testing of a ray for a node or for a primitive, if the value of t on the ray is t _int If there is an intersection at the point of (2), then the minimum culling distance t is used _min So that if t _int <t _min (or in the alternative, if t _int ≤t _min ) Then no intersection is accepted and the maximum culling distance t is used _max So that if t _int >t _max (or in the alternative, if t _int ≥t _max ) Then no intersection is accepted. Once S514 is complete, and has been at the value t of t _int Where intersection is accepted, t of the ray may be adjusted (i.e., reduced) _max Such that it corresponds to the identified intersection (i.e., t _max Is set equal to t _int ). It is not necessary that the value of t on the light be greater than t _max Subsequent intersection tests are performed on the points of (a). Using the re-entry means that the intersection in the sub-tree of the re-entry is found early during the intersection test, and t _max Early reduction in (2) may result in fewer intersection tests that need to be performed, thereby reducing the complexity, latency, and power consumption of performing intersection tests on non-occluded rays, for example.

The results of the intersection test at step S514 may reveal that the ray intersects the node of the re-entry point once (e.g., for an occluded ray), does not intersect, or intersects multiple times (only for a non-occluded ray). The results of the intersection test may be stored by the ray tracing system and may be used to subsequently perform processing operations on the scene. An example of a subsequent processing operation that may be performed on a scene is a shader operation. Junction of intersection test performed at step S514 The effect can also be used for adjusting the maximum distance t of the light _max So that conceptually, the light does not extend to more than t _max T value of (2), wherein t is _max The t value representing the ray at the closest identified intersection.

In some examples, after step S514, no further intersection testing of the rays is performed. That is, for some rays, once a single intersection has been identified, the intersection test may be considered complete. An example of a ray for which the intersection test is deemed complete after a single intersection is identified is an occlusion ray. However, in most ray tracing operations, it is necessary to consider multiple intersections between the ray and the scene to be processed. This is because, for non-occluding rays, the closest intersection of the ray with the scene is sought. For these rays, there is no guarantee that the intersection identified according to the intersection test from the re-entry point is the closest intersection. Thus, to explicitly determine the closest intersection, a wider portion of the acceleration structure must also be traversed by the ray. This is done at step S516.

At step S516, the remainder of the acceleration structure is traversed. In other words, at step S516, intersection testing of rays is performed for nodes of the hierarchical acceleration structure that are not tested by intersection testing from the reentrant point. The remainder of the acceleration structure may be traversed starting from the root node. The term "remainder of the acceleration structure" refers to the remaining nodes that are not included in the subtree associated with the node of the re-entry point (i.e., their supplements). In other words, during step S516, the subtree comprising the re-entry point is skipped and is therefore not traversed. This is because the subtree with the re-entry point as its root has been traversed at step S514. In some examples, the indication of the intermediate node that leads from the root node of the hierarchy to the re-entry point may be derived from the indication of the re-entry point stored in memory. If the starting node of the subtree comprising the re-entry point is identified by reference a and the root node is identified by reference R, the intermediate node may be considered as node a_1, a_2, … a_i leading from the root node R to the re-entry point a. If a ray is determined to intersect node A, it is extended that the ray also necessarily intersects nodes R, A _1, A_2, … A_i. Such as The indications of the fruit nodes R, A _1, a_2, … a_i may be derived from the reentrant point, and the testing of these intermediate nodes may also be skipped at step S516. The intersection test at step S516 includes testing the ray for nodes of the hierarchical acceleration structure that are not in the set of nodes defined by the re-entry point. The root node corresponds to an area of the scene that covers the entire scene (see 202 in fig. 2A). Thus, traversing the acceleration structure from the root node means that the entire scene has been considered at the end of the subsequent intersection test in step S516. As mentioned above, if one or more intersections between the node and the ray are identified during the intersection test at step S514, the maximum distance (t) of the ray may be adjusted (i.e., reduced) _max )。t _max Is limited by the distance range of the ray during the intersection test performed in step S516. This means that the closest intersection of the rays is determined to be at no more than t _max Further from the origin. This means that the ratio t corresponds to the scene _max Nodes of the region farther from the origin (or more generally, farther from minus infinity if allowed to be negative) do not have to be tested at step S516. Thus, when testing the remainder of the acceleration structure at S516, the efficiency of the intersection test may be achieved by avoiding the need to test nodes corresponding to areas of the scene farther from the origin than the closest intersection identified using the re-entry point.

In fig. 5A, steps S516 and S514 are sequentially performed. That is, step S516 is performed after step S514. Thus, in FIG. 5A, the acceleration structure is traversed from the re-entry point of the ray, and then the acceleration structure is traversed for the remainder of the acceleration structure that does not include the subtree that falls from the re-entry point. Thus, the further intersection test of S516 may be performed after the intersection test of the ray against the node of the acceleration structure, starting from the re-entry point corresponding to the extracted indication of the re-entry point. An advantage of this approach is that if there is an intersection in the node of the re-entry point, the intersection will be identified before the subsequent test is performed in step S516. Thus t _max May be adjusted after the target intersection test but before the subsequent intersection test (i.e., the test for the remainder of the acceleration structure), which may beTo potentially occur on rays with t values greater than t by avoiding intersection with the scene _max The corresponding nodes of the region at the point of (c) are accelerated. Another advantage is that computational power can be reduced by performing intersection test operations continuously. In an alternative example, steps S516 and S514 may be performed in parallel. That is, the intersection test for the remainder of the acceleration structure may be performed concurrently with the intersection test of the ray for the set of nodes of the acceleration structure, beginning at the re-entry point. An advantage of this approach is that it reduces the total amount of time it takes to perform intersection tests on each of the rays. That is, if steps S514 and S516 are performed simultaneously, the total amount of time to complete these steps decreases. Furthermore, in the parallel approach, it is still possible to determine any intersection in the subtree from the re-entry point faster than in the traditional approach (i.e. without using the re-entry point at all), and this determination can still be used to adjust t _max And thus expedites traversal of the remainder of the acceleration structure by avoiding the need to test nodes that would otherwise be tested without an earlier intersection determination (although fewer unnecessary tests would likely be avoided than a continuous method). For example, even in a specific implementation where steps S514 and S516 are performed in parallel, testing of the subtree passing down from the re-entry point may begin (in step S514) as part of step S516 before testing the parent node of the re-entry point. Steps S514 and S516 may be performed with multiple instructions (e.g., volume/primitive tests) and/or multiple data (e.g., rays), i.e., SISD, MISD, SIMD or MIMD. If multiple rays are in flight at the same time, the rays may not benefit from re-entry point data generated from other rays traversed at the same time in step S514, e.g., for primitive parallel testing. Other forms of parallelism may also reduce overall benefit.

At step S518, it is determined whether there are additional rays for which intersection tests are to be performed. That is, it is determined whether there are more rays traversing the acceleration structure. In other words, it is determined whether there are any rays with data to be extracted by the ray tracing system. If there are additional rays to be extracted by the ray tracing unit, the method returns to step S502 where the next ray for traversal is extracted. Similar to the new ray described above, the next ray is associated with a ray identifier. Steps S502 to S516 are then performed for the next ray. If no additional rays traverse the acceleration structure, the method ends at step S536.

As described above, a ray identifier for a ray is an identifier that includes data (either uniquely or non-uniquely) representing the ray. In one example, the ray identifier may correspond exactly to the original ray data of the ray. In alternative examples, the ray identifier may not correspond exactly to the original ray data. The ray identifier may include a coarser representation of the ray than the original ray data (e.g., a quantized representation of the ray). When a coarser representation of a ray than the original data is included, the ray identifier may not uniquely identify the ray. However, the ray identifier may still represent the ray by defining a set or "bucket" of raw data values for the raw ray data. The ray identifier may summarize, quantify, or compress ray data. The same ray identifier may correspond to data for multiple rays having similar data. The ray identifier may also be referred to as a ray bucket identifier because it may identify a collection or "bucket" of similar rays. The ray identifier may alternatively be referred to as a ray representation, or as a quantized ray representation when the ray data is quantized.

The ray identifier may be a fixed bit width value based on the position and orientation data of the ray. In other words, the ray identifier may be a data packet that includes a fixed number of bits. Each ray identifier may include the same number of bits. An advantage of each of the ray identifiers comprising the same number of bits is that the identifiers may be stored in the same cache line of memory 300. The ray identifier may include fewer bits than the original ray data. The ray identifier may comprise 18 bits. The ray identifier may include any selectable number of bits. An advantage of having a low bit width ray identifier is that it saves storage resources in the memory 300. The number of bits of the ray identifier should be chosen so as to save space in memory while also ensuring that the ray identifier adequately represents the rays they identify. The number of bits of ray identifiers should ideally be chosen to ensure that each ray identifier adequately represents a ray group that both identifies the ray group and distinguishes the ray group from other ray groups, and also to ensure that no repeated indications of too many re-entry points are stored in memory (i.e., that the same indication of re-entry points is stored for different ray identifiers). In order to keep the number of repeated indications of re-entry points stored in memory low (and thus the amount of data stored in memory low), it is expected that the ray identifiers will include fewer bits than the original ray data they represent.

If it is confirmed at step S510 that the ray identifier corresponding to the ray data of the next ray for traversal is not stored in the memory, the method of fig. 5A proceeds to the method of fig. 5B. If there is no ray identifier corresponding to the ray data stored in the memory, it is confirmed that there is no ray identifier corresponding to the new ray to be tested stored in the memory. Thus, there is no indication of the re-entry point of the new ray stored in memory. This means that the target intersection test from the re-entry point cannot be performed on the new ray.

At step S520, the ray tracing unit is used to traverse the ray from the root node through the acceleration structure. That is, the ray traverses from the node corresponding to the area (see 202 in FIG. 2A) covering the entire scene. As described above, traversing a ray from a root node through an acceleration structure is considered a conventional method of performing intersection testing. If there is no indication of a re-entry point stored for the ray, a traversal of the ray from the root node is performed, as there is no re-entry point data for the ray to provide a "potential leading advantage" as to where the intersection of the ray may exist in the scene. The traversal of the ray through the scene begins at the root node and if an intersection is detected, the test continues down through the one or more subtrees branching from the root node until the test has been performed on the primitives of the acceleration structure.

At step S522, it is determined whether the ray has intersected a primitive in the scene. More specifically, it is determined whether an intersection between the ray and the primitive has been detected. If no intersection with the primitive is detected, no further processing of the ray needs to be performed. If the ray does not intersect the primitive, no re-entry point can be generated for the ray. Thus, the method proceeds to step S518, where it is confirmed whether there are any additional rays for the ray tracing system to test, as described above.

If it is determined that the ray did intersect a primitive in the scene, a re-entry point is generated for the ray at step S524. The re-entry point is the node that intersects the ray. Thus, the re-entry point of a ray point corresponds to an area of the scene within which the intersection between the primitive and the ray is identified. The re-entry point may be a node at any suitable level of detail in the acceleration structure. Thus, a re-entry point may be associated with any size region in the scene. The size of the area or level of detail of the node for the re-entry point may be predetermined as explained in further detail below. Alternatively, the level of detail of the node for the re-entry point may be dynamically determined during the intersection test. The 'level of detail' of a node indicates the level of nodes within the acceleration structure, e.g., relative to the root node of the acceleration structure.

After step S524, a ray identifier for the new ray is determined at step S534. As with the ray identifier generated at step S532, the ray identifier generated at step S534 may include a coarser version of the ray data than the original ray data of the new ray so that the ray identifier may be used in the future to identify rays similar to the new ray. Step S534 may also include optional steps S526 and S528, which will be described in further detail below. Once the re-entry point has been generated, an indication of the re-entry point (e.g., a pointer to the re-entry point) may be generated and committed to memory 300 at step S530. Thus, an indication of the re-entry point may be stored in memory. As described above, the memory 300 may be a cache. The indication of the re-entry point is stored with the ray identifier of the ray for which the re-entry point has been generated. The indication of the re-entry point, once stored in memory, may be accessed by the ray tracing system when processing subsequent rays. More specifically, a ray having similar location and direction data as the ray for which the re-entry point is stored may access memory using the same ray identifier as the ray identifier.

At steps S522 and S524, an intersection test is performed on the light ray, and a re-entry point of the light ray is generated. The identified primitive intersections of the rays are used to generate re-entry points. The indication of the re-entry point is then stored in memory along with the ray identifier. In the example shown in fig. 5B, an indication of the re-entry point is stored in memory for rays that have not yet been stored. In an alternative example, the ray may already have an indication of the re-entry point stored in memory. However, it may be useful to store the closest or closest indication of the re-entry point in memory. Thus, the indication of the new re-entry point may replace the indication of the old re-entry point in memory. That is, the indication of the old re-entry point may be discarded to facilitate the indication of the new re-entry point. Another example of a situation where the new indication of the re-entry point replaces the old indication of the re-entry point in memory may occur when the intersection is identified as the closest intersection from the traversal of the remainder of the acceleration structure in step S516 of fig. 5A. Alternatively, when any new intersections are identified, a replacement may occur (such a replacement may be dedicated to occluding rays, as the "closest" intersection need not be identified for these rays). In this case, steps S522 to S530 may be performed on the reentrant point corresponding to the intersection identified at step S516. Instead of discarding the old re-entry point, the indication of the new re-entry point may be stored with the indication of the old re-entry point, assuming that these indications are dissimilar. For example, for each primitive intersection identified in steps S514, S516, or S520, the re-entry point may be stored in memory regardless of whether the intersection is the closest intersection. In the case of storing multiple re-entry points for the same ray identifier, then in a subsequent intersection test, (a) one of the re-entry points is selected as a starting point (e.g., a re-entry point that may be outside of the range of distances of the ray of intersection distances only, based on having a "closest" intersection distance relative to the direction of the ray), or (b) each re-entry point is selected as a starting point such that there are multiple distinct starting points (i.e., multiple steps similar to step S514, in step S514, traversal is performed with respect to a subtree having the corresponding re-entry point as a root within the acceleration structure). For option (a), this may involve storing an indication of the distance of intersection in memory along with an indication of each re-entry point. At the find step S508, these indications may be compared to the distance range and direction of the new ray to skip any re-entry points that are outside the distance range of the new ray. For option (b), the intersection test may be performed sequentially or in parallel from multiple re-entry points. After traversing from multiple re-entry points, the "remainder" of the hierarchy to be traversed may be the remaining nodes of the hierarchy that are not included within any subtree of each of the re-entry points. In further examples, where the rays are generally coherent (e.g., have similar directions), the indication of the new re-entry point may be discarded to facilitate the indication of the existing re-entry point, with the indication of the old re-entry point still stored in memory. In this example, the indication of the new re-entry point may correspond to an intersection that is "farther" than the origin of the intersection ray corresponding to the old re-entry point relative to the direction of the current ray. This example may include storing an indication of the primitive intersection distance for each re-entry point in addition to the indication of the re-entry point and the ray identifier.

The methods of fig. 4, 5A, and 5B enable re-entry points to be generated, stored, and used to expedite intersection testing of rays in an acceleration structure. The advantage of using re-entry points to perform intersection tests in order to render images of a scene can be seen in fig. 2A. For ray 210 in FIG. 2A, imagine memory 300 stores an identifier of the ray and an indication of a re-entry point corresponding to sub-region 204a of the scene. That is, the re-entry point for the ray identifier for ray 210 would be node 204a'. When ray data for ray 210 is received by the ray tracing system, the identifier of ray 210 is used to perform a memory read operation. The ray identifier of ray 210 is identified in memory and an indication of the re-entry point is extracted from memory and used to indicate the corresponding re-entry point, i.e., node 204a' associated with spatial region (e.g., area, volume) 204 a. In the example of FIG. 2A, the slave node 204a' is openThe target intersection test is performed within the starting subtree. The subtree starting from node 204a 'includes children nodes 206a' to 206d 'of node 204a'. It is not necessary to perform an initial intersection test on nodes that do not originate from node 204a'. That is, in step S514, it is not necessary to perform the target intersection test on any node in the subtrees originating from the nodes 204b 'to 204 d'. By performing an intersection test from region 204a, it is determined that the ray intersects node 204a 'and also intersects child nodes 206a', 206b ', and 206 c'. It is then determined that the ray intersects primitives 208a and 208b in child node 206 c'. The closest intersection determined for the ray using the re-entry point is the intersection with primitive 208 a. Determining that a ray intersects primitives 208a and 208b reduces t _max Such that it is equal to the value of t at the intersection of the ray with the leading edge of primitive 208a (as indicated by reference numeral 212). If the ray is an occlusion ray, no further intersection tests are performed. If the ray is not an occluded ray, further intersection testing of the ray should be performed, such as after testing from the re-entry point (e.g., from the root node) is completed. Due to t _max Has been reduced so that after step S514 is completed, it is not necessary to extend the scene beyond t _max The subsequent intersection test of the ray is performed by the node corresponding to the region of (a). Thus, in this example, for child nodes 204b ' through 204d ', subsequent intersection tests from root node 202' may be avoided. In particular, if t is prior to performing the test of the ray against nodes 204b ', 204c' and 204d _max Decreasing to a value equal to t on ray 210 where it intersects primitive 208a (i.e., at point 212), then no ray will be found to intersect node 204b ', 204c ', or 204d ' and then no intersection test will be performed on ray 210 with respect to sub-regions 206e ' through 206p ' or with respect to primitives 208c ' through 208g '. By using re-entry points and finding close intersection points early, the number of tests required to find the closest possible intersection of rays is significantly reduced. In practice, the number of intersection tests that must be performed per ray may be reduced by 10% to 15%. Equivalently, the number of light rays that can be processed per unit time can be increased by 10% to 15%.

The re-entry point of the ray may be the root node of the acceleration structure. Preferably, however, the re-entry point is not the root node of the acceleration structure. In other words, the re-entry point of the ray is preferably a node other than (i.e., at least one level of detail higher than) the root node of the hierarchical acceleration structure. The reason for this is that if some of the nodes in the acceleration structure can be avoided, then the intersection test can be skipped. If the root node is a re-entry point, then an intersection test from the re-entry point may be performed on the entire acceleration structure (i.e., all subtrees originating from the root node). If the re-entry point is not the root node, the acceleration structure excludes lower levels of detail in the hierarchy (such as the root node) from the traversal of the re-entry point, which increases processing speed.

As already mentioned above, the re-entry point of a ray is the node of the hierarchical acceleration structure that has identified an intersection for a previously tested ray. In other words, the re-entry point indicates a node containing primitives that intersect the previously tested ray. While the re-entry point may be a leaf node of the acceleration structure (i.e., a node having a pointer to the primitive or the primitive itself), it is preferred that the re-entry point be a tree node at least one level of detail below the leaf node. The re-entry point is determined from the intersection between the ray and the primitive.

Each level of detail in the acceleration structure may be associated with an integer N. In the context of the present application, N is an absolute value defining the number of levels between a root node and child nodes in the acceleration structure. N may increase as the hierarchy of the hierarchy progresses further from the root node. In other words, N may increase as the level of detail of the hierarchy in the acceleration structure increases. For example, N of the root node may be 0. The N of the child node extending from the root node may be 1. The total number of levels in a hierarchy may vary depending on the complexity of the hierarchy. The re-entry points may be nodes that are each at the same level of the hierarchy in the acceleration structure. In other words, each re-entry point generated for the acceleration structure may be at the same level of detail in the acceleration structure. The level of detail of the re-entry point may be fixed in the acceleration structure. In the case where the level of detail of the re-entry point is fixed, the indication of the re-entry point may be compressed at the time of storage (e.g., by deleting the least significant bits of the indication). This is useful when some bits are used only to distinguish nodes of a level of detail higher than that of the re-entry point.

Two methods have been devised to determine a fixed level of detail for re-entry points in an acceleration structure. These methods are described below. It should be appreciated that although two methods are described below, other possible methods are possible. The first method is called the top-down method. In this approach, the re-entry point has a level of detail that is M levels greater than the level of detail of the root node of the acceleration structure (i.e., where N of the root node is 0, m=n). A larger level means a higher level of detail. For example, the re-entry point may have a level of detail two levels greater than the level of detail of the root node. This means that for each ray intersection determined by the ray tracing system, the re-entry point of the ray will be the node that includes the intersection, which has a level of detail that is two levels greater than the level of detail of the root node. In the case where the level of detail of the root node is 0, the level of detail of the reentrant point will be 2. If this level of detail is lower than the calculated level of detail that is M levels greater than the level of detail of the root node, the reentrant point may alternatively be clamped to a level of detail that is 0 or one level of detail lower than the level of detail of the leaf node.

The second method for determining the level of detail of the re-entry point is referred to as a bottom-up method. In this approach, the re-entry point has a level of detail that is L less than the level of detail of the primitives in the acceleration structure that intersect the previously tested ray associated with the ray identifier. For example, the re-entry point may have a level of detail two levels smaller than the level of detail of the intersecting primitives. This means that for each ray intersection determined by the ray tracing system, the re-entry point of the ray will be the node that includes the intersection, which has a level of detail that is two smaller than the level of detail of the intersected primitive. Smaller levels mean lower levels of detail. If the level of detail of the primitive is 4, the level of detail of the reentrant point will be 2. If this level of detail is Yu Bitu yuan higher than the calculated level of detail for L levels, the reentrant point may instead be clamped to a level of detail that is 0 or one level of detail higher than the level of detail of the root node.

The re-entry point may be predetermined during the acceleration structure build-up process or "pre-baked" onto the triangle primitives. Prebaking of re-entry points involves a tradeoff between hierarchy set-up time/cost and traversal time/cost. The top-down approach is suitable for accelerating the top-down establishment of structures. The bottom-up approach is suitable for accelerating the bottom-up build-up of structures. Both methods can be implemented by streaming setup. This means that the setup only processes each node once and then refreshes that node downstream. Alternatively, the re-entry point may be determined dynamically during the intersection test. Dynamic determination of re-entry points may be combined with the top-down approach described above. This can be accomplished by noting that rays typically traverse the hierarchy top-down. A reference to the latest non-primitive node for which the ray was intersection tested may be stored until the required level of detail is reached. Once the desired level of detail is reached, the re-entry point is not updated.

For dynamic determination of a re-entry point, for a current level of detail, memory 300 may store, upon finding any intersection in a subtree of a re-entry point node determined by the current level of detail, a current indication of successful recording of rays in the cache that found those rays of the stored ray identifier. The indication of successful recording may be weighted by the level of detail at which the reentrant point is in the hierarchy. Each ray that finds a ray identifier in the cache may decrease the current success indication and each ray that meets the success criteria may increase the success indication (e.g., the success indication may be the ratio between the success ray and all rays that find the ray identifier in the cache). The success indication is weighted by the levels of detail, as higher levels of detail (i.e., lower in the acceleration structure) indicate fewer subtree intersection tests than lower levels of detail. The function used to weight the indicated success through a level of detail may not be able to predict the box or triangle intersection savings perfectly, but heuristically derived approximations may be sufficient for reasonable results. The dynamic reentrant point may "walk" through the acceleration structure (i.e., increment or decrement) in an attempt to find a better level of detail. That is, after a predetermined number of rays have been processed and a success indication has been accumulated, the level of detail of the indication is either incremented or decremented. This results in the generation of a first successful indication. After processing the same number of rays at increasing/decreasing levels of detail, a second success indication is generated. The first success indication and the second success indication are then compared. If the second successful indication is better than or equal to the first successful indication, the level of detail associated with the second successful indication is saved in memory and the next level of detail indicated is given by another level in the same direction. If the second success indication is less than the first success indication, the first level of detail is reestablished and the process is repeated in the opposite direction.

Note that in the foregoing description, the term 'level of detail' has been related to the number of levels between a node and a root node. Thus, it is contemplated that all nodes at the same level of detail will represent the same amount of space based on equal subdivision of space from a parent node to its child nodes. However, acceleration structure optimization policies may cause nodes to be repositioned within an acceleration structure such that even if an initial acceleration structure defines nodes at the same level of detail as representing the same amount of space, this may not be true in an optimized acceleration structure derived from the initial structure. Thus, it is also noted that in some cases it may be beneficial to select a re-entry point that represents a fixed size (i.e., a fixed amount of space) rather than a fixed number of levels from the root node. Those skilled in the art will understand how the methods disclosed herein can be adapted accordingly.

The re-entry point may have to be adjusted for instantiation. Instantiation partitions the acceleration structure of a scene into a single top-level acceleration structure and one or more bottom-level acceleration structures that arrive during ray traversal. The acceleration structure is partitioned by one or more instance transformation nodes associated with the instance transformation matrix. The instance transformation node is a leaf node of the top level acceleration structure. During traversal of the acceleration structure, and at instance transformation nodes, rays are transformed into instance space using an inverse transformation to the original transformation matrix. If instantiation is used, extended re-entry point references are required. This may require additional bit storage. The extended re-entry point encodes the location of the instance transformation node of the underlying acceleration structure within the top level acceleration structure, thereby implicitly indicating the instance transformation (and/or its inverse), and the location of the primitive within the underlying acceleration structure. The ray identifier is generated from the untransformed ray attributes such that the ray identifier can be identified regardless of in which underlying acceleration structure the intersection occurs.

Memory 300 may be updated periodically as new re-entry points for the ray are generated. Memory 300 may have a sufficient number of cache lines to store all ray identifiers and re-entry points for rays to be tested by the ray tracing system. Alternatively, the memory 300 may have a limited number of cache lines such that the number of ray identifiers for all rays to be tested by the system exceeds the number of cache lines in the memory. In this latter example, existing entries in memory may need to be overwritten at some point in order to make room for new ray identifiers and indications of re-entry points generated during intersection testing. Such overwriting can be performed in a variety of ways. In a first example, for a direct-mapped cache, a single cache line corresponding to the new ray identifier is overwritten because there is only one way per set. In a second example, a walk eviction policy may be implemented for setting associative or fully associative caches. In this example, after each new initialization or replacement commit, the eviction iterator "walks" through all the ways in the set that correspond to the new ray identifier before repeating the replacement of the way. This ensures that each entry in the cache line of the memory has the same lifetime. More specifically, each entry in the cache line in memory has a lifetime of w commitments, where w is the number of ways per set in the cache (e.g., equal to the cache size of the fully associative cache). Lifetime is independent of any subsequent lookup or update to the cache line (i.e., a read operation addressing an entry in the cache line). The walk eviction policy is advantageous because it is a computationally simple implementation mechanism.

In a third example, for setting associative or fully associative caches, memory may use a "least recently used" eviction policy. In this policy, a tree of way indexes is generated that partially orders the ways of the groups in the cache based on cache commit and/or lookup. For the group corresponding to the new ray identifier, each new ray identifier submitted to the cache may replace or initialize (when the ray identifier corresponds to a group that does not yet contain valid data for the identifier) the contents of the way in the minimum way index encoded by the tree. The ordering of ray identifiers in the tree varies depending on when cache entries are committed and/or looked up. For example, if a cache line is accessed by a write or read operation, the cache line is reordered to the top of the tree before other ways that have not been accessed recently. The least recently used eviction policy has the advantage that it is more accurate in selecting which entries to overwrite, as it takes into account how often different memory entries are used.

The ray identifier may indicate ray data. The ray identifier may not include exactly the same data as the ray. As described above, the ray representations may be compressed or generalized representations of ray data. The compressed ray identifier may represent a plurality of rays. The plurality of rays identified by the compressed ray identifier may be similar rays. Where the ray identifier is a compressed representation of ray data, the ray identifier may be referred to as a ray representation. Generating a ray representation from the raw ray data is shown in step S504 of fig. 5A and step S526 of fig. 5B. As shown in step S506 of fig. 5A and step S528 of fig. 5B, the ray representation may be further transformed into a quantized ray representation. The steps of transforming the raw ray data into a ray representation are shown in fig. 9. This method may be performed by processing logic of ray tracing system 100.

The raw ray data or uncompressed data of the ray may include three directional components and three location components. That is, the light may extend through the three-dimensional space. The light ray may have a first component extending along the x-axis, a second component extending along the y-axis, and a third component extending along the z-axis. In an alternative example, the ray data may include two direction components and two position components. In other words, the light rays may extend through a two-dimensional space. In further examples, the light ray may have more than three directional components and more than three location components (e.g., four location components and four directional components).

An exemplary illustration of uncompressed light is shown in fig. 6. Ray 602 is shown in two dimensions. The two-dimensional space of fig. 6 has a y-axis extending in the vertical direction and an x-axis extending in the horizontal direction. As can be seen in fig. 6, the ray 602 extends in only two dimensions. This may be the case. However, in alternative examples, the light rays may extend in additional directions. For example, the light may extend in a third direction. In an example where the light rays extend in a third direction, the space of fig. 6 may be a three-dimensional space. That is, the space may have a z-axis that extends in a direction perpendicular to both the y-axis and the z-axis (and perpendicular to the plane of the page, i.e., either within or outside of the plane of the page). In this latter example, the space in fig. 6 is shown with only two axes for simplicity.

Each of the position components of the ray is used to represent the origin of the ray. More specifically, each positional component of a ray is a coordinate representing the origin of the ray. The origin of a ray may be considered the point from which the ray originates or begins. In fig. 6, the origin of ray 602 is shown by reference numeral 604. For example, if the ray is a three-dimensional ray, the origin of the ray will be identified by three location components. The three position components are the x-component (P _x Representing the position of the origin along the x-axis), the y-component (P _y Representing the position of the origin along the y-axis) and the z-component (P _z Representing the position of the origin along the z-axis). Each of the directional components of the light ray is a vector component representing an amount by which the light ray extends from the origin in that direction or in a direction opposite to that direction. In other words, each directional component of the light ray consists of a magnitude representing a scale of the direction in which the light ray extends and a sign representing the positive or negative direction in which the light ray extends. The direction of the light is shown by the arrow extending from the light 602. For example, if the ray is a three-dimensional ray, the ray will have three directional components. The three direction components are x-components (D _x Representing the magnitude and sign of the ray along the x-axis), the y-component (D _y Representing the magnitude and sign of the ray along the y-axis) and the z-component (D _z Representing the magnitude and sign of the ray along the z-axis). The ray may also have a maximum rejection distance t _min And t _max . Maximum rejection distance t of light _min And t _max The minimum rejection distance and the maximum rejection distance of the light are respectively indicated. In fig. 6, the minimum culling distance of the ray is denoted by reference numeral 606, and the maximum culling distance is denoted by reference numeral 608. The intersection determination of the ray may be based on whether the distance along which the ray intersects is between the minimum culling distance and the maximum culling distance of the ray. Such inclusion may be inclusive or exclusive for each end point or both end points of the culling distance.

All position and direction components of a ray may be used to derive an identifier of the ray. For example, where the ray is a three-dimensional ray, the ray may be represented by each of its three dimensional and positional components. The ray may also use t _min And t _max The value represents. However, it may be beneficial to identify the ray using a (compressed) ray identifier that is equal to or quantized from the ray representation. One reason is that less storage resources are required to store compressed data in a memory, such as memory 300. Another reason is that a (compressed) ray identifier may be used to identify multiple rays, grouping the rays together. This means that similar or even equivalent rays, or rays having similar or even equivalent ray data, may be identified by a common identifier. Using the same identifier to identify similar rays means that those rays can be similarly processed. More specifically, similar rays identified by the same ray identifier may be treated identically.

A method for compressing ray data to generate a ray representation is shown in fig. 9. This method may correspond to step S504 of fig. 5A and/or step S526 of fig. 5B. The method starts at step S902, where the directional component of the ray data having the greatest magnitude is identified. In other words, the directional component of the ray with the greatest value is identified. The directional component of the light ray having the greatest value extends along one of the axes of the space through which the light ray travelsExtending. Where the ray is a three-dimensional ray, the directional component of the ray having the greatest magnitude extends along one of the x-axis, y-axis, and z-axis (i.e., is D _x 、D _y And D _z One of them). If two or three axes have the same large magnitude, then the tie-breaking rule method must be used to guarantee (deterministic) selection.

After determining the directional component of the ray data having the greatest magnitude, then at step S904, the axis of the identified directional component is defined as the long axis of the ray. Thus, the long axis of a ray is defined as the axis that contains the greatest magnitude of the directional component of the ray. For example, if the direction component of the ray data having the greatest magnitude is the x-axis component (D _x ) The long axis of the ray will be the x-axis. Step S904 may be expressed mathematically as follows:

from the above representation, it can be seen that during the transformation of the raw ray data into a ray representation, the x, y and z axes of the ray are replaced with the u, v and w axes. More specifically, the short axis of the ray is replaced with the u-axis and the v-axis, and the long axis of the ray is replaced with the w-axis. Thus, in the above equation, d _x 、d _y And, d _z The distance components of the raw ray data along the x-axis, y-axis, and z-axis, respectively. d, d _u 、d _v And, d _w Is a permutation value of the original direction component. d, d _w Is the displacement distance component along the long axis. d, d _u And d _v Is the displacement distance component along the minor axis.

At step S906, after the long axis of the light has been determined, a transition position at which the position component along the long axis on the light is 0 is determined. As described above, the position component of a ray represents the origin of the ray. Therefore, at step S906, the light ray data is converted such that the origin thereof is located on the long axis. This means that the value of the position component of the light along the long axis is converted to 0. The values of the positional components along axes other than the long axis may be transformed according to the transformation of the positional components of the long axis. An advantage of reducing the position component of the rays along the long axis to 0 is that if the ray data for each ray is adjusted in this way, it can be assumed that the value of one position coordinate for each ray will be 0. Thus, the positional component of each ray along the long axis can be removed from the ray data. This compresses the ray data. Step S906 may be mathematically illustrated as follows:

In the above equation, p _x 、p _y And p _z The position components of the raw ray data along the x-axis, y-axis, and z-axis, respectively. P is p _u 、p _v And, p _w Is a permutation value of the original direction component, where p _w Is the distance component along the long axis. P is p _u And p _v Is the displacement distance component along the short axis of the ray. P (P) _u And P _v Is the transition position component of the ray along the short axis. The value 'E' represents the scene range, which provides the bounding size to the scene geometry. In other words, the value E represents a real number such that the cube [ -E, E] ³ Including the entire scene. In some implementations, E can be a power of 2.

At step S908, the three directional components of the light ray are rescaled such that the value of the directional component along the long axis is 1. This is achieved in that for an effective light direction, the directional component along the long axis is guaranteed to be non-zero. In other words, the magnitude of each of the directional components of the light ray decreases such that the magnitude of the component is between 0 and 1 (inclusive). The maximum value of the directional component of the compressed light is the value of the component along the long axis, i.e. 1. An advantage of the ray's directional component along the long axis being rescaled to 1 is that if the ray data for each ray is adjusted in this way, it can be assumed that the value of one directional coordinate for each ray will be 1. Thus, the directional component of each ray along the long axis can be removed from the ray data. This compresses the ray data. This compression is essentially lossless with respect to the position and direction of the light. It is essentially lossless in that mathematical operations are reversible (i.e., unijective) over the infinite set of lines associated with those rays, however, some penalty may be introduced due to the lack of numerical precision/accuracy in any floating point arithmetic operations (e.g., addition, subtraction, multiplication, division, etc.). Step S906 may be mathematically illustrated as follows:

D _u And D _v Is the rescaled directional component of the ray along the short axis. Although step S908 is shown in fig. 9 as occurring after step S906, in alternative examples, step S906 may be performed after step S908 or concurrently with step S908. Step S908 occurs before step S906, which may be computationally less intensive. The resulting ray representation generated by the processing logic includes (i) two positional components of the transition position of the ray along an axis other than the long axis, and (ii) two rescaled directional components of the ray along an axis other than the long axis. The ray representation does not include a positional component or a directional component of the long axis of the ray. The axis that is not the major axis may otherwise be referred to as the minor axis. The ray representation may also include a value indicating the long axis of the ray. The ray representation may also include the values of the sign of the directional component along the long axis.

An example of a ray that has been compressed using the method of fig. 9 is shown by reference numeral 610 in fig. 6. Compressed light 610 in fig. 6 is a compressed version of light 602. As shown in fig. 6, the long axis of ray 602Is the y-axis. That is, the directional component of the ray 602 having the greatest magnitude is the component extending along the y-axis. The z-axis is not shown in fig. 6, however, for purposes of this example, it is assumed that the light rays extend to a lesser extent along the z-axis than along the y-axis. To compress the data of ray 602, it is first confirmed that the directional component of the ray data has the greatest magnitude in the y-direction. The y-axis is then defined as the long axis of the ray. Once the y-axis has been defined as the long axis, the transition position on the ray where the position component along the y-axis is 0 can be determined. This is illustrated in fig. 6 by the origin 612 of the ray projected onto the x-axis. Where the space of FIG. 6 includes three directions, ray origin 612 may instead be pinned to the x-z plane. By projecting ray origin 612 into the x-z plane, the ray's position component in the y-direction (P _w ) Is set to 0. The remaining position components of the ray in the x and z directions are transformed into (P _u 、P _v ) To illustrate the projection of light onto an axial plane.

The directional component of the ray is also rescaled so that a new directional component along the y-axis (D _w ) The value of (2) is 1.

Rescaling the ray means that the length of the ray along the y-axis is reduced to 1. This may be achieved by dividing the direction component by itself. The remaining direction component of the ray is rescaled according to the long axis component such that the new direction component (D _u 、D _v ) Having a magnitude less than 1. That is, the rescaled directional components each have a magnitude of no more than 1. Each of the direction components may be rescaled by dividing their value by the value of the (un-scaled) direction component along the long axis.

The resulting compressed ray 610 may be fit within a square (or cube) in two-dimensional (or three-dimensional) space as shown in fig. 6. The top of the rescaled direction of the compressed ray 610 is located on the boundary of the box, with a range of + -1 unit in each direction around the projected location of the compressed ray. As already mentioned above, the compressed ray 610 has a position component (P _v ) And a direction component (D) having a value of 1 _v ). If the above method is followed for each light processed by the ray tracing systemThe line generates a ray representation and there is no need to store the values of these components. This means that the amount of data that must be stored to identify the ray is reduced, and thus the memory resources required to store the ray data are reduced. This also means that equivalent rays (i.e., rays that are part of the same infinite line through the scene) are mapped onto a single ray representation.

It has been mentioned that compressed light may otherwise be referred to as a light representation. As described above, the uncompressed light may be three-dimensional light. That is, the original data of the light ray may include three position components and three direction components. The ray representation compressed according to the method of fig. 9 comprises two fewer components than the uncompressed ray. More specifically, the ray representation includes one less directional component and one positional component. This means that the ray representation comprises exactly two direction components and two position components. In other examples, for example where the light is a two-dimensional light, the additional components of the light may be compressed such that the light representation includes only one directional component, or only one positional component. The uncompressed light may include more than three position components and/or more than three direction components. In this example, the compressed light may have more than two directional components and/or more than two location components, but will have fewer location components and/or directional components than the uncompressed light.

In addition to the two directional components and the two positional components, the compressed ray (or ray representation) may also include an indication of the long axis of the ray. The indication of the long axis of the light may comprise three bits or at least two bits. In other words, the compressed ray may include data identifying an axis along which the directional component of the ray has the greatest magnitude. An advantage of storing an indication of the long axis of the light is that the indication can be used to decompress the light data. A further advantage is that the indication can be used as part of a ray identifier and as input to a hash function (outlined below) that generates a set index for a memory storing ray data to distinguish dissimilar rays having different long axes. The indication of the long axis may also be used to further identify the ray as it distinguishes rays having a principal component extending along the x-axis from rays having a principal component extending along the alternate axis.

The compressed light data may be stored using a predetermined number of bits. Using a predetermined number of bits to store the ray data means that the data may be uniformly stored within a memory resource such as the memory of fig. 3. Each of the position and direction components of the original ray data may be comprised of a predetermined number of bits. For example, each position and direction component of the original ray data may include 32 bits. Thus, the raw data of the ray may include 192 bits in total. By reducing the number of components included in the compressed ray data by 2, the number of bits of the ray identifier (or ray representation) may be reduced by 64 bits. In the case where the compressed light data includes an indication of the long axis, the indication may include two bits. The indication of the long axis is advantageously formed by two bits, since a two bit field allows four distinct combinations of bit values to be obtained. Thus, where the ray is a three-dimensional ray or a four-dimensional ray, the distinct identifier value for each axis through which the ray extends may be covered by a two-bit indicator. In the case where the ray is a two-dimensional ray, the identifier value for each axis through which the ray extends may be covered by a unit indicator. The compressed light data may include 130 bits in total (four 32-bit directional/positional components and a two-bit representation of the long axis).

In addition to magnitude, each of the position and direction components of the ray may begin with a positive or negative sign. The symbols preceding each of the position and direction components represent the position or direction in which the component travels. For example, in fig. 6, ray 602 travels in a positive direction (left to right) along the x-axis. Thus, the directional component of the light along the x-axis will bear a positive sign. Light ray 602 travels in a negative direction (top to bottom) along the y-axis. Thus, the directional component of the light along the y-axis will be negative. In an example, the sign taken by the directional component of the light along the long axis may be removed. This is illustrated in fig. 6, where the direction of the compressed light ray 610 is opposite to the direction of the uncompressed light ray 602, such that the compressed light ray extends in a positive direction (bottom-up) on the y-axis, rather than in a negative direction. The direction of the light along each of the short axes (or along the x-axis and z-axis in the case of fig. 6) is also reversed. This ensures that the line represented by the direction of the light rays is unchanged when the sign along the direction of the long axis is removed. The advantage of removing the long axis symbol from the ray data is that it further reduces the amount of data in the ray representation. This means that the memory resources required to store the ray representations in a memory, such as memory 300, are further reduced. At the same time, the sign indicating the positive/negative direction of the light along the long axis may not have a great effect on the efficiency of the result, as it does not affect the infinite undirected line containing the light. In the case of removing the symbol, it is assumed that the direction of the light along the long axis is positive. Thus, in step S908 of fig. 9, the rescaling of the three direction components may be such that the value of the direction component along the long axis is +1. This method step is particularly relevant for the data stored in the memory of fig. 3, since it has been empirically determined that this sign bit has no significant effect on the efficiency of this memory, especially when there is reasonable coherence in the input ray data.

In addition to the direction and position components, the compressed ray data may also include a minimum distance component and a maximum distance component representing non-null intervals. That is, the compressed ray data may also include t of the ray _min And t _max Is an indication of (a). t is t _min And t _max A two-dimensional distance range of the light ray may be represented. Thus t _min And t _max May be referred to as a distance range component. Also included in the compressed ray data is t _min And t _max In the case of an indication of (a), the compression method may further include rescaling and transforming the minimum distance component and the maximum distance component of the ray, thereby generating the value T based on the transformation position and the rescaling direction of the ray, respectively ₀ And T ₁ . Rescaling and transforming the minimum and maximum distance range components of the ray can be expressed mathematically as follows:

T ₀ ＝d _w t _min +p _w

T ₁ ＝d _w t _max +p _w

wherein T is ₀ Is t _min And T ₁ Is t _max Is a compressed value of (a).

More specifically, the distance from the new ray position to the old ray position can be determined by t _min And t _max The correct number of ray lengths at the indicated locations to rescale and translate the minimum and maximum distance components of the ray. That is, referring to FIG. 6, by determining from origin 612 to t _min 606, the correct number of ray lengths, t, of the original value _min May be rescaled and converted to a value T of compressed ray 610 ₀ . By determining from origin 612 to t _max 608, t _max May be rescaled and converted to a value T of compressed ray 610 ₁ . New (rescaled) direction magnitude (D) relative to compressed ray data _u 、D _v 、D _w ) To calculate the correct number of ray lengths, where D _w The value of (2) is 1. As already mentioned above, t _min Is the minimum value of t on the unremoved ray, and t _max Is the maximum value of t on the unremoved ray. If t _min And/or t _max Negative, t _min Not of constant ratio t _max Closer to the ray origin. That is, the "closest" intersection of a ray with a primitive may be the intersection of the ray with the t-value closest to minus infinity, and not necessarily the intersection of the ray with the t-value closest to 0 (i.e., the intersection closest to the origin). If the sign is negative, t _min And t _max Transformed value of (i.e. T) ₀ And T ₁ May be flipped during rescaling and transition by the sign of the directional component of the light ray extending along the long axis. For example, in the case where the negative directional component of the ray along the long axis is inverted to a positive directional component (i.e., the negative sign is removed), t _min And t _max Transformed value of (i.e. T) ₀ And T ₁ The sign of (2) is also inverted. This can be explained with reference to fig. 6, where the direction of the compressed light 610 along the long (y) axis has been reversed. Thus T ₀ And T ₁ Can be interchanged to ensure T ₀ Less than T ₁ . In the event that the ray data from the representation is to be subsequently decompressed (i.e., at t _min And t _max The representation of the original value of (a) will be derived from the compressed representationT of (2) ₀ And T ₁ In the case of acquisition), no event of a p T ₀ And T ₁ Unless the sign of the long axis component of the ray is stored separately in the compressed representation. If the light data is not decompressed, T ₀ And T ₁ The order of (c) may be interchanged during computation of the ray representations to ensure that they represent non-empty spaces. In FIG. 6, when the compressed ray travels in the opposite direction, the minimum distance component of the ray is now previously labeled t _max 608 component, having a value T in the compressed ray representation ₁ And the maximum distance component of the ray is now previously marked t _min 606, having a value T in the compressed ray representation ₀ 。

In the case where the uncompressed ray data represents an eight-dimensional ray (i.e., a ray having data composed of three position components, three direction components, and two distance range components), the eight-dimensional ray may include uncompressed data having a bit width of 256 bits. By removing one dimensional component and one positional component from the ray data, the value of the ray can be reduced to 194 bits (six 32-bit directions/positional components and two-bit representation of the long axis), i.e., 62 bits.

To further compress the ray data, the ray representation may be converted into a quantized ray identifier. Converting the ray representation to a quantized ray identifier or quantized ray representation may correspond to step S506 in fig. 5A and/or step S528 in fig. 5B. The quantized ray identifier may be a data packet that includes data indicative of various characteristics of the ray or may be contained within the data packet. For example, the quantized ray identifier may include data that includes a long axis of the ray and/or two location components that indicate a transition location and/or two rescaled direction components and/or two rescaled and transitioned distance range components. Thus, the quantized ray identifier may be used to identify the most important characteristics of the ray.

The quantized ray identifier may have a fixed bit width. In other words, the quantized ray identifier may have a predetermined size or be formed of a predetermined number of bits. An advantage of having quantized ray identifiers of fixed bit width is that these identifiers, as well as the fixed bit width indication of the re-entry point, can each be adapted to the same cache line of memory 300. The quantized ray identifier may have a bit width of 18 bits. In this example, an entry in the cache line (including the ray identifier and an indication of the re-entry point) may fit 64 bits (18 bits for the ray identifier and 46 bits for the re-entry point), i.e., 8 bytes. The memory resources utilized by one memory bank comprising 256 sets of 2 ways each and 8 bytes each are equivalent to those of a small L1 cache or a large L0 cache. In the example where the unquantized ray representation has a bit width of 130 bits, the bit width of the data may be significantly reduced by quantization to form the quantized ray representation. The quantized ray identifier may have a bit width of 128 bits. The quantized ray identifier may include 15 bits for each quantized direction component and 24 bits for each position/range component. The quantized ray identifier may provide a 50% compression rate for the original ray data. To form a quantized ray representation, each component of the ray data may be reduced to a predetermined number of bits. In the first approach, this may be done by first converting floating point minor direction components into a fixed point format before quantizing them. This may be accomplished by offsetting each floating point secondary direction component of the ray data by a value of 3. Each minor directional component of the compressed light is known to lie in the closed interval [ -1,1 ]. Each of these components is shifted up by a value of 3, placing them in the interval 2, 4. Clamping to the half-open interval [2, 4) ensures that all floating point values of the ray data have equal exponents, and are therefore purely defined by their mantissas. Thus, the highest number of most significant mantissa bits for each component may be taken as the quantization setpoint value for that component. In the second method, in the case where the value of the component of the quantized ray identifier is already a fixed-point value, the highest number of the most significant bits may be taken out of the component values as the quantized value of the component. In the first approach, the secondary position/range components can also be put into the interval [ -1,1] by rescaling the secondary position/range components by (a multiple of) the scene range size E, so that they can be treated identically to the secondary direction components.

In a third approach, each component of the ray data may be reduced to a predetermined number of bits to form a quantized ray representation while maintaining a floating point format. Quantizing the location and range components of the ray identifier to a shorter floating point format may include reducing a number of (a) exponent bits, (b) mantissa bits, or (c) exponent and mantissa bits for each component. The number of exponent bits in each direction/position/range component may be reduced to 0. This effectively provides a fixed-point representation of the components. The number of mantissa bits in each direction/position/range component may also be reduced to 0. This is particularly useful if the full restriction of the floating point format is required (e.g., because the scene range cannot be retrieved), but where a minimum number of bits of the component are used.

A first example of a quantized ray identifier 700A is shown in fig. 7A. The quantized ray identifier may correspond to the unquantized but compressed ray 610 of fig. 6. The quantized ray identifier may be a data packet comprising a plurality of portions. In fig. 7A, the quantized ray representation includes five parts. The first portion 702 of the package, denoted MAJ, may identify the long axis of the ray. The first portion 702 may be formed of two bits. The second portion 704 of the package, denoted POSS, may identify a first location component (P _u ). A third portion 706 of the package, denoted DIRU, may identify a first directional component of the ray (D _u ). A fourth portion 708 of the package, denoted POSV, may identify a second location component (P _v ). A fifth portion 710 of the packet, denoted DIRV, may identify a second directional component (D _v ). Each of the location components may have a bit width of five bits. In other words, the data in the data packet of the quantized ray identifier may include no more than five bits to indicate each of two position components of the transition location along an axis other than the major axis (i.e., the minor axis). Each of the direction components may have a bit width of no more than three. In other words, the data in the data packet of the quantized ray identifier may include no more than three bits to indicate one of two rescaled directional components along an axis other than the major axis (i.e., the minor axis)Each rescaled directional component. Thus, the bit width of each position and direction component can be significantly reduced from 32 bits per component to 5 bits per component or less. The quantized ray identifier may include 15 bits for each of its directional components and 24 bits for each of its location components. Having a limited bit-wide position and orientation component in this way has the advantage that the memory resources required for storing quantized ray identifiers can be minimized. As described above, the data packet that quantifies the ray identifier may have a total bit width of 18. An alternative benefit of having these position and direction components with limited bit widths is that it allows compression of the light data, which enables similar light to be grouped together under a single light identifier. The ordering of the portions within the package may be different from the ordering shown in fig. 7A.

A second example of a quantized ray identifier 700B is shown in fig. 7B. The ray identifier 700B is identical to the identifier 700A, except that the packet includes two additional portions. That is, the ray identifier 700B may be a data packet including a total of seven portions. The sixth portion 712 of the packet, denoted RNG0, may identify a first (minimum) distance range component of the ray. The seventh portion 714 of the packet, denoted RNG1, may identify a second (maximum) distance range component of the ray. Like the location components POSU and POSV, each of the distance range components in the quantized ray identifier 700B may include 24 bits.

The method of fig. 9 may further include generating a hash result of the quantized ray identifier representing the unquantized/quantized ray representation. The result of the hash may additionally be referred to as a set index, which may be used to identify where to find matching data stored in a cache, such as a direct-mapped cache. The set index may be used to identify entries that are stored in memory and on which a read or write operation may be performed. The hash result has a bit width smaller than the quantized ray identifier for which the hash result was generated, such that the hash result is a many-to-one mapping. In one example, where the quantized ray identifier has an 18-bit width, the hash may be a mapping from 18 bits to 8 bits. That is, the hash result may include 8 bits. The 8-bit width of the set index used to reference memory allows fewer sets to result in smaller memory, e.g., a cache.

Fig. 8 illustrates a hashing method for generating a hash result of a quantized ray identifier. The hashing method includes mixing and merging components of the quantized ray identifiers to form a set index equal to the hashed ray identifier. Mixing and merging the components in the quantized ray identifier may also be referred to as combining the ray identifier components using a bitwise binary operation. Combining the components may be accomplished by performing a logical exclusive-or operation on the bits of the quantized ray identifier to reduce the number of bits of the identifier. The logical exclusive-or operation is a binary operation. It is suitable for generating a hashed ray identifier because any change in input bits is reflected in a change in output bits. The logical exclusive-or operation considers the individual components of the quantized ray identifier, i.e., it is a bitwise operation.

The quantized ray representation includes a first bit 802 of the long axis, a second bit 804 of the long axis, one or more bits 806/810 of a first location component of the ray (POSS [0], POSS [4 … ]), one or more bits 812 of a second location component of the ray (POSS), one or more bits 808 of a first direction component of the ray (DIRU), and one or more bits 814 of a second direction component of the ray (DIRV). One or more bits of the first position component may be separated into a first bit 806 of the zeroth position component and a remaining bit 810 of the first position component. The bitwise exclusive or operation combines the first bit 802 of the long axis with the least significant bit 806 of the first position component in a first sub-operation. Meanwhile, in the second sub-operation, the second bit 804 of the long axis is combined with the most significant bit of the remaining bits 810 of the first position component. In an alternative example shown in fig. 8, the first bit of the long axis (i.e., M0) may be combined with the most significant bit of the remaining bits of the first location component in the second sub-operation. In this alternative example, the second bit of the long axis (i.e., M1) is combined with the least significant bit of the first location component in the first sub-operation. The purpose of the first two sub-operations is to combine the ray long axis components 802 and 804 with the most significant bits of the position coordinates of the ray identifiers POSU and POSV, respectively, which becomes clear after the third sub-operation: the result of the first combination and the second combination is combined with a bit representing the second position component and the second direction component. The resulting bits from this further operation are combined (e.g., bitwise exclusive-or) together to generate a set index of hashed ray identifiers. The purpose of the third sub-operation is to maximize the bit coverage of the two direction components by minimizing the overlap of the two direction components. This maximizes the change in the set index given any size change in the direction of the quantized ray. Furthermore, the bit coverage of the two position components is maximized by minimizing the overlap of the two position components, thereby maximizing the change in the set index given any size change in quantized ray positions. At the same time, the mutual overlap of the least significant bits of the ray identifier components is minimized to maximize the change in the set index given the small size change of any subset of the ray identifier components (i.e., the components that quantify both ray direction and position). The result of the third sub-operation generates a hashed ray identifier 816.

The validity of combining the ray identifier components by the bitwise exclusive-or operation described above depends on the following assumption:

coherent light rays will typically have nearly the same position but different directions;

coherent light rays will typically have nearly the same direction, but different locations;

coherent light rays will typically have similar but different directions and positions;

the least significant bits of the ray identifier component will change more frequently than the most significant bits of the ray identifier component; and

as determined by the instantaneous distribution of the light, a uniform random variation of the output bits is achieved given the (non-uniform) random variation of the input bits, resulting in a hashed light identifier with optimal utilization of memory (e.g., cache) in terms of minimizing cache conflicts.

As described above, the ray representation may be used to store an identifier of the ray in a memory (such as the memory shown in FIG. 3). The memory may be a cache. The cache may store data for intersection tests used by the ray tracing system to render images of the scene. An advantage of using ray identifiers for a memory as shown in fig. 3 is that data loss associated with binding ray data for different but similar rays into the same ray identifier is minimized. More specifically, this problem is solved by first converting the six-dimensional floating point ray data of the ray into a four-dimensional floating point ray representation, and second by generating a quantized ray representation from the ray representation. In other words, the ray representation generation techniques described herein may be used to represent multiple similar rays using the same identifier while ensuring that an accurate representation of each ray is achieved.

In some examples, it may be necessary to convert the compressed light data back into uncompressed light data. In other words, it may be necessary to convert the ray representation into ray data for the ray. A method for decompressing compressed light data is shown in fig. 10. The data to be decompressed using the method shown in fig. 10 may be data that has been compressed using the method shown in fig. 9. Alternatively, the data to be decompressed may be data that has been compressed by other means. As already mentioned above, the compressed ray data is a compressed representation of ray data and comprises: (i) Two positional components of the transition position of the light ray, (ii) two directional components of the light ray. In the case where the data is to be decompressed, the data also includes an indication of the long axis of the ray and a sign of the third directional component of the ray. The data may also include two distance range components of the ray.

The method of fig. 10 begins at step S1002, where a third positional component of a ray is inserted into ray data according to an indication of the long axis of the ray. The third position component of the ray corresponds to the position component that was removed during compression. The third position component represents the position of the origin of the ray along the long axis. During ray compaction, the origin of the ray is shifted such that the value of the third position component of the origin is 0. Therefore, the value of the position component added to the light data during decompression is 0. The indication of the long axis of the ray indicates the location within the ray data where the new location component was inserted. For example, if the long axis is the x-axis, it is the x-component of the location where the new location component was inserted into the decompressed ray data. Two position components of the compressed ray data are inserted into the other two position components (e.g., y and z position components) of the decompressed ray data.

At step S1004, a third directional component of the ray is inserted into the ray data according to the indication of the long axis of the ray. The third directional component of the light ray corresponds to the directional component that was removed during compression. The third direction component represents the magnitude of the direction of the light ray along the long axis. As described above, when compressed, the value of the third direction component is 1. Therefore, the value of the directional component added to the light data during decompression is 1. The indication of the long axis of the ray indicates where the new direction component is inserted into the ray data. For example, if the long axis is the x-axis, it is the x-component of the direction in which the new direction component is inserted into the decompressed ray data. Two directional components of the compressed ray data are inserted into the other two directional components (e.g., y and z directional components) of the decompressed ray data.

The method of decompressing may further include adding additional bits to each of the two direction components and the two position components of the ray representation, wherein the additional bits are least significant bits. Additional bits are added to fill the bit width of the position and direction components to the width of the original direction and position components. For example, if each of the compression direction components has a bit width of 15 and each of the non-compression direction components has a bit width of 32, 17 additional bits may be added to increase the number of bits in these components to 32. Similarly, if each of the compressed position components has a bit width of 24 and each of the uncompressed position components has a bit width of 32, 8 additional bits may be added to increase the number of bits in these components to 32. These additional "pad" bits may be appended to the entire component in fixed-point format, or to the mantissa of the component only in floating-point format. These additional "padding" bits may be one of the following:

All 0 (minimum representation of a group of rays)

0 heel 1 (middle representation of a group of rays)

1 heel 0 (middle representation of a set of rays)

All 1 (maximum representation of a group of rays)

In the case where the floating point component of the ray identifier has been converted to a corresponding fixed point component, then this must be undone during decompression to generate the floating point component either before or after the "pad" bit is appended. If the component of the original (uncompressed) ray data is already a fixed point component, this is not necessary.

The ray representation may also include a minimum distance component and a maximum distance component. In case the ray representation comprises these additional components, the method further comprises adding further bits to the minimum distance component (t _min ) And a maximum distance component (t) _max ) Wherein the further bits are least significant bits. Similar to the above mentioned, additional bits are added to t _min And t _max So as to fill the bit widths of these components to their original widths. The additional "pad" bits may be any of the exemplary bit combinations provided above.

The method may further comprise according to t _min And t _max Compression value (i.e.T) ₀ And T ₁ ) Symbols of a third directional component (i.e., a main directional component) of the light data are decoded. Then, each directional component of the ray must be multiplied by this sign of the third directional component to undo the inversion of the ray direction performed during compression (i.e., if the sign is negative, then the inversion). In other words, after adding the sign to the third direction component, the sign of the third direction component may be used to flip (e.g., exclusive or) the sign of the remaining direction component. This method step may be performed on compressed data, wherein the negative sign of the main direction component has been removed during compression. Can also be applied to the T _min And t _max Indication of (i.e.T) ₀ And T ₁ ) The method steps are performed on compressed data of (a). In storing T for compressed ray data ₀ And T ₁ In the case of (a), analysis is performed to determine T ₁ Whether or not the value of (2) is greater than T ₀ Is a value of (2). As described aboveLet t be _min And t _max Is different in value, t _min And t _max Compressed value T of (2) ₀ And T ₁ Possibly eventually disordered after compression with a negative sign. At t _min Is equal to t _max Then, as an initial compression step, t _min And t _max One of the values of (2) may be disturbed by the smallest possible value, e.g. by decrementing or incrementing t, respectively _min Or t _max To ensure that the two values are dissimilar (sufficient precision may need to be used to ensure that the two values are rescaled and translated into T) ₀ And T ₁ And then remain distinct). If T ₀ Is greater than T ₁ And the sign preceding the directional component on the long axis is determined to be positive. If T ₀ Is greater than T ₁ And the sign preceding the directional component on the long axis is determined to be a negative sign. Once the correct symbol has been identified, it is placed in front of its corresponding directional component. All ray direction components are then multiplied by the sign of the principal direction component. In other words, in addition to the primary direction component, the remaining secondary direction component of the light ray is also multiplied by the sign of the primary direction component. This will reverse the overall light direction and ensure that the reversal of light direction applied during compression is undone. Finally, by reestablishing the correct order, T is determined ₀ And T ₁ T assigned to decompression _min And t _max . Alternatively, this sign bit may be stored separately and not according to t _min And t _max Encoding is performed.

The method of FIG. 10 may further include, at the point where the decompressed minimum distance component t is assigned _min And a maximum distance component t _max At this time, the compression distance value T is determined according to which of the two values is closest to minus infinity ₀ And T ₁ The reordering is performed.

The decompressed light data obtained according to the method shown in fig. 10 may not be exactly the same as the initially compressed light data. Thus, the decompression method of fig. 10 may be described as a lossy decompression technique. Although it is not exactly the same as the original data, the difference between the decompressed data and the original data will be small, and it is still useful to obtain the decompressed data. The use of lossy data is particularly beneficial in image and video compression applications. Lossy decompression allows a significant reduction in data size compared to lossless compression techniques without materially affecting the visual or perceived quality of the final result. In ray tracing applications, particularly for the directional component of the ray, the compression method described above discards some number of bits (least significant bits) from the ray representation, assuming that the compression accuracy level is higher than desired. For directional components of light, fixed point coding may be more beneficial than floating point coding. This is because fixed point encoding propagates errors in the directional component more uniformly around the boundaries of the unit sphere of the compressed ray, while floating point encoding concentrates maximum precision around the axiom plane great circle and also around the axiom pole. Removing this extra level of accuracy has little effect on the end result achieved for the light identifier. Thus, lossy compression techniques are suitable for use with the ray representations described herein because the loss of precision is negligible compared to the reduction in data size achieved.

The methods described in fig. 9 and 10 may be performed by a ray tracing system, such as the system shown in fig. 1. That is, the ray tracing system of FIG. 1 may be configured to convert ray data of rays into ray representations. As described above, the ray representations are compressed representations of ray data. As also described above, the ray data may include three directional components and three location components of the ray. More specifically, the method described in FIG. 9 may be performed by processing logic included within the ray tracing system of FIG. 1. The processing logic may be configured to perform the steps shown in fig. 9. In other words, the processing logic is configured to: identifying which of the three directional components of the ray data has the greatest magnitude and defining an axis of the identified directional component as the long axis of the ray; determining a transition position on the light ray where the position component along the long axis is zero; and rescaling the three directional components of the light such that the value of the directional component along the long axis is 1. The resulting ray representation generated by the processing logic includes (i) two positional components of the transition position of the ray along an axis other than the long axis, and (ii) two rescaled directional components of the ray along an axis other than the long axis.

The processing logic of the ray tracing system in fig. 1 may be further configured to: the minimum and maximum distance components are rescaled and transitioned based on the transition location and the direction of rescaling of the ray. The processing logic may be further configured to: the ray representation is converted into (quantized) ray identifiers by generating a data packet of the ray representation, which data packet comprises data indicating the long axis of the ray, two (quantized) position components of the transition position and two (quantized) rescaled direction components.

Similarly, the method described in FIG. 10 may be performed by processing logic included within the ray tracing system of FIG. 1. That is, the ray tracing system of FIG. 1 may be configured to convert a ray representation into ray data for a ray, wherein the ray representation is a compressed representation of the ray data, and comprises: (i) two positional components of the transition position of the light, (ii) two directional components of the light, and (iii) an indication of the long axis of the light. The processing logic of the system may be configured to insert a third position component of the ray according to the indication of the long axis of the ray, wherein the value of the third position component is 0, and to insert a third direction component of the ray according to the indication of the long axis of the ray, wherein the magnitude of the third direction component is 1.

The ray tracing system of fig. 1 may also include a memory, such as the memory shown in fig. 3. The memory of the ray tracing system may be a cache. This cache may be referred to as a likely hit cache. Where the ray tracing system includes a memory such as that shown in FIG. 3, the ray representations generated by the system may be used to store an indication of the ray in the cache. The ray tracing system may be further configured to retrieve data from the cache for an intersection test, the intersection test for rendering an image of the scene. In other words, the ray representations generated by the ray tracing system may be used to perform ray tracing operations.

While the methods of fig. 9 and 10 are described herein as being used with the memory shown in fig. 3, it should be understood that compression and decompression techniques may have other applications. An exemplary purpose of ray property compaction is to reduce storage requirements whenever ray data is stored in memory, particularly local memory in a ray tracing system (e.g., by making the cache narrower because the width of the ray data is reduced, or by making the cache shorter because more rays can be packed into a single cache line). Another objective of ray property compaction is to simplify arithmetic in ray and geometric intersection testing by reducing or simplifying arithmetic operations. This can be achieved in three ways. First, rays having a unit directional component and/or zero position component may reduce the number of multipliers and/or adders, respectively, required for intersection testing of ray data. Second, fixed point formats (e.g., for directional components) may reduce hardware complexity because fixed point arithmetic is generally cheaper than floating point arithmetic for the same bit width. Third, reducing the number of bits of the light component may reduce the width required for any arithmetic operations (e.g., multipliers, adders, etc.), thereby reducing the overall area cost.

FIG. 12 illustrates a computer system in which a processing system described herein may be implemented. The computer system may be or include a ray tracing system as shown in fig. 1. The computer system includes a CPU 1102, a GPU 1104, a memory 1106, a Neural Network Accelerator (NNA) 1108, and other devices 1114, such as a display 1116, speakers 1118, and a camera 1122. Processing block 1110 is implemented on CPU 1102. In other examples, one or more of the depicted components may be omitted from the system and/or the processing block 1110 may be implemented on the GPU 1104 or within the NNA 1108. The components of the computer system may communicate with each other via a communication bus 1120. Storage 1112 is implemented as part of memory 1106. The computing systems of fig. 1-12 are shown as including a plurality of functional blocks. This is merely illustrative and is not intended to limit the strict division between the different logic elements of such entities. Each of the functional blocks may be provided in any suitable manner. It should be appreciated that intermediate values formed by the computing system described herein need not be physically generated by the computing system at any point in time, and may represent only logical values that conveniently describe the processing performed by the computing system between its inputs and outputs.

The computing systems described herein may be embodied in hardware on an integrated circuit. The computing systems described herein may be configured to perform any of the methods described herein. In general, any of the functions, methods, techniques, or components described above may be implemented in software, firmware, hardware (e.g., fixed logic circuitry) or any combination thereof. The terms "module," "functionality," "component," "element," "unit," "block," and "logic" may be used herein to generally represent software, firmware, hardware, or any combination thereof. In the case of a software implementation, the module, functionality, component, element, unit, block or logic represents program code that performs specified tasks when executed on a processor. The algorithms and methods described herein may be executed by one or more processors executing code that causes the processors to perform the algorithms/methods. Examples of a computer-readable storage medium include Random Access Memory (RAM), read-only memory (ROM), optical disks, flash memory, hard disk memory, and other memory devices that can store instructions or other data using magnetic, optical, and other techniques and that can be accessed by a machine.

The terms computer program code and computer readable instructions as used herein refer to any kind of executable code for a processor, including code expressed in a machine language, an interpreted language, or a scripting language. Executable code includes binary code, machine code, byte code, code defining an integrated circuit (such as a hardware description language or netlist), and code expressed in programming language code such as C, java or OpenCL. The executable code may be, for example, any kind of software, firmware, script, module, or library that, when properly executed, handled, interpreted, compiled, run in a virtual machine or other software environment, causes the processor of the computer system supporting the executable code to perform the tasks specified by the code.

A processor, computer, or computer system may be any kind of device, machine, or special purpose circuit, or a collection or region thereof, having processing capabilities such that instructions can be executed. The processor may be or include any kind of general purpose or special purpose processor, such as CPU, GPU, NNA, a system on a chip, a state machine, a media processor, an Application Specific Integrated Circuit (ASIC), a programmable logic array, a Field Programmable Gate Array (FPGA), or the like. The computer or computer system may include one or more processors.

The present invention is also intended to cover software defining a configuration of hardware as described herein, such as HDL (hardware description language) software, as used for designing integrated circuits or for configuring programmable chips to perform desired functions. That is, a computer readable storage medium may be provided having encoded thereon computer readable program code in the form of an integrated circuit definition data set that, when processed (i.e., run) in an integrated circuit manufacturing system, configures the system to manufacture a computing system configured to perform any of the methods described herein, or to manufacture a computing system comprising any of the devices described herein. The integrated circuit definition data set may be, for example, an integrated circuit description.

Accordingly, a method of manufacturing a computing system as described herein at an integrated circuit manufacturing system may be provided. Furthermore, an integrated circuit definition data set may be provided that, when processed in an integrated circuit manufacturing system, causes a method of manufacturing a computing system to be performed.

The integrated circuit definition data set may be in the form of computer code, for example, as a netlist, code for configuring a programmable chip, as a hardware description language defining hardware suitable for fabrication at any level in an integrated circuit, including as Register Transfer Level (RTL) code, as a high-level circuit representation (such as Verilog or VHDL), and as a low-level circuit representation (such as OASIS (RTM) and GDSII). A higher-level representation, such as RTL, logically defining hardware suitable for fabrication in an integrated circuit may be processed at a computer system configured to generate a fabrication definition of the integrated circuit in the context of a software environment that includes definitions of circuit elements and rules for combining the elements to generate a fabrication definition of the integrated circuit so defined by the representation. As is typically the case when software is executed at a computer system to define a machine, one or more intermediate user steps (e.g., providing commands, variables, etc.) may be required to configure the computer system to generate a manufacturing definition for an integrated circuit to execute code that defines the integrated circuit to generate the manufacturing definition for the integrated circuit.

An example of processing an integrated circuit definition data set at an integrated circuit manufacturing system to configure the system as a manufacturing computing system will now be described with reference to fig. 12.

Fig. 12 illustrates an example of an Integrated Circuit (IC) fabrication system 1202 configured to fabricate a computing system as described in any of the examples herein. In particular, IC fabrication system 1202 includes layout processing system 1204 and integrated circuit generation system 1206.IC fabrication system 1202 is configured to receive an IC definition data set (e.g., defining a computing system as described in any of the examples herein), process the IC definition data set, and generate an IC from the IC definition data set (e.g., embodying the computing system as described in any of the examples herein). The processing of the IC definition data set configures the IC fabrication system 1202 to fabricate an integrated circuit embodying the computing system as described in any of the examples herein.

Layout processing system 1204 is configured to receive and process the IC definition data set to determine a circuit layout. Methods of determining a circuit layout from an IC definition dataset are known in the art and may involve, for example, synthesizing RTL codes to determine a gate level representation of a circuit to be generated, for example in terms of logic components (e.g., NAND, NOR, AND, OR, MUX and FLIP-FLOP components). By determining the location information of the logic components, the circuit layout may be determined from the gate level representation of the circuit. This may be done automatically or with the participation of a user in order to optimize the circuit layout. When the layout processing system 1204 has determined a circuit layout, the layout processing system may output the circuit layout definition to the IC generation system 1206. The circuit layout definition may be, for example, a circuit layout description.

As is known in the art, an IC generation system 1206 generates ICs from circuit layout definitions. For example, the IC generation system 1206 may implement a semiconductor device fabrication process that generates ICs, which may involve a multi-step sequence of photolithography and chemical processing steps during which electronic circuits are built up on wafers made of semiconductor material. The circuit layout definition may be in the form of a mask that may be used in a lithographic process for generating an IC from the circuit definition. Alternatively, the circuit layout definitions provided to the IC generation system 1206 may be in the form of computer readable code that the IC generation system 1206 may use to form a suitable mask for generating the IC.

The different processes performed by IC fabrication system 1202 may all be implemented in one location, e.g., by a party. Alternatively, IC fabrication system 1202 may be a distributed system, such that some processes may be performed at different locations and by different parties. For example, some of the following phases may be performed at different locations and/or by different parties: (i) Synthesizing an RTL code representing the IC definition dataset to form a gate level representation of the circuit to be generated; (ii) generating a circuit layout based on the gate level representation; (iii) forming a mask according to the circuit layout; and (iv) using the mask to fabricate the integrated circuit.

In other examples, processing the integrated circuit definition data set at the integrated circuit manufacturing system may configure the system to manufacture the computing system without processing the integrated circuit definition data set to determine the circuit layout. For example, an integrated circuit definition dataset may define a configuration of a reconfigurable processor such as an FPGA, and processing of the dataset may configure the IC manufacturing system to generate (e.g., by loading configuration data into the FPGA) the reconfigurable processor having the defined configuration.

In some embodiments, the integrated circuit manufacturing definition data set, when processed in the integrated circuit manufacturing system, may cause the integrated circuit manufacturing system to generate an apparatus as described herein. For example, an apparatus as described herein may be manufactured by configuring an integrated circuit manufacturing system in the manner described above with reference to fig. 12 through an integrated circuit manufacturing definition dataset.

In some examples, the integrated circuit definition dataset may contain software running on or in combination with hardware defined at the dataset. In the example shown in fig. 12, the IC generation system may be further configured by the integrated circuit definition data set to load firmware onto the integrated circuit in accordance with the program code defined at the integrated circuit definition data set at the time of manufacturing the integrated circuit or to otherwise provide the integrated circuit with the program code for use with the integrated circuit.

Specific implementations of the concepts set forth in the present application in devices, apparatuses, modules, and/or systems (and in methods implemented herein) may provide improved performance over known specific implementations. Performance improvements may include one or more of increased computational performance, reduced latency, increased throughput, and/or reduced power consumption. During the manufacture of such devices, apparatuses, modules and systems (e.g., in integrated circuits), a tradeoff may be made between performance improvements and physical implementation, thereby improving the manufacturing method. For example, a tradeoff can be made between performance improvement and layout area, matching the performance of a known implementation, but using less silicon. This may be accomplished, for example, by reusing the functional blocks in a serial fashion or sharing the functional blocks among elements of a device, apparatus, module, and/or system. Rather, the concepts described herein that lead to improvements in the physical implementation of devices, apparatus, modules, and systems (e.g., reduced silicon area) can be weighed against performance improvements. This may be accomplished, for example, by fabricating multiple instances of the module within a predefined area budget.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims

1. A computer-implemented method for converting ray data of a ray into a ray representation, wherein the ray representation is a compressed representation of the ray data, and wherein the ray data includes three directional components and three location components of the ray, the method comprising:

2. The method of claim 1, wherein the ray representation includes exactly two directional components and exactly two location components.

3. The method of claim 1 or claim 2, wherein the light representation further comprises an indication of the long axis.

4. The method of claim 3, wherein the indication of the long axis comprises two bits.

5. The method of any of the preceding claims, wherein the ray data further comprises a minimum distance component and a maximum distance component, and the method further comprises rescaling the minimum distance component and the maximum distance component based on the transition position and based on rescaling of the three directional components of the ray.

6. A method according to any one of the preceding claims, wherein the three directional components of the light ray are rescaled such that the directional component along the long axis has a value of +1.

7. The method of any of the preceding claims, further comprising converting the ray representation into a quantized ray identifier by generating a data packet of the ray representation, the data packet comprising data indicative of the principal axis of the ray, the two position components of the transition position, and the two rescaled direction components.

8. The method of any preceding claim, wherein the quantized ray identifier has a fixed bit width.

9. The method of claim 7 or 8, wherein the data in the data packet of the quantized ray identifier includes no more than three bits to indicate each of the two rescaled directional components along the axis that is not the long axis.

10. The method of any of claims 7 to 9, wherein the data in the data packet of the quantized ray identifier comprises no more than five bits to indicate each of the two position components of the transition location along the axis that is not the long axis.

11. The method of any of claims 7 to 10, wherein the quantized ray identifier identifies a collection of rays, each ray in the collection of rays comprising similar location and direction components.

12. The method of any of claims 7 to 11, further comprising generating a hash of the quantized ray identifier to represent the ray representation.

13. The method of claim 12, wherein generating the hash comprises performing a logical exclusive-or operation on bits of the quantized ray identifier to reduce a number of bits of the quantized ray identifier.

14. The method of claim 12 or 13, wherein the hash comprises eight bits.

15. The method of any of the preceding claims, wherein the ray representation is used to store an indication of the ray in a cache, the cache being used to store data for intersection testing, the data being used by the ray tracing system to render an image of a scene.

16. A computer system for converting ray data of a ray into a ray representation, wherein the ray representation is a compressed representation of the ray data, wherein the ray data includes three directional components and three location components of the ray, the computer system comprising processing logic configured to:

17. The computer system of claim 16, wherein the computer system is a ray tracing system.

18. A method of manufacturing a computer system as claimed in claim 16 or 17 using an integrated circuit manufacturing system.

19. A computer readable storage medium having computer readable code stored thereon, the computer readable code being configured to cause the method of any of claims 1 to 15 to be performed when the code is run.

20. A computer readable storage medium having stored thereon an integrated circuit definition data set that, when processed in an integrated circuit manufacturing system, configures the integrated circuit manufacturing system to manufacture the computer system of claim 16 or 17.