CN115861518B

CN115861518B - Ray intersection testing using quantization and interval representation

Info

Publication number: CN115861518B
Application number: CN202211113266.8A
Authority: CN
Inventors: C·A·伯恩斯
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2021-09-24
Filing date: 2022-09-14
Publication date: 2023-12-08
Anticipated expiration: 2042-09-14
Also published as: CN115861518A; GB202318608D0; DE102022122793A1; CN117593439A; GB202212912D0; GB2612681A; TW202403671A; TWI822330B; TW202314645A; GB2612681B; KR20230043717A

Abstract

Techniques related to primitive intersection testing for ray tracing in a graphics processor are disclosed. In some embodiments, the graphics processor includes ray intersection circuitry configured to perform intersection tests including: quantizing the first representation of the primitive to generate a reduced-precision interval representation of the primitive; quantizing the first representation of the ray to generate a reduced precision interval representation of the ray; and determining an initial intersection result based on the coordinates of the interval representation of the primitive and the coordinates of the interval representation of the ray using an interval algorithm. The initial intersection result may be a conservative result such that a miss indicated by the initial intersection result is guaranteed not to be a hit of the first representation of the primitive and the first representation of the ray. The disclosed techniques may improve performance, reduce power consumption, or both, relative to conventional techniques.

Description

Ray intersection testing using quantization and interval representation

The present application claims priority from U.S. provisional patent application No. 63/248,143 filed on 24, 9, 2021, which is incorporated herein by reference in its entirety.

Background

Technical Field

The present disclosure relates generally to graphics processors, and more particularly to primitive intersection testing for ray tracing.

Description of related Art

In computer graphics, ray tracing is a rendering technique for generating images by tracing the path of light as pixels in an image plane and simulating the effect that it encounters a virtual object. Ray tracing may allow resolution of visibility in three dimensions between any two points in a scene, which is also a source of most of its computational cost. A typical ray tracker samples the path of light through the scene in the opposite direction of light propagation, starting from the camera and propagating into the scene, rather than from the light source (this is sometimes referred to as "backward ray tracking"). Starting from the camera has the beneficial effect of tracking only the light rays visible to the camera. The system may model a rasterizer in which light simply stops at a first surface and a shader (similar to a fragment shader) is invoked to calculate a color. More common secondary effects (where the illumination exchange between scene elements, such as diffuse internal reflection and transmission) are also modeled. A shader that evaluates the reflective properties of a surface may invoke additional intersection queries (e.g., generating new rays) to capture incident illumination from other surfaces. This recursive process has many expressions, but is often referred to as path tracking.

Graphics processors implementing ray tracing typically provide more realistic scenes and lighting effects relative to traditional rasterization systems. Ray tracing, however, is generally computationally expensive. Improvements to ray tracing techniques may improve realism in a graphics scene, improve performance (e.g., allow more rays to be traced per frame, in a more complex scene, or both), reduce power consumption (which may be particularly important in battery powered devices), etc.

Ray intersection queries may be performed by shaders, dedicated hardware, or a combination of both. Different types of intersecting queries may provide different types of information. For example, a "closest hit" query may locate the geometry of the closest intersection along the ray and within the parameter interval in which the ray is valid (this may be the most common type of query). The "any hit" query may indicate whether there are any intersecting geometries along the ray and within the parameter interval. This type of query may be used for shadow rays, for example, to determine whether a point in a scene has visibility or is occluded from light. Once the intersection geometry is determined, the geometry may be colored based on the intersection, and further rays may then be generated, for example, from the reflective surface for intersection testing.

Motion blur is a phenomenon that occurs when an image being recorded changes during recording of a single exposure. For example, a photograph of a moving freight train with a sufficiently long exposure time may show that the train is blurred, while non-moving objects are not. In a computer graphics context, a graphics processor may simulate the motion blur effect of a frame of graphics data. In this context, an animated graphics primitive (e.g., triangle) may be modeled with a plurality of different positions during an open shutter interval (also referred to herein as a motion blur interval) of a virtual camera, and thus may affect pixel values at a plurality of positions in a frame to cause a blur effect.

Each ray is typically assigned an accurate time stamp, e.g. within a motion blur interval. When implementing both ray tracing and motion blur, testing of ray/primitive intersections can be expensive in terms of processor resources and power consumption.

Drawings

Fig. 1A is a diagram illustrating an overview of exemplary graphics processing operations, according to some embodiments.

Fig. 1B is a block diagram illustrating an exemplary graphical unit according to some embodiments.

Fig. 2A is a block diagram illustrating an exemplary low precision test circuit according to some embodiments.

Fig. 2B is a block diagram illustrating an exemplary intersection testing technique according to some embodiments.

FIG. 3 is a diagram illustrating an exemplary interval representation of various values used in an initial intersection test, according to some embodiments.

Fig. 4 is a diagram illustrating an exemplary interpolation circuit configured to generate an interval representation of interpolation primitives in a motion blur interval, in accordance with some embodiments.

Fig. 5 is a block diagram illustrating an exemplary shear factor circuit configured to generate a shear factor interval, according to some embodiments.

Fig. 6 is a diagram illustrating an exemplary circuit configured to translate and clip vertices using a clipping factor interval, according to some embodiments.

Fig. 7 is a circuit diagram illustrating an exemplary circuit configured to generate initial intersection test results, according to some embodiments.

Fig. 8 is a diagram illustrating an exemplary circuit configured to generate a modified interval product in accordance with some embodiments.

Fig. 9 is a diagram illustrating an exemplary triangle pair and sequence pair processing circuit in accordance with some embodiments.

FIG. 10 is a diagram illustrating an exemplary boundary of a quantized primitive representation and regions of deterministic hits in accordance with some embodiments.

FIG. 11 is a diagram illustrating an exemplary test circuit configured to generate a hit or non-deterministic output according to some embodiments.

Fig. 12 is a circuit diagram illustrating an exemplary circuit configured to generate initial intersection test results, according to some embodiments.

Fig. 13 is a diagram illustrating an exemplary primitive test sequence according to different orderings (including ordering from the middle), in accordance with some embodiments.

Fig. 14 is a flow chart illustrating an exemplary method according to some embodiments.

Fig. 15 is a flow chart illustrating another exemplary method according to some embodiments.

Fig. 16 is a block diagram illustrating an exemplary computing device, according to some embodiments.

Fig. 17 is a diagram illustrating an exemplary application of the disclosed systems and devices according to some embodiments.

Fig. 18 is a block diagram illustrating an exemplary computer-readable medium storing circuit design information, according to some embodiments.

Detailed Description

In the disclosed embodiment of the invention, a lower precision hardware triangle test is performed first as a filter, and if the lower precision test determines a potential hit, a higher precision triangle test is performed. Such a low-precision test may be conservative (e.g., it may generate false hits but should not). U.S. patent application Ser. No. 17/136,542, filed on 12/29/2020, and entitled "Primitive Testing for Ray Intersection at Multiple Precisions," is incorporated herein by reference in its entirety. The' 542 patent application describes exemplary techniques for testing at different accuracies, and how potential errors due to quantization of inputs can be tracked throughout the reduced accuracy test to ensure that the results are conservative.

The present disclosure uses a spacing algorithm to track and limit potential quantization errors for hardware primitive tests using quantization of one or more inputs. In some embodiments, the disclosed techniques may advantageously provide a more stringent margin of error than the embodiments of the' 542 patent application. Additionally, in some implementations, the disclosed techniques may use reduced circuit area to perform primitive testing with a particular accuracy.

In addition, the disclosed embodiments discussed in detail below generate interpolated spatial coordinate intervals to represent moving triangles for conservative intersection testing for a given ray time in a motion blur interval. In addition, the disclosed techniques provide efficient encoding and processing techniques for moving and non-moving triangle pairs.

Still further, the disclosed techniques may use lower precision intersection tests to provide deterministic hit results without performing intersection tests with original precision (e.g., for "any hit" rays).

Finally, the presently disclosed techniques for accelerating the traversal ordering of data structures (e.g., ordering "from the middle" rather than front-to-back or back-to-front) may improve performance, reduce power consumption, or both for traversal of certain types of rays.

Overview of graphics processing

Referring to FIG. 1A, a flow chart illustrating an exemplary process flow 100 for processing graphics data is shown. In some implementations, the transformation and illumination process 110 may involve processing illumination information for vertices received from an application based on defined light source locations, reflectivities, etc., assembling the vertices into polygons (e.g., triangles), and converting the polygons to the correct sizes and orientations based on locations in three-dimensional space. Clipping process 115 may involve discarding polygons or vertices outside the viewable area. The rasterization process 120 may involve defining segments within each polygon and assigning an initial color value to each segment, e.g., based on texture coordinates of polygon vertices. Fragments may specify attributes of pixels that they overlap, but actual pixel attributes may be determined based on combining multiple fragments (e.g., in a frame buffer), ignoring one or more fragments (e.g., if they are covered by other objects), or both. The shading process 130 may involve changing the pixel components based on lighting, shading, bump mapping, translucency, and the like. The colored pixels may be assembled in the frame buffer 135. Modern GPUs typically include programmable shaders that allow application developers to customize shading and other processes. Thus, in various embodiments, the example elements of fig. 1A may be performed in various sequences, in parallel, or omitted. Additional processes may also be implemented.

Referring now to FIG. 1B, a simplified block diagram of an exemplary graphics unit 150 is shown, according to some embodiments. In the illustrated embodiment, the graphics unit 150 includes a programmable shader 160, a vertex pipe 185, a fragment pipe 175, a Texture Processing Unit (TPU) 165, an image writing unit 170, and a memory interface 180. In some implementations, graphics unit 150 is configured to process both vertex data and fragment data using programmable shader 160, which may be configured to process graphics data in parallel using multiple execution pipelines or instances.

In the illustrated embodiment, the vertex pipe 185 may include various fixed function hardware configured to process vertex data. The vertex pipe 185 may be configured to communicate with the programmable shader 160 to coordinate vertex processing. In the illustrated embodiment, the vertex pipe 185 is configured to send the processed data to the fragment pipe 175 or the programmable shader 160 for further processing.

In the illustrated embodiment, fragment tube 175 may include various fixed-function hardware configured to process pixel data. Fragment pipe 175 may be configured to communicate with programmable shader 160 to coordinate fragment processing. The fragment tube 175 may be configured to perform rasterization on polygons from the vertex tube 185 or the programmable shader 160 to generate fragment data. The vertex tube 185 and fragment tube 175 may be coupled to a memory interface 180 (coupling not shown) to access graphics data.

In the illustrated embodiment, the programmable shader 160 is configured to receive vertex data from the vertex pipe 185 and segment data from the segment pipe 175 and the TPU 165. Programmable shader 160 may be configured to perform vertex processing tasks on vertex data, which may include various transformations and adjustments of vertex data. In the illustrated embodiment, the programmable shader 160 is also configured to perform fragment processing tasks on pixel data, such as, for example, texture and shading processing. Programmable shader 160 may include multiple sets of multiple execution pipelines for processing data in parallel.

In some embodiments, the programmable shader includes a pipeline configured to execute one or more different SIMD groups in parallel. Each pipeline may include various stages configured to perform operations (such as fetch, decode, issue, execute, etc.) in a given clock cycle. The concept of a processor "pipeline" is well understood and refers to the concept of dividing the "work" that a processor performs on instructions into multiple stages. In some embodiments, the decoding, dispatching, executing (i.e., fulfilling), and retirement of instructions may be examples of different pipeline stages. Many different pipeline architectures may have different ordering of elements/portions. Various pipeline stages perform such steps on instructions during one or more processor clock cycles, and then pass the instructions or operations associated with the instructions to other stages for further processing.

The term "SIMD group" is intended to be interpreted in accordance with its well-known meaning, including a set of threads for which processing hardware processes the same instruction in parallel using different input data for different threads. Various types of computer processors may include a set of pipelines configured to execute SIMD instructions. For example, graphics processors typically include a programmable shader core configured to execute instructions for a set of related threads in SIMD fashion. Other examples of names that may be used for SIMD groups include: wavefront, clique or warp. SIMD groups may be part of a larger thread group that may be split into multiple SIMD groups based on the parallel processing capabilities of the computer. In some embodiments, each thread is assigned to a hardware pipeline that fetches operands for the thread and performs specified operations in parallel with other pipelines of the set of threads. Note that a processor may have a large number of pipelines so that multiple separate SIMD groups may also execute in parallel. In some implementations, each thread has a private operand store, for example, in a register file. Thus, reading a particular register from a register file may provide a version of the register for each thread in the SIMD group.

In some implementations, a plurality of programmable shader units 160 are included in the GPU. In these implementations, the global control circuitry may assign work to different sub-portions of the GPU, which in turn may assign work to the shader cores for processing by the shader pipeline.

In the illustrated embodiment, the TPU 165 is configured to schedule fragment processing tasks from the programmable shader 160. In some embodiments, TPU 165 is configured to prefetch texture data and assign an initial color to a segment for further processing by programmable shader 160 (e.g., via memory interface 180). The TPU 165 may be configured to provide fragment components, for example, in a normalized integer format or a floating point format. In some embodiments, TPU 165 is configured to provide a quad ("fragment quad") fragment in a 2 x 2 format that is pipelined by a set of four in programmable shader 160.

In some embodiments, the Image Writing Unit (IWU) 170 is configured to store processed tiles of the image and may perform operations on the rendered image before transmitting them for display or to memory for storage. In some implementations, the graphics unit 150 is configured to perform partitioned delayed rendering (TBDR). In a tiled rendering, different portions of screen space (e.g., squares or rectangles of pixels) can be processed separately. In various embodiments, the memory interface 180 may facilitate communication with one or more of various memory hierarchies.

In the illustrated example, the graphics unit 150 includes a Ray Intersection Accelerator (RIA) 190, which may include hardware configured to perform various ray intersection operations, as described in detail below.

Interval-based intersection test overview

Fig. 2A is a block diagram illustrating an exemplary quantization circuit and a low precision intersection test circuit according to some embodiments. In the illustrated embodiment, the graphics processor includes test circuitry 220.

In some embodiments, the quantization circuit is configured to quantize the light data and generate a spaced representation of the quantized values. In various embodiments, the upper and lower limits of the generated interval are represented using a lower precision than the input representation, but the interval is guaranteed to cover the initial value in the input precision. Note that primitive data may also be stored in a quantization interval format (e.g., to accelerate data structure).

In the illustrated embodiment, the interval algorithm-based low precision test circuit 220 is configured to generate conservative intersection results by performing an interval algorithm on the interval representation. Conservative intersection results may ensure that misses signaled by circuit 220 do not result in hits for higher precision intersection tests (e.g., operating on values with input precision prior to quantization). In these embodiments, a positive output from circuit 220 indicates a potential hit.

In various implementations, performing lower precision initial intersection tests may advantageously improve performance, reduce power consumption, or both, relative to conventional techniques. In particular, misses or deterministic hits generated by the initial test may avoid the need to perform higher precision tests for a given ray and primitive. Thus, both improving the accuracy of the test (e.g., by tightening the error margin) and improving the performance or power consumption of the initial test itself may have technical advantages.

Fig. 2B is a flow chart illustrating a general exemplary intersection testing technique, according to some embodiments. In the illustrated embodiment, element 210 converts the light direction into a lower precision floating point interval representation. Element 230 determines a shear factor based on the quantized frame transform (for quantization of vertices, as discussed in detail below), and element 244 converts the shear factor to a fixed-point interval representation. Element 242 also generates a fixed point interval representation of the ray origin based on the quantized frame transform. Element 246 generates a fixed point interval representation of the ray time. For motion blur processing, element 250 interpolates the quantized triangle vertices temporally based on ray time (this element may be omitted or may pass directly through the quantized triangle vertices when no motion blur operation is performed). Element 260 transforms the vertices according to the shear factor and ray origin, and element 270 evaluates the edge equations to determine whether there is a miss or potential hit. The various elements of fig. 2B are explained in further detail below. The particular operations of fig. 2B are included for illustrative purposes and are not intended to limit the scope of the present disclosure. However, in some embodiments, the disclosed operations may advantageously use reasonable circuit area and power consumption to provide close spacing.

Exemplary quantization interval representation of intersection test values

FIG. 3 is a diagram illustrating an exemplary interval representation of various values used in an initial intersection test, according to some embodiments. In the illustrated example, the interval is generated for vertex position, ray origin, direction and time, shear factor, and interpolation triangle vertices. It should be noted that these particular interval values are discussed for illustrative purposes and are not intended to limit the scope of the present disclosure. In other embodiments, the interval may be used to represent any of a variety of values used in determining the initial intersection result.

In the illustrated embodiment, for each quantized vertex position (e.g., for each of the three vertices of a triangle), three respective intervals are determined for the X-dimension, the Y-dimension, and the Z-dimension. A similar interval is determined for the ray origin and ray direction. In some embodiments supporting motion blur, the upper and lower limits of ray time are also determined.

In some embodiments that use clipping as part of the ray-triangle intersection test, an upper limit and a lower limit are determined for two clipping factors in the non-principal coordinate directions of the ray.

In some implementations supporting motion blur, the graphics processor determines X, Y and Z intervals for each vertex for an interpolation triangle corresponding to ray times within the motion blur interval. FIG. 4 is discussed in detail below and provides an exemplary technique for generating an interval representation of an interpolated triangle. Generally, more detailed techniques for determining various specific intervals are discussed in detail below.

As discussed in detail below, the data structure may represent triangles, shifted triangles, pairs of shifted triangles, or some combination thereof. In some embodiments, three vertices are used to represent triangles, six vertices are used to represent moving triangles, four vertices are used to represent triangle pairs, and eight vertices are used to represent moving triangle pairs.

In some embodiments, the quantized triangle coordinates are stored as unsigned integer values with defined point precision and rounded to zero. These coordinates may correspond to a local coordinate system recorded in the acceleration data structure ADS, for example, as discussed in the' 542 patent application. The quantized value may be an N-bit value. In some embodiments, each coordinate value uses a number of bits that facilitate packing within a field of a particular size. As one example, the 7-bit value of each quantized coordinate interval value of a single triangle may be packed into two 64-bit fields (x up/down, y up/down and z up/down 7-bit value = 126 bits for each of the three vertices). In other embodiments, fixed point coding using various suitable numbers of bits may be utilized. In some embodiments, unsigned values are converted into a new coordinate system, where the values become signed integers. Note that in some cases, only one boundary of an interval may be stored, while the other boundary may be implicit. This may reduce the storage requirements for certain parts of the processor.

In this context, if p is a quantized value of a triangle coordinate, the interval representing the coordinate in the local quantized coordinate space isIn some casesIn an embodiment δp represents one minimum precision (ULP) unit in the quantization format. The original coordinate values before quantization are guaranteed to lie within this interval. For an N-bit fixed point representation +.>Generally, the amount of obstruction discussed herein refers to spacing.

Thus, a given non-moving triangle may be encoded using nine values (three vertices, each vertex having three lower coordinate limits, with the upper limit implicitly one ULP greater than the lower limit).

In some embodiments, the mobile triangle is stored as two (or more) sets of coordinates, such as position p (0) at time t=0 and position p (1) at time t=1. This may define a normalized time interval [0,1 ]]Is provided. Note that multiple linear movements during the sub-interval may also be used to encode non-linear movements over a larger motion blur interval. In this case, the movement triangle may include more than two sets of coordinates. The moving triangle coordinates at time t may use the intervalTo represent.

In some embodiments, the ray time is quantified as intervals of lower precision as part of a low precision intersection test Where t is encoded with M bits of sub-interval resolution (e.g., at 2 ^M Implicitly set 1.0). M may or may not correspond to the number of bits N used to represent the spatial coordinates of the triangle (or the number of bits used to represent the spatial coordinates of the ray). As with other quantization intervals, it is guaranteed that the original high-precision value is found within the low-precision interval. In some embodiments, time is a fourth coordinate axis independent of other coordinates such as x, y, and z.

Exemplary interval-based motion blur processing

In some embodiments, the interval interpolation circuitIs configured to reconstruct the conservative spatial interval to at the quantized time interval of the rayAnd moving the triangle coordinates upwards. Fig. 4 is a diagram illustrating an exemplary interpolation circuit configured to generate an interval representation of interpolation primitives in a motion blur interval, in accordance with some embodiments. The circuit 410 may perform the operations discussed above with reference to the element 250 of fig. 2B.

In the illustrated embodiment, interpolation circuit 410 is configured to receive the interval representation of ray time and the interval representation of the moving triangle (e.g., the x, y, and z intervals for each of the six vertices), and to generate an interval representation of the interpolation triangle (e.g., the x, y, and z intervals for each of the three vertices).

As one example, the circuitry 410 may determine the interpolated spatial coordinate interval as:

using the symbol p ⁰ =p (0) and p ¹ =p (1), circuit 410 will guarantee to cover any t e [0,1- δt ]]Quantized time interval [ t, t+δt ]]Is defined by the interpolation position coordinate interval of (a)The method comprises the following steps:

wherein the method comprises the steps of

z＝p ⁰ (1-t-δt)+p ¹ t

In various embodiments, the equation may provide a good fit with reasonable performance and circuit area. In addition, the interval provided by this equation has been determined to be conservative.

In some embodiments, the circuit 410 is configured to determine from the equationSpacing. Note that in other embodiments, other equations may be implemented by the circuit to determine the conservative interpolation triangle spacing; the equations disclosed herein are included for illustrative purposes and are not intended to limit the scope of the present disclosure.

In various implementations, the interpolation triangle interval may be tested using an initial low-precision intersection test, at least in a motion blur mode of operation. Thus, the various primitive inputs discussed below may be used for conventional triangles or for interpolating triangles, e.g., depending on whether motion blur is utilized. In addition, although the various techniques discussed herein use a spacing algorithm; the disclosed interpolation triangle technique for motion blur may also be used with other quantized representations and techniques (e.g., the technique of the' 542 patent application).

Exemplary shear factor determination

As discussed in the' 542 patent application, shearing techniques may be used to implement intersection testing. In the following discussion, the following naming convention is employed:

p ray origin, floating point object space

p ray origin, fixed point quantization space

Ray direction, floating point object space

v ^v Triangle vertex coordinates, fixed point quantization space

In some embodiments, the transformation into 2D shear space is given by:

to perform these calculations with a fixed point algorithm, the device may convert the object space ray amounts P and D into quantization spaces P and D according to the following:

before proceeding further, the device may determine which axis of the scaled ray direction has the greatest magnitude, and rotate the axis name such that the longest axis is at the third location ("z"). In addition, if the directional component is negative, the device may replace the other two axes to maintain handedness. For the following discussion, it is assumed that this renaming has been applied to all Cartesian quantities.

Substituting it into equation 1 and simplifying it yields:

in the context of the disclosed spacing technique, the various values represented in equation 4 are spacing representations, as discussed above. Once in the 2D clipping space, the ray position is restored to the origin of the coordinate system, which is oriented in alignment with the z-axis, wherein the device can test for three directed edges of a 2D triangle represented by three clipping coordinates v 'e { a', B ', C' }, according to the following conditions:

u＝A′ _x ·B′ _y -A′ _y ·B′ _x

v＝B′ _x ·C′ _y -B′ _y ·C′ _x

w＝C′ _x ·A′ _y -C′ _y ·A′ _x

If u, v, w all have the same sign, then the triangle covers the origin and the ray intersects the triangle, thus within numerical accuracy.

Fig. 5 is a block diagram illustrating an exemplary shear factor circuit configured to generate a shear factor interval, according to some embodiments. In the illustrated embodiment, the shear factor circuitry (which may be included in the low precision test circuit 220) includes down-conversion circuitry 510A-510C, subtraction circuitry 520A-520B, reciprocal circuitry 530, space multiplication and scale adjustment circuitry 540A-540B, and floating-point to fixed-point space conversion circuitry 550A-550B. In some embodiments, the circuit of fig. 5 implements the functionality of element 230 of fig. 2B.

In the illustrated embodiment, the down-conversion circuit 510 is configured to convert the x, y, and z directions (after rotation, such that the longest axis is the z direction) into floating point interval representations of reduced precision. In some implementations, the down-conversion Rounds To Negative Infinity (RTNI) to generate a lower interval boundary and Rounds To Positive Infinity (RTPI) to generate an upper interval boundary.

In the illustrated embodiment, subtracting circuits 520 are each configured to subtract the x and y scale values from the z scale value to generate S in the unsigned integer representation _z /S _x And S is _z /S _y Is a result of unsigned division of (a). In some implementations, the scale value is a power of two such that the subtraction of the exponent corresponds to division. These scale factors may be determined based on quantized frames of primitives. In general, a set of quantized values may share a "quantized frame" of parameters that define the values. In some embodiments, the quantized values are represented as fixed-point offsets relative to a common origin and scale factors. Thus, the quantized frame may specify an origin (e.g., in x, y, and z coordinates) and a scale factor (e.g., a scale factor that is a power of 2 for each of the z, y, and z dimensions). The quantized primitive intervals discussed herein may be represented using fixed-point coordinates explained in the context of quantized frames. Note that in the illustrated example, the output of circuit 520 is not an interval.

In the illustrated embodiment, reciprocal circuit 530 is configured to generate a reciprocal of the down-converted z-direction value.

In the illustrated embodiment, the interval product circuit and scale adjustment circuit 540 is configured to perform an interval product operation on its inputs to generate an output in a floating point interval format with reduced precision. In some embodiments, the circuits 540 are configured to clamp their outputs to the range [ -1,1]. In some implementations, circuit 540 also applies scaling from circuit 520 by multiplying the exponent adjustment by a power of two.

In the illustrated embodiment, the floating-point to fixed-point interval conversion circuit 550 is configured to convert the reduced precision floating-point interval representation to D _x S _z /D _z S _x And D _y S _z /D _z S _y The fixed point interval representation of the clipping factor (which is input to the circuit of fig. 6 discussed below).

Fig. 6 is a diagram illustrating an exemplary circuit configured to translate and clip vertices using a clipping factor interval, according to some embodiments. For example, FIG. 6 may use a spacing algorithm to implement the operation of equation (4) above. Fig. 6 may implement the operations discussed above with reference to element 260 of fig. 2B. In the illustrated embodiment, the circuitry receives vertex and ray position data in the form of intervals and is configured to perform interval subtraction and multiplication operations to generate translated and clipped vertices using the clipping factor interval generated by the circuitry of fig. 5. In some embodiments, each output of FIG. 6 is an interval, in FIG. 7, the lower limit of which may be used with a minus sign (e.g., a _y- ) Is represented and its upper limit may be represented using a plus sign (e.g., a _y+ ) To represent.

Fig. 7 is a block diagram illustrating an exemplary circuit configured to perform an initial reduced accuracy intersection test, according to some embodiments. In some embodiments, the circuit of fig. 7 implements the functionality of element 270 of fig. 2B. For example, FIG. 7 may perform operations corresponding to the above equations for u, v, w based on the output of FIG. 6 to generate intersection results. Note that the circuit of fig. 7 has some differences with respect to those equations. First, the circuit performs a comparison rather than a subtraction (e.g., A' _x ·B′ _y <A′ _y ·B′ _x Rather than A' _x ·B′ _y -A′ _y ·B′ _x ) As only symbols are needed. Second, in the illustrated embodiment, the circuit of FIG. 7 performs a double multiplication to provide a conservative test (e.g., consider only the "outer" portion of the edge spacing), but the circuit does not know which way is "outward" because it may be considering the triangle's timingNeedle or counter-clockwise. The circuit 710 is configured to generate a modified interval product and is discussed in detail below with reference to fig. 8.

The exemplary AND and OR logic of FIG. 7 provides a result indicating whether a reduced accuracy test provides a deterministic miss. As shown, six double-sided edge tests may use 12 multipliers and 6 comparators, all fixed-point. Note that the various circuits may be combined or merged, e.g., the adder and subtractor may be implemented by a single component performing both operations in parallel, and the multiplier and comparator may be merged to implement a single ab < = cd operation.

As discussed above, if there is a non-deterministic result (potential hit), the processor may perform a higher precision intersection test (e.g., using the original floating point representation).

Exemplary modified interval product

Typically, a signed interval product requires four multipliers, as defined:

in some embodiments, two multipliers are used to implement the space product. To fully resolve the sign of the interval product sum, we need to resolve the sign of both endpoints of each interval product accurately. This can be accomplished by using only two multipliers per interval product unless both interval inputs of the interval product span the origin. In this case, the hardware may cause an exception and the intersection test may record a potential hit. Empirical data suggests that such anomalies may be rare under typical workloads. Code list 1 uses only two hardware multipliers to implement the modified signed interval product.

/>

Fig. 8 illustrates one example of a circuit 810 configured to implement the modified signed interval product in accordance with some embodiments. In some embodiments, the circuit of fig. 8 is included in the corresponding element 710 of fig. 7. In this embodiment, routing circuit 810 is configured by four symbols of input to route operands to two multipliers, e.g., as set forth in code list 1. In this example, the circuit 810 is also configured to detect an abnormal condition.

Exemplary encoding and processing techniques for triangle pairs

Fig. 9 is a diagram illustrating an exemplary triangle pair and sequence pair processing circuit in accordance with some embodiments. As shown, triangle pair 910 is a group of two triangles sharing two vertices (vertex 1 and vertex 2 in the illustrated example). Thus, the two triangles may be defined by four vertices. Assuming triangle pairs are common in the various models, in some embodiments, the processor is configured to store triangles using a triangle data structure with four vertices, which may reduce storage requirements.

In some embodiments, the processor includes a sequential pair processing circuit 920 configured to sequentially perform one or more operations on the triangle pairs, e.g., processing one triangle in a pair before processing a second triangle in the pair. As one example, the operation may be an initial intersection result, but other circuits may use similar sequential techniques. This may provide efficient processing in implementations where the same triangle pair structure is used for all triangles but some structures may only have data for a single triangle. In these embodiments, if the data structure indicates that only one triangle is encoded, then sequential pair processing circuit 920 may skip the operation of the second triangle in the pair.

Exemplary deterministic hit detection Using lower precision intersection testing

In some implementations, intersection test circuitry operating on quantized inputs may still provide deterministic information about whether a line corresponding to a ray intersects a primitive, which may be useful for certain types of rays. Thus, referring back to the example of FIG. 7, modified compare circuitry (in addition to or instead of the circuitry of FIG. 7) may be implemented to provide results indicating whether a deterministic hit or a non-deterministic hit occurred.

FIG. 10 is a diagram illustrating an exemplary region surrounded by a quantized representation of a two-dimensional triangle primitive (e.g., clipped). In the illustrated example, edge 1010 shows a precise edge, for example, if represented according to the original precision. The outer boundary 1020 and the inner boundary 1030 show the boundaries of the quantized representation, for example using a space representation.

As shown, light rays falling in the region outside boundary 1020 are deterministic misses, for example, as can be detected by the circuit of fig. 7. The rays falling in the region between boundaries 1020 and 1030 are indeterminate (e.g., because the exact location within the region where the triangle edge falls is not known). Light falling in this area may require higher precision testing.

As shown, a ray in an area falling within boundary 1030 is a deterministic hit for a line corresponding to the ray. Note that the intersection detected by this test may not accurately indicate where a hit occurs, for example, due to quantization. In addition, the intersections detected by this test may only indicate hits on the line corresponding to the ray, e.g., due to quantization of the ray's effective interval.

However, in some implementations, it may be useful to determine a deterministic hit in the region within boundary 1030 even if there are limitations discussed above.

Fig. 11 is a block diagram illustrating an exemplary low precision test circuit 1120 configured to indicate whether there is a hit or whether it is not possible to determine whether there is a hit. Fig. 12, discussed in detail below, provides a detailed example of such a circuit. Note that circuit 1120 may also provide an output indicating whether there is a miss or whether it is not possible to determine whether there is a miss (e.g., whether the circuits of fig. 7 and 12 are combined).

In some implementations, the processor may skip higher precision intersection tests in some cases where the output of circuit 1120 indicates a deterministic hit. In some embodiments, such a ray query may be terminated under the following conditions: the ray is any hit ray, the triangle is opaque, and the active ray interval completely covers at least one bounding volume that completely encloses the triangle. In some implementations, the triangle opacity can be determined based on whether α maps to a test. Whether the active ray interval completely covers at least one bounding volume that completely encloses the triangle may be determined based on the traversal of the ADS (which allows determining which bounding volumes completely enclose the triangle based on the structure of the ADS) and a flat panel test circuit configured to test the traversed bounding volumes.

Under these conditions, the processor may record ray-triangle intersection hits without performing higher precision testing. This may advantageously improve performance, reduce power consumption, or both when any hit rays are processed. It should be noted that the conditions discussed above are included for illustrative purposes; in other embodiments, only a subset of these conditions may be checked, other conditions may be applied, and so forth.

FIG. 12 is a circuit diagram similar to the diagram of FIG. 7 showing a deterministic hit test circuit according to some embodiments. In the illustrated embodiment, the circuit 710 is configured as described above with reference to fig. 7 and 8. However, the outputs are routed differently to the comparators to provide hits or at non-deterministic results. In some embodiments, comparators, AND gates and OR gates, as shown in FIG. 12, are included in addition to the circuitry shown in FIG. 7, such that the quantitative intersection test circuit outputs two Boolean results for a given test.

The following code list 2 provides exemplary operations that may be implemented by the circuitry of fig. 12 or other similar circuitry.

/>

Exemplary traversal techniques to potentially reduce intersection testing

Ray intersection computation is typically facilitated by an Acceleration Data Structure (ADS). In order to effectively implement ray intersection queries, the spatial data structure may reduce the number of ray surface intersection tests, thereby accelerating the query process. A common class of ADS is Boundary Volume Hierarchy (BVH), where surface primitives are enclosed in a hierarchy of geometric proxy volumes (e.g., boxes) that test intersections more cheaply. These volumes may be referred to as boundary regions. By traversing the data structure and performing proxy intersection tests along the path, the graphics processor locates a conservative set of candidate intersection primitives for a given ray. A common form of BVH uses a 3D Axis Aligned Bounding Box (AABB). Once constructed, the AABB BVH is available for all ray queries and is a view-independent structure. In some embodiments, for each different mesh in the scene, these structures are constructed once in the local object space or model space of the object, and rays are transformed from world space into local space before traversing the BVH. This may allow geometric instantiation of a single grid with many rigid transformations and material properties (similar to instantiation in rasterization). Animation geometries typically require reconstruction of the data structure (sometimes with less expensive updating operations, called "re-fitting"). For non-real-time use cases where millions or billions of rays are tracked for a single scene in a single frame, the cost of ADS construction is fully amortized to the extent of "free". However, in a real-time environment, there is typically a subtle tradeoff between build cost and traversal cost, where building a more efficient structure is typically more expensive.

In some embodiments, the intersection circuitry is configured to traverse BVH ADS that use the 3D axis alignment box as its bounding volume. ADS may have a maximum branching factor (e.g., 2, 4, 8, 16, etc.) that does not assume triangle geometry and a flexible user-defined payload (e.g., content at leaves). In some embodiments, a depth-first search is performed, for example, as discussed in U.S. patent application Ser. No. 17/103,317, filed 11/24/2020, which is incorporated herein by reference in its entirety.

In some embodiments, RIA 190 is configured to use the revised ordering of child nodes for a given node for a particular type of depth-first traversal. In some embodiments, the disclosed technology is applied to secondary rays. A secondary ray is a ray that travels from the intersection location between the first (traced) ray and the surface. Many of the hit rays are secondary rays due to the type of effect that is typically achieved with any hit ray (e.g., shadow). Thus, the secondary ray originates near the intersecting surface and points away from that surface (and thus does not intersect that particular surface).

Due to the nature of secondary rays, the inventors have recognized that traversal of child nodes of an intersecting bounding volume from front to back or back to front can often result in missed intersection tests. For example, for front-to-back, a ray may intersect the bounding volume of the primitive reflecting the secondary ray (triggering the intersection test), but will not actually intersect the primitive.

FIG. 13 is a diagram comparing front-to-back ordering of intersecting child nodes of an acceleration data structure with ordering from the middle, according to some embodiments. In the illustrated example, the secondary ray is a reflection based on the intersection of another ray (not shown) with primitive A. As shown, the ray ends at the light source (this may be because ray tracing typically traces the ray back from the camera to the light source to avoid processing irrelevant rays). In this example, the ray is any hit ray and intersects primitive C.

Consider the exemplary case where a ray intersects a bounding volume of each exemplary primitive, and the exemplary primitive is all children of a node corresponding to a larger bounding volume. In this example, the traversal circuitry may use various rankings of child nodes to search first in a depth-first search.

As shown, using front-to-back ordering, where the bounding volume nearer to the ray origin is traversed first, the intersection test of primitives A and B results in a miss before the hit of primitive C is ultimately detected and the query ends (since this is any hit ray). It is not surprising that the ray that produced the exemplary secondary ray intersects primitive a, the miss for primitive a.

Using a middle-to-middle ordering advantageously provides faster hit detection, which in this example requires two fewer intersection tests than front-to-back ordering. As shown, starting from the middle of the ray results in a hit for primitive C, and the query may end without testing primitives D, A or B.

In some embodiments, various techniques may be utilized to provide prioritization to one or more intermediate nodes relative to the front/back nodes. As an example, consider a tree ADS with a branching factor N. The intersecting circuit may first sort the child nodes whose bounding volumes intersect in a front-to-back order. For M.ltoreq.N intersecting sub-nodes (index 0 through M-1), the intersecting circuit may reorder the intersections via the following sequence depending on whether M is odd or even.

If M is odd and division refers to integer division (e.g., 3/2=1), then the following is an exemplary reordered sequence of sub-indices:

M/2

M/2+1

M/2-1

M/2+2

M/2-2

...

M/2+M/2＝M-1

M/2-M/2＝0

if M is an even number, then the following is an exemplary reorder sequence:

M/2

M/2-1

M/2+1

M/2-2

M/2+2

...

M/2+(M/2-1)＝M-1

M/2-M/2＝0

in some hardware implementations, for a maximum branching factor of N, the circuit may encode a reordered sequence of each value of M from 1 to N to quickly determine the order of traversal from the middle. As one non-limiting example, if n=8, the table may include the following sequence:

For m=1: [0]

for m=2: [1,0]

For m=3: [1,2,0]

For m=4: [2,1,3,0]

For m=5: [2,3,1,4,0]

For m=6: [3,2,4,1,5,0]

For m=7: [3,4,2,5,1,6,0]

For m=8: [4,3,5,2,6,1,7,0]

It should be noted that the particular sequences discussed herein are included for illustrative purposes and are not intended to limit the scope of the present disclosure. In other embodiments, various ordering may be implemented, with one or more internal child nodes prioritized over the front/back nodes.

In implementations using a binary tree (n=2), the traversal circuitry can alternate between a back-to-front and front-to-back traversal order when searching for child nodes of different levels of the tree (e.g., front-to-back for odd depths in the tree, back-to-front for even depths in the tree, and vice versa).

Exemplary method

FIG. 14 is a flowchart illustrating an exemplary method for performing an initial intersection test, according to some embodiments. The method shown in fig. 14 may be used in conjunction with any of the computer circuits, systems, devices, elements or components, etc. disclosed herein. In various embodiments, some of the illustrated method elements may be performed concurrently in a different order than illustrated, or may be omitted. Additional method elements may also be performed as desired.

At 1410, in the illustrated embodiment, the graphics processor quantizes the first representation of the primitive to generate a reduced precision interval representation of the primitive, wherein the interval representation includes interval values that guarantee coverage of corresponding values specified by the first representation of the primitive. In some embodiments, quantization of the first representation of the primitive uses a fixed-point quantized representation rounded to zero for the lower limit of the interval and a minimum precision Unit (ULP) plus the lower limit for the upper limit of the interval.

At 1420, in the illustrated embodiment, the graphics processor quantizes the first representation of the ray to generate a reduced precision interval representation of the ray, wherein the interval representation includes interval values that guarantee coverage of corresponding values specified by the first representation of the ray. In some embodiments, the interval representation of reduced accuracy of the ray includes a quantized ray time represented as an interval. In some implementations, the circuitry generates an interval representation of the reduced accuracy of the primitive based on the first and second locations of the primitive at different points within the motion blur time interval such that the interval representation of the reduced accuracy of the primitive covers all possible locations of the primitive during the interval representing the quantized ray time.

At 1430, in the illustrated embodiment, the graphics processor uses a spacing algorithm to determine an initial intersection result based on the coordinates of the spacing representation of the primitive and the coordinates of the spacing representation of the ray, wherein the miss indicated by the initial intersection result is guaranteed not to be a hit of the first representation of the primitive and the first representation of the ray.

In some implementations, in response to a potential hit initial intersection result, the graphics processor is configured to perform an intersection test using the first representation of the primitive and the first representation of the ray.

In some embodiments, the clipping factor circuit generates an interval representation of the clipping factor based on the ray direction information and the scale information, and generates clipped vertex intervals based on the quantized representation of the primitive and the interval representation of the clipping factor. In some embodiments, the initial intersection result is based on the clipped vertex spacing. In some embodiments, the shear factor circuit is configured to use: the first precision represents a first coordinate of the ray origin in the coordinate direction, providing a threshold (e.g., maximum) contribution to the ray direction vector (e.g., the axis renamed to the z-direction); and using the second higher precision to represent the coordinates of the ray origin in the other directions.

In some implementations, the first representation of primitives is a representation of a triangle pair that includes at most four vertices of two triangle primitives in the triangle pair, wherein the graphics processor includes circuitry configured to sequentially process triangles in the given triangle pair.

FIG. 15 is a flowchart illustrating an exemplary method for performing an initial intersection test, according to some embodiments. The method shown in fig. 15 may be used in conjunction with any of the computer circuits, systems, devices, elements or components, etc. disclosed herein. In various embodiments, some of the illustrated method elements may be performed concurrently in a different order than illustrated, or may be omitted. Additional method elements may also be performed as desired.

At 1510, in the illustrated embodiment, the graphics processor performs an intersection test, wherein the intersection test operates on the reduced-precision representation of the ray generated by quantizing the initial representation of the ray and the reduced-precision representation of the primitive generated by quantizing the initial representation of the primitive. In an example embodiment, the intersection test generates a first result of the first ray and the first primitive, wherein the first result indicates that the first ray intersected the first primitive according to their initial representation. In some implementations, the intersection test can also generate a second result for the second ray and the first primitive, where the second result indicates that it is not possible to determine whether the second ray intersects the first primitive. The graphics processor may perform intersection testing on the second ray using the second ray and the initial representation of the first primitive. Intersection testing may be performed based on a traversal of an acceleration data structure comprising hierarchically arranged bounding volumes for at least a portion of a graphical scene.

At 1520, in the illustrated embodiment, the graphics processor records the intersection of the first ray with the first primitive based on the first result without performing an intersection test on the first ray using the first ray and the initial representation of the first primitive. In the illustrated embodiment, the intersection is recorded based on: a first result; determining that the first primitive is opaque; and determining that there is at least one bounding volume in the acceleration data structure that encloses the entire first primitive and that the entire enclosed portion of the first ray is active.

In some embodiments, the graphics processor is configured to record the intersection of the first ray based on any hit queries to the first ray (and may not record deterministic intersection results based on reduced-precision testing of other types of queries).

In some embodiments, the test circuit is further configured to output a result of the first ray and the first primitive, the result indicating: depending on their initial representation, the first ray misses the first primitive or it cannot be determined whether the first ray misses the first primitive. For example, the processor may include the comparators and logic circuits of both fig. 7 and 12. For the first ray and the first primitive, in the example discussed above, this output would indicate that it is not possible to determine whether the first ray missed the first primitive, as the other output indicates a deterministic hit.

In some embodiments, the processor uses a traversal order for at least some types of rays, starting from the middle. In some embodiments, the processor is configured to perform the intersection test based on a traversal (e.g., by a traversal circuit) of an acceleration data structure including nodes corresponding to hierarchically arranged bounding volumes. In particular, the processor may perform a depth-first search of the acceleration data structure and, for a set of sub-nodes of a first node in the acceleration data structure, select a next node for the depth-first search according to a ranking of intersecting bounding regions of the set of sub-nodes, wherein the ranking begins with a bounding volume that is closer to a midpoint of the ray being tested than one or more front bounding volumes and one or more rear bounding volumes.

In some embodiments, before determining the ordering, the processor determines a number of nodes in the set of child nodes, wherein the set of child nodes corresponds to nodes that respectively intersect the ray being tested. For example, once the number of intersecting child nodes is determined, the processor may access a lookup table to determine the ordering. In some embodiments, the ray being tested is any hit ray, and the traversal of the ray being tested ends in response to detecting the intersection. In some embodiments, the subsequent nodes are alternately ordered in the ordering between the forward node and the backward node relative to the starting node. As used herein, nodes closer to the "front" of a ray are also closer to the end of the ray, and nodes closer to the "back" of the ray are also closer to the origin of the ray. The exemplary ordering discussed above with reference to fig. 13 is an example of alternating between forward and backward nodes starting from an intermediate node.

Example apparatus

Referring now to fig. 16, a block diagram of an exemplary embodiment of an exemplary device 1600 is shown. In some embodiments, elements of device 1600 may be included within a system-on-chip. In some embodiments, device 1600 may be included in a mobile device that may be battery powered. Thus, power consumption of device 1600 may be an important design consideration. In the illustrated embodiment, device 1600 includes fabric 1610, compute complex 1620, input/output (I/O) bridge 1650, cache/memory controller 1645, graphics unit 1675, and display unit 1665. In some embodiments, device 1600 may include other components (not shown) such as video processor encoders and decoders, image processing or recognition elements, computer vision elements, and the like, in addition to or in place of the components shown.

The fabric 1610 may include various interconnects, buses, muxes, controllers, etc., and may be configured to facilitate communication between the various elements of the device 1600. In some embodiments, portions of structure 1610 may be configured to implement a variety of different communication protocols. In other embodiments, the structure 1610 may implement a single communication protocol, and elements coupled to the structure 1610 may be internally converted from the single communication protocol to other communication protocols.

In the illustrated embodiment, computing complex 1620 includes Bus Interface Unit (BIU) 1625, cache 1630, and cores 1635 and 1640. In various embodiments, compute complex 1620 may include various numbers of processors, processor cores, and caches. For example, compute complex 1620 may include 1, 2, or 4 processor cores, or any other suitable number. In one embodiment, cache 1630 is a set of associative L2 caches. In some implementations, the cores 1635 and 1640 may include internal instruction and data caches. In some embodiments, a coherence unit (not shown) in fabric 1610, cache 1630, or elsewhere in device 1600 may be configured to maintain coherence between the various caches of device 1600. BIU 1625 may be configured to manage communications between computing complex 1620 and other elements of device 1600. Processor cores, such as cores 1635 and 1640, may be configured to execute instructions of a particular Instruction Set Architecture (ISA) that may include operating system instructions and user application instructions.

The cache/memory controller 1645 may be configured to manage data transfers between fabric 1610 and one or more caches and memory. For example, the cache/memory controller 1645 may be coupled to an L3 cache, which in turn may be coupled to system memory. In other implementations, the cache/memory controller 1645 may be directly coupled to memory. In some implementations, the cache/memory controller 1645 may include one or more internal caches.

As used herein, the term "coupled to" may refer to one or more connections between elements, and coupling may include intermediate elements. For example, in FIG. 16, graphics unit 1675 may be described as being "coupled to" memory through fabric 1610 and cache/memory controller 1645. In contrast, in the illustrated embodiment of fig. 16, graphics unit 1675 is "directly coupled" to structure 1610 because there are no intervening elements.

Graphics unit 1675 may include one or more processors, such as one or more Graphics Processing Units (GPUs). For example, graphics unit 1675 may receive graphics-oriented dataInstructions of (a), such asMetal or->An instruction. Graphics unit 1675 may execute special purpose GPU instructions or perform other operations based on the received graphics-oriented instructions. Graphics unit 1675 may generally be configured to process large blocks of data in parallel and may construct an image in a frame buffer for output to a display, which may be included in a device or may be a separate device. Graphics unit 1675 may include transformation, lighting, triangle, and rendering engines in one or more graphics processing pipelines. Graphics unit 1675 may output pixel information for displaying an image. In various embodiments, graphics unit 1675 may include programmable shader circuitry that may include highly parallel execution cores configured to execute graphics programs that may include pixel tasks, vertex tasks, and compute tasks (which may or may not be graphics-related).

In some implementations, graphics unit 1175 includes circuitry 220 that may reduce power consumption, improve performance, or both, relative to a conventional GPU.

The display unit 1665 may be configured to read data from the frame buffer and provide a stream of pixel values for display. In some embodiments, the display unit 1665 may be configured to display a pipeline. In addition, the display unit 1665 may be configured to mix a plurality of frames to generate an output frame. In addition, the display unit 1665 may include one or more interfaces for coupling to a user display (e.g., a touch screen or external display) (e.g.,or an embedded display port (eDP)).

The I/O bridge 1650 may include various elements configured to implement, for example, universal Serial Bus (USB) communications, security, audio, and low-power always-on functionality. The I/O bridge 1650 may also include interfaces such as Pulse Width Modulation (PWM), general purpose input/output (GPIO), serial Peripheral Interface (SPI), and inter-integrated circuit (I2C). Various types of peripheral devices and devices may be coupled to device 1600 via I/O bridge 1650.

In some embodiments, device 1600 includes network interface circuitry (not explicitly shown) that may be connected to either structure 1610 or I/O bridge 1650. The network interface circuit may be configured to communicate via various networks, which may be wired networks, wireless networks, or both. For example, the network interface circuitry may be configured to communicate via a wired local area network, a wireless local area network (e.g., via WiFi), or a wide area network (e.g., the internet or a virtual private network). In some embodiments, the network interface circuit is configured to communicate via one or more cellular networks using one or more radio access technologies. In some embodiments, the network interface circuitry is configured to communicate using device-to-device communications (e.g., bluetooth or WiFi Direct), or the like. In various embodiments, the network interface circuitry may provide the device 1600 with connectivity to various types of other devices and networks.

Exemplary application

Turning now to fig. 17, various types of systems are shown that may include any of the circuits, devices, or systems described above. The system or device 1700, which may incorporate or otherwise utilize one or more of the techniques described herein, may be used in a wide variety of fields. For example, the system or device 1700 may be used as part of the hardware of a system such as a desktop computer 1710, a laptop computer 1720, a tablet 1730, a cellular or mobile telephone 1740, or a television 1750 (or a set top box coupled to a television).

Similarly, the disclosed elements may be used with a wearable device 1760, such as a smart watch or health monitoring device. In many embodiments, the smart watch may implement a variety of different functions-e.g., access to email, cellular services, calendars, health monitoring, etc. The wearable device may also be designed to perform only health monitoring functions, such as monitoring vital signs of the user, performing epidemiological functions such as contact tracking, providing communication to emergency medical services, and so forth. Other types of devices are also contemplated, including devices worn on the neck, devices implantable in the human body, glasses or helmets designed to provide a computer-generated reality experience, such as those based on augmented reality and/or virtual reality, and the like.

The system or device 1700 may also be used in a variety of other environments. For example, the system or device 1700 may be used in the context of a server computer system (such as a dedicated server) or on shared hardware implementing a cloud-based service 1770. Still further, the system or device 1700 may be implemented in a wide range of specialized everyday devices, including devices 1780 commonly found in the home, such as refrigerators, thermostats, security cameras, and the like. The interconnection of such devices is often referred to as "internet of things" (IoT). The elements may also be implemented in various modes of transportation. For example, the system or device 1700 may be used with control systems, guidance systems, entertainment systems, etc. of various types of vehicles 1790.

The applications illustrated in fig. 17 are merely exemplary and are not intended to limit potential future applications of the disclosed systems or devices. Other exemplary applications include, but are not limited to: portable gaming devices, music players, data storage devices, unmanned aerial vehicles, etc.

Exemplary computer readable Medium

The present disclosure has described various exemplary circuits in detail hereinabove. It is intended that the present disclosure encompass not only implementations that include such circuitry, but also computer-readable storage media that include design information specifying such circuitry. Accordingly, the present disclosure is intended to support claims that encompass not only an apparatus comprising the disclosed circuitry, but also storage media specifying circuitry in a format configured to generate manufacturing system identification of hardware (e.g., integrated circuits) comprising the disclosed circuitry. The claims to such storage media are intended to cover entities that generate circuit designs, for example, but do not themselves make the designs.

Fig. 18 is a block diagram illustrating an exemplary non-transitory computer-readable storage medium storing circuit design information according to some embodiments. In the illustrated embodiment, semiconductor manufacturing system 1820 is configured to process design information 1815 stored on non-transitory computer readable medium 1810 and to manufacture integrated circuit 1830 based on the design information 1815.

The non-transitory computer readable storage medium 1810 may include any of a variety of suitable types of memory devices or storage devices. The non-transitory computer readable storage medium 1810 may be an installation medium such as a CD-ROM, floppy disk, or tape device; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, rambus RAM, etc.; nonvolatile memory such as flash memory, magnetic media, e.g., hard disk drives or optical storage devices; registers, or other similar types of memory elements, etc. The non-transitory computer readable storage medium 1810 may include other types of non-transitory memory or combinations thereof. The non-transitory computer readable storage medium 1810 may include two or more memory media that may reside in different locations, such as in different computer systems connected by a network.

Design information 1815 may be specified using any of a variety of suitable computer languages, including hardware description languages such as, but not limited to: VHDL, verilog, systemC, systemVerilog, RHDL, M, myHDL, etc. Design information 1815 may be used by semiconductor manufacturing system 1820 to manufacture at least a portion of integrated circuit 1830. The format of design information 1815 may be identified by at least one semiconductor manufacturing system 1820. In some implementations, design information 1815 may also include one or more cell libraries that specify the synthesis, layout, or both, of integrated circuit 1830. In some embodiments, the design information is specified in whole or in part in the form of a netlist specifying the cell library elements and their connectivity. The separately acquired design information 1815 may or may not include sufficient information for manufacturing a corresponding integrated circuit. For example, design information 1815 may specify circuit elements to be manufactured, but not their physical layout. In this case, design information 1815 may need to be combined with layout information to actually manufacture the specified circuit.

In various embodiments, the integrated circuit 1830 may include one or more custom macro-cells, such as memory, analog or mixed signal circuitry, and the like. In this case, the design information 1815 may include information related to the macro-cells included. Such information may include, but is not limited to, a circuit diagram capture database, mask design data, behavioral models, and device or transistor level netlists. As used herein, mask design data may be formatted according to a Graphic Data System (GDSII) or any other suitable format.

Semiconductor fabrication system 1820 may include any of a variety of suitable elements configured to fabricate integrated circuits. This may include, for example, elements for depositing semiconductor material (e.g., on a wafer that may include a mask), removing material, changing the shape of the deposited material, modifying material (e.g., by doping material or modifying dielectric constant using ultraviolet treatment), etc. The semiconductor manufacturing system 1820 may also be configured to perform various tests of the manufactured circuits for proper operation.

In various implementations, integrated circuit 1830 is configured to operate according to a circuit design specified by design information 1815, which may include performing any of the functions described herein. For example, integrated circuit 1830 may include any of the various elements shown in fig. 1B, fig. 2, fig. 4-9, fig. 11, fig. 12, and fig. 16. Additionally, integrated circuit 1830 may be configured to perform the various functions described herein in connection with the other components. Additionally, the functionality described herein may be performed by a plurality of connected integrated circuits.

As used herein, the phrase in the form of "design information specifying the design of a circuit configured as …" does not imply that the circuit in question must be manufactured in order to meet the element. Rather, the phrase indicates that the design information describes a circuit that, when manufactured, would be configured to perform the indicated action or would include the specified components.

***

The present disclosure includes references to "embodiments" or groups of "embodiments" (e.g., "some embodiments" or "various embodiments"). Embodiments are various implementations or examples of the disclosed concepts. References to "an embodiment," "one embodiment," "a particular embodiment," etc., do not necessarily refer to the same embodiment. Numerous possible embodiments are contemplated, including those specifically disclosed, as well as modifications and substitutions that fall within the spirit or scope of the present disclosure.

The present disclosure may discuss potential advantages that may result from the disclosed embodiments. Not all implementations of these embodiments will necessarily exhibit any or all of the potential advantages. Whether a particular implementation achieves advantages depends on many factors, some of which are outside the scope of this disclosure. Indeed, there are many reasons why a particular implementation falling within the scope of the claims may not exhibit some or all of the disclosed advantages. For example, a particular implementation may include other circuitry outside the scope of the present disclosure, in combination with one of the disclosed embodiments, that negates or reduces one or more of the disclosed advantages. Moreover, sub-optimal design execution of a particular implementation (e.g., implementation techniques or tools) may also negate or mitigate the disclosed advantages. Even assuming a technical implementation, the implementation of the advantages may still depend on other factors, such as the environment in which the implementation is deployed. For example, inputs provided to a particular implementation may prevent one or more problems addressed in the present disclosure from occurring in a particular instance, and as a result may not realize the benefits of its solution. In view of the existence of potential factors outside of the present disclosure, any potential advantages described herein should not be construed as a limitation of the claims that must be satisfied in order to prove infringement. Rather, identification of such potential advantages is intended to illustrate one or more types of improvements available to designers who benefit from the present disclosure. Describing such advantages permanently (e.g., stating "a particular advantage" may occur ") is not intended to convey a question regarding whether such advantage may in fact be achieved, but rather to recognize that implementation of such advantage typically depends on the technical reality of the additional factors.

Embodiments are not limiting unless otherwise specified. That is, the disclosed embodiments are not intended to limit the scope of the claims that are drafted based on this disclosure, even where only a single example is described for a particular feature. The disclosed embodiments are intended to be illustrative, rather than limiting, and do not require any opposite statement in the present disclosure. It is therefore intended that the present application be construed as limited to the appended claims, and such alternatives, modifications, and equivalents, as will be apparent to those skilled in the art having the benefit of this disclosure.

For example, features of the application may be combined in any suitable manner. Accordingly, new claims may be formulated to any such combination of features during prosecution of the present patent application (or of a patent application claiming priority thereto). In particular, with reference to the appended claims, features of dependent claims may be combined with features of other dependent claims, including claims dependent on other independent claims, where appropriate. Similarly, where appropriate, features from the respective independent claims may be combined.

Thus, while the appended dependent claims may be written such that each dependent claim depends from a single other claim, additional dependencies are also contemplated. Any combination of the dependent features consistent with the present disclosure is contemplated and may be claimed in this or another patent application. In short, the combinations are not limited to those specifically recited in the appended claims.

It is also contemplated that a claim drafted in one format or legal type (e.g., device) is intended to support a corresponding claim of another format or legal type (e.g., method), where appropriate.

***

Because the present disclosure is a legal document, various terms and phrases may be subject to regulatory and judicial interpretation constraints. An announcement is hereby given, and the following paragraphs and definitions provided throughout this disclosure will be used to determine how to interpret the claims drafted based on this disclosure.

References to items in the singular (i.e., a noun or noun phrase preceded by "a", "an", or "the") are intended to mean "one or more", unless the context clearly dictates otherwise. Thus, reference to an "item" in a claim does not exclude additional instances of that item, without accompanying context. "plurality" of items refers to a collection of two or more items.

The word "may" is used herein in a permitted sense (i.e., having the potential to, being able to) and not in a mandatory sense (i.e., must).

The terms "comprising" and "including" and their forms are open ended and mean "including, but not limited to.

When the term "or" is used in this disclosure with respect to a list of options, it will generally be understood to be used in an inclusive sense unless the context provides otherwise. Thus, the expression "x or y" is equivalent to "x or y, or both", thus covering 1) x but not y, 2) y but not x, and 3) both x and y. On the other hand, phrases such as "either x or y, but not both," are used in an exclusive sense to make a clear "or.

The expression "w, x, y or z, or any combination thereof" or ". At least one of w, x, y and z" is intended to cover all possibilities involving a single element up to the total number of elements in the set. For example, given a set [ w, x, y, z ], these phrases encompass any single element in the set (e.g., w but not x, y, or z), any two elements (e.g., w and x but not y or z), any three elements (e.g., w, x, and y but not z), and all four elements. The phrase "..at least one of w, x, y and z" thus refers to at least one element in the set [ w, x, y, z ] thereby covering all possible combinations in the list of elements. The phrase should not be construed as requiring the presence of at least one instance of w, at least one instance of x, at least one instance of y, and at least one instance of z.

In this disclosure, various "tags" may precede nouns or noun phrases. Unless the context provides otherwise, different labels for features (e.g., "first circuit," "second circuit," "particular circuit," "given circuit," etc.) refer to different instances of a feature. In addition, unless otherwise indicated, the labels "first," "second," and "third" when applied to features do not imply any type of ordering (e.g., spatial, temporal, logical, etc.).

The phrase "based on" or used to describe one or more factors that affect the determination. This term does not exclude that there may be additional factors that may influence the determination. That is, the determination may be based on specified factors alone or on specified factors and other unspecified factors. Consider the phrase "determine a based on B". This phrase specifies that B is a factor for determining a or that B affects a. This phrase does not preclude the determination of a from being based on some other factor, such as C. This phrase is also intended to cover embodiments where a is determined based only on B. As used herein, the phrase "based on" is synonymous with the phrase "based at least in part on".

The phrases "responsive" and "responsive" describe one or more factors that trigger an effect. The phrase does not exclude the possibility that additional factors may affect or otherwise trigger the effect, which factors are used in conjunction with or independent of the specified factors. That is, the effect may be responsive only to these factors, or may be responsive to specified factors as well as other unspecified factors. Consider the phrase "execute a in response to B". The phrase specifies that B is a factor that triggers the execution of a or the specific outcome of a. The phrase does not exclude that executing a may also be responsive to some other factor, such as C. The phrase also does not exclude that execution a may be performed in conjunction in response to B and C. This phrase is also intended to cover embodiments in which a is performed in response to B only. As used herein, the phrase "responsive" is synonymous with the phrase "at least partially responsive". Similarly, the phrase "responsive to" is synonymous with the phrase "at least partially responsive to".

***

Within this disclosure, different entities (which may be variously referred to as "units," "circuits," other components, etc.) may be described or claimed as "configured to" perform one or more tasks or operations. This expression-an entity configured to perform one or more tasks-is used herein to refer to a structure (i.e., a physical thing). More specifically, this expression is used to indicate that the structure is arranged to perform one or more tasks during operation. A structure may be said to be "configured to" perform a task even though the structure is not currently being operated on. Thus, an entity described or stated as "configured to" perform a certain task refers to a physical thing for performing the task, such as a device, a circuit, a system with a processor unit, and a memory storing executable program instructions, etc. This phrase is not used herein to refer to intangible things.

In some cases, various units/circuits/components may be described herein as performing a set of tasks or operations. It should be understood that these entities are "configured to" perform those tasks/operations, even if not specifically indicated.

The term "configured to" is not intended to mean "configurable to". For example, an unprogrammed FPGA is not considered "configured to" perform a particular function. However, the unprogrammed FPGA may be "configurable" to perform this function. After appropriate programming, the FPGA can then be considered "configured to" perform a particular function.

For the purposes of U.S. patent application based on this disclosure, the statement in the claims that a structure "configured to" perform one or more tasks is expressly intended to be specific to that claim elementDoes not takeRefer to 35u.s.c. ≡112 (f). If applicants want to refer to section 112 (f) during application based on the disclosed U.S. patent application, then it will use "for [ perform function ]]Is to be construed as meaning elements of the claims.

Different "circuits" may be described in this disclosure. These circuits, or "circuits," constitute hardware including various types of circuit elements, such as combinational logic, clock storage devices (e.g. flip-flops, registers, latches, etc.), finite state machines, memory (e.g. random access memory, embedded dynamic random access memory), programmable logic arrays, and the like. The circuit may be custom designed or taken from a standard library. In various implementations, the circuitry may optionally include digital components, analog components, or a combination of both. Certain types of circuits may be commonly referred to as "units" (e.g., decode units, arithmetic Logic Units (ALUs), functional units, memory Management Units (MMUs), etc.). Such units are also referred to as circuits or circuits.

Thus, the disclosed circuits/units/components and other elements shown in the figures and described herein include hardware elements, such as those described in the preceding paragraphs. In many cases, the internal arrangement of hardware elements in a particular circuit may be specified by describing the functionality of the circuit. For example, a particular "decode unit" may be described as performing a function that "processes the opcode of an instruction and routes the instruction to one or more of a plurality of functional units," meaning that the decode unit is "configured to" perform the function. The functional specification is sufficient to suggest a set of possible structures for the circuit to those skilled in the computer arts.

In various embodiments, circuits, units, and other elements may be defined by functions or operations that they are configured to perform as described in the preceding paragraphs. The arrangement relative to each other and the manner in which such circuits/units/components interact form a microarchitectural definition of hardware that is ultimately fabricated in an integrated circuit or programmed into an FPGA to form a physical implementation of the microarchitectural definition. Thus, microarchitectural definition is considered by those skilled in the art as a structure from which many physical implementations can be derived, all of which fall within the broader structure described by the microarchitectural definition. That is, a skilled artisan having the microarchitectural definition provided in accordance with the present disclosure may implement the structure by encoding a description of the circuits/units/components in a Hardware Description Language (HDL) such as Verilog or VHDL without undue experimentation and utilizing the ordinary skill's application. HDL descriptions are often expressed in a manner that can appear to be functional. But for those skilled in the art, this HDL description is the means used to transform the structure of a circuit, unit, or component into the next level of specific implementation details. Such HDL descriptions can take the following form: behavior code (which is typically non-synthesizable), register Transfer Language (RTL) code (which is typically synthesizable as compared to behavior code), or structural code (e.g., a netlist specifying logic gates and their connectivity). HDL descriptions may be synthesized sequentially for a library of cells designed for a given integrated circuit manufacturing technology and may be modified for timing, power, and other reasons to obtain a final design database that is transferred to the factory to generate masks and ultimately produce integrated circuits. Some hardware circuits, or portions thereof, may also be custom designed in a schematic editor and captured into an integrated circuit design along with a composite circuit. The integrated circuit may include transistors and other circuit elements (e.g., passive elements such as capacitors, resistors, inductors, etc.), as well as interconnections between transistors and circuit elements. Some embodiments may implement multiple integrated circuits coupled together to implement a hardware circuit, and/or may use discrete elements in some embodiments. Alternatively, the HDL design may be synthesized as a programmable logic array such as a Field Programmable Gate Array (FPGA) and may be implemented in the FPGA. This decoupling between the design of a set of circuits and the subsequent low-level implementation of those circuits generally results in the situation: where a circuit or logic designer never specifies a particular set of structures for low-level implementations beyond what the circuit is configured to do, as the process is performed at different stages of the circuit implementation process.

The fact that the same specification of a circuit can be achieved using many different low-level combinations of circuit elements results in a large number of equivalent structures of the circuit. As noted, these low-level circuit implementations may vary depending on manufacturing techniques, foundry selected for manufacturing integrated circuits, cell libraries provided for particular projects, and so forth. In many cases, the choice of creating these different implementations by different design tools or methods may be arbitrary.

Furthermore, a single implementation of a particular functional specification of a circuit typically includes a large number of devices (e.g., millions of transistors) for a given implementation. Thus, the shear volume of this information makes it impractical to provide a complete recitation of the low-level structure used to implement a single embodiment, not to mention the large number of equivalent possible implementations. To this end, the present disclosure describes the structure of a circuit using functional shorthand commonly used in industry.

Claims

1. An apparatus, comprising:

a graphics processor configured to determine whether a ray intersects a primitive in a graphics scene, wherein the graphics processor comprises:

ray intersection circuitry configured to perform an intersection test, the intersection test comprising:

Quantizing a first representation of the primitive to generate a reduced-precision interval representation of the primitive, wherein the interval representation includes a lower limit and an upper limit defining an interval such that a corresponding value specified by the first representation of the primitive is guaranteed to fall within the lower limit and the upper limit;

quantizing a first representation of the ray to generate a reduced precision interval representation of the ray, wherein the interval representation includes interval values having upper and lower limits guaranteed to cover corresponding values specified by the first representation of the ray; and

an initial intersection result is determined based on coordinates of the interval representation of the primitive and coordinates of the interval representation of the ray using an interval algorithm, wherein a miss indicated by the initial intersection result is guaranteed not to be a hit of the first representation of the primitive and the first representation of the ray.

2. The apparatus of claim 1, further comprising a clipping factor circuit configured to:

generating an interval representation of the shear factor based on the ray direction information and the scale information; and

generating a clipped vertex interval based on the quantized representation of the primitive and the interval representation of the clipping factor;

Wherein the initial intersection result is based on the clipped vertex spacing.

3. The apparatus of claim 2, wherein the clipping factor circuit is configured to:

representing a first coordinate of the origin of the ray in a coordinate direction using a first precision, thereby providing a threshold contribution to a ray direction vector; and

a second, higher precision is used to represent the coordinates of the origin of the ray in the other direction.

4. The apparatus of claim 1, wherein the quantization of the first representation of the primitive uses a fixed-point quantized representation rounded to zero for a lower bound of the interval and a minimum precision Unit (ULP) for an upper bound of the interval.

5. The apparatus of claim 1, wherein the first representation of the primitive is a representation of a triangle pair comprising at most four vertices of two triangle primitives in the triangle pair, wherein the graphics processor comprises circuitry configured to sequentially process triangles in a given triangle pair.

6. The apparatus of claim 1, wherein the reduced precision interval representation of the ray comprises a quantized ray time represented as an interval.

7. The apparatus of claim 6, further comprising:

circuitry configured to generate the reduced-precision interval representation of the primitive based on a first position and a second position of the primitive at different points within a motion blur time interval such that the reduced-precision interval representation of the primitive covers all possible positions of the primitive during the interval representing the quantized ray time.

8. The apparatus of claim 1, wherein, in response to a potential hit initial intersection result, the graphics processor is configured to perform an intersection test using the first representation of the primitive and the first representation of the ray.

9. The apparatus of claim 1, wherein the apparatus is a computing device, the computing device further comprising:

a central processing unit;

a display; and

a network interface circuit.

10. A method, comprising:

quantizing, by a graphics processor, a first representation of a primitive to generate an interval representation of reduced precision of the primitive, wherein the interval representation includes a lower limit and an upper limit defining an interval such that a corresponding value specified by the first representation of the primitive is guaranteed to fall within the lower limit and the upper limit;

Quantizing, by the graphics processor, a first representation of a ray to generate a reduced precision interval representation of the ray, wherein the interval representation includes interval values having an upper bound and a lower bound guaranteed to cover corresponding values specified by the first representation of the ray; and

determining, by the graphics processor, an initial intersection result based on coordinates of the interval representation of the primitive and coordinates of the interval representation of the ray using an interval algorithm, wherein a miss indicated by the initial intersection result is guaranteed not to be a hit of the first representation of the primitive and the first representation of the ray.

11. The method of claim 10, further comprising:

generating, by the graphics processor, a spatial representation of the shear factor based on the ray direction information and the scale information; and

generating, by the graphics processor, clipped vertex intervals based on the quantized representation of the primitive and the interval representation of the clipping factor;

wherein the initial intersection result is based on the clipped vertex spacing.

12. The method of claim 10, wherein quantizing the first representation of the primitive uses a fixed-point quantized representation rounded to zero for a lower bound of the interval and uses the lower bound plus one minimum unit of precision (ULP) for an upper bound of the interval.

13. A non-transitory computer readable storage medium having stored thereon design information specifying a design of at least a portion of a hardware integrated circuit in a format recognized by a semiconductor manufacturing system configured to use the design information to produce the circuit from the design, wherein the design information specifies the circuit comprises:

14. The non-transitory computer-readable storage medium of claim 13, wherein the design information further specifies that the circuit comprises:

a clipping factor circuit configured to:

wherein the initial intersection result is based on the clipped vertex spacing.

15. The non-transitory computer-readable storage medium of claim 14, wherein the shear factor circuit is configured to:

16. The non-transitory computer-readable storage medium of claim 13, wherein the quantization of the first representation of the primitive uses a fixed-point quantized representation rounded to zero for a lower bound of the interval and a minimum precision Unit (ULP) plus the lower bound for an upper bound of the interval.

17. The non-transitory computer-readable storage medium of claim 13, wherein the first representation of the primitive is a representation of a triangle pair comprising at most four vertices of two triangle primitives in the triangle pair, wherein the graphics processor comprises circuitry configured to sequentially process triangles in a given triangle pair.

18. The non-transitory computer-readable storage medium of claim 13, wherein the reduced-precision interval representation of the ray includes a quantized ray time represented as an interval.

19. The non-transitory computer-readable storage medium of claim 18, wherein the design information further specifies that the circuit comprises:

20. The non-transitory computer-readable storage medium of claim 13, wherein, in response to a potential hit initial intersection result, the graphics processor is configured to perform an intersection test using the first representation of the primitive and the first representation of the ray.