CN117156149A

CN117156149A - Compression and decompression of sub-primitive presence indication for use in a rendering system

Info

Publication number: CN117156149A
Application number: CN202310611388.8A
Authority: CN
Inventors: S·芬尼
Original assignee: Imagination Technologies Ltd
Current assignee: Imagination Technologies Ltd
Priority date: 2022-05-30
Filing date: 2023-05-26
Publication date: 2023-12-01
Also published as: GB202207938D0; GB2613417A

Abstract

The present application relates to compression and decompression of sub-primitive presence indications for use in a rendering system. A method and a compression unit are provided for compressing a sub-primitive presence indication block for use in a rendering system into a compressed data block. The sub-primitive presence indication block comprises a plurality of sub-primitive presence indication sub-blocks. A plurality of candidates for a combination of presence indications is identified. For each of the sub-blocks in the sub-block of the indication block, there is a sub-element: one of the candidates to be used for representing the sub-block is selected, and an index indicating the selected candidate is stored in the compressed data block.

Description

Compression and decompression of sub-primitive presence indication for use in a rendering system

Cross reference to related applications

The present application claims priority from uk patent application GB2207941.2 filed 5/30 in 2022 and uk patent application GB2207938.8 filed 5/30 in 2022, the disclosures of which are incorporated herein by reference in their entireties.

Technical Field

The present disclosure relates to techniques for compressing and/or decompressing sub-primitive presence indications in a rendering system.

Background

The rendering system may be used to generate an image of the scene. Two common rendering techniques are ray tracing and rasterization. In particular, ray tracing is a computational rendering technique for generating an image of a scene (e.g., a 3D scene) by tracing the path of light ('rays') through the scene, typically from the perspective of the camera. Each ray is modeled as originating from a camera and entering the scene through a pixel. As a ray traverses a scene, it may intersect objects within the scene. The intersection between a ray and its intersecting object can be modeled to create a realistic visual effect. For example, in response to determining that a ray intersects an object, a shader program (i.e., a portion of computer code) may be executed for the intersection. A programmer may write a shader program to define how the system reacts to an intersection (which may, for example, result in one or more secondary rays being emitted into the scene), for example, to represent reflection of a ray from an intersecting object or refraction of a ray through an object (e.g., if the object is transparent or translucent). As another example, the shader program may cause one or more rays to be emitted into the scene for determining whether an object is in a shadow at an intersection. The result of executing the shader program (and processing the associated secondary ray) may be to calculate the color value of the pixel through which the ray passed.

Rendering an image of a scene using ray tracing may involve performing many intersection tests, such as performing billions of intersection tests to render an image of a scene. To reduce the number of intersection tests that need to be performed, the ray tracing system may generate an acceleration structure, where each node of the acceleration structure represents an area within the scene. The acceleration structure is typically hierarchical (e.g., has a tree structure) such that it contains multiple levels of nodes, where nodes near the top of the acceleration structure represent relatively large areas in the scene (e.g., the root node may represent the entire scene), and nodes near the bottom of the acceleration structure represent relatively small areas in the scene. The leaf nodes of the acceleration structure represent regions in the scene surrounding at least one primitive or part of a primitive and have pointers to the surrounding primitives.

An acceleration structure may be used to perform intersection testing of a ray by first testing the ray for intersection with a root node of the acceleration structure (e.g., in a recursive manner). If a ray is found to intersect a parent node (e.g., root node), then the test may proceed to a child node of the parent node. In contrast, if a ray is found not to intersect a parent node, intersection testing of child nodes of the parent node may be avoided, thereby saving computational effort. If a ray is found to intersect a leaf node, the ray may be tested against objects within the region represented by the leaf node to determine which object(s) the ray intersects. "primitives" may be used to represent objects. The primitives represent geometric units in the system and may be, for example, convex polygons. Primitives are typically triangles, but they may also be other shapes, such as rectangular (the term "rectangle" is used herein to include "square"), pentagonal, hexagonal, or non-planar shapes, such as spherical or bicubic patches, or have curved sides, etc.

Primitives are typically simple geometric shapes to facilitate intersection tests to determine whether a ray intersects a primitive. However, primitives may be used to represent more complex shapes. For example, a texture (e.g., a 2D image or 3D volume) may be applied to a primitive, where the texture may have alpha values that determine opacity at different locations on the primitive, e.g., a maximum sampled alpha value (e.g., 255 for an 8-bit alpha value) means that the primitive is completely opaque at the sampling location, and a minimum sampled alpha value (e.g., 0) means that the primitive is completely transparent at the sampling location. The value between the minimum alpha value and the maximum alpha value may represent the partial opacity. For purposes of intersection testing in ray tracing systems, if a ray intersects a primitive at a location where the primitive is completely transparent (i.e., at a location where the alpha value is zero), then the intersection is not accepted, i.e., the ray passes directly through the primitive. In this way, setting the alpha value to zero may be used to represent a hole in a primitive, i.e., a location on the primitive where 'no' exists in terms of the intersection test procedure is considered. For intermediate alpha values, the system may choose to weight sum the object behind the primitive and the shadow surface itself, or possibly use a threshold, commonly referred to in the art as alpha test. Textures that include non-existent regions may be referred to as "pass-through textures," alpha test textures, "or" masking textures, "and primitives to which these textures are applied may be referred to as" pass-through primitives, "" alpha test transparent primitives, "or" masking primitives. The through primitives may be used to represent geometries with complex perimeters or large numbers of holes therein, such as leaf and link pens with a small number of primitives.

Note that the 'texture' is not necessarily the actual image-it may be 'dynamic' calculated. Such calculations may be performed by executing a 'shader' program. Thus, 'inspection texture' may also be understood to include these calculation methods.

FIG. 1 shows two triangle primitives 102 sharing one edge to form a quadrilateral ₁ And 102 ₂ Is an example of (a). Textures representing leaves are applied to both primitives. The texture has some regions (e.g., 104) that are completely transparent so that they are not present for the purpose of intersection testing. Textures also have some opaque regions (e.g., 106) (e.g., they are opaque) so that they exist for intersection testing purposes. Finally, there may be a small number of regions (e.g., along the boundary between regions 104 and 106) that are partially transparent, which may be handled with two methods such as those previously mentioned for the 'intermediate alpha' value. Different ray tracing systems may react differently to finding an intersection of a ray with a partially transparent region, e.g., the intersection may be considered a hit, miss, or partial hit. One or more additional rays may be generated as a result of the partial hit.

When the intersection test process finds that the ray intersects the through primitive, the intersection test process of the ray may be stopped while a shader program is executed on the programmable execution unit to determine whether the primitive appears at the intersection point of the ray and the primitive intersection. The presence of primitives at an intersection is typically determined by the alpha channel mapped to the texture on the primitive. The transfer between the intersection test procedure (which may be implemented in fixed function hardware) and the shader program (which is executed on the programmable execution unit) introduces latency into the ray tracing system. For example, fixed function hardware that implements an intersection test procedure may stop thousands of clock cycles when a shader program executes on a programmable execution unit to determine the presence of primitives at an intersection. Thus, reducing the number of times a shader program needs to be executed to determine the presence of a punch-through primitive at an intersection will significantly improve the performance of the ray tracing system. Reducing the number of times a shader program needs to be executed to determine the presence of a pass-through primitive at an intersection without increasing the number of primitives used to represent geometry would be particularly beneficial because increasing the number of primitives would increase processing costs in a ray tracing system, such as the processing costs of rendering, modeling, and updating of acceleration structures.

A paper entitled "Sub-triangle opacity masks for faster ray tracing of transparent objects" by Holger Gruen, carsten Benthin, and Sven Woop (Proceedings of the ACM on Computer Graphics and Interactive Techniques, volume 3, phase 2, article number: 18) proposes ray tracing of transparent primitives using a Sub-triangle opacity mask for alpha testing. Each triangle primitive is subdivided into a set of sub-primitives of uniform size. For example, FIG. 2 shows a triangle primitive 202 that is subdivided into 64 sub-primitives of uniform size, labeled 0 through 63. The barycentric coordinates of the three vertices of triangle primitive 202 are labeled b=0, 0,1, b=0, 1,0, and b=1, 0. Any position within triangle primitive 202 may be uniquely identified with barycentric coordinates, indicating which of sub-primitives (0-63) the position is within. For each sub-element (0 to 63), an evaluation is made in a preprocessing step to determine a sub-element presence indication, which indicates that each sub-element is: (i) complete presence, (ii) complete absence, or (iii) partial presence. If a sub-primitive portion is present, then the texture needs to be checked, e.g., by executing a shader program, to determine whether a particular point within the sub-primitive is present or absent. The preprocessing step may be performed by an Application Programming Interface (API), or as part of a process such as creating primitives and textures by a user. Each of the sub-primitive presence indications is represented with 2 bits to indicate one of three presence states: (i) complete presence, (ii) complete absence, or (iii) partial presence. The "partial present" state may be referred to as a "check texture" state because the presence at a location within a partially present sub-primitive is determined by checking the texture, i.e., by executing a shader program.

When an intersection is found between a ray and a primitive, the presence indication may be queried to determine whether to accept the intersection. The location of the intersection within the primitive, for example as indicated by the barycentric coordinates, is used to identify the sub-primitive where the intersection is located. If the presence indication of the identified sub-primitive indicates that the sub-primitive is completely present or completely absent, the intersection test procedure may proceed with the intersection test without executing a shader program to determine the presence of the primitive at the intersection point. However, if the presence indication of the identified sub-primitive indicates that a sub-primitive portion is present, then the texture is examined by executing a shader program to determine the presence of the primitive at the intersection point.

The use of presence indications reduces the number of times a shader program needs to be executed to examine textures to determine the presence of primitives at intersections to determine whether to accept an intersection. In other words, there is an indication of the region used to determine the complete absence and the complete presence of the primitive, thereby reducing the number of times an alpha test needs to be performed, thereby skipping more expensive alpha test operations, if possible. Alpha testing (i.e., running shader programs to check alpha values of textures at intersections) is an expensive operation in terms of latency and power consumption.

If a primitive is subdivided into K sub-primitives, 2K bits are used for the presence indication of the primitive and these bits will be included with the remaining primitive data for the primitive during the intersection test. In the example shown in fig. 2, K is 64 such that 128 bits are used for the presence indication of primitive 202. This is a significant increase in the amount of primitive data used to describe the primitives.

Furthermore, uk patent nos. GB2538856B and GB2522868B describe a rasterized rendering technique in which an opacity state diagram is used to indicate whether a block of texels of a texture is completely opaque, completely transparent, partially transparent or a mixture of these states. The indication in the opacity state diagram may be used to accelerate processing through primitives in the rasterization system. Similar to the presence indication described above with reference to the ray tracing system, each opacity state in the rasterization systems of GB2538856B and GB2522868B is represented by two bits.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

There is provided a method of compressing a sub-primitive presence indication block for use in a rendering system (e.g. for use in intersection testing in a rendering system) into a compressed data block, wherein the sub-primitive presence indication block comprises a plurality of sub-primitive presence indication sub-blocks, the method comprising:

identifying a plurality of candidates for a combination of presence indications; and

for each of the sub-blocks in the sub-element presence indication block:

selecting one of the candidates to be used for representing the sub-block; and is also provided with

An index indicating the selected candidate is stored in the compressed data block.

The method may further include storing candidate data representing one or more of the plurality of candidates in the compressed data block.

The candidate data representing one or more of the plurality of candidates may be stored in a respective one or more entries in a codebook in the compressed data block.

Each of the sub-blocks may have four presence indications arranged in a 2 x 2 arrangement, and each of the one or more of the plurality of candidates represented by candidate data in the compressed data block may be represented by five bits of candidate data comprising:

A bit indicating one of: (i) A first palette that is completely and partially present, and (ii) a second palette that is completely and partially present; and

each of the four presence indications for the candidate indicates one bit to indicate a partial presence or another state of the indicated palette.

The sub-primitive presence indication block may comprise 64 sub-blocks, each of which may comprise four sub-primitive presence indications, such that the sub-primitive presence indication block may have 256 presence indications.

Each of the sub-blocks may have sixteen presence indications including four 2 x 2 presence indication groups, and for each 2 x 2 presence indication group of the four 2 x 2 presence indication groups of sub-blocks, each of the one or more candidates represented by candidate data in the compressed data block may be represented by five bits of candidate data including:

Each of the four presence indications for the 2 x 2 group indicates one bit to indicate a partial presence or another state of the indicated palette.

Each of the sub-blocks may have sixteen presence indications including four groups of 2 x 2 presence indications, and wherein each of the one or more of the plurality of candidates represented by candidate data in the compressed data block may be represented by:

for each of four 2 x 2 presence indication groups of a sub-block, a sub-index for indicating an entry in a sub-codebook; and is also provided with

The subcodebook includes a plurality of entries of a 2 x 2 presence indication group, wherein each entry in the subcodebook may include five bits of candidate data including:

The sub-primitive presence indication block may comprise 256 sub-blocks, each of which may comprise sixteen sub-primitive presence indications, such that the sub-primitive presence indication block may have 4096 presence indications.

The sub-primitive presence indication block may comprise 576 sub-blocks, each of which may comprise sixteen sub-primitive presence indications, such that the sub-primitive presence indication block may have 9216 presence indications.

At least one candidate of the plurality of candidates may not be represented by candidate data stored in the compressed data block.

Each of the at least one of the plurality of candidates not represented by candidate data stored in the compressed data block may represent a sub-block, the sub-element presence indications of the sub-blocks being all the same.

The sub-primitive presence may indicate that the presence status of their respective sub-primitives is indicated as one of: (i) complete presence, (ii) complete absence, or (iii) partial presence.

The index for a sub-block may be a T-bit index, where T may be less than the number of sub-element presence indications in one of the sub-blocks. For example, each of the sub-blocks may include four sub-element presence indications, and t=3. As another example, each of the sub-blocks may include sixteen sub-element presence indications, and t=5.

The sub-primitive presence indication block may have a plurality of regions, each of the regions comprising a plurality of the sub-blocks, wherein one or more of the identified candidates may be region-specific candidates such that the region-specific candidates are candidates for sub-blocks within only one of the regions of the block.

One or more of the identified candidates may be global candidates such that the global candidates are candidates for sub-blocks within all of the regions of the block.

The candidate data stored in the compressed data block may have: (i) Candidate data representing three global candidates, and (ii) candidate data representing two region-specific candidates for each of the regions.

The candidate data stored in the compressed data block may have candidate data representing 29 global candidates.

The area of the block may be a quadrant of the block.

The method may further comprise storing in the compressed data block for each of one or more of the regions an indication of a transform to be applied to candidate data representing candidates for sub-blocks within the region.

The transformation may include one or both of rotation and reflection.

Each of the indications of the transformation may include three 1-bit flags: (i) a 90 degree rotation index, (ii) a vertical reflection index, and (iii) a horizontal reflection index.

The selecting one of the candidates to be used for representing the sub-block may comprise selecting a candidate compatible with the presence indication in the sub-block.

The sub-primitives may be rectangular or triangular.

The method may further comprise storing the compressed data blocks.

The rendering system may be a ray tracing system or a rasterization system.

There is provided a compression unit configured to compress a sub-primitive presence indication block for use in a rendering system (e.g. for use in an intersection test in a rendering system) into a compressed data block, wherein the sub-primitive presence indication block comprises a plurality of sub-primitive presence indication sub-blocks, the compression unit being configured to:

identifying a plurality of candidates for a combination of presence indications; and is also provided with

For each of the sub-blocks in the sub-element presence indication block:

A compression unit may be provided that is configured to perform any of the compression methods described herein.

A method of decompressing compressed data to determine one or more sub-primitive presence indications for use in a rendering system (e.g., for use in intersection testing in a rendering system) may be provided, the method comprising:

Receiving a compressed data block for a sub-primitive presence indication block, wherein the sub-primitive presence indication block comprises a plurality of sub-primitive presence indication sub-blocks, and wherein for each of the sub-blocks in the sub-primitive presence indication block, the compressed data block comprises an index indicating one of a plurality of candidates for a combination of presence indications;

reading an index for one of the sub-blocks in the sub-element presence indication block from the compressed data block;

obtaining candidate data representing at least a portion of the candidates indicated by the read index; and

one or more of the presence indications in the sub-block are determined using the obtained candidate data.

A decompression unit may be provided that is configured to decompress the compressed data to determine one or more sub-primitive presence indications for use in a rendering system (e.g. for use in an intersection test in the rendering system), the decompression unit being configured to:

obtaining candidate data representing at least a portion of the candidates indicated by the read index; and is also provided with

The compression unit and/or decompression unit may be embodied in hardware on an integrated circuit. A method of manufacturing a compression unit or a decompression unit in an integrated circuit manufacturing system may be provided. An integrated circuit definition data set may be provided that, when processed in an integrated circuit manufacturing system, configures the system to manufacture a compression unit or a decompression unit. A non-transitory computer-readable storage medium having stored thereon a computer-readable description of a compression unit or a decompression unit that, when processed in an integrated circuit manufacturing system, causes the integrated circuit manufacturing system to manufacture an integrated circuit embodying the compression unit or the decompression unit may be provided.

An integrated circuit manufacturing system may be provided, the integrated circuit manufacturing system comprising: a non-transitory computer readable storage medium having stored thereon a computer readable description of a compression unit or a decompression unit; a layout processing system configured to process the computer readable description to generate a circuit layout description of an integrated circuit embodying the compression unit or the decompression unit; and an integrated circuit generation system configured to manufacture the compression unit or the decompression unit according to the circuit layout description.

A computer program code for performing any of the methods described herein may be provided. A non-transitory computer readable storage medium may be provided having stored thereon computer readable instructions that, when executed at a computer system, cause the computer system to perform any of the methods described herein.

As will be apparent to those skilled in the art, the above features may be suitably combined, and may be combined with any of the aspects of the examples described herein.

Drawings

Examples will now be described in detail with reference to the accompanying drawings, in which:

FIG. 1 shows a pass-through texture applied to two primitives forming a quadrilateral;

FIG. 2 shows a triangle primitive subdivided into 64 sub-primitives;

FIG. 3a illustrates a ray tracing system according to examples described herein;

FIG. 3b shows an indication of the presence of a sub-triangle within a triangle primitive;

FIG. 3c illustrates the presence indication shown in FIG. 3b, wherein sub-blocks having consistent presence status have been cleared;

FIG. 4 is a flow chart of a method of compressing a sub-primitive presence indication block into a compressed data block;

FIG. 5 illustrates an example of a compressed data block;

FIG. 6a shows a first object partially present;

FIG. 6b shows a presence indication block of a first object, the presence indication block comprising 64 2X 2 presence indication sub-blocks;

FIG. 6c illustrates a presence indication block of a first object in which sub-blocks having consistent presence status have been cleared;

FIG. 6d illustrates the presence indication block of the first object shown in FIG. 6c, wherein some of the sub-blocks exhibiting symmetry are highlighted;

FIG. 7a shows a second object partially present;

FIG. 7b shows a presence indication block of a second object, the presence indication block comprising 64 2X 2 presence indication sub-blocks;

FIG. 7c illustrates a presence indication block of a second object in which sub-blocks having consistent presence status have been cleared;

FIG. 7d illustrates the presence indication block of the second object shown in FIG. 7c, wherein some of the sub-blocks exhibiting symmetry are highlighted;

FIG. 8 is a flow chart of a method of decompressing compressed data to determine one or more sub-primitive presence indications for use in intersection testing;

FIG. 9 illustrates a first example of logic within a decompression unit for decompressing compressed data to determine one or more sub-primitive presence indications of a first or second object;

FIG. 10 illustrates an example of steps that may be performed to determine one or more of the presence indications in a sub-block using the obtained candidate data for the sub-block;

FIG. 11 illustrates a presence indication block of a third object that is partially present, the presence indication block comprising 256 4 x 4 presence indication sub-blocks;

FIG. 12 illustrates an example of logic within a decompression unit for decompressing compressed data to determine one or more sub-primitive presence indications for a third object;

FIG. 13 illustrates how candidate data may be formatted and obtained in another example within a decompression unit for decompressing compressed data to determine one or more sub-primitive presence indications of a third object;

FIG. 14 illustrates a computer system in which a compression unit and/or a decompression unit is implemented;

FIG. 15 illustrates an integrated circuit manufacturing system for generating an integrated circuit embodying a compression unit or a decompression unit; and

fig. 16 shows a transformation unit for applying a transformation to sub-primitive presence indication sub-blocks.

The figures illustrate various examples. Skilled artisans will appreciate that element boundaries (e.g., blocks, groups of blocks, or other shapes) illustrated in the figures represent one example of boundaries. In some examples, it may be the case that one element may be designed as a plurality of elements, or that a plurality of elements may be designed as one element. Where appropriate, common reference numerals have been used throughout the various figures to indicate like features.

Detailed Description

The following description is presented by way of example to enable a person skilled in the art to make and use the invention. The invention is not limited to the embodiments described herein and various modifications to the disclosed embodiments will be apparent to those skilled in the art.

Embodiments will now be described by way of example only. In the present disclosure, the sub-primitive presence indication represents the presence state of the corresponding sub-primitive.

In the ray tracing system described in the background section above, each presence indication is stored with 2 bits, such that if a primitive is subdivided into K sub-primitives, 2K bits are used for the presence indication of the primitive. Reducing the amount of data used to represent the presence indication is beneficial in reducing the amount of memory required to store the presence indication and reducing the amount of data transferred between different components in the ray tracing system. Thus, a reduction in the amount of data used to represent the presence indication may reduce the latency, power consumption, and silicon area of the ray tracing system.

As a simple example of how to compress the presence indications, it is noted that two bits are used for each presence indication to indicate one of three presence states (fully present, fully absent or partially present), so if the presence information of multiple sub-primitives is combined, the presence indication of a group of sub-primitives may be able to be represented with less than 2 bits per sub-primitive on average. As an example, a set of 5 sub-primitives (i.e., 3 ⁵ The presence indication of 243 possible presence state combinations may be stored in 8 bits (i.e., 2 ⁸ =256 possible encodings). In this simple example, if a primitive is subdivided into K sub-primitives, approximately 1.6K bits are used for the presence indication of the primitive. 2KThe compression of bits to 1.6K bits represents a compression ratio of 80%, where the compression ratio is defined as the size of compressed data divided by the size of uncompressed data. Compressing the data to a greater extent results in a smaller compression ratio.

In the examples described below, compression and decompression techniques are described that compress the presence indication to a greater extent (i.e., achieve a lower compression ratio) than the simple examples described above.

It is noted that having three states is attractive from a quality perspective, as opposed to simpler schemes with only "fully present" and "fully absent" states, because a dual state system is likely to result in aliasing (i.e., jagged edges), unless very high resolution, and therefore memory intensive masking, is possible. Furthermore, while a two-state scheme may benefit from not having to run a shader to "check texture," this also means that some use cases that do require partial transparency (e.g., modeling a tinted glass) will be suboptimal. Nevertheless, one skilled in the art can adapt the examples described below in which the vector quantization method is used to a system having only two states.

Fig. 3a shows a ray tracing system 300 comprising a ray tracing unit 302 and a memory 304. Ray tracing system 300 also includes a geometric data source 303 and a ray data source 305. Ray tracing unit 302 includes a processing module 306, an intersection test module 308, and processing logic 310. The intersection test module 308 includes one or more box intersection test units 312, one or more primitive intersection test units 314, and a decompression unit 318. The geometric data source includes a compression unit 316. In operation, ray traced unit 302 receives geometry data defining objects within a 3D scene from geometry data source 303. Ray traced unit 302 also receives ray data from ray data source 305 that defines rays to be subjected to intersection testing. The light may be primary or secondary. The processing module 306 is configured to generate an acceleration structure based on the geometric data and send the acceleration structure to the memory 304 for storage therein. After the acceleration structure has been stored in the memory 304, the intersection test module 308 canThe nodes of the acceleration structure (e.g., including data defining an axis alignment box corresponding to the node) are retrieved from memory 304 to perform a ray intersection test for the retrieved nodes. The box intersection test unit 312 performs an intersection test to determine whether a ray intersects each of the bounding boxes corresponding to the nodes of the acceleration structure (where a miss may cull a large piece of the hierarchical acceleration structure). If a leaf node intersection is determined, primitive intersection test unit 314 performs one or more primitive intersection tests to determine which object(s), if any, the ray intersects. In this example, the primitives are triangles or pairs of triangles, but it is noted that in other examples, the primitives may be other shapes, e.g., other convex planar polygons, such as rectangles (which include squares), pentagons, hexagons, and the like. The results of the intersection test indicate which primitive in the scene the ray intersects, and these results may also indicate other intersection data, such as the location on the object where the ray intersects the object (e.g., defined in terms of barycentric coordinates), and may also indicate the distance along the ray at which the intersection occurred, such as the Euclidean distance or a (signed) multiple of the ray length. In some cases, the intersection determination may be based on whether the distance along the ray at which the intersection occurred is between the minimum clipping distance and the maximum clipping distance of the ray (which may be referred to as t _min And t _max ). The results of the intersection test are provided to processing logic 310. Processing logic 310 is configured to process the results of the intersection test to determine rendering values for images representing the 3D scene. The rendering values determined by processing logic 310 may be transferred back to memory 304 for storage therein to represent images of the 3D scene.

In the examples described herein, ray tracing systems use acceleration structures in order to reduce the number of intersection tests that need to be performed on a ray for a primitive. It should be noted, however, that some other examples may not use acceleration structures and that rays may be simply tested for primitives without first attempting to reduce the number of intersection tests that need to be performed using acceleration structures.

When the primitive intersection test unit 314 of the intersection test module 308 determines that a ray intersects a primitive having a partial presence, then typically the intersection test module 308 will need to stop while a shader program is executing on the processing logic 310 to address the presence of the primitive at the intersection. Some of these pauses may be avoided by using sub-primitive presence indications as described herein.

In the examples described below, compression and decompression of the sub-primitive presence indication is performed using a vector quantization method, which is a potentially lossy compression method. In this method, sub-primitive presence indication blocks are compressed into compressed data blocks for use in intersection testing in a ray tracing system. The inventors have realized that the distribution of presence indicators is rarely random because the primitives represent physical structures. Sub-primitives with a particular presence state are typically next to sub-primitives with the same presence state. This order of presence status distribution (i.e., non-randomness) may be used to achieve better compression of the presence indication block.

Note that in the example shown in fig. 3a, the compression unit 316 is implemented in the geometry data source 303, but in other examples, the compression unit 316 may be implemented in a different component than the geometry data source 303, and in some examples may be implemented in the ray tracing unit 302, e.g., as part of the intersection test module 308. Furthermore, in the example shown in fig. 3a, decompression unit 318 is implemented as part of intersection test module 308, but in other examples it may be implemented somewhere other than as intersection test module 308.

A method of compressing a set of sub-triangles in a triangle primitive is described with reference to fig. 3 b. The exemplary triangle 320 has been subdivided into 256 sub-triangle areas, and each of these sub-triangle areas has been shaded according to a corresponding presence indication, with a light gray corresponding to 'completely absent', a dark gray corresponding to 'completely present', and a medium gray corresponding to 'check texture' (i.e., partially present). In turn, these sub-triangle areas are again grouped into a larger triangle set of 4 adjacent sub-triangles. Fig. 3b shows 64 of these larger triangle sets. These larger sets of triangles may be referred to as "vectors" or "sub-blocks. As an example, 322 is a vector comprising four 'fully present' sub-triangles, 324 is a vector of four 'fully absent' indications, and 326 is a vector of four 'check texture' indications. Other vectors are inconsistent: 328 has three 'no at all' and one 'check texture' indications, while 330 has one 'no at all' and three 'check texture' indications.

Furthermore, it is reasonable to assume that the presence map is continuous/bandwidth limited, i.e. it cannot immediately change from "completely present" to "completely absent" without some transitional "check texture" state-this is due to interpolation in texture reconstruction or due to non-representable frequency content-it is desirable to avoid aliasing-then for vectors less than 81 (i.e. 3 ⁴ ) A number of possible patterns. The list of possible cases gives 31 cases, which can be uniquely indicated using 5 bits per vector. In the case of 64 vectors in the example, this uses 320 bits.

The inventors have appreciated that in vector sets, the same 'pattern' frequently repeats. To make this more intuitive, the very frequent occurrence of unified cases in 2100 (e.g., such as those matching 2010, 2011, and 2012) have been 'culled' to produce 340, as shown in fig. 3 c. Some of the remaining (non-uniform) vectors are denoted by reference numerals 342, 344, 346, 348, 350 and 352, respectively, in fig. 3 c. Consider vector 342: the same pattern reappears at 344 and 346 and, in view of reflection along the diagonal edges, the pattern also appears in two adjacent vectors shown at 348. Similarly, 350 and 352 show three occurrences of different patterns.

In case of frequent pattern repetition, a scheme related to VQ (vector quantization) may be employed to more compactly represent the presence map, but there may be some risk of additional checking of texture state. In this example, each vector is replaced with a 3-bit index, which is used to index into the LUT/codebook. A codebook may represent a plurality of vectors, for example using 5 bits per vector. As an example, the first 3 index values (i.e., 0 to 2) may implicitly reference three 'unified' vectors, such as the vectors matching 322, 324, and 326, and explicitly store five additional vectors indexed by, for example, 3, 4, 5, 6, and 7 at the cost of 5 x 5 bits. Thus, the entire map can be represented with 64×3+5×5=217 bits. The 256 uncompressed presence indications shown in fig. 3b are represented by 512 bits before compression, so compressing these presence indications to 217 bits represents a compression ratio of 42% (where compression ratio is defined as the compressed size of the data divided by the uncompressed size of the data). Since this scheme may be lossy, sometimes a vector may need to reference a different but 'compatible' vector in the codebook. Compatible meanings will be defined later in this document.

The inventors have additionally appreciated the following: first, triangles are rarely isolated, but are typically connected to other triangles along a common shared edge, and thus can often be grouped in at least pairs, each pair sharing a common edge. A single triangle may be represented as a 'pair' where the second triangle is degenerate or marked as absent. Second, the presence graph of a pair of triangles may be represented as a square, with one of the diagonals corresponding to a shared edge. Note that although these triangles typically have different sizes in world or model space, one can consider their representation in, for example, barycentric coordinates, allowing the pair of triangles to be mapped consecutively to a square. The presence in the local areas on both sides of the diagonal should be the same due to the continuity. Third, indexing into a set of sub-squares in a square graph is easier than indexing into sub-triangles in a triangle graph. Taking fig. 1 as an example of a pair of adjacent triangles to which leaf texture is applied, fig. 6b shows a 16 x 16 presence diagram of the triangle pair.

As described above, another method of compressing a sub-primitive existence indication block into a compressed data block is described with reference to the flowchart of fig. 4. The compression is performed by the compression unit 316. Fig. 5 illustrates an example of a compressed data block 500 resulting from performing the method illustrated in fig. 4.

In step S402, the compression unit 316 receives a sub-primitive existence indication block to be compressed. The sub-primitive presence indication block comprises a plurality of sub-primitive presence indication sub-blocks. For example, fig. 6a shows a first object that is partially present. In this example, the object is a leaf and is represented by a pair of triangle primitives, which form a quadrilateral because they share one edge. Fig. 6b shows the presence indication block of the first object received at the compression unit 316. The object is divided into 256 sub-primitives arranged in a 16 x 16 square. The presence indication block comprises 64 2 x 2 presence indication sub-blocks of the respective sub-picture elements. In other examples, a block may have a different number of sub-primitives, and the sub-primitives may be arranged in other shapes (e.g., rectangular or triangular), and/or a sub-block may have a different number of sub-primitives, and the sub-primitives may be arranged in other shapes (e.g., rectangular or triangular). In fig. 6b, each presence indication is represented by one of three hatching lines to represent one of three possible presence states. In particular, presence indications indicating the complete presence of the respective sub-element are represented by dark hatching; presence indications indicating the complete absence of the corresponding sub-element are indicated by light hatching; and presence indications indicating the presence of the respective sub-primitive portions are represented by medium-level hatching. As described above, the presence indication received in step S402 may be determined in a preprocessing step, which may be performed by an Application Programming Interface (API) or as part of a process of creating primitives and textures, for example, by a user. Each of the (uncompressed) sub-primitive presence indications is represented with 2 bits to indicate one of three presence states: (i) complete presence, (ii) complete absence, or (iii) partial presence.

In step S404, the compression unit 316 identifies a plurality of candidates for the combination of the presence indications. These candidates may be referred to as "vectors". The combination of presence indications is a candidate for a sub-block, i.e. the combination may be referred to as a "sub-block combination" and the candidate may be referred to as a "sub-block candidate". The sub-block combination is a combination of presence indications of sub-blocks. This means that each candidate for a combination of presence indications has the same number and arrangement of presence indications as each of the sub-blocks. For example, if the sub-block has a presence indication of a 2×2 permutation (as shown in fig. 6 b), then each of the candidates has a presence indication of a 2×2 permutation.

In step S402, at least some of the candidates may be identified by analyzing the sub-primitive presence indications in the block to find common sub-blocks with a combination of identical presence indications. In particular, the combination of presence indications that are most frequently repeated in a sub-primitive presence indication block may be selected as candidates so as to have the highest number of sub-blocks represented by the corresponding candidates.

Some candidates are frequently used in most presence indication blocks. In particular, sub-blocks in which all presence indications are in the same state (e.g., all present, all absent, or all partially present) are useful for most presence indication blocks. For example, in fig. 6b, of 64 sub-blocks, 18 sub-blocks are completely present, 11 sub-blocks are completely absent, 13 sub-blocks are completely partially present, and 22 sub-blocks have a mix of presence states. Candidates representing sub-blocks in which all presence indications are the same may be identified as candidates without having to analyze the presence indication block.

Fig. 6c shows that the presence indication block of the first object, 42 sub-blocks with unified presence status have been cleared, showing the remaining 22 sub-blocks. These remaining sub-blocks are analyzed to identify suitable candidates for the presence-indicating sub-block in the block.

Note that in order to achieve compression at useful compression ratios, the number of candidates identified in step S404 is significantly less than the number of possible combinations of presence indications. For example, when there are four presence indications in a sub-block, and each presence indication may have one of three presence states, then there is a combination of 81 presence indications for the sub-block (3 ⁴ =81). However, as described above, assuming that a fully-existing sub-primitive cannot share an edge or a vertex with a fully-absent sub-primitive, and noting that in the example shown in FIG. 6b, all four sub-primitives within a sub-block share a vertex, then a sub-block cannot include a fully-existing sub-primitive and a fully-absent sub-primitiveBoth elements. Thus, a combination of only 31 presence indications may occur. However, the number of candidates identified may be less than 31, for example, about 10.

In step S406, the compression unit 316 selects one of the identified candidates to be used for representing the sub-block for each of the sub-blocks in the sub-element existence indication block. Specifically, candidates that are compatible (e.g., most compatible (or one of equally most compatible)) with the presence indication in the sub-block are selected. If the presence indication in the sub-block matches (i.e. is the same as) the presence indication in the candidate, the candidate is automatically fully compatible with the presence indication in the sub-block and thus the candidate will be selected. However, some sub-blocks may not have exactly matching candidates, since the number of candidates is significantly smaller than the number of possible combinations of presence indications in the sub-block. The presence states that are completely present or completely absent in the sub-block are compatible with the presence states that are partially present in the candidate, because the end result will be the same, although the texture operation will need to be checked. Thus, a candidate is compatible with a presence indication in a sub-block if the candidate is the same as the sub-block, or if one or more of the presence indications of complete presence or complete absence in the sub-block is replaced by a partial presence state in the candidate. This can be done by describing the candidates as vectors: a= (a ₀ ,A ₁ ,A ₂ ,A ₃ ) And describes the sub-blocks as vectors: b= (B) ₀ ,B ₁ ,B ₂ ,B ₃ ) Mathematical writing is performed in which the components of the vector represent presence states as either completely present (denoted "FP"), completely absent (denoted "FA"), or partially present (denoted "PP"). According to this representation, vector a is compatible with vector B when and only when:

(A _i ＝B _i )∨((A _i ＝PP)∧B _i ∈{FP,FA})。

it is acceptable to represent a completely or completely absent presence state as a partial presence, as this does not lead to rendering errors during intersection testing. In practice, this means that the intersection test process will examine the texture to determine the existence of primitives at the intersection with the sub-primitives. Thus, this is acceptable because the lossy nature of the compression applied to the presence indication has lost the opportunity to reduce the latency of the intersection test procedure by using sub-primitive indications, but without rendering errors. It is unacceptable to represent a completely or partially present presence state as completely absent or a completely absent or partially present presence state as completely present, as this may lead to rendering errors during intersection testing. Given compatible candidates X and Y for a given sub-block S, then X may be considered more compatible than Y if X has a higher compatibility score, where the compatibility score between candidate C and sub-block/vector S is defined as:

Candidates in which all presence states are partially present are compatible with any sub-block, so the candidate can always be used as a backup, but it is preferable to keep as many FP and FA presence indications as possible, as this is the benefit of using presence indications.

Note that higher resolution presence maps typically have a higher proportion of indications of complete presence or complete absence (as opposed to more 'expensive' inspection texture indications), and thus, given a fixed memory/storage budget, higher resolution but lossy schemes may be preferred from a performance standpoint over lower resolution but lossless schemes.

Further, in step S406, the compression unit stores an index indicating a selected candidate for each of the sub-blocks in the compressed data block 500. In the compressed data block 500 shown in fig. 5, an index is denoted 502. The index for the sub-block is a T-bit index. This means that the index for a sub-block may indicate 2 for that sub-block ^T One of the different candidates. In the examples described herein, to achieve moderate compression, the value of T is less than the number of sub-primitive presence indications in one of the sub-blocks. As an example, when each of the sub-blocks includes four sub-element presence indications, T is set to less than or equal to 3 to achieve a good compression level. To avoid quantifying the presence indication, T is preferably not too low. As an example, T may be 3. When t=3, the index of a sub-block may indicate one of 8 different candidates for the sub-block. In other examples, there may be a different number of sub-primitive presence indications in a sub-block, and/or T may have a value other than 3.

In step S408, the compression unit 316 stores candidate data 504 representing one or more candidates among the candidates in the compressed data block 500. For example, candidate data representing one or more of the candidates may be stored in a respective one or more entries in a codebook in the compressed data block. In some examples, at least one of the candidates (e.g., the candidates representing sub-blocks whose sub-element presence indications are all the same) is not represented by candidate data stored in the compressed data block. For example, fully present, fully absent, and fully partially present candidates may be so common that they do not need to be explicitly stored in the codebook, but other identified candidates are stored in the codebook.

In the example where each of the sub-blocks has four presence indications in a 2 x 2 arrangement (as shown in fig. 6 b), all sub-elements represented by the sub-blocks share a vertex (in the middle of the 2 x 2 arrangement). Gradients in alpha values of textures are finite, so since all sub-primitives in a sub-block meet at a single point, in this 2 x 2 example, it is not possible to have both fully-existing sub-primitives and fully-non-existing sub-primitives in the same sub-block. In this example, each of the candidates for which candidate data is stored in compressed data block 500 may be represented by five bits of candidate data:

-a bit indicating one of: (i) A first palette that is completely and partially present, and (ii) a second palette that is completely and partially present; and

-one bit for each of the four presence indications of the candidate to indicate the partial presence or another state of the indicated palette.

Increasing the number of candidates in the codebook is useful because it allows a larger number of candidates, so a better match with the sub-block that the candidate is used to represent may be available. This means that fewer presence indications of complete presence or complete absence will need to be represented as part of the presence in the candidate. However, in terms of the size of compressed data, it is not beneficial to increase the number of bits (T) used for indexing. If the number of candidates in the codebook increases, but T does not increase, then the number of candidates available for selecting a particular sub-block within the presence indication block may remain at 2 ^T But different candidates may be used for different ones of the sub-blocks within the presence indication block. In particular, different candidates may be more suitable for different sub-blocks based on the spatial locations of the sub-blocks within the block. In particular, the sub-element presence indication block may be divided into a plurality of regions (e.g., quadrants), each of which includes a plurality of sub-blocks. One or more of the identified candidates may be region-specific candidates such that the region-specific candidates are candidates for sub-blocks within only one (or within a subset (but not all)) of the regions of the block. However, some of the identified candidates may be global candidates such that the global candidates are candidates for sub-blocks within all of the regions of the block. Note that selecting candidates to place into a region-specific or global set based on distribution is a 'empirical discussion' likely to produce better results, but is not mandatory.

For example, candidate data 504 shown in compressed data block 500 in fig. 5 includes candidate data 508 for one or more global candidates, and candidate data 510 for the presence of one or more region-specific candidates indicating each of four regions in the block ₁ 、510 ₂ 、510 ₃ And 510 ₄ . In the examples described below, the candidates stored in compressed data block 500The selection data 504 has: (i) Candidate data representing three global candidates, and (ii) candidate data representing two region-specific candidates for each of four regions. In other examples (not described in detail herein), candidate data stored in a compressed data block has: (i) Candidate data representing four global candidates, and (ii) candidate data representing one region-specific candidate for each of the four regions.

In the example shown in fig. 6b, four regions (quadrants) are shown: (i) an upper left quadrant comprising 16 sub-blocks in an upper left quadrant of the block, (ii) an upper right quadrant comprising 16 sub-blocks in an upper right quadrant of the block, (iii) a lower left quadrant comprising 16 sub-blocks in a lower left quadrant of the block, and (iv) a lower right quadrant comprising 16 sub-blocks in a lower right quadrant of the block. Note that fig. 6c is essentially a simplified representation of 6b, in fig. 6c those sub-blocks that are consistent, i.e., one of all present, all absent, or all partially present, have been 'cleared' for purposes of illustration.

FIG. 6d illustrates the presence indication block of the first object shown in FIG. 6c, wherein some of the sub-blocks exhibiting symmetry are highlighted. Specifically, there are two sub-blocks in the upper left quadrant of the block that have been circled with solid lines, and the two sub-blocks have three sub-element presence indications indicating non-presence sub-elements, and one sub-element presence indication indicating a partially presence sub-element, the partially presence sub-element being located at the lower right of the 2×2 sub-block. Similarly, there are two sub-blocks in the upper right quadrant of the block that have been encircled with solid lines and have three sub-element presence indications indicating non-existing sub-elements and one sub-element presence indication indicating partially existing sub-elements, the partially existing sub-elements being located at the lower left of the 2 x 2 sub-block. In the lower left quadrant of the block there is one sub-block that has been circled with a solid line and this sub-block has three sub-element presence indications indicating non-presence sub-elements and one sub-element presence indication indicating a partially presence sub-element, the partially presence sub-element being located at the upper right of the 2 x 2 sub-block. In the lower right quadrant of the block there is one sub-block that has been circled with a solid line and this sub-block has three sub-element presence indications indicating non-presence sub-elements and one sub-element presence indication indicating a partially presence sub-element, the partially presence sub-element being located at the upper left of the 2 x 2 sub-block. It can be seen that although the sub-blocks are not identical, there is some symmetry between the sub-blocks. Specifically, with respect to the sub-block indicated by the solid circle in the upper left quadrant: the sub-block indicated by the solid circle in the upper right quadrant is rotated 90 degrees clockwise, (ii) the sub-block indicated by the solid circle in the lower left quadrant is rotated 90 degrees counterclockwise, and (iii) the sub-block indicated by the solid circle in the lower right quadrant is rotated 180 degrees.

Similarly, the sub-blocks indicated with dashed circles in fig. 6d also exhibit symmetry, depending on the quadrant in which the sub-block is located. Each of the sub-blocks indicated with dashed circles has three sub-element presence indications indicating fully present sub-elements and one sub-element presence indication indicating partially present sub-elements. In the upper left quadrant of the block there is one sub-block that has been encircled with a dashed line and this sub-block has three sub-element presence indications indicating the presence of sub-elements and one sub-element presence indication indicating the presence of sub-elements that are partially located at the upper left of the 2 x 2 sub-block. There are two sub-blocks in the upper right quadrant of the block, one sub-block in the lower left quadrant of the block, and one sub-block in the lower right quadrant of the block, which have been encircled with dashed lines. Relative to the sub-block indicated with the dashed circle in the upper left quadrant: the sub-block indicated by the dashed circle in the upper right quadrant is rotated 90 degrees clockwise, (ii) the sub-block indicated by the dashed circle in the lower left quadrant is rotated 90 degrees counterclockwise, and (iii) the sub-block indicated by the dashed circle in the lower right quadrant is rotated 180 degrees.

Sub-blocks within a block typically exhibit symmetry, depending on the quadrant in which the sub-block is located, because the partially existing texture and primitives will typically represent a geometric shape with some symmetrical structure, such as a leaf or fence, and the structure will tend to be located approximately in the center of the texture/primitive. This symmetry can be exploited to better exploit global candidates. For example, a first global candidate may be stored and used (during decompression) for all sub-blocks indicated by the solid circles in fig. 6d, and a quadrant-dependent transformation (e.g., rotation and/or reflection) is applied to the first global candidate such that the first global candidate matches each of the sub-blocks indicated by the solid circles. Similarly, a second global candidate may be stored and used (during decompression) for all sub-blocks indicated by the dashed circle in fig. 6d, with quadrant-dependent transformations (e.g., rotation and/or reflection) applied to the second global candidate such that the second global candidate matches each of the sub-blocks indicated by the dashed circle.

In step S410, the compression unit 316 stores, in the compressed data block 500, for each of one or more of the regions, an indication of a transform to be applied to candidate data representing candidates for sub-blocks within the region. The transformation may include one or both of rotation and reflection. If there are R regions in a block, an indication of the transform may be stored for the (R-1) ones of the regions. In the primary example described herein, the region is a quadrant of the block (such that r=4), but in other examples, the region may be something other than a quadrant. The indication of the transformation is indicated in fig. 5 as 506. For example, each indication of the transform may have 2 or 3 bits.

As an example, referring to the examples shown in fig. 6a to 6d, no transform indication is stored for the upper left quadrant, a transform indication indicating a clockwise rotation of 90 degrees is stored for the upper right quadrant, a transform indication indicating a counterclockwise rotation of 90 degrees is stored for the lower left quadrant, and a transform indication indicating a rotation of 180 degrees is stored for the lower right quadrant.

In step S412, the compression unit 316 outputs the compressed data block 500 for storage. The compressed data blocks 500 may be stored with the primitive data for the primitives, for example, in the geometric data source 303. The compressed data blocks may be passed to ray tracing unit 302 along with primitive data for the primitives and may be stored in memory 304 and/or memory within intersection test module 308 for use by primitive intersection test unit 314 as part of performing intersection tests of the rays with respect to the primitives.

As another example, fig. 7a shows a second object that is partially present. In this example, the object is a plant and is represented by a pair of triangle primitives that form a quadrilateral because they share one edge. Note that the size of a pair of triangles actually used is smaller than that shown by the rectangle of the image, and therefore the pair of triangles more tightly restrict the visible portion of the plant. Specifically, the right hand side has moved to the left. Fig. 7b shows the presence indication block of the second object received at the compression unit 316. Examples of presence indication blocks described herein (e.g., shown in fig. 6b, 7b, and 11) are trimmed from the top, left, right, and bottom to find opaque or partially transparent boundaries. As in the first example described above, each (uncompressed) sub-primitive presence indication in the (uncompressed) sub-primitive presence indication is represented with 2 bits to indicate one of three presence states: (i) complete presence, (ii) complete absence, or (iii) partial presence. The second object is divided into 256 sub-primitives arranged in a 16 x 16 square. The presence indication block comprises 64 2 x 2 presence indication sub-blocks of the respective sub-picture elements. In fig. 7b, there are no fully existing sub-primitives, so each presence indication is represented by one of the two hatching lines to represent one of the two remaining presence states. In particular, presence indications indicating the complete absence of the respective sub-element are represented by light hatching; and presence indications indicating the presence of the respective sub-primitive portions are represented by medium-level hatching.

In fig. 7b, of the 64 sub-blocks, zero sub-blocks are completely present, 25 sub-blocks are completely absent, 22 sub-blocks are completely partially present, and 17 sub-blocks have a mix of presence states. Fig. 7c shows the presence indication block of the second object, 47 sub-blocks with unified presence status having been cleared, showing the remaining 17 sub-blocks. These remaining sub-blocks may be analyzed to identify suitable candidates for the presence-indicating sub-block in the block.

FIG. 7d illustrates the presence indication block of the second object shown in FIG. 7c, wherein some of the sub-blocks exhibiting symmetry are highlighted. Specifically, there are two sub-blocks in the upper left quadrant of the block that have been circled with solid lines, and the two sub-blocks have two sub-element presence indications indicating non-presence sub-elements, and two sub-element presence indications indicating partially presence sub-elements, the partially presence sub-elements being located above and above left and right of the 2×2 sub-blocks. Similarly, there are three sub-blocks in the upper right quadrant of the block that have been encircled with solid lines, and these three sub-blocks have two sub-element presence indications indicating non-presence sub-elements, and two sub-element presence indications indicating partially presence sub-elements, the partially presence sub-elements being located above and below left of the 2 x 2 sub-block. In the lower left quadrant of the block there is one sub-block that has been circled with a solid line and this sub-block has two sub-element presence indications indicating non-presence sub-elements and two sub-element presence indications indicating partially presence sub-elements, the partially presence sub-elements being located above left and above right of the 2 x 2 sub-block. In the lower right quadrant of the block there is one sub-block that has been circled with a solid line and this sub-block has two sub-element presence indications indicating non-presence sub-elements and two sub-element presence indications indicating partially presence sub-elements, the partially presence sub-elements being located above left and above right of the 2 x 2 sub-block. It can be seen that the sub-blocks indicated with solid circles in the upper left quadrant are relative to: the sub-blocks indicated by the solid circles in the upper right quadrant are rotated 90 degrees counterclockwise (i) the sub-blocks indicated by the solid circles in the lower left quadrant are not transformed, and (iii) the sub-blocks indicated by the solid circles in the lower right quadrant are not transformed. Note that for 4 quadrants, only 3 transform indications need to be stored, so in this example the upper left quadrant is the reference quadrant, i.e. no transform indication is stored for this quadrant, a transform indication indicating a 90 degree counter-clockwise rotation may be stored for the upper right quadrant, a transform indication indicating an identity transform may be stored for the lower left quadrant, and a transform indication indicating an identity transform may be stored for the lower right quadrant.

Note that there is another sub-block in the lower right quadrant (there are two sub-blocks above the circled one), there are two sub-element presence indications indicating partially present sub-elements and two sub-element presence indications indicating non-present sub-elements, but the arrangement of the indications in this (un-circled) sub-block is different from the arrangement in the circled sub-block in the lower right quadrant. In the example detailed herein, a single transformation indication is stored for the quadrant, so in this example, a single candidate cannot be used to exactly match the two sub-blocks. The choice of candidates and transforms for the quadrants is a global optimization problem to find the best fit for all sub-blocks ideally, where the best fit is defined as having as few mandatory compatibility alternatives as possible. In this example, the non-circled sub-blocks may be represented using different candidates (e.g., compatible candidates, e.g., fully partially existing candidates). A success metric of the global optimization problem may be used to determine whether to accept the selection of candidates and transforms, where the success metric may be based on a proportion of the presence indication of an accurate representation and/or a final size of the compressed data block.

For example, each transform indication may be represented by 2 bits to indicate one transform of a set of four transforms. For example, the four transforms may include: (i) identity, i.e. no transformation, (ii) rotation 90 degrees clockwise, (iii) rotation 90 degrees counter clockwise, and (iv) rotation 180 degrees. In other examples, the transformation may include reflection, such as reflection on a vertical, horizontal, or diagonal axis. The set of four transforms that are selectable may be the same for each of the three quadrants (upper right, lower left, and lower right), or the set of four transforms that are selectable may be different for each of the three quadrants.

In other examples, each transformation indication may be represented with fewer or more than 2 bits. For example, each transformation indication may be represented with 3 bits, such that 8 options are available for each transformation indication. For example, eight transforms may include: (i) identity transformation, (ii) rotation 90 degrees clockwise, (iii) rotation 90 degrees counter clockwise, (iv) rotation 180 degrees, (v) reflection on vertical axis, (vi) reflection on horizontal axis, (vii) reflection on diagonal axis in case of x=y, or (viii) reflection on diagonal axis in case of x= -y. The transformation is equivalent to the permutation of the presence indication within the sub-blocks and may be implemented with a matrix or with a chain of multiplexer units only in the transformation unit. The use of a multiplexer unit is easy to implement in hardware and introduces little latency in the processing.

Fig. 16 shows an example of such a transformation unit for applying a transformation to a sub-primitive presence indication sub-block. This unit takes as input the coding of a transform comprising three 1-bit flags: "make 90 counter-clockwise rotation" 1602 (which may be referred to as a 90 degree rotation flag), "make up/down reflection" 1604 (which may be referred to as a vertical reflection flag), and "make left/right reflection" 1606 (which may be referred to as a horizontal reflection flag), and four of five bits encoding sub-blocks corresponding to four indications, namely an upper left bit 1612, an upper right bit 1614, a lower left bit 1616, and a lower right bit 1618. Residual of 5-bit encoding ⁵ Bits (i.e., bits of the palette that select whether the sub-block is (a) fully and/or partially present or (b) fully and/or partially present) are not affected by this transform process and are not shown in fig. 16. Note that while the rotation is 90 degrees counter-clockwise in this example, in other examples the rotation may be 90 degrees clockwise.

The transform unit includes 3 layers, each layer including four 2-input multiplexer ("MUX") units. Each of these MUX units has a select input S and two data inputs a and B. The MUX unit outputs the value provided by input a if the S bit of the MUX unit is zero, otherwise the MUX unit outputs the value of input B if S is one. In the first layer, the select bit S of each of the four MUX units 1622, 1624, 1626, and 1628 is connected to the "make 90 ° counter-clockwise rotation" flag 1602. The indication bits 1612, 1614, 1616 and 1618 are respectively transmitted to these MUX units (1622, 1624, 1626 and 1628) so that if the "90 counter-clockwise rotation" is zero, the output of this layer (i.e., { temp_1, temp_2, temp_3, temp_4 }) will actually be the value { TL, TR, BL, BR }, i.e., identity transform. However, if "make 90 ° counter-clockwise rotation" is one, the output will actually be { TR, BR, TL, BL }, which corresponds to a 90 degree rotation in the clockwise direction.

The select inputs of the second layers of MUX units (1632, 1634, 1636, and 1638) are connected to a "reflect up and down" flag 1604. These MUX units are connected to the respective outputs of layer 1, i.e., { temp_1, temp_2, temp_3, temp_4}, so that if "up-down reflection" is zero, an identity transformation will be performed, i.e., the outputs of layer 2 { temp_5, temp_6, temp_7, temp_8} will have values { temp_1, temp_2, temp_3, temp_4}. If "up and down reflection" is one, the output { Temp_5, temp_6, temp_7, temp_8} will represent the reflection of the input about the horizontal axis, i.e., in practice the output values { Temp_5, temp_6, temp_7, temp_8} of layer 2 will become values { Temp_3, temp_4, temp_1, temp_2}.

Similarly, the select inputs of the third layer of MUX cells (1642, 1644, 1646, and 1648) are connected to the "do left and right reflection" flag 1606. These MUX units are connected to the respective outputs of layer 2, i.e., { temp_5, temp_6, temp_7, temp_8}, so that if "do left-right reflection" is zero, an identity transformation will be performed, i.e., the outputs of layer 3 { tl_out, tr_out, bl_out, br_out } will have values { temp_5, temp_6, temp_7, temp_8}. If "do left-right reflection" is one, the output { TL_out, TR_out, BL_out, BR_out } will represent reflection of the input about the vertical axis, i.e., in practice the output values { TL_out, TR_out, BL_out, BR_out } of layer 3 will become values { Temp_6, temp_5, temp_8, temp_7}. The outputs tl_out, tr_out, bl_out, br_out from layer 3 are output from the transform unit.

The combination of these three layers allows 8 different transforms to be encoded, as summarized below:

in the example with triangle sub-blocks, the same transform processing unit may still be employed. Individual transformations may be incorrect in 'geometry', but the process of shuffling the indicators will still provide a way to create alternative combinations of vectors.

Instead of "transformation" the value of the vector itself may also be modified. For example, the flag may indicate that the palette settings are "inverted," i.e., such that for a given region, stored fully-present and/or partially-present candidates may be interpreted as fully-absent and/or partially-present sub-blocks. In another example, the 'transform' flag may indicate that the indicator bit itself should be inverted. These "inversion" and "transform" flags may be additional flags or stored in place of one or more of the reflection and rotation flags 1602, 1604, 1606.

As an example where a double triangle may be represented with 64 bytes, there may be an additional budget of 32 bytes (256 bits) for a 16 x 16 presence indication block (i.e., 1 bit on average per presence indication and 2 bits each for uncompressed presence indications). To meet this budget, there is an indication that it needs to be compressed to a size of at most 50% of the original uncompressed data. In a 16×16 block, there are 64 2×2 blocks, where each 2×2 block is encoded with a T-bit index, so in the case of t=3, index 502 uses 192 bits (64×3=192) in compressed data block 500. In this example there are three transform indications 506 (one for each of the upper right, lower left, and lower right quadrants of the presence indication block). Each of the transform indications may be represented with 3 bits, such that 9 bits are used for the transform indication. Thus, if a budget of 256 bits is to be met, the candidate data 504 in the compressed data block 500 has 55 remaining bits. As described above, each entry in the codebook uses 5 bits, so there is room for 11 entries in the codebook, i.e., candidate data for 11 candidates in the compressed data block except for three candidates (completely present, completely absent, and completely partially present candidates), which are so common that their candidate data need not be explicitly stored in the compressed data block 500. In this example, candidate data 508 is stored in compressed data blocks for three global candidates, and for each of four quadrants of the presence indication block, candidate data (510 ₁ 、510 ₂ 、510 ₃ And 510 ₄ ) Is stored in compressed data blocks for two region-specific candidates.

In this example, there are 3 bits per index, so the index may indicate one candidate in a set of eight candidates. The quadrant-specific candidate that an index may indicate depends on which of the quadrants the presence associated with the index indicates within. Table 1 shows an example of what different indexes can represent. This is just one example, and in different examples, the index may be interpreted differently.

Table 1: exemplary index

In some examples, if only 2 bits are used for each of the transform indications, then there may be 3 bits available in the compressed data block 500 such that the compressed data block meets a budget of 256 bits. These three bits may be used to shift the boundaries between different quadrants so that these regions do not have to be of equal size or shape.

An advantage of this compression scheme is that it is a fixed length coding scheme, i.e. the compressed data block 500 has a fixed length and the data fields within the block are located at predetermined positions. This is in contrast to other compression schemes, which may be variable length coding schemes, in which the compressed data has a variable length and thus the data field position itself depends on the encoded data. The use of a fixed length coding scheme means that random access to the data within the compressed data block is possible (i.e. it is possible to decompress some of the data within the compressed data block without having to decompress the data entirely), and that the implementation of decompression unit 318 is simplified because the boundaries between the different fields within the compressed data block are known, i.e. the boundaries do not depend on the data stored in those fields.

Random access is particularly useful for the purpose of determining the existence of a primitive at an intersection point when an intersection is found between a ray and the primitive, as this allows the existence of a primitive at that point to be determined from a compressed data block without having to decompress all of the existence indications of the remaining primitives (which are not required).

With reference to the flowchart in FIG. 8, a method performed by decompression unit 318 to decompress compressed data to determine one or more sub-primitive presence indications for use in intersection testing in a ray tracing system is described.

Fig. 9 shows some of the logic in the decompression unit 318. Specifically, decompression unit 318 includes index lookup logic 904, candidate data logic 906, transformation logic 908, and decoder logic 910.

In step S802, the decompression unit receives a compressed data block for the sub-primitive presence indication block. Fig. 9 shows a compressed data block 500 that has been created as described in the example given above. As described above, in this example, for each of the sub-blocks in the sub-element presence indication block, the compressed data block 500 includes an index indicating one of the plurality of candidates for the combination of presence indications. In the example described above, there are 64 2 x 2 sub-blocks within an indication block, and the indices 502 each have 3 bits. In this example, the compressed data block 500 includes three global candidates 508 and two quadrant-specific candidates (510) for each of the quadrants of the presence indication block ₁ 、510 ₂ 、510 ₃ And 510 ₄ ) Is a candidate for the data. The candidate data has 5 bits. There are three transform indications 506 in the compressed data block, where in this example each of the transform indications has 3 bits.

In step S804, the decompression unit 318 receives an indication of one of the sub-blocks in the sub-element presence indication block. Specifically, decompression unit 318 receives sub-block indication 902 at index lookup logic 904. In an example where there is an indication block with 64 sub-blocks arranged in an 8 x 8 arrangement, the sub-block indication may have 6 bits: the 3 bits indicate the x-coordinate of the sub-block and the 3 bits indicate the presence of the y-coordinate indicating the sub-block within the block.

In step S806, the decompression unit 318 reads an index for one of the sub-blocks in the sub-element presence indication block from the compressed data block. Specifically, index lookup logic 904 reads an index for the sub-block indicated by sub-block indication 902 from index 502.

In step S808, the decompression unit 318 obtains candidate data representing at least a part of the candidates indicated by the index read in step S806. For example, in step S808, the decompression unit 318 may obtain candidate data representing candidates indicated by the index read in step S806. Candidate data logic 906 is configured to receive an index from index lookup logic 904 and to receive an indication of a region (e.g., quadrant) of a sub-block. The quadrant of the sub-block may be represented by two bits of sub-block indication 902: the Most Significant Bit (MSB) of the x-coordinate of the sub-block and the MSB of the y-coordinate of the sub-block, e.g., such that (0, 0) will represent the upper left quadrant. Candidate data logic 906 may receive the complete sub-block indication 902 or only the MSBs of the x and y coordinates of the sub-block from sub-block indication 902. In other examples, quadrants of a sub-block may be represented with different bits than the examples given above.

As described above, some of the candidates (e.g., candidates representing sub-blocks whose sub-element presence indications are all the same) are so common that it is not necessary to explicitly store candidate data in compressed data block 500 for those candidates. For these candidates, step S808 may include obtaining predetermined candidate data indicated by the read index. For example, the decompression unit 318 may be configured (e.g., in fixed function circuitry) to interpret some of the indices as indicating a particular candidate without reading candidate data from the compressed data block 500. For example, the index may be interpreted according to table 1 shown above, in which case candidate data logic 906 would interpret index 0 as representing a fully-present candidate, index 1 as representing a fully-partially-present candidate, and index 2 as representing a completely-absent candidate. As described above, candidate data representing candidates may have 5 bits.

If the index indicates a candidate for which the candidate data is stored in the compressed data block 500, step S808 includes reading the candidate data from the compressed data block. As described above, the candidate data may be for global candidates (candidate data 508) or for region-specific candidates (candidate data 510) ₁ 、510 ₂ 、510 ₃ Or 510 ₄ ). If the index indicates a particular region candidate, candidate data logic 906 uses the region from sub-block indication 902The indication of the field determines for which particular region candidate of the particular region candidates the candidate data is read from the compressed data block.

In step S810, the decompression unit 318 uses the obtained candidate data to determine one or more of the presence indications in the sub-block. For example, the transformation logic 908 and decoding logic 910 may be used to determine an indication of presence in a sub-block, i.e., to determine a candidate 912 indicated by the obtained candidate data.

For example, fig. 10 shows some steps that may be performed in step S810 in order to determine the presence indication in a sub-block. As described above, the transformation may include one or both of rotation and reflection. The transformation logic 908 is configured to receive the candidate data obtained by the candidate data logic 906 in step S808. The transformation logic 908 is also configured to receive an indication of a region (e.g., quadrant) of the sub-block. As described above with respect to candidate data logic 906, transform logic 908 may receive the complete sub-block indication 902 or only the MSBs of the x and y coordinates of the sub-block from sub-block indication 902. In step S1002, the transformation logic 908 reads one of the indications 506 of the transformation to be applied to the obtained candidate data from the compressed data block. Specifically, the transform logic 908 reads the transform indication of the region in which the sub-block is located.

In step S1004, the transformation logic 908 applies a transformation to the obtained candidate data to determine transformed candidate data. As described above, the transformation may include one or both of rotation and reflection. Note that for at least one of the regions, no transform may be indicated in the compressed data block. Further, for an area that does have a transform indication stored in the compressed data block, the transform indication may indicate that an identity transform is to be applied to the candidate data. The transformation logic 908 does not have to alter the candidate data it receives from the candidate data logic 906. The transformed candidate data is passed from transform logic 908 to decoding logic 910. Note that the "transformed candidate data" output from transformation logic 908 may not actually be the result of any transformation applied to the candidate data (i.e., it may be equivalent to applying an identity transformation to the candidate data). The "transformed candidate data" may be referred to as "output candidate data" from transformation logic 908.

In step S1006, the decoding logic 910 uses the transformed candidate data to determine one or more of the presence indications in the sub-block. In the example described above, the candidate data has 5 bits for one candidate, and the decode logic 910 determines 8 bits that explicitly represent each of the individual presence indications in the sub-block (i.e., 2 bits for each of the presence indications in the sub-block). Specifically, as described in the example above, the candidate data has five bits:

-a first bit indicating one of: (i) A first palette that is completely and partially present, and (ii) a second palette that is completely and partially present; and

-four further bits, including one bit for each of the four presence indications of the candidate to indicate a partial presence or another state of the indicated palette.

The decoding logic 910 uses the first bit to determine whether the first palette or the second palette is being used. Then, for each of the four presence indications of candidate 912, decoding logic 910 uses a respective one of the further bits to determine whether the presence indication is a partial presence or another state of the indicated palette.

In step S812, the decompression unit 318 outputs the determined presence indication. The determined sub-primitive presence indication may be used to determine the presence of a primitive at an intersection with a ray as part of performing an intersection test on the ray in a ray tracing system.

As mentioned above, the use of the fixed length compression scheme described herein means that the decompression process is easy to implement and that some presence indications can be decompressed from the compressed data block without decompressing the entire block (i.e. random access is possible from the compressed data block). The compression unit 316 and decompression unit 318 may be implemented in software or hardware or a combination thereof, and when the compression and/or decompression units are implemented in hardware (e.g., in fixed function circuitry), the simplicity of the compression and/or decompression process means that the physical size of the hardware for these units is small. Furthermore, the simplicity of the compression and/or decompression process means that the latency and power consumption of these units are low, whether they are implemented in hardware or software.

In the example described above, the 16×16 presence indication block is compressed using 64 2×2 sub-blocks, and a 50% compression rate of the presence indication is achieved such that the compressed data block has 256 bits. One way to achieve higher compression (i.e. lower compression ratio) is to use larger sub-blocks at risk of increasing the number of sub-elements sub-optimally encoded into part of the existing sub-elements. One example with square vectors and square graphs may be to express a 3N x 3N (e.g., 48 x 48) presence indication block as N ² The presence indication block is represented by 3 x 3 sub-blocks or by 4n x 4n (or even non-square 4n x 2 m) using non-square rectangular sub-blocks (e.g., 4 x 2 sub-blocks). It will be appreciated that the use of higher resolution sub-elements will generally result in a lower proportion of sub-elements that are present in the more expensive portion.

However, we now continue to describe a more 'aggressive' method (i.e., a method that achieves greater compression), using 4 x 4 sub-blocks to compress higher resolution presence indication blocks, with the goal of achieving an average of 0.5 bits of compressed data (which corresponds to a compression ratio of 25%) per presence indication. In this example, an uncompressed presence indication block is shown in fig. 11, and is a 64×64 presence indication block divided into 256 4×4 presence indication sub-blocks (i.e., 4096 presence indications in the block). In another example not shown in the figure, the uncompressed presence indication block is a 96×96 presence indication block (i.e., 9216 presence indications in the block) divided into 576 4×4 presence indication sub-blocks. In these examples, each sub-block has sixteen presence indications. In these examples, the indices stored in the compressed data blocks may each have five bits, i.e., t=5.

Fig. 12 shows an example of compressing a data block 1200 in an example of compressing 4096 presence indications shown in fig. 11. The compressed data block includes an index 1202, candidate data 1204, and a transform indication 1206. In the case of t=5, the index may indicate one candidate out of the possible 32 candidates for the sub-block. In this example, there are 256 sub-blocks, and a 5-bit index is stored in compressed data block 1200 for each of the sub-blocks, so index 1202 occupies 1280 bits. Similar to those described in the examples described above, three common candidates (i.e., fully present, fully absent, and fully partially present) are so common that candidate data for these candidates need not be explicitly stored in the compressed data block. In this example, there are only global candidates, i.e. no region-specific candidates. The candidate data of 29 global candidates is thus stored in the candidate data 1204 in the compressed data block 1200.

In this example, each of the sub-blocks has sixteen presence indications arranged in a 4×4 arrangement, and for each of four 2×2 presence indication groups, candidate data in the compressed data block for candidates has five bits of candidate data including:

-one bit for each of four presence indications of a 2 x 2 group to indicate the partial presence or another state of the indicated palette.

In this way, the 4×4 sub-blocks are encoded into four 2×2 groups, each of which is encoded in the same manner as the 2×2 sub-blocks in the example described above with reference to fig. 9.

The compression of a 64 x 64 presence indication block as shown in fig. 11 may be given a compression budget of 0.5 bits per presence indication, i.e. 2048 bits for compressed data block 1200. As described above, 256 indexes use 1280 bits. The candidate data for 29 global candidates uses 580 bits (i.e., 29×20=580). To meet the budget, the transform indication 1206 may use up to 188 bits. An enhanced quadrant transformation scheme can be used. Instead of each quadrant having a single transformation, a set of transformations may be stored and indicated by the transformation indications for three of the quadrants (e.g., each of the upper right, lower left, and lower right quadrants). For example, for each of the three quadrants, a unique 3-bit transform indication may be stored for each of the first 19 of the 29 entries in the codebook, while the remaining 10 entries may share one transform indication. Thus, 60 bits are used for the transform indication for each of the three quadrants, and thus 180 bits in total are used for the transform indication, as shown in FIG. 12. Alternatively, the transforms may be assigned to pairs of adjacent entries in the codebook.

In this example, decompression unit 318 will operate in a similar manner as described above with reference to fig. 9. Specifically, the decompression unit includes index lookup logic 1210, candidate data logic 1212, transform logic 1214, and decode logic 1216. The decompression unit receives the compressed data block 1200 and the sub-block indication 1208. The index lookup logic 1210 uses the sub-block indication 1208 to read an index from the index 1202 for the sub-block indicated by the sub-block indication 1208. Using the index, candidate data is obtained by candidate data logic 1212 from candidate data 1204 in compressed data block 1200 or by simply obtaining candidate data for one of the common candidates that is not explicitly stored in compressed data block 1200. In this example, candidate data logic 1212 need not receive an indication of the region in which the sub-block is located, since there are no region-specific candidates.

As mentioned above, each of the sub-blocks in this example has sixteen presence indications including four 2×2 presence indication groups, and for each 2×2 presence indication group of the four 2×2 presence indication groups of the sub-blocks, the obtained candidate data represents candidates with five bits of candidate data including:

The transformation logic 1214 receives the obtained candidate data from the candidate data logic 1212, the index from the index lookup logic 1210, and the indication of the quadrant in which the sub-block is located from the sub-block indication. The transform logic 1214 reads the appropriate transform indication from the compressed data block using the index and the indication of the quadrant in which the sub-block is located. As described above, the transformation may be, for example, rotation and/or reflection. The transform logic 1214 then applies the transform to the candidate data and outputs the transformed candidate data to the decode logic 1216.

The decoding logic 1216 uses the transformed candidate data to determine an indication of presence in the sub-block. In this example, the candidate data has 20 bits for one candidate, and the decode logic 1216 determines 32 bits that explicitly represent each of the individual presence indications in the sub-block (i.e., 2 bits for each of sixteen presence indications in the sub-block). These presence indications are output from the decode logic 1216 as candidates 1218.

Fig. 13 illustrates another scheme in which a two-level coding technique is used for 4×4 presence candidate data indicating candidates of a sub-block. In this example, a 5-bit index 1302 is used to indicate entries in a first ("4 x 4") codebook 1304. Each entry in the first codebook 1304 includes four references to the second ("2 x 2") codebook 1306. The second codebook 1306 may be referred to as a "subcodebook". For example, index 1302 indicates an entry 1308 in first codebook 1304 that includes four references (or "sub-indices"): (i) Reference 1310 _BR Indicating a first entry 1312 in a second codebook 1306 ₁ For representing the lower right 2 x 2 presence indication group in the 4 x 4 sub-block, (ii) refer to 1310 _BL Indicating a second entry 1312 in a second codebook 1306 ₂ For representing a lower left 2 x 2 presence indication group in a 4 x 4 sub-block, (iii) refer to 1310 _TR Indicating a third entry 1312 in the second codebook 1306 ₃ For representing an upper right 2 x 2 presence indication group, and (iv) reference 1310 _TL Which also indicates a third entry 1312 in the second codebook 1306 ₃ For representing the upper left 2 x 2 presence indication group in the 4 x 4 sub-block.

To reiterate, in the example shown in fig. 13, each sub-block has sixteen presence indications including four 2×2 presence indication groups, and each of the candidates represented by candidate data in the compressed data block is represented by:

-for each of four 2 x 2 presence indication groups of a sub-block, a 2 x 2 presence indication group for indicating a sub-index of an entry in a sub-codebook; and is also provided with

-the sub-codebook comprises a plurality of entries of a 2 x 2 presence indication group, wherein each entry in the sub-codebook comprises five bits of candidate data comprising:

o indicates one bit of one of the following: (i) A first palette that is completely and partially present, and (ii) a second palette that is completely and partially present; and

o is one bit for each of the four presence indications of the 2 x 2 set to indicate the partial presence or another state of the indicated palette.

In the decompression unit, when candidate data has been obtained from the first codebook 1304 and the second codebook 1306 for candidates, the candidate data may be processed as described above with reference to fig. 12 so as to determine 4×4 existence indication candidates.

Assuming that in some examples, only one region of values is needed, the second codebook 1306 need not be multiport. A quadrant transformation may be used to apply these to the output of the first codebook 1304 in part to select only one 2 x 2 index of the 2 x 2 indices and then use that 2 x 2 index to access the second codebook 1306. The output of the second codebook 1306 will in turn be transformed/selected as described in the examples above.

There are 32 entries in the first codebook 1304, and each entry has four references, each of which has four bits. The first codebook 1304 has 512 bits. There are 16 entries in the second codebook 1306, each of which has five bits. The second codebook 1306 thus has 80 bits. The total number of bits used to represent the first and second codebooks is 582 bits. Note that this is very similar to the number of bits (580) of candidate data 1204 used in the example shown in fig. 12, but the two-stage scheme shown in fig. 13 provides more flexibility in terms of the number of different candidates that can be represented (at the cost of increasing the complexity of compression and decompression.

In the examples described herein, the sub-primitives are squares, but in other examples they may be other shapes, such as triangles.

It should be understood that the specific numbers in the examples described herein (e.g., the number of presence indications in the block and in the sub-block, and the number of bits used to represent the index, candidate data, and transform indications) are given by way of example, and that in other embodiments these numbers may be different.

Furthermore, the examples provided herein use triangles and barycentric coordinates, but the solutions presented herein are also applicable to surfaces that can be parametrically represented, for example tensor product patches such as bicubic patches, spheres, or (part of) a surface that is rotated or extruded. These parameters may be used to index into the presence indication.

The primary examples described herein have used presence indication to accelerate ray tracing, but the method is also applicable to other rendering techniques, such as rasterization. As described in the background section above, GB patent 2538856 and 2522868 describe the use of opacity state diagrams in rasterization systems to accelerate the processing of through primitives. In particular, an opacity state diagram is used to indicate whether a block of texels of a texture is completely opaque, completely transparent, partially transparent, or a mixture of these states. The indication in the opacity state diagram may be used to speed up processing through polygons in a rasterization system. Similar to the presence indication described above with reference to the ray tracing system, each opacity state in the rasterization systems of GB2538856B and GB2522868B is represented by two bits. The compression/decompression methods of presence indication described herein may also be applied to compress/decompress indications of opacity states in rasterization systems such as described in GB2538856B and GB 2522868B. The 'partially transparent' state and the 'mixed' state may be combined into a single state such that there are only three states, which may then be compressed/decompressed in the same manner as the fully present, partially present, and fully absent states in the ray tracing system described above.

FIG. 14 illustrates a computer system in which the compression and decompression units described herein may be implemented. The computer system includes a CPU 1402, a GPU 1404, a memory 1406, and other devices 1414, such as a display 1416, speakers 1418, and a camera 1422. Processing block 1410 (corresponding to ray tracing unit 302) is implemented on GPU 1404 and Neural Network Accelerator (NNA) 1411. In other examples, processing block 1410 may be implemented on CPU 1402 or within NNA 1411. The components of the computer system may communicate with each other via a communication bus 1420. Storage 1412 (corresponding to memory 304) is implemented as part of memory 1406.

While FIG. 14 illustrates one embodiment of a graphics processing system, it should be appreciated that a similar block diagram may be drawn for an artificial intelligent accelerator system, for example, by replacing CPU 1402 or GPU 1404 with a Neural Network Accelerator (NNA) 1411, or by adding the NNA as a separate unit. In such cases, processing block 1410 may also be implemented in an NNA.

The ray traced unit of fig. 3a is shown as comprising several functional blocks. This is merely illustrative and is not intended to limit the strict division between the different logic elements of such entities. Each of the functional blocks may be provided in any suitable manner. It should be understood that intermediate values described herein as being formed by the compression and/or decompression unit need not be physically generated by the compression and/or decompression unit at any point, and may merely represent logical values that conveniently describe the processing performed by the compression and/or decompression unit between its input and output.

The compression and/or decompression units described herein may be embodied in hardware on an integrated circuit. The compression and/or decompression unit described herein may be configured to perform any of the methods described herein. Generally, any of the functions, methods, techniques, or components described above may be implemented in software, firmware, hardware (e.g., fixed logic circuitry) or any combination thereof. The terms "module," "functionality," "component," "element," "unit," "block," and "logic" may be used herein to generally represent software, firmware, hardware, or any combination thereof. In the case of a software implementation, the module, functionality, component, element, unit, block or logic represents program code that performs specified tasks when executed on a processor. The algorithms and methods described herein may be executed by one or more processors executing code that causes the processors to perform the algorithms/methods. Examples of a computer-readable storage medium include Random Access Memory (RAM), read-only memory (ROM), optical disks, flash memory, hard disk memory, and other memory devices that can store instructions or other data using magnetic, optical, and other techniques and that can be accessed by a machine.

The terms computer program code and computer readable instructions as used herein refer to any kind of executable code for a processor, including code expressed in a machine language, an interpreted language, or a scripting language. Executable code includes binary code, machine code, byte code, code defining an integrated circuit (such as a hardware description language or netlist), and code expressed in programming language code such as C, java or OpenCL. The executable code may be, for example, any kind of software, firmware, script, module, or library that, when properly executed, handled, interpreted, compiled, run in a virtual machine or other software environment, causes the processor of the computer system supporting the executable code to perform the tasks specified by the code.

The processor, computer, or computer system may be any kind of device, machine, or special purpose circuit, or a collection or portion thereof, that has processing capabilities such that instructions can be executed. The processor may be or include any kind of general purpose or special purpose processor, such as CPU, GPU, NNA, a system on a chip, a state machine, a media processor, an Application Specific Integrated Circuit (ASIC), a programmable logic array, a Field Programmable Gate Array (FPGA), or the like. The computer or computer system may include one or more processors.

The present invention is also intended to cover software defining a configuration of hardware as described herein, such as Hardware Description Language (HDL) software, for designing integrated circuits or for configuring programmable chips to perform desired functions. That is, a computer readable storage medium may be provided having encoded thereon computer readable program code in the form of an integrated circuit definition data set that, when processed (i.e., run) in an integrated circuit manufacturing system, configures the system to manufacture a compression and/or decompression unit configured to perform any of the methods described herein, or to manufacture a compression and/or decompression unit comprising any of the apparatus described herein. The integrated circuit definition data set may be, for example, an integrated circuit description.

Accordingly, a method of manufacturing a compression and/or decompression unit as described herein in an integrated circuit manufacturing system may be provided. Furthermore, an integrated circuit definition data set may be provided that, when processed in an integrated circuit manufacturing system, causes a method of manufacturing a compression and/or decompression unit to be performed.

The integrated circuit definition data set may be in the form of computer code, for example, as a netlist, code for configuring a programmable chip, as a hardware description language defining a hardware suitable for fabrication at any level in an integrated circuit, including as Register Transfer Level (RTL) code, as a high-level circuit representation (such as Verilog or VHDL), and as a low-level circuit representation (such as OASIS (RTM) and GDSII). A higher-level representation, such as RTL, logically defining hardware suitable for fabrication in an integrated circuit may be processed at a computer system configured to generate a fabrication definition of the integrated circuit in the context of a software environment that includes definitions of circuit elements and rules for combining the elements to generate a fabrication definition of the integrated circuit so defined by the representation. As is typically the case when software is executed at a computer system to define a machine, one or more intermediate user steps (e.g., providing commands, variables, etc.) may be required to configure the computer system to generate a manufacturing definition for an integrated circuit to execute code that defines the integrated circuit to generate the manufacturing definition for the integrated circuit.

An example of processing an integrated circuit definition data set at an integrated circuit manufacturing system to configure the system to manufacture compression and/or decompression units will now be described with respect to fig. 15.

Fig. 15 illustrates an example of an Integrated Circuit (IC) fabrication system 1502 configured to fabricate a compression and/or decompression unit as described in any of the examples herein. Specifically, IC fabrication system 1502 includes layout processing system 1504 and integrated circuit generation system 1506.IC fabrication system 1502 is configured to receive an IC definition data set (e.g., defining a compression and/or decompression unit as described in any of the examples herein), process the IC definition data set, and generate an IC (e.g., embodying a compression and/or decompression unit as described in any of the examples herein) from the IC definition data set. Processing of the IC definition data set configures IC fabrication system 1502 to fabricate an integrated circuit embodying compression and/or decompression units as described in any of the examples herein.

Layout processing system 1504 is configured to receive and process the IC definition data sets to determine a circuit layout. Methods of determining circuit layout from IC definition data sets are known in the art and may involve, for example, synthesizing RTL codes to determine gate level representations of circuits to be generated, for example in terms of logic components (e.g., NAND, NOR, AND, OR, MUX and FLIP-FLOP components). By determining the location information of the logic components, the circuit layout may be determined from the gate level representation of the circuit. This may be done automatically or with the participation of a user in order to optimize the circuit layout. When the layout processing system 1504 has determined a circuit layout, the layout processing system may output the circuit layout definition to the IC generation system 1506. The circuit layout definition may be, for example, a circuit layout description.

As is known in the art, the IC generation system 1506 generates ICs from circuit layout definitions. For example, the IC generation system 1506 may implement a semiconductor device fabrication process that generates ICs, which may involve a multi-step sequence of photolithography and chemical processing steps during which electronic circuits are built up on wafers made of semiconductor material. The circuit layout definition may be in the form of a mask that may be used in a lithographic process to generate an IC from the circuit definition. Alternatively, the circuit layout definition provided to the IC generation system 1506 may be in the form of computer readable code that the IC generation system 1506 may use to form an appropriate mask for generating the IC.

The different processes performed by IC fabrication system 1502 may all be implemented in one location, e.g., by a party. Alternatively, IC fabrication system 1502 may be a distributed system such that some of the processes may be performed at different locations and by different parties. For example, some of the following phases may be performed at different locations and/or by different parties: (i) Synthesizing an RTL code representing the IC definition dataset to form a gate level representation of the circuit to be generated; (ii) generating a circuit layout based on the gate level representation; (iii) forming a mask according to the circuit layout; and (iv) using the mask to fabricate the integrated circuit.

In other examples, processing of the integrated circuit definition data set at the integrated circuit manufacturing system may configure the system to manufacture the compression and/or decompression unit without processing the integrated circuit definition data set to determine the circuit layout. For example, an integrated circuit definition dataset may define a configuration of a reconfigurable processor, such as an FPGA, and processing of the dataset may configure the IC manufacturing system to generate (e.g., by loading configuration data into the FPGA) the reconfigurable processor having the defined configuration.

In some embodiments, the integrated circuit manufacturing definition data set, when processed in the integrated circuit manufacturing system, may cause the integrated circuit manufacturing system to generate an apparatus as described herein. For example, by configuring an integrated circuit manufacturing system in the manner described above with reference to fig. 15 through an integrated circuit manufacturing definition dataset, an apparatus as described herein may be manufactured.

In some examples, the integrated circuit definition dataset may include software running on or in combination with hardware defined at the dataset. In the example shown in fig. 15, the IC generation system may also be further configured by the integrated circuit definition data set to load firmware onto the integrated circuit or to otherwise provide the integrated circuit with program code for use with the integrated circuit in accordance with the program code defined in the integrated circuit definition data set at the time of manufacturing the integrated circuit.

Embodiments of the concepts set forth in the present application in apparatuses, devices, modules, and/or systems (and in methods implemented herein) may provide improved performance over known embodiments. Performance improvements may include one or more of increased computational performance, reduced latency, increased throughput, and/or reduced power consumption. During the manufacture of such devices, apparatuses, modules and systems (e.g., in integrated circuits), a tradeoff may be made between performance improvements and physical implementation, thereby improving the manufacturing method. For example, a tradeoff may be made between performance improvement and layout area, matching the performance of known implementations, but using less silicon. This may be accomplished, for example, by reusing the functional blocks in a serial fashion or sharing the functional blocks among elements of a device, apparatus, module, and/or system. In contrast, the concepts described herein that lead to improvements in the physical implementation of devices, apparatus, modules, and systems (e.g., reduced silicon area) may be weighed against performance improvements. This may be accomplished, for example, by fabricating multiple instances of the module within a predefined area budget.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the application.

Claims

1. A method of compressing a sub-primitive presence indication block for use in a rendering system into a compressed data block, wherein the sub-primitive presence indication block comprises a plurality of sub-primitive presence indication sub-blocks, the method comprising:

for each of the sub-blocks in the sub-element presence indication block:

2. The method of claim 1, further comprising storing candidate data representing one or more of the plurality of candidates in the compressed data block.

3. The method of claim 2, wherein the candidate data representing one or more of the plurality of candidates is stored in a respective one or more entries in a codebook in the compressed data block.

4. A method as claimed in claim 2 or 3, wherein each of the sub-blocks has four presence indications arranged in a 2 x 2 arrangement, and wherein each of the one or more of the plurality of candidates represented by candidate data in the compressed data block is represented by five bits of candidate data, the five bits of candidate data comprising:

5. A method as claimed in claim 2 or 3, wherein each of the sub-blocks has sixteen presence indications including four groups of 2 x 2 presence indications, and wherein for each group of 2 x 2 presence indications of the four groups of 2 x 2 presence indications of a sub-block, each of the one or more candidates represented by candidate data in the compressed data block is represented by five bits of candidate data, the five bits of candidate data comprising:

6. A method as claimed in claim 2 or 3, wherein each of the sub-blocks has sixteen presence indications comprising four groups of 2 x 2 presence indications, and wherein each of the one or more of the plurality of candidates represented by candidate data in the compressed data block is represented by:

The subcodebook includes a plurality of entries of a 2 x 2 presence indication group, wherein each entry in the subcodebook includes five bits of candidate data including:

7. A method as claimed in any preceding claim, wherein at least one of said plurality of candidates is not represented by candidate data stored in said compressed data block.

8. The method of claim 7, wherein each of the at least one candidate of the plurality of candidates that is not represented by candidate data stored in the compressed data block represents a sub-block, the sub-primitive presence indications of the sub-blocks being all the same.

9. A method as claimed in any preceding claim, wherein the sub-primitive presence indication indicates the presence status of their respective sub-primitives as one of: (i) complete presence, (ii) complete absence, or (iii) partial presence.

10. The method of any preceding claim, wherein the index for a sub-block is a T-bit index, wherein T is less than the number of sub-element presence indications in one of the sub-blocks.

11. A method as claimed in any preceding claim, wherein the sub-primitive presence indication block has a plurality of regions, each of the regions comprising a plurality of the sub-blocks, wherein one or more of the identified candidates are region specific candidates such that the region specific candidates are candidates for sub-blocks within only one of the regions of the block.

12. The method of claim 11, wherein one or more of the identified candidates are global candidates such that the global candidates are candidates for sub-blocks within all of the regions of the block.

13. The method of claim 12, wherein the candidate data stored in the compressed data block has: (i) Candidate data representing three global candidates, and (ii) candidate data representing two region-specific candidates for each of the regions.

14. The method of any of claims 11 to 13, further comprising storing in the compressed data block an indication of a transform for each of one or more of the regions, the transform to be applied to candidate data representing candidates for sub-blocks within the region.

15. The method of claim 14, wherein the transforming comprises one or both of rotating and reflecting.

16. The method of claim 14 or 15, wherein each of the indications of transformation comprises three 1-bit flags: (i) a 90 degree rotation index, (ii) a vertical reflection index, and (iii) a horizontal reflection index.

17. The method of any preceding claim, wherein the selecting one of the candidates to be used for representing the sub-block comprises selecting a candidate compatible with the presence indication in the sub-block.

18. A method as claimed in any preceding claim, wherein the rendering system is a ray tracing system or a rasterisation system.

19. A compression unit configured to compress a sub-primitive presence indication block for use in a rendering system into a compressed data block, wherein the sub-primitive presence indication block comprises a plurality of sub-primitive presence indication sub-blocks, the compression unit being configured to:

For each of the sub-blocks in the sub-element presence indication block:

20. A computer readable storage medium having computer readable code stored thereon, the computer readable code being configured to cause the method of any of claims 1 to 18 to be performed when the code is run.

21. A computer readable storage medium having stored thereon an integrated circuit definition dataset that, when processed in an integrated circuit manufacturing system, configures the integrated circuit manufacturing system to manufacture the compression unit of claim 19.