CN111127299A

CN111127299A - Method and device for accelerating rasterization traversal and computer storage medium

Info

Publication number: CN111127299A
Application number: CN202010222529.3A
Authority: CN
Inventors: 张竞丹; 樊良辉
Original assignee: Nanjing Xintong Semiconductor Technology Co Ltd
Current assignee: Nanjing Xintong Semiconductor Technology Co Ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-05-08

Abstract

The embodiment of the invention discloses a method and a device for accelerating rasterization traversal and a computer storage medium; the method can comprise the following steps: selecting a target tile covered by a graphic primitive from candidate image blocks tile covered by a primary bounding box of the graphic primitive; wherein the primary bounding box is the smallest bounding box of the primitive; aiming at each target tile covered by the graphic element, acquiring a secondary bounding box area which can surround the intersection part of the graphic element and each target tile in each target tile, wherein the secondary bounding box area in each target tile is the minimum rectangular area of intersection of each target tile and the graphic element; and transmitting the secondary bounding box areas of all the target tiles to a rasterization module to perform rasterization operation.

Description

Method and device for accelerating rasterization traversal and computer storage medium

Technical Field

The embodiment of the invention relates to the technical field of Graphic Processing Units (GPUs), in particular to a method and a device for accelerating rasterization traversal and a computer storage medium.

Background

In a uniform Rendering architecture for Tile-Based Rendering (TBR), rasterization is an essential step in the GPU, which is a process of converting basic primitives, such as points, lines, triangles, into pixels. As an important step in the graphics rendering pipeline, the efficiency of rasterization directly affects the graphics rendering performance of the GPU.

Currently, in a TBR architecture, a conventional rasterization operation traverses tiles covered by a minimum bounding box of a primitive one by one, so as to find pixels inside the primitive in an image block (tile, which may also be referred to as a block or a tile). However, in the tile covered by the smallest bounding box, there is usually a tile that is not covered by a primitive. As shown in fig. 1, for a triangle primitive as an example, the number of tiles covered by the minimum bounding box of the primitive is 12, which are numbered tile0 to tile11, but for these 12 tiles, tile0 and tile3 are not covered by the primitive, and further tile4 and tile7 are located at the boundary of the primitive and only a small portion of pixels in these tiles are covered by the primitive. In summary, if the tile covered by the minimum bounding box of the primitive shown in fig. 1 is traversed one by adopting the conventional rasterization operation, the rasterization module performing the rasterization operation in the graphics rendering pipeline traverses the tile not covered by the primitive and the pixels not covered by the primitive in the tile, so that the invalid traversal is increased, the rasterization efficiency is reduced, and the rendering performance of the GPU is affected.

Disclosure of Invention

In view of the above, embodiments of the present invention are directed to a method, an apparatus, and a computer storage medium for accelerating rasterization traversal; the traversal efficiency and speed of rasterization operation can be improved, and therefore the rendering performance of the GPU is improved.

The technical scheme of the embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a method for accelerating rasterization traversal, where the method includes:

selecting a target tile covered by a graphic primitive from candidate image blocks tile covered by a primary bounding box of the graphic primitive; wherein the primary bounding box is the smallest bounding box of the primitive;

aiming at each target tile covered by the graphic element, acquiring a secondary bounding box area which can surround the intersection part of the graphic element and each target tile in each target tile, wherein the secondary bounding box area in each target tile is the minimum rectangular area of intersection of each target tile and the graphic element;

and transmitting the secondary bounding box areas of all the target tiles to a rasterization module to perform rasterization operation.

In some examples, the extracting a target tile covered by a primitive from candidate image tiles covered by a primary bounding box of the primitive includes:

if all the vertexes of the candidate tile are outside at least one edge of the primitive, the candidate tile is not the target tile; otherwise, the candidate tile is a target tile covered by the primitive.

In some examples, the obtaining, for each target tile covered by the primitive, a secondary bounding box region that can surround an intersection of the primitive and the target tile, in the target tile, corresponding to the number of target tiles covered by the primitive being greater than 1 includes:

for each target tile, performing the following steps:

acquiring an intersection area of the target tile and the primary bounding box of the primitive;

in the intersection region, acquiring a minimum rectangular region capable of covering the intersection part of the graphic primitive and the target tile; the minimum rectangular area is a secondary bounding box of the target tile.

In some examples, the obtaining, within the intersection region, a minimum rectangular region capable of covering an intersection portion of the primitive and the target tile includes:

aiming at each primitive edge of the primitive, acquiring a partial minimum rectangular area corresponding to each primitive edge in the intersection area;

and performing intersection operation on partial minimum rectangular areas corresponding to all the primitive edges of the primitive to obtain the minimum rectangular area.

In some examples, the obtaining, for each primitive edge of the primitive, a partial minimum rectangular region corresponding to each primitive edge within the intersection region includes:

aiming at each primitive edge of the primitive, executing the following steps:

acquiring an intersection point of the primitive edge and the intersection region and a vertex positioned in the primitive edge in the intersection region;

and determining a partial minimum rectangular area corresponding to the primitive edge in the intersection area according to the maximum value and the minimum value of each coordinate system of the intersection point and the vertex in the coordinate system.

In some examples, if the target tile number corresponding to the primitive coverage is 1, the secondary bounding box area is equal to the primary bounding box of the primitive.

In a second aspect, an embodiment of the present invention provides an apparatus for accelerating rasterization traversal, where the apparatus is applied in a GPU, and the apparatus includes: a selection section, an acquisition section, and a transmission section, wherein,

the selecting part is configured to select a target tile covered by a graphic primitive from candidate image tiles covered by a primary bounding box of the graphic primitive; wherein the primary bounding box is the smallest bounding box of the primitive;

the acquiring part is configured to acquire a secondary bounding box area which can surround the intersection part of the graphics primitive and each target tile in each target tile aiming at each target tile covered by the graphics primitive, wherein the secondary bounding box area in each target tile is a minimum rectangular area of intersection of each target tile and the graphics primitive;

the transmission part is configured to transmit the two-level bounding box areas of all the target tiles to a rasterization module to perform rasterization operation.

In some examples, the fetch section, corresponding to the number of target tiles covered by the primitive being greater than 1, is configured to:

for each target tile, the following operations are performed:

In some examples, the acquisition portion is configured to:

and aiming at each primitive edge of the primitive, executing the following operations:

In a third aspect, an embodiment of the present invention provides a computer storage medium, where the computer storage medium stores a program for accelerating a rasterization traversal, and the program for accelerating the rasterization traversal is executed by at least one processor to implement the steps of the method for accelerating the rasterization traversal according to any one of the first aspect.

The embodiment of the invention provides a method and a device for accelerating rasterization traversal and a computer storage medium; before the rasterization module performs rasterization operation, the secondary bounding box area is transmitted to the rasterization module to perform rasterization operation, so that traversal of all candidate tiles covered by the primary bounding box of the primitive is not needed, the traversal range of the rasterization operation can be reduced, invalid traversal of the uncovered area of the primitive is reduced, the rasterization operation speed and efficiency are improved, and the processing performance of the GPU is improved.

Drawings

Fig. 1 is a schematic diagram of a primitive overlay tile according to an embodiment of the present invention;

FIG. 2 is a block diagram of a computing device according to an embodiment of the present invention;

fig. 3 is a block diagram of a GPU according to an embodiment of the present invention;

FIG. 4 is a block diagram of a graphics processing pipeline formed based on a GPU architecture, according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating a method for accelerating rasterization traversal according to an embodiment of the present invention;

fig. 6 is a diagram illustrating primitives of a two-dimensional image according to an embodiment of the present invention;

FIG. 7(a) is a schematic diagram of a two-level bounding box according to an embodiment of the present invention;

FIG. 7(b) is a schematic diagram of another two-level bounding box range provided by an embodiment of the present invention;

FIG. 8 is a diagram illustrating another two-dimensional image primitive provided in an embodiment of the present invention;

fig. 9 is a schematic diagram of an intersection region according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a range of another two-level bounding box according to an embodiment of the present invention;

fig. 11 is a schematic composition diagram of an apparatus for accelerating rasterization traversal according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

Generally, when a rasterizing module in a graphics rendering pipeline performs rasterizing operation, a tile covered by each primitive bounding box is traversed from a first pixel until the last pixel of the tile is traversed, and if the size of one tile is set to be 32 × 32 pixels, the rasterizing module needs to traverse 1024 pixels for each tile, and determines whether the traversed pixel is located inside the primitive and performs primitive-to-pixel conversion. Thus, the above conventional scheme causes a large number of invalid traversals by the rasterizing module in the implementation process. The technical scheme of the embodiment of the invention is expected to reduce the invalid traversal of the rasterization module in the rasterization operation process, thereby improving the efficiency and speed of the rasterization operation and further improving the rendering performance of the GPU. For example, the area traversed by the tile during the rasterization operation is reduced, so that the rasterization module can complete the conversion from the primitive to the pixel only by traversing the non-whole area in the tile.

Referring to FIG. 2, there is shown a computing device 100 to which embodiments of the invention can be applied, the computing device 100 may include, but is not limited to, the following: wireless devices, mobile or cellular telephones, including so-called smart phones, Personal Digital Assistants (PDAs), video game consoles, including video displays, mobile video gaming devices, mobile video conferencing units, laptop computers, desktop computers, television set-top boxes, tablet computing devices, electronic book readers, fixed or mobile media players, and the like. In the example of fig. 2, computing device 100 may include a Central Processing Unit (CPU) 102 and a system memory 104 that communicate via an interconnection path that may include a memory bridge 105. The memory bridge 105, which may be, for example, a north bridge chip, is connected to an I/O (input/output) bridge 107 via a bus or other communication path 106, such as a HyperTransport (HyperTransport) link. I/O bridge 107, which may be, for example, a south bridge chip, receives user input from one or more user input devices 108 (e.g., a keyboard, mouse, trackball, touch screen that can be incorporated as part of display device 110, or other type of input device) and forwards the input to CPU102 via path 106 and memory bridge 105. A Graphics Processor (GPU) 112 is coupled to the memory bridge 105 via a bus or other communication path 113 (e.g., PCI Express, accelerated graphics port, or hypertransport link); in one embodiment, GPU112 may be a graphics subsystem that delivers pixels to display device 110 (e.g., a conventional CRT or LCD based monitor). System disk 114 is also connected to I/O bridge 107. Switch 116 provides a connection between I/O bridge 107 and other components, such as network adapter 118 and various add-in cards 120 and 121. Other components (not explicitly shown), including USB or other port connections, CD drives, DVD drives, film recording devices, and the like, may also be connected to I/O bridge 107. Communication paths interconnecting the various components in fig. 2 may be implemented using any suitable protocols, such as PCI (peripheral component interconnect), PCI-Express, AGP (accelerated graphics port), hypertransport, or any other bus or point-to-point communication protocol, and connections between different devices may use different protocols as is known in the art.

In one embodiment, GPU112 includes circuitry optimized for graphics and video processing, including, for example, video output circuitry. In another embodiment, GPU112 includes circuitry optimized for general purpose processing while preserving the underlying (underlying) computing architecture. In yet another embodiment, GPU112 may be integrated with one or more other system elements, such as memory bridge 105, CPU102, and I/O bridge 107, to form a system on a chip (SoC).

It will be appreciated that the system shown herein is exemplary and that variations and modifications are possible. The connection topology, including the number and arrangement of bridges, the number of CPUs 102, and the number of GPUs 112, may be modified as desired. For example, in some embodiments, system memory 104 is directly connected to CPU102 rather than through a bridge, and other devices communicate with system memory 104 via memory bridge 105 and CPU 102. In other alternative topologies, GPU112 is connected to I/O bridge 107 or directly to CPU102, rather than to memory bridge 105. While in other embodiments, I/O bridge 107 and memory bridge 105 may be integrated onto a single chip. Numerous embodiments may include two or more CPUs 102 and two or more GPUs 112. The particular components shown herein are optional; for example, any number of add-in cards or peripherals may be supported. In some embodiments, switch 116 is eliminated and network adapter 118 and add-in cards 120, 121 are directly connected to I/O bridge 107.

Fig. 3 is a schematic block diagram of a GPU112 capable of implementing the technical solution of the embodiment of the present invention, in which the graphics memory 204 may be a part of the GPU 112. Thus, GPU112 may read data from graphics memory 204 and write data to graphics memory 204 without using a bus. In other words, GPU112 may process data locally using local storage instead of off-chip memory. Such graphics memory 204 may be referred to as on-chip memory. This allows GPU112 to operate in a more efficient manner by eliminating the need for GPU112 to read and write data via a bus, which may experience heavy bus traffic. In some cases, however, GPU112 may not include a separate memory, but rather utilize system memory 104 via a bus. Graphics memory 204 may include one or more volatile or non-volatile memories or storage devices, such as Random Access Memory (RAM), static RAM (sram), dynamic RAM (dram), erasable programmable rom (eprom), electrically erasable programmable rom (eeprom), flash memory, magnetic data media, or optical storage media.

Based on this, GPU112 may be configured to perform various operations related to: generate pixel data from graphics data provided by CPU102 and/or system memory 104 via memory bridge 105 and communication path 113, interact with local graphics memory 204 (e.g., a general frame buffer) to store and update pixel data, transfer pixel data to display device 110, and so on.

In operation, CPU102 is the main processor of computing device 100, controlling and coordinating the operation of other system components. Specifically, CPU102 issues commands that control the operation of GPU 112. In some embodiments, CPU102 writes command streams for GPU112 into data structures (not explicitly shown in fig. 2 or 3) that may be located in system memory 104, graphics memory 204, or other storage locations accessible to both CPU102 and GPU 112. A pointer to each data structure is written to a pushbuffer to initiate processing of the command stream in the data structure. GPU112 reads the command stream from one or more pushbuffers and then executes the commands asynchronously with respect to the operation of CPU 102. Execution priority may be specified for each pushbuffer to control scheduling of different pushbuffers.

As described in particular in FIG. 3, GPU112 may be connected to the rest of computing device 100 via a communication path 113 that is connected to memory bridge 105 (or, in an alternative embodiment, directly to CPU102, an I/O (input/output) unit 205 that communicates with the rest of computing device 100. the connection of GPU112 to the rest of computing device 100 may also vary.

In one embodiment, communication path 113 can be a PCI-EXPRESS link in which a dedicated channel is allocated to GPU112 as is known in the art. The I/O unit 205 generates data packets (or other signals) for transmission over the communication path 113 and also receives all incoming data packets (or other signals) from the communication path 113, directing the incoming data packets to the appropriate components of the GPU 112. For example, commands related to processing tasks may be directed to scheduler 207, while commands related to memory operations (e.g., reads or writes to graphics memory 204) may be directed to graphics memory 204.

In GPU112, an array 230 of rendering cores may be included, where array 230 may include C general purpose rendering cores 208, where C > 1. Based on the generic rendering cores 208 in the array 230, the GPU112 is able to concurrently perform a large number of program tasks or computational tasks. For example, each rendering core may be programmed to be able to perform processing tasks related to a wide variety of programs, including, but not limited to, linear and non-linear data transformations, video and/or audio data filtering, modeling operations (e.g., applying laws of physics to determine the position, velocity, and other attributes of objects), graphics rendering operations (e.g., tessellation shader, vertex shader, geometry shader, and/or fragment shader programs), and so forth.

Further, a fixed function processing unit 231, which may include hardware that is hardwired to perform certain functions, may also be included in GPU 112. Although fixed-function hardware may be configured to perform different functions via, for example, one or more control signals, the fixed-function hardware typically does not include program memory capable of receiving user-compiled programs. In some examples, fixed function processing unit 231 may include, for example, a processing unit that performs primitive assembly, a processing unit that performs rasterization, and a processing unit that performs fragment operations. For the processing unit executing the primitive assembly, the processing unit can restore the vertexes which are colored by the vertex shader unit into a grid structure of a graph, namely the primitive, according to the original connection relation, so that the subsequent fragment shader unit can process the graph; the rasterization operation includes converting the new primitive and outputting the fragments to a fragment shader; the fragment operation includes, for example, a depth test, a cropping test, an Alpha blend, or a transparency blend, and the pixel data output by the above operations can be displayed as graphics data by the display device 110. Combining the rendering core array 230 and the fixed-function processing unit 231, a complete logic model of the graphics rendering pipeline can be implemented.

In addition, rendering core array 230 may receive processing tasks to be performed from scheduler 207. Scheduler 207 may independently schedule the tasks for execution by resources of GPU112, such as one or more rendering cores 208 in rendering core array 230. In one example, scheduler 207 may be a hardware processor. In the example shown in fig. 3, scheduler 207 may be included in GPU 112. In other examples, scheduler 207 may also be a separate unit from CPU102 and GPU 112. Scheduler 207 may also be configured as any processor that receives a stream of commands and/or operations.

Scheduler 207 may process one or more command streams that include scheduling operations included in one or more command streams executed by GPU 112. Specifically, scheduler 207 may process one or more command streams and schedule operations in the one or more command streams for execution by rendering core array 230. In operation, CPU102, through GPU driver 103 included with system memory 104 in fig. 1, may send a command stream to scheduler 207 that includes a series of operations to be performed by GPU 112. Scheduler 207 may receive a stream of operations including a command stream through I/O unit 205 and may process the operations of the command stream sequentially based on an order of the operations in the command stream, and the operations in the command stream may be scheduled for execution by one or more processing units in rendering core array 230.

Based on the above descriptions of fig. 2 and fig. 3, fig. 4 shows an example of the graphics rendering pipeline 80 formed by the structure of the GPU112 shown in fig. 3, it should be noted that the core portion of the graphics rendering pipeline 80 is a logic structure formed by cascading the general-purpose rendering core 208 and the fixed function processing unit 231 included in the rendering core array 230, and further, for the scheduler 207, the graphics memory 204, and the I/O unit 205 included in the GPU112, all are peripheral circuits or devices that implement the logic structure function of the graphics rendering pipeline 80, accordingly, the graphics rendering pipeline 80 usually includes a programmable execution unit (as indicated by the round-cornered box in fig. 4) and a fixed function unit (as indicated by the square box in fig. 4), for example, the function of the programmable execution unit can be performed by the general-purpose rendering core 208 included in the rendering core array 230, the functions of the fixed function unit may be implemented by the fixed function processing unit 231. As shown in FIG. 4, graphics rendering pipeline 80 includes the following stages:

vertex fetch module 82, shown in the example of FIG. 4 as a fixed-function unit, is generally responsible for supplying graphics data (triangles, lines, and dots) to graphics rendering pipeline 80. For example, vertex crawling module 82 may collect vertex data for high-order surfaces, primitives, and the like, and output vertex data and attributes to vertex shader 84.

Vertex shader 84 is a programmable execution unit configured to execute a vertex shader program to highlight and transform vertex data as specified by the vertex shader program. For example, vertex shader 84 may be programmed to transform vertex data from an object-based coordinate representation (object space) to a coordinate system that may alternatively be based on a coordinate system such as world space or Normalized Device Coordinate (NDC) space. Vertex shader 84 may read the data stored by vertex crawling module 82 for use in processing vertex data.

Primitive assembly module 86, shown in FIG. 4 as a fixed-function unit, is responsible for collecting the vertices output by vertex shader module 84 and assembling the vertices into geometric primitives. For example, primitive assembly module 86 may be configured to group every three consecutive vertices into a geometric primitive (i.e., a triangle). In some embodiments, a particular vertex may be repeated for consecutive geometric primitives (e.g., two consecutive triangles in a triangle strip may share two vertices).

Geometry shader 88 is a programmable execution unit configured to execute a geometry shader program that transforms graphics primitives received from primitive assembly module 86 as specified by the geometry shader program. For example, geometry shader 88 may be programmed to subdivide a graphics primitive into one or more new graphics primitives and calculate parameters, such as plane equation coefficients, used to rasterize the new graphics primitives. In some examples, geometry shader 88 is not a necessary shader of graphics rendering pipeline 80, and thus, geometry shader 88 is optional. In some embodiments, geometry shader 88 may also add or delete elements in the geometry stream. Geometry shader 88 outputs parameters and vertices specifying new graphics primitives to clipping and partitioning module 90.

The clipping and dividing module 90, shown as a fixed functional unit in fig. 4, is responsible for clipping and removing the assembled primitives, and then dividing the primitives according to the size of tile tiles.

Rasterizer 92 is typically a fixed function unit that is responsible for preparing primitives for fragment shader 94. For example, rasterization module 92 may generate fragments for shading by fragment shader 94. In some examples, rasterization module 92 may scan convert new primitives and output fragments and overlay data to fragment shader 94; in addition, the rasterizing module 92 may be configured to implement z-culling (z-culling) and other z-based optimizations.

Fragment shader 94 is a programmable execution unit configured to execute a fragment shader program to transform fragments received from rasterization module 92 as specified by the fragment shader program. For example, fragment shader 94 may be programmed to implement operations such as perspective correction, texture mapping, shading, blending, and the like, to produce shaded fragments that are output to output merger module 96.

Output merger module 96, shown in fig. 4 as a fixed function unit, is generally responsible for performing raster operations such as stencil (tencel), z-test, blending, etc., and outputting pixel data as processed graphics data for storage in graphics memory 204. The processed graphics data may be stored in graphics memory 204 for display on display device 110 or for further processing by CPU102 or GPU 112.

For the graphics rendering pipeline 80 shown in FIG. 4, the clipping and partitioning module 90 will also typically clip vertices that are out of view, compute bounding boxes for each primitive, and record tiles covered by the bounding boxes. Therefore, in some examples, the technical solution of the embodiment of the present invention may also be implemented by the clipping and dividing module 90 in the graphics rendering pipeline 80 during the specific implementation process. Referring to FIG. 5, a method for accelerating rasterization traversal provided by embodiments of the present invention is illustrated, which in some examples may be applied to a clipping and partitioning module 90 in a graphics rendering pipeline 80, and which may include:

s501: selecting a target tile covered by a graphic primitive from candidate image blocks tile covered by a primary bounding box of the graphic primitive; wherein the primary bounding box is the smallest bounding box of the primitive.

S502: aiming at each target tile covered by the graphic element, acquiring a secondary bounding box area which can surround the intersection part of the graphic element and each target tile in each target tile, wherein the secondary bounding box area in each target tile is the minimum rectangular area of intersection of each target tile and the graphic element;

s503: and transmitting the secondary bounding box areas of all the target tiles to a rasterization module to perform rasterization operation.

Through the technical scheme shown in fig. 5, before the rasterization operation is performed by the rasterization module, the secondary bounding box region is transmitted to the rasterization module to perform the rasterization operation, and traversal of all candidate tiles covered by the primary bounding box of the primitive is not required, so that the traversal range of the rasterization operation can be reduced, the invalid traversal of the uncovered region of the primitive is reduced, the speed and the efficiency of the rasterization operation are improved, and the processing performance of the GPU is improved.

For the solution shown in fig. 5, in some examples, if the number of target tiles covered by the primitive is 1, the secondary bounding box area is equal to the primary bounding box of the primitive.

For the technical solution shown in fig. 5, in some examples, the extracting a target tile covered by a primitive from candidate image blocks tiles covered by a primary bounding box of the primitive includes:

It should be noted that, in the above example, the intersection condition of tile and three edges of the primitive is used to determine whether the tile is the target tile, and the "outer side" in the example refers to a side that is not covered by the primitive in the embodiment of the present invention. Taking fig. 1 as an example, tile0 does not intersect any of the three edges of the primitive, so the four vertices of tile0 are all outside the a-edge of the primitive, and therefore the primitive does not cover tile0, so tile0 is not the target tile.

For the technical solution shown in fig. 5, in some examples, the obtaining, for each target tile covered by the primitive, a secondary bounding box region that can surround an intersection portion of the primitive and each target tile, corresponding to the number of target tiles covered by the primitive being greater than 1 includes:

for each target tile, performing the following steps:

It should be noted that, as shown in fig. 6, the intersection region in the above example is a triangle primitive, a solid line box is a target tile covered by the primitive, a dashed line box is a primary bounding box of the primitive, and an intersection portion between the solid line box and the dashed line box is the intersection region. After a minimum rectangular area (shown as a gray filling box in fig. 6) capable of covering the intersection part of the primitive and the target tile is acquired in the intersection area, the rasterization module performs rasterization operation on the minimum rectangular area, so that the traversal range is reduced from the whole target tile to the minimum rectangular area. The traversal range of the rasterization operation is greatly reduced, the invalid traversal aiming at the uncovered area of the primitive is reduced, and the speed and the efficiency of the rasterization operation are improved.

For the above example, preferably, in the intersection region, acquiring a minimum rectangular region capable of covering an intersection portion of the primitive and the target tile includes:

For the above preferred example, in more detail, the obtaining, for each primitive edge of the primitive, a partial minimum rectangular region corresponding to each primitive edge in the intersection region includes:

aiming at each primitive edge of the primitive, executing the following steps:

For the above detailed example, the primitives of the two-dimensional image shown in fig. 6 are taken as an example to illustrateThe coordinate of the intersection point of the edge of the primitive and the current intersection area is set as (X)_ij，Y_ij) Wherein i =0, 1, 2, representing the ith edge of the primitive; j =0, 1, representing the j-th intersection point of one edge of the primitive and the current intersection region; the vertex in the intersection region located inside the primitive edge is determined by the direction of the primitive edge, as shown in fig. 7, the directions of the primitives are set to be all counterclockwise, and for fig. 7(a), the vertex in the lower right corner of the intersection region is located inside the primitive edge of the solid line; for FIG. 7(b), the upper-left, upper-right, and lower-left vertices of the intersection region are all inside the primitive edge of the solid line.

It should be noted that the size of the secondary bounding box is different due to the different directions of the edges of the primitive, such as the size of the secondary bounding box shown by the dashed square in fig. 7(a) and fig. 7(b), in fig. 7(b), the size of the secondary bounding box is equal to the size line of the current intersection region, and the secondary bounding box is slightly larger than the current intersection region when drawing, only for clearly showing the secondary bounding box and not for limiting and explaining the size of the secondary bounding box.

Further to the above detailed example, still taking the primitive of the two-dimensional image as an example, in the intersection point of each primitive edge and the current intersection region and the vertex located inside the primitive edge in the intersection region, the X coordinate value and the Y coordinate value of each point are obtained, and four vertices of a partial minimum rectangular region corresponding to each primitive edge in the intersection region are formed according to the maximum value Xmax and the minimum value Xmin in of the X coordinate values of each point and the maximum value Ymax and the minimum value Ymin the Y coordinate value of each point, which are sequentially (Xmin, Ymin), (Xmax, Ymax), (Xmin, Ymax). And performing intersection operation on the partial minimum rectangular areas corresponding to the edges of each primitive in the intersection area to obtain the minimum rectangular area in the intersection area, namely the secondary bounding box.

In order to clearly illustrate the implementation process of the above technical solution, in the embodiment of the present invention, a two-dimensional image primitive shown in fig. 8 is taken as an example to describe, a tile situation covered by a set primitive and a primary bounding box thereof is shown in fig. 8, the primary bounding box is shown by a solid line box, the tile is shown by a dashed line box, and the primitive is a triangle, as can be seen from fig. 8, sixteen tiles covered by the primary bounding box of the primitive are sequentially numbered tile0, tile1, tile2, … …, and tile15, and tile3, tile5, tile6, tile7, tile8, tile9, tile10, and tile13 are covered by the primitive, and three primitive edges of the primitive are shown in the figure.

Based on the example shown in fig. 8, in the implementation process of the above technical solution, first, from 16 candidate tiles, which are tile0 to tile15 covered by the primary bounding box, target tiles covered by primitives, that is, tile3, tile5, tile6, tile7, tile8, tile9, tile10, and tile13 are selected.

Next, for each target tile, an intersection region of each target tile and the primary bounding box of the primitive is obtained, taking tile8 as an example, and as shown in fig. 9, after it is determined through the above steps that tile8 is covered by the primitive, an intersection region of tile8 and the primary bounding box of the primitive is obtained, in fig. 9, a dashed line frame is a frame of tile8, a solid line frame is an intersection region of tile8 and the primary bounding box of the primitive, and four vertices of the intersection region may be set to (tile _ Xmin, tile _ Ymin), (tile _ Xmax, tile _ Ymin), (tile _ Xmin, tile _ Ymax), (tile _ Ymax ), respectively.

Then, in the intersection area, for each primitive edge of the primitive, an intersection point of the primitive edge and the intersection area and a vertex in the intersection area, which is located inside the primitive edge, are obtained. Referring to fig. 8 and 9, of the three primitive edges of the primitive, the first primitive edge and the second primitive edge have an intersection with the intersection area, and the first primitive edge and the intersection area have two intersections with coordinates (X) respectively₀₀，Y₀₀），（X₀₁，Y₀₁) And the vertex (tile _ Xmax, tile _ Ymax) and (tile _ Xmin, tile _ Ymax) of the intersection region are positioned inside the first primitive edge; the second primitive edge has two intersection points with the same coordinate of (X)₁₀，Y₁₀），（X₁₁，Y₁₁) And the intersection regionThe vertex (tile _ Xmin, tile _ Ymin), (tile _ Xmax, tile _ Ymin) of (A) is positioned inside the second primitive edge; also, as can be seen from FIG. 9, (X)₀₁，Y₀₁) And (X)₁₁，Y₁₁) Are identical.

And then, determining a partial minimum rectangular area corresponding to the primitive edge in the intersection area according to the maximum value and the minimum value of each coordinate system of the coordinate system where the intersection point and the vertex are located. As shown in fig. 10, it can be known by performing coordinate comparison that four vertices of the partial minimum rectangular area corresponding to the first primitive edge are: (X)₀₁，Y₀₀），（X₀₀，Y₀₀) (tile _ Xmin, tile _ Ymax) and (tile _ Xmax, tile _ Ymax), as indicated by the diagonal filled boxes in fig. 10; the four vertexes of the partial minimum rectangular area corresponding to the second primitive side are four vertexes: (tile _ Xmin, tile _ Ymin), (tile _ Xmax, tile _ Ymin), (X)₁₁，Y₁₀) And (X)₁₀，Y₁₀) As indicated by the cross-line filled boxes in fig. 10.

Subsequently, the intersection operation is performed between the area indicated by the hatched box in fig. 10 and the area indicated by the cross-line box to obtain the secondary bounding box in tile8, i.e., the area with both the hatched and cross-line in fig. 10, and the vertex of the secondary bounding box is (X) respectively₀₁，Y₀₀），（X₀₀，Y₀₀），（X₁₁，Y₁₀) And (X)₁₀，Y₁₀）。

And finally, sequentially acquiring the secondary bounding boxes of the target tile covered by the graphics primitive according to the steps, and transmitting the secondary bounding boxes to a rasterization module of a graphics rendering pipeline, so that the rasterization module traverses according to the secondary bounding boxes in the rasterization operation process, thereby reducing invalid traversal scanning areas and improving the speed and efficiency of rasterization operation.

Based on the same inventive concept of the foregoing technical solution, referring to fig. 11, there is shown an apparatus 11 for accelerating rasterization traversal, where the apparatus 11 is applied to the GPU112 described in the foregoing technical solution, and especially preferably can be implemented by the clipping and dividing module 90 in the graphics rendering pipeline 80 shown in fig. 4, where the apparatus 11 includes: a selection section 1101, an acquisition section 1102, and a transmission section 1103, wherein,

the selecting part 1101 is configured to select a target tile covered by a primitive from candidate image tiles covered by a primary bounding box of the primitive; wherein the primary bounding box is the smallest bounding box of the primitive;

the obtaining part 1102 is configured to obtain, for each target tile covered by the primitive, a secondary bounding box region capable of bounding an intersection portion of the primitive and each target tile within each target tile, where the secondary bounding box region within each target tile is a minimum rectangular region where each target tile intersects with the primitive;

the transmitting portion 1103 is configured to transmit the two-level bounding box regions of all the target tiles to the rasterizing module for rasterization.

In some examples, the fetching portion 1102, corresponding to the number of target tiles covered by the primitive being greater than 1, is configured to:

for each target tile, the following operations are performed:

In some examples, the acquisition portion 1102 is configured to:

In some examples, the selection portion 1101 is configured to: if all the vertexes of the candidate tile are outside at least one edge of the primitive, the candidate tile is not the target tile; otherwise, the candidate tile is a target tile covered by the primitive.

In one or more examples or examples above, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media may include computer data storage media or communication media including any medium that facilitates transfer of a computer program from one place to another. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. By way of example, and not limitation, such computer-readable media can comprise a USB flash disk, a removable hard disk, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The code may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Accordingly, the terms "processor" and "processing unit" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of embodiments of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (i.e., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by a collection of interoperative hardware units, including one or more processors as described above.

Various aspects of the present invention have been described. These and other embodiments are within the scope of the following claims. It should be noted that: the technical schemes described in the embodiments of the present invention can be combined arbitrarily without conflict.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for accelerating a rasterized traversal, the method comprising:

2. The method according to claim 1, wherein the extracting of the target tile covered by the primitive from the candidate image tiles covered by the primary bounding box of the primitive comprises:

3. The method according to claim 1, wherein the obtaining, for each target tile covered by the primitive, a secondary bounding box region capable of bounding an intersection of the primitive and the target tile, corresponding to the number of target tiles covered by the primitive being greater than 1, comprises:

for each target tile, performing the following steps:

4. The method according to claim 3, wherein the obtaining, within the intersection area, a minimum rectangular area capable of covering an intersection portion of the primitive and the target tile comprises:

5. The method according to claim 4, wherein the obtaining, for each primitive edge of the primitive, a corresponding partial minimum rectangular area of each primitive edge within the intersection area comprises:

aiming at each primitive edge of the primitive, executing the following steps:

6. The method of claim 1, wherein the number of target tiles covered by the primitive is 1, and the secondary bounding box area is equal to the primary bounding box of the primitive.

7. An apparatus for accelerating rasterization traversal, the apparatus being applied in a GPU, the apparatus comprising: a selection section, an acquisition section, and a transmission section, wherein,

8. The apparatus according to claim 7, wherein the fetching portion, corresponding to the number of target tiles covered by the primitive being greater than 1, is configured to:

for each target tile, the following operations are performed:

9. The apparatus of claim 8, wherein the acquisition portion is configured to:

10. The apparatus of claim 9, wherein the acquisition portion is configured to:

11. A computer storage medium storing a program for accelerating a rasterization traversal, the program for accelerating a rasterization traversal when executed by at least one processor implementing the steps of the method for accelerating a rasterization traversal of any one of claims 1 to 6.