CN112233008A

CN112233008A - Device and method for realizing triangle rasterization in GPU

Info

Publication number: CN112233008A
Application number: CN202011010966.5A
Authority: CN
Inventors: 阮成肖; 李姝仪; 张航; 苑豪杰; 李二磊; 刘彤; 纪录; 张琦; 冯蕾; 李红星; 周吉
Original assignee: 716th Research Institute of CSIC
Current assignee: 716th Research Institute of CSIC
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2021-01-15

Abstract

The invention discloses a device and a method for realizing triangle rasterization in a GPU (graphics processing Unit), wherein the device comprises a vertex data cache Buffer, an edge function normalization module, an endpoint generation module, a span generation module and a span cache Buffer; the realization method comprises the following steps: the vertex coordinates of the triangle and the parameters of the side equations processed by the vertex shader are sent to a vertex data Buffer, the three sides of the triangle are subjected to boundary normalization processing, end points of each side are generated in parallel through a plurality of groups of end point generators, the end points with the same Y coordinate are sent to a span generator, a span of the triangle is obtained through classification and comparison of the end points, and the span of the triangle is sent to the span Buffer. The method can generate the internal scanning lines of the triangle in parallel, save hardware logic resources and improve the efficiency of triangle rasterization.

Description

Device and method for realizing triangle rasterization in GPU

Technical Field

The invention relates to the field of GPU design, in particular to a device and a method for realizing triangle rasterization in a GPU.

Background

Under the push of diversified application requirements, the semiconductor manufacturing process level is rapidly developed, and the functions and the performance of a computer system are greatly enriched and improved. The pursuit of the user for the three-dimensional visual effect makes the image processing only by improving the processing speed of the CPU, which can not meet the requirements of people to a great extent, and the Graphics Processing Unit (GPU) is produced accordingly. The GPU has a strong data calculation capability as a core of a computer display system, realizes functions such as 2D/3D graphics, image processing, display control, and the like in a hardware acceleration manner, frees a general-purpose CPU from a complex graphics algorithm and drawing, and has become a standard configuration of almost all types of computer systems.

From the appearance of the GPU, the hardware architecture has been reformed for several times, but the idea of rasterization of primitives is an essential loop in the graphics pipeline, and the quality and efficiency of rasterization of primitives directly affect the performance of the whole GPU pipeline. Of all the primitive objects of the GPU, the triangle, which is the most basic and most important primitive in the GPU, is the basic primitive that makes up any other more complex two-dimensional or three-dimensional object. The most important indicator of triangle rasterization is efficiency, i.e., how many triangle primitives a GPU can process in a unit time.

For increasing demands on GPU processing performance, a GPU will usually integrate dozens to hundreds of parallel rasterization modules to increase performance. The method of improving the rasterization efficiency by simply increasing the number of modules increases the scale and complexity of a chip, increases the design cost, and improves the efficiency of a single rasterization module on the basis of expanding the number of rasterization modules.

Disclosure of Invention

The invention aims to provide a device and a method for realizing triangle rasterization in a GPU (graphics processing Unit), which realize the generation of a triangle span endpoint by adopting a parallel stepping mode, greatly improve the rasterization efficiency on the premise of not only increasing the number of modules, reduce the complexity of hardware design and improve the triangle rasterization efficiency.

The technical solution for realizing the purpose of the invention is as follows: an apparatus for implementing triangle rasterization in a GPU, comprising:

vertex data Buffer: the system comprises a triangle, a vertex coordinate attribute and three sides, wherein the triangle is used for reading the vertex coordinate attribute and the parameter information of the three sides;

the side function normalization module: the system is used for converting the edge equations of the three edges of the triangle into the edge equation of the same form through normalization and placing the edge equation in a coordinate system of the triangular bounding box;

an endpoint generation module: the system is used for traversing the normalized side equation along the Y coordinate to generate a corresponding normalized endpoint coordinate X and reducing the calculated normalized coordinate to the coordinate of the side equation before normalization;

a span generation module: obtaining two end points of the triangle under the same Y coordinate by comparing the end point coordinates of the left and right boundaries of the triangle;

span Buffer: and the block generator is used for storing the endpoint coordinates of the triangle span and packaging the endpoint coordinates to be sent to the back end for processing.

A method for implementing triangle rasterization in a GPU comprises the following steps:

the vertex data cache Buffer reads the vertex coordinate attribute of the triangle and the parameter information of the three edges;

the side function normalization module transforms the side equations of the three sides of the triangle into the side equation of the same form through normalization and places the side equation in a coordinate system of the triangular bounding box;

the end point generating module traverses the normalized side equation along the Y coordinate to generate a corresponding normalized end point coordinate X, and reduces the calculated normalized coordinate to the coordinate of the 6 types of side equations before normalization;

the span generation module obtains two end points of the triangle under the same Y coordinate by comparing the end point coordinates of the left and right boundaries of the triangle;

the span Buffer stores the endpoint coordinates of the triangle span, and packs the endpoint coordinates and sends the endpoint coordinates to the block generator at the back end for processing.

Compared with the prior art, the invention has the advantages that: 1) parallel processing span generation improves the expandability of hardware; 2) the division is replaced by an addition iteration mode, so that the hardware complexity is simplified, and the logic resource is saved; 3) by the aid of the device, the triangular rasterization processing efficiency can be greatly improved on the basis of not increasing the number of rasterization modules.

Drawings

Fig. 1 is a structural diagram of a device for implementing triangle rasterization in a GPU implemented by the present invention.

FIG. 2 is a schematic diagram of a bounding box coordinate system for triangle primitives according to an embodiment of the present invention.

FIG. 3 is a schematic diagram of a triangle boundary and effective half-plane type division method according to the present invention.

FIG. 4 is a diagram illustrating an embodiment of a boundary traversal search method for a first valid endpoint according to the present invention.

FIG. 5 is a schematic diagram of an implementation method for searching effective endpoints by step traversal according to the present invention.

FIG. 6 is a schematic diagram of a method for selecting and implementing left and right span endpoints according to the present invention.

Detailed Description

As shown in fig. 1, an apparatus for implementing triangle rasterization in a GPU of the present invention is composed of the following components:

(1) vertex data Buffer: and reading the vertex coordinate attribute and the parameter information of three edges of the triangle.

(2) The side function normalization module: the edge equations for three sides of a triangle are transformed into the same form of edge equations by normalization and placed in a triangle Bounding Box (Bounding Box) coordinate system.

(3) An endpoint generation module: and the system is used for traversing the normalized edge equation along the Y coordinate to generate a corresponding normalized endpoint coordinate X, and reducing the calculated normalized coordinate to the coordinate of the 6 types of edge equations before normalization.

(4) A span generation module: and comparing the coordinates of the end points of the left and right boundaries of the triangle to obtain two end points of the triangle under the same Y coordinate.

(5) Span Buffer: and the block generator is used for storing the endpoint coordinates of the triangle span and packaging the endpoint coordinates to be sent to the back end for processing.

The invention also discloses a method for realizing triangle rasterization in the GPU, which comprises the following steps:

(1) the vertex data cache Buffer reads the vertex coordinate attribute of the triangle and the parameter information of the three edges;

the vertex coordinate attribute and the parameter information of the three sides of the triangle comprise three vertex coordinates (x) of the triangle₁,y₁)、(x₂,y₂)、(x₃,y₃) The parameter information of the three sides is the parameter information (a) corresponding to the side equation f (x, y) Ax + By + C0₁，B₁，C₁)、(A₂，B₂，C₂)、(A₃，B₃，C₃)。

(2) The side function normalization module transforms the side equations of the three sides of the triangle into the side equation of the same form through normalization and is arranged in the triangle bounding box coordinate system:

step 1: the coordinates of the triangle bounding box, i.e. the coordinates of the smallest rectangle that encloses the triangle, are determined. The coordinate of the upper left corner of the bounding box is (x)_min，y_min) The coordinate of the lower right corner is (x)_max，y_max) Corresponding bounding box height

(in the direction ofUpper rounded) bounding box width W2^mWherein

As shown in FIG. 2, the coordinates of the upper left corner of the bounding box are defined as (0, 0), then the coordinates of the triangular bounding box are 0 ≦ x ≦ W, and 0 ≦ y ≦ H.

Step 2: the type of the three edges is determined. Because the internal data of the triangle is effective data, the three edges are judged in the clockwise direction, and the right half plane of the edge is effective data, namely the coordinate of f (x, y) is more than or equal to 0; in order to avoid repeated calculation of different triangle boundaries in the GPU, the left boundary of the default triangle is a solid line, the right boundary of the default triangle is a dotted line, namely the relative coordinate range is that x is more than or equal to 0 and less than W, and y is more than or equal to 0 and less than or equal to H; then, according to the edge equation f (x, y) ═ Ax + By + C, the half plane representing the edge of the valid data inside the triangle is divided into a half-closed plane with left boundary f (x, y) ≥ Ax + By + C ≥ 0 and a half-open plane with right boundary f (x, y) ≥ Ax + By + C > 0; accordingly, the edge types can be divided into 6 types as shown in fig. 3:

class 1: when A is less than 0 and B is more than or equal to 0, let f (x, y) be the half-open plane of Ax + By + C > 0;

class 2: when A <0 and B <0, let f (x, y) be half-open plane Ax + By + C > 0;

class 3: when a is 0 and B is <0, let f (x, y) be a half-open plane of Ax + By + C > 0;

class 4: when A is greater than 0 and B is less than or equal to 0, making f (x, y) equal to Ax + By + C equal to or more than 0;

class 5: when A is greater than 0 and B is greater than 0, let f (x, y) equal to Ax + By + C ≧ 0 semi-closed plane;

class 6: when A is equal to 0 and B is equal to or larger than 0, let f (x, y) be equal to Ax + By + C is equal to or larger than 0;

in FIG. 3, the shadow directions of class 1 to class 6 are half planes where f (x, y) is equal to or greater than 0, the dotted line represents a half-open plane, and the solid line represents a half-closed plane. In order to avoid repeated calculation of different triangle boundaries in the GPU, the default triangle has a solid left boundary and a dashed right boundary, i.e. x is greater than or equal to 0 and less than W, and y is greater than or equal to 0 and less than or equal to H, so that the calculation of repeated boundaries is reduced, and hardware resources are saved.

And step 3: according to step 2In the method, after the boundary types of the three sides of the triangle are judged, in order to simplify the boundary traversal mode and save hardware resources, the boundaries of different types are uniformly converted into f (x, y) ═ Ax + By + C ≥ 0 (A)<0, B is more than or equal to 0). Setting the original edge equation of any boundary as f (x, y) ═ Ax + By + C, and replacing x, y, C, that is, x ═ x-x_min，y＝y-y_min，C＝C+Ax_min+By_minThe edge equation f (x, y) of the bounding box coordinate system is obtained as Ax + By + C. Then converting the effective edge equation into f (x, y) ═ Ax + By + C ≧ 0 (A)<0, B is more than or equal to 0), the conversion relation is as follows:

A＝-|A|,B＝|B|,

where W is the bounding box width.

(3) The endpoint generation module traverses the normalized edge equation along the Y coordinate to generate a corresponding normalized endpoint coordinate X, and reduces the calculated normalized coordinate to the coordinate of the 6 types of edge equations before normalization:

step 1: and traversing the boundary, and finding the position where the effective boundary endpoint starts. Since the bounding box already defines the x-coordinate of the boundary between 0, W), the value of the coordinate x outside this interval is meaningless. According to the boundary equation f (x, y) ═ Ax + By + C ═ 0 (a)<0, B is more than or equal to 0), obtaining:

therefore, as shown in fig. 4, when the traversal is performed along X ≧ 0, the valid endpoint (X) can be obtained only if f (0, y) ≧ 0₀，Y₀) Wherein X is₀、Y₀Are all integers.

Step 2: calculating step parameter, calculating effective end point X according to the found effective end point position₀Value, slope K of the edge, and related parameters. In order to save hardware resources, integer calculation is adopted in the GPU for relevant calculation, all values are integers, and then the integers are obtained

(the rounding is performed downwards),

X₀value of remainder E₀＝f(0,Y₀) mod | A |, K, the numeric remainder R₀| B | mod | a |. In the invention, in order to avoid that the division calculation occupies a large amount of hardware resources, and meanwhile, the value is limited to [0, W ] according to x, the division calculation is carried out by adopting a binary addition iteration-based method, and the calculation principle is as follows:

let b be a q + r, where q is a divisor, and take a value of (0, 2)^m) And can be expressed as an m +1 bit binary number: k is a radical of_mk_m-1。。。k₂k₁k₀，

I.e. q ═ k₀+k₁*2+k₂*2²+...+k_m*2^m＝k₀+2*(k₁+2*(k₂+2*(...+2*k_m) ...)) to obtain binary value of q and remainder r through m +1 times of iteration.

And step 3: step-by-step traversal is carried out, and the initial value X obtained in the step 2 is used₀Slope K, remainder E₀The remainder R₀Traversal is performed along the Y-axis direction to obtain all valid x values, as shown in fig. 5, until Y is H, or x>W is added. The invention adopts a step iteration method to perform traversal calculation, and the calculation method comprises the following steps:

order to

To obtain

Where n is 1, 2,. . . H-Y₀。

The difference from the traditional Bresenham algorithm is that the slope of the edge does not need to be reduced below 1 in the iterative mode, and the calculation of all positive value slopes can be realized.

And 4, step 4: restoring the normalized coordinate, reversely restoring to a coordinate value before normalization according to the normalization process, wherein the restoring condition of the x value is as follows: x ═ X (class 1), X ═ W-1-X (class 2, class 3), X ═ X +1 (class 4), and X ═ W-X (class 5, class 6), where X is the endpoint coordinate before reduction.

(4) The span generation module obtains two endpoints of the triangle under the same Y coordinate by comparing the coordinates of the endpoints of the left and right boundaries of the triangle, and the two endpoints have the following characteristics:

step 1: determining end point values, determining left and right boundaries according to parameters of the edge equation, comparing end point values of the same boundary with the same Y coordinate and with bounding box coordinates, and selecting the left end point (P in the figure) with the maximum value of the left boundary as the span₂) Selecting the minimum value of the right boundary as the right end point (P in the figure)₃) While the selected value is determined to be between [0, W), the left endpoint is 0 if it is less than 0 and W-1 if it is greater than W.

Step 2: and (3) generating a complete span, namely restoring the coordinate values of the two endpoints selected in the step (1) into original coordinates, namely generating a complete span.

(5) The span cache Buffer stores the endpoint coordinates of the triangle span, packs the endpoint coordinates and sends the endpoint coordinates to the block generator at the rear end for processing, and specifically comprises the following steps:

and sending a group of continuous span bands to the block generation unit for processing in each clock cycle, and if the transmission of the last span band is finished, setting a triangle end flag bit to be valid and sending a primitive end flag to the block generator.

The present invention will be described in detail with reference to examples.

Examples

As shown in fig. 1, an apparatus for implementing triangle rasterization in a GPU is composed of the following parts.

(1) Vertex data Buffer: the method is used for reading the vertex coordinate attribute and the parameter information of three sides of the triangle, and the vertex coordinate attribute and the parameter information of three sides of the triangle comprise three vertex coordinates (x) of the triangle₁,y₁)、(x₂,y₂)、(x₃,y₃) The parameter information of the three sides is the parameter information (a) corresponding to the side equation f (x, y) Ax + By + C0₁，B₁，C₁)、(A₂，B₂，C₂)、(A₃，B₃，C₃)。

(2) The side function normalization module: and the edge equation for converting the three edges of the triangle into the edge equation in the same form through normalization and placing the edge equation in the coordinate system of the triangle bounding box.

(5) Span Buffer: and the block generator is used for storing the endpoint coordinates of the triangle span and packaging the endpoint coordinates to be sent to the back end for processing. And sending a group of continuous span bands to the block generation unit for processing in each clock cycle, and if the transmission of the last span band is finished, setting a triangle end flag bit to be valid and sending a primitive end flag to the block generator.

As shown in fig. 2, a schematic diagram of a bounding box coordinate system of a triangle primitive is implemented. The coordinates of the triangle bounding box, i.e. the coordinates of the smallest rectangle that encloses the triangle, are determined. The coordinate of the upper left corner of the bounding box is (x)_min，y_min) The coordinate of the lower right corner is (x)_max，y_max) Corresponding bounding box height

(rounded up), bounding box width W2^mWherein

The coordinate of the upper left corner of the bounding box is defined as (0, 0), then x is more than or equal to 0 and less than or equal to W, and y is more than or equal to 0 and less than or equal to H.

As shown in fig. 3, a schematic diagram of an implementation method of triangle boundary and effective semi-plane type division is shown. The type of the three edges is determined. Because the internal data of the triangle is effective data, the three edges are judged in the clockwise direction, and the right half plane of the boundary is effective data, namely the coordinate of f (x, y) is more than or equal to 0. According to the equation f (x, y) ═ Ax + By + C, the edge types can be divided into 6 forms in the figure, the shadow directions of the classes 1 to 6 are half planes with f (x, y) ≥ 0, the dotted line represents a half-open plane, and the solid line represents a half-closed plane. In order to avoid repeated calculation of different triangle boundaries in the GPU, the default triangle has a solid left boundary and a dashed right boundary, i.e. x is greater than or equal to 0 and less than W, and y is greater than or equal to 0 and less than or equal to H, so that the calculation of repeated boundaries is reduced, and hardware resources are saved.

After the boundary types of the three sides of the triangle are judged, in order to simplify the boundary traversal mode and save hardware resources, the boundaries of different types are uniformly converted into f (x, y) ═ Ax + By + C ≥ 0 (A)<0, B is more than or equal to 0). Setting the original edge equation of any boundary as f (x, y) ═ Ax + By + C, and replacing x, y, C, that is, x ═ x-x_min，y＝y-y_min，C＝C+Ax_min+By_minThe edge equation f (x, y) of the bounding box coordinate system is obtained as Ax + By + C. Then converting the effective edge equation into f (x, y) ═ Ax + By + C ≧ 0 (A)<0, B is more than or equal to 0), the conversion relation is as follows:

A＝-|A|,B＝|B|,

where W is the bounding box width.

As shown in fig. 4, the boundary traversal search first valid endpoint implementation method is schematically illustrated. Since the bounding box already defines the x-coordinate of the boundary between 0, W), the value of the coordinate x outside this interval is meaningless. According to the boundary equation f (x, y) ═ Ax + By + C ═ 0 (a)<0, B is more than or equal to 0), obtaining:

when the traversal is performed along X ≧ 0, the valid endpoint (X) can be obtained only if f (0, y) ≧ 0₀，Y₀) Wherein X is₀、Y₀Are all integers.

As shown in fig. 5, a schematic diagram of an implementation method for finding valid endpoints by step traversal is shown. Calculating the effective end point X according to the position of the effective end point₀Value, slope K of the edge, and related parameters. In order to save hardware resources, integer calculation is adopted in the GPU for relevant calculation, all values are integers, and then the integers are obtained

(the rounding is performed downwards),

According to the obtained initial value X₀Slope K, remainder E₀The remainder R₀Traversing along the Y-axis direction to obtain all valid x values until Y is H or x>W is added. The invention adopts a step iteration method to perform traversal calculation, and the calculation method comprises the following steps:

order to

To obtain

Wherein n is 1, 2, …, H-Y₀。

As shown in fig. 6, a schematic diagram of an implementation method for selecting the left and right end points of the span is shown. Determining left and right boundaries according to parameters of the edge equation, comparing end point values of the same boundary with the same Y coordinate and with bounding box coordinates, and selecting the maximum value of the left boundary as the left end point (P in the figure) of the span₂) Selecting the minimum value of the right boundary as the right end point (P in the figure)₃) While the selected value is determined to be between [0, W), the left endpoint is 0 if it is less than 0 and W-1 if it is greater than W. And restoring the coordinate values of the two end points into the original coordinate, namely generating a complete span.

Claims

1. An apparatus for implementing triangle rasterization in a GPU, comprising:

2. The apparatus of claim 1, wherein the apparatus is configured to implement triangle rasterization in the GPUThe vertex coordinate attribute of the triangle includes three vertex coordinates (x) of the triangle₁,y₁)、(x₂,y₂)、(x₃,y₃) The parameter information of the three sides is the parameter information (a) corresponding to the side equation f (x, y) Ax + By + C0₁，B₁，C₁)、(A₂，B₂，C₂)、(A₃，B₃，C₃)。

3. A method for implementing triangle rasterization in a GPU is characterized by comprising the following steps:

4. A method as defined in claim 3, wherein the vertex coordinates attributes of the triangle comprise three vertex coordinates (x) of the triangle₁,y₁)、(x₂,y₂)、(x₃,y₃) The parameter information of the three sides is the parameter information (a) corresponding to the side equation f (x, y) Ax + By + C0₁，B₁，C₁)、(A₂，B₂，C₂)、(A₃，B₃，C₃)。

5. The method according to claim 3, wherein the edge function normalization module transforms the edge equations of three edges of the triangle into the edge equations of the same form through normalization and places the edge equations in a triangle bounding box coordinate system, and specifically comprises the following steps:

step 1: determining the coordinates of a triangular bounding box, namely the coordinates of the smallest rectangle bounding the triangle; the coordinate of the upper left corner of the bounding box is (x)_min，y_min) The coordinate of the lower right corner is (x)_max，y_max) Corresponding bounding box height

Width W of bounding box 2^mWherein

The coordinate of the upper left corner of the bounding box is defined as (0, 0), then x is more than or equal to 0 and less than or equal to W, and y is more than or equal to 0 and less than or equal to H;

step 2: determining the types of the three edges; because the internal data of the triangle is effective data, the three edges are judged in the clockwise direction, and the right half plane of the edge is effective data, namely the coordinate of f (x, y) is more than or equal to 0; in order to avoid repeated calculation of different triangle boundaries in the GPU, the left boundary of the default triangle is a solid line, the right boundary of the default triangle is a dotted line, namely the relative coordinate range is that x is more than or equal to 0 and less than W, and y is more than or equal to 0 and less than or equal to H; then, according to the edge equation f (x, y) ═ Ax + By + C, the half plane representing the edge of the valid data inside the triangle is divided into a half-closed plane with left boundary f (x, y) ≥ Ax + By + C ≥ 0 and a half-open plane with right boundary f (x, y) ≥ Ax + By + C > 0; accordingly, the types of edges can be classified into the following 6 types:

class 2: when A <0 and B <0, let f (x, y) be half-open plane Ax + By + C > 0;

and step 3: judging the boundary types of the three sides of the triangle according to the mode of the step 2, uniformly converting the boundaries of different types into the boundary with f (x, y) Ax + By + C being more than or equal to 0 for processing, wherein A<0, B is more than or equal to 0; setting the original edge equation of any boundary as f (x, y) ═ Ax + By + C, and replacing x, y, C, that is, x ═ x-x_min，y＝y-y_min，C＝C+Ax_min+By_minObtaining an edge equation f (x, y) of the bounding box coordinate system as Ax + By + C; then converting the effective edge equation into a form of f (x, y) ═ Ax + By + C ≧ 0, A<0, B is more than or equal to 0, and the conversion relation is as follows:

where W is the bounding box width.

6. The method according to claim 3, wherein the endpoint generation module traverses the normalized edge equation along the Y coordinate to generate a corresponding normalized endpoint coordinate X, and restores the calculated normalized coordinate to the coordinate of the 6 classes of edge equations before normalization, and specifically includes the following steps:

step 1: the boundary traversal is used for finding the position where the effective boundary endpoint starts; since the bounding box already confines the x-coordinate of the boundary between [0, W), the value of the coordinate x outside this interval is meaningless; according to the boundary equation f (x, y) ═ Ax + By + C ═ 0, a<0, B is more than or equal to 0, and the following can be obtained:

A<0; when the traversal is performed along X ≧ 0, the valid endpoint (X) can be obtained only if f (0, y) ≧ 0₀，Y₀) Wherein X is₀、Y₀Are all integers;

step 2: calculating step parameter, calculating effective end point X according to the found effective end point position₀Value, slope K of the edge and related parameters; in the GPU, the correlation calculation adopts integer calculation, all values are integers, and then the integer is obtained

X₀Value of remainder E₀＝f(0,Y₀) mod | A |, K, the numeric remainder R₀| B | mod | a |; the method based on binary addition iteration is adopted to carry out division calculation, and the calculation principle is as follows: let b be a q + r, where q is a divisor, and take a value of (0, 2)^m) And can be expressed as an m +1 bit binary number: k is a radical of_mk_m-1。。。k₂k₁k₀I.e. q ═ k₀+k₁*2+k₂*2²+...+k_m*2^m＝k₀+2*(k₁+2*(k₂+2*(...+2*k_m) ...)) to obtain binary value of q by m +1 times of iteration and obtain remainder r;

and step 3: step-by-step traversal is carried out, and the initial value X obtained in the step 2 is used₀Slope K, remainder E₀The remainder R₀Traversing along the Y-axis direction to obtain all valid x values until Y is H or x>W; the step iteration method is adopted for traversal calculation, and the calculation method is as follows:

order to

To obtain

Wherein n is 1, 2, …, H-Y₀。

And 4, step 4: restoring the normalized coordinate, reversely restoring to a coordinate value before normalization according to the normalization process, wherein the restoring condition of the x value is as follows: class 1: x ═ X, class 2, class 3: X-W-1-X, class 4: x +1, class 5, class 6: and X is W-X, wherein X is the endpoint coordinate before reduction.

7. The method according to claim 3, wherein the span generation module obtains two endpoints of the triangle at the same Y coordinate by comparing the coordinates of the endpoints of the left and right boundaries of the triangle, and specifically comprises the following steps:

step 1: determining end point values, determining left and right boundaries according to parameters of an edge equation, comparing the end point values of the same type of boundary of the same Y coordinate with the bounding box coordinate, selecting a left end point with the maximum value of the left boundary as a span, selecting a right end point with the minimum value of the right boundary, and simultaneously determining that the selected value is between [0, W ], wherein if the left end point is less than 0, the left end point is 0, and if the right end point is more than W, the right end point is W-1.

8. The method of claim 3, wherein the span Buffer sends a set of consecutive span bands to the block generator for processing each clock cycle, and if the last span band is over, sets the triangle end flag bit valid and sends the primitive end flag to the block generator.