CN111047498A

CN111047498A - GPU hardware copy buffer algorithm-oriented TLM microstructure

Info

Publication number: CN111047498A
Application number: CN201911125649.5A
Authority: CN
Inventors: 陈佳; 姜丽云; 张少锋; 吴晓成; 任向隆; 赵彬
Original assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Current assignee: Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-04-21
Anticipated expiration: 2039-11-18
Also published as: CN111047498B

Abstract

The invention relates to the technical field of computer hardware modeling, in particular to a TLM microstructure facing GPU hardware copy buffer algorithm. The invention provides a GPU hardware copy buffer algorithm-oriented TLM microstructure which comprises a copy parameter calculation module 1, a buffer allocation module 2, a buffer lower boundary processing module 3, a height direction buffer processing module 4, a buffer upper boundary processing module 5 and a tile line pixel copy module 6. The method realizes the TLM model-based copy buffer area algorithm function and the realization structure, solves the problem of GPU hardware copy buffer area algorithm function verification, solves the conditions that the copied coordinate is positioned outside the buffer area or the copy width is larger than the buffer area, and the like, improves the hardware performance of the GPU, reduces the condition of copy errors, and effectively accelerates the RTL design development.

Description

GPU hardware copy buffer algorithm-oriented TLM microstructure

Technical Field

The invention relates to the technical field of computer hardware modeling, in particular to a TLM microstructure facing GPU hardware copy buffer algorithm.

Background

In the design and development of a graphics processor chip (hereinafter referred to as GPU), the correctness and efficiency of an algorithm are important factors determining the function and performance of the GPU. The OpenGL API supports copying pixels from a buffer, but does not define how the copied pixels should be processed when the copy coordinates are outside the buffer. When the copied coordinates are outside the buffer area or the copy width is larger than the buffer area, reading out boundary crossing or copy dislocation or a large number of invalid copy behaviors are easy to process, and the hardware performance of the GPU is reduced, which is a technical problem to be solved. When the GPU chip hardware is used for debugging the details of the algorithm, the verification and debug at the RTL stage are difficult. Therefore, the algorithm needs to be verified as early as possible before the RTL design, and a reference basis is provided for the RTL design.

Disclosure of Invention

Based on the problems in the background art, the TLM microstructure facing the GPU hardware copy buffer algorithm can solve the problems of correctness and high efficiency of the RTL simulation copy buffer algorithm and can assist RTL to perform functional verification on the TLM model on the hardware microstructure of the copy buffer algorithm in advance.

The technical solution of the invention is as follows:

a GPU hardware copy buffer algorithm-oriented TLM microstructure comprises a calculation copy parameter module 1, a buffer dispatching module 2, a buffer lower boundary processing module 3, a height direction buffer processing module 4, a buffer upper boundary processing module 5 and a tile line pixel copy module 6;

the copy parameter calculating module 1, the buffer region dispatching module 2, the lower boundary processing module 3 of the buffer region and the tile row pixel copying module 6 are connected in sequence;

the copy parameter calculating module 1, the buffer allocation module 2, the height direction buffer processing module 4 and the tile row pixel copying module 6 are connected in sequence;

the copy parameter calculating module 1, the buffer area dispatching module 2, the buffer area upper boundary processing module 5 and the tile line pixel copying module 6 are connected in sequence;

the copy parameter calculating module 1 is used for calculating the distance exceeding the upper bound in the y direction, the positive and negative copy distances in the x and y directions, the copy starting coordinates in the x and y directions, the copy starting tile coordinate, the number of copy tiles in the x direction and the number of positive and negative copy tiles in the y direction;

the buffer area allocating module 2 is used for allocating tiles in the y negative direction to the buffer area lower boundary processing module 3, allocating tiles in the y positive direction to the height direction buffer area processing module 4 and allocating out-of-limit tiles rows to the buffer area upper boundary processing module 5;

the lower boundary processing module 3 of the buffer area is used for processing tile row copy pixels in the y negative direction;

the height direction buffer processing module 4 is used for processing tile row copy pixels in the y positive direction;

the buffer upper boundary processing module 5 is used for processing tile line copy pixels exceeding the upper boundary of the video memory;

the tile row pixel copying module 6 is used for copying tile row pixels;

the height direction buffer processing module 4 comprises a read pixel submodule 41, an x direction copy pixel submodule 42 and a tile row position calculation submodule 43;

wherein tile represents a 4x4 pixel block, the x and y coordinates of the leftmost lower pixel are both integer multiples of 4, tile line represents 4 pixel lines, the y coordinate of the starting pixel line is integer multiple of 4, and the left lower corner coordinate (x, y) of the buffer area is set as the origin.

Further, in the above-mentioned case,

the copy parameter calculating module 1 receives the copy coordinates and the copy width and height;

calculating the distance of y direction exceeding the upper bound, positive and negative copy distance in x and y directions, copy initial coordinate in x and y directions, copy initial tile coordinate, number of copy tiles in x direction, and number of positive and negative copy tiles in y direction;

and then the distance that the y direction exceeds the upper bound, the positive and negative copy distances in the x and y directions, the copy start coordinates in the x and y directions, the copy start tile coordinate, the number of the copy tiles in the x direction and the number of the positive and negative copy tiles in the y direction are sent to the buffer allocation module 2 through the TLM interface.

Further, the buffer allocation module 2 receives the positive and negative copy distances in the x and y directions, the copy start coordinates in the x and y directions, the copy start tile coordinates, the number of copy tiles in the x direction, and the number of positive and negative copy tiles in the y direction sent by the calculation copy parameter module 1,

sending the negative copy distance in the y direction to a lower boundary processing module 3 of the buffer area through a TLM interface;

sending the copy starting coordinate in the y direction, the positive copy distance in the y direction, the copy starting tile coordinate, the number of the copy tiles in the x direction and the negative copy distance in the x direction to the height direction buffer processing module 4 through the TLM interface;

sending the distance exceeding the upper bound in the y direction to a buffer upper bound processing module 5 through a TLM interface;

the copy start coordinate in the x-direction is sent to the tile line pixel copy module 6 via the TLM interface.

Further, the lower boundary processing module 3 of the buffer receives the y-direction negative copy distance sent by the buffer allocation module 2,

calculating the start and end positions of the tile row in the y direction, setting all the tile row pixels to 0,

the copy pixel tile line, the start and end positions of the tile line are then sent to the tile line pixel copy module 6 via the TLM interface.

Further, the height direction buffer processing module 4 receives the y direction copy start coordinate, the y direction positive copy distance, the copy start tile coordinate, the x direction copy tile number, and the x direction negative copy distance sent by the buffer allocation module 2,

reading the pixels of the buffer area by calculating the tile coordinates of the buffer area, then performing 0 complementing processing on the pixels outside the buffer area in the x direction, finally calculating the starting position and the ending position of the tile line,

and sending the tile line of the copy pixel and the starting and ending positions of the tile line to a tile line pixel copy module 6 through a TLM interface.

Further, the buffer upper boundary processing module 5 receives the distance that the y direction sent by the buffer dispatching module 2 exceeds the upper boundary,

and then sending the tile line of the copy pixel and the starting and ending positions of the tile line to a tile line pixel copy module 6 through a TLM interface.

Further, the tile line pixel copying module 6 receives tile lines of copied pixels and start and end positions of tile lines sent by the buffer lower boundary processing module 3 and the height direction buffer processing module 4, and x-direction start coordinates sent by the buffer dispatching module 2,

the start pixel of the tile row is calculated from the copy start coordinate in the x-direction,

then copy operation of tile row pixels is carried out.

Further, the read pixel submodule 41 receives the copy start tile coordinate sent by the buffer allocation module 2, calculates and reads the coordinate of each tile according to the number of the copy tiles in the x direction,

the buffer pixels are read according to tile coordinates,

and sends the read buffer pixels to the x-direction copy pixel submodule 42.

Further, the x-direction copy pixel sub-module 42 receives the x-direction negative copy distance sent by the buffer allocation module 2 and the buffer pixel sent by the read pixel sub-module 41,

the x-direction negative copy distance is reserved in front of the read line of pixels, which are all filled with 0, as the pixels outside the buffer,

the processed copy pixel lines are then sent to tile line pixel copy modules 6.

Further, the tile row position calculating submodule 43 receives the y start coordinate and the positive copy distance in the y direction sent by the buffer allocation module 2 to calculate the start and end positions of each tile row,

the start and end positions of the tile row are sent to the tile row pixel copy module 6.

The invention has the beneficial effects that:

the method realizes the TLM model-based copy buffer area algorithm function and the realization structure, solves the problem of GPU hardware copy buffer area algorithm function verification, solves the conditions that the copied coordinate is positioned outside the buffer area or the copy width is larger than the buffer area, and the like, improves the hardware performance of the GPU, reduces the condition of copy errors, and effectively accelerates the RTL design development.

Drawings

FIG. 1 is a block diagram of a hardware TLM micro-architecture for a copy buffer algorithm in accordance with the present invention;

Detailed Description

The technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings and the specific embodiments. It is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than the whole embodiments, and that all other embodiments, which can be derived by a person skilled in the art without inventive step based on the embodiments of the present invention, belong to the scope of protection of the present invention.

The invention provides a GPU hardware copy buffer algorithm-oriented TLM microstructure, which comprises a calculation copy parameter module 1, a buffer allocation module 2, a buffer lower boundary processing module 3, a height direction buffer processing module 4, a buffer upper boundary processing module 5 and a tile line pixel copy module 6, wherein the buffer allocation module is used for allocating a plurality of buffer areas;

the tile row pixel copying module 6 is used for copying tile row pixels;

The buffer allocation module 2 receives the positive and negative copy distances in the x and y directions, the copy start coordinates in the x and y directions, the copy start tile coordinates, the number of copy tiles in the x direction and the number of positive and negative copy tiles in the y direction sent by the calculation copy parameter module 1,

The lower boundary processing module 3 of the buffer receives the y-direction negative copy distance sent by the buffer allocation module 2,

The height direction buffer area processing module 4 receives the copy starting coordinate in the y direction, the positive copy distance in the y direction, the copy starting tile coordinate, the number of the copy tiles in the x direction and the negative copy distance in the x direction sent by the buffer area allocating module 2,

The buffer upper boundary processing module 5 receives the distance that the y direction sent by the buffer dispatching module 2 exceeds the upper boundary,

The tile line pixel copying module 6 receives tile lines of copied pixels and the starting and ending positions of the tile lines sent by the buffer lower boundary processing module 3 and the height direction buffer processing module 4, and the x direction starting coordinate sent by the buffer dispatching module 2,

then copy operation of tile row pixels is carried out.

The read pixel submodule 41 receives the copy start tile coordinate sent by the buffer dispatch module 2 and the number of the copy tiles in the x direction to calculate and read the coordinate of each tile,

the buffer pixels are read according to tile coordinates,

and sends the read buffer pixels to the x-direction copy pixel submodule 42.

The x-direction copy pixel sub-module 42 receives the x-direction negative copy distance sent by the buffer dispatch module 2 and the buffer pixel sent by the read pixel sub-module 41,

the processed copy pixel lines are then sent to tile line pixel copy modules 6.

The tile row position calculating submodule 43 receives the y start coordinate and the positive copy distance in the y direction sent by the buffer allocation module 2 to calculate the start and end positions of each tile row,

The GPU hardware copy buffer area oriented algorithm based on the structure comprises the following steps:

1) calculating a copy range parameter:

calculating positive and negative copy distances, initial coordinates and copy initial tile coordinates in the x and y directions according to the input copy coordinates and the width and height; calculating the positive direction copy tile number of x and y according to the copy initial tile coordinate; and calculating the negative direction tile number of the y according to the negative direction length of the y and the total copy tile number in the y direction.

2) Height direction buffer allocation:

dividing the buffer into positive and negative in the height direction, assigning tile lines in the negative buffer to step 3) and tile lines in the positive buffer to step 4) according to the number of copy tiles positive and negative in the y direction.

3) Copy pixel tile line process in y negative direction:

the number of negative copy tiles in the y direction is the number of tile lines needing to be copied outside the buffer area, the tiles are directly given to 0 without being copied, the initial position of the tile line corresponding to the first tile line needs to be calculated according to the copy height in the negative direction, the initial position of the tile line is 0 in other cases, and the end position of the tile line is 4.

4) Copying pixel tile lines in the y positive direction:

4.1) read buffer pixels:

and calculating and reading the coordinate of each tile according to the copy starting tile coordinate and the number of the tiles copied in the x direction, and then reading the pixels of the buffer area according to the tile coordinate.

4.2) x-direction copy pixel processing:

the x-direction negative copy distance is reserved in front of the read pixel row and these pixels are all filled with 0's as the pixels outside the buffer.

4.3) tile line start and end position calculation:

and respectively calculating the starting position and the ending position of the tile line for the first tile line and the last tile line according to the y starting coordinate and the positive copy distance in the y direction. Otherwise, the start position of the tile row is 0 and the end position is 4.

5) y-exceed upper bound copy pixel tile line processing

These tiles also do not need to go through the copy process and are given directly to 0, with the start positions of tile lines all being 0. For the last tile row, the end position of the tile row is calculated based on the distance exceeding the upper bound, and the end positions of the other tile rows are all 4.

6) Tile line pixel copy

Firstly, the restarting pixel of the tile row is calculated according to the copy coordinate in the x direction, and then the copy operation of the pixel of the tile row is carried out.

Example (b):

the invention is described in further detail below with reference to the accompanying drawings, which refer to fig. 1.

step 1, calculating copy range parameters, and calculating positive and negative copy distances, initial coordinates and copy initial tile coordinates in x and y directions according to input copy coordinates and width and height; calculating the positive direction copy tile number of x and y according to the copy initial tile coordinate; and calculating the negative direction tile number of the y according to the negative direction length of the y and the total copy tile number in the y direction.

And 2, allocating a buffer area in the height direction, dividing the buffer area into positive and negative in the height direction, allocating tile rows in a negative buffer area to the step 3 according to the number of copy tiles in the positive and negative in the y direction, and allocating tile rows in a positive buffer area to the step 4.

And 3, processing the tile rows of the y negative direction copy pixels, wherein the number of the y negative direction copy tiles is the number of the tile rows needing to be copied outside the buffer area, the tiles are directly given to 0 without a copying process, the initial position of the tile row corresponding to the first tile row needs to be calculated according to the negative direction copy height, the initial positions of the tile rows are all 0 under other conditions, and the end positions are all 4.

And 4, copying pixels in the positive y direction for tile line processing, reading the pixels in the buffer area, calculating and reading the coordinate of each tile according to the copy starting tile coordinate and the copy tile number in the positive x direction, and reading the pixels in the buffer area according to the tile coordinate. Then, the x-direction copy pixel processing is performed, and the x-direction negative copy distance is reserved before the read pixel row, and all the pixels are filled with 0 to be used as pixels outside the buffer area. And finally, calculating the starting and ending positions of the tile rows, and respectively calculating the starting and ending positions of the first and last tile rows according to the y starting coordinate and the positive copy distance in the y direction. Otherwise, the start position of the tile row is 0 and the end position is 4.

And 5, processing the tile rows of the y-beyond-upper-bound copy pixels, wherein the tiles are directly given to 0 without a copying process, and the starting positions of the tile rows are all 0. For the last tile row, the end position of the tile row is calculated based on the distance exceeding the upper bound, and the end positions of the other tile rows are all 4.

And 6, copying the pixels of the tile rows, namely firstly calculating the restarting pixels of the tile rows according to the copy coordinates in the x direction, and then copying the pixels of the tile rows.

Claims

1. A GPU hardware copy buffer algorithm-oriented TLM microstructure is characterized in that: the method comprises a copy parameter calculation module 1, a buffer allocation module 2, a buffer lower boundary processing module 3, a height direction buffer processing module 4, a buffer upper boundary processing module 5 and a tile line pixel copy module 6;

the tile row pixel copying module 6 is used for copying tile row pixels;

wherein tile represents a 4x4 pixel block, the x and y coordinates of the leftmost lower pixel are both integer multiples of 4,

tile row represents 4 pixel rows, the y coordinate of the starting pixel row is an integer multiple of 4,

the lower left corner coordinates (x, y) of the buffer are set to the origin.

2. A GPU hardware copy buffer algorithm oriented TLM micro-architecture as claimed in claim 1, wherein:

3. A GPU hardware copy buffer algorithm oriented TLM micro-architecture as claimed in claim 1, wherein:

4. A GPU hardware copy buffer algorithm oriented TLM micro-architecture as claimed in claim 1, wherein:

5. A GPU hardware copy buffer algorithm oriented TLM micro-architecture as claimed in claim 1, wherein:

6. A GPU hardware copy buffer algorithm oriented TLM micro-architecture as claimed in claim 1, wherein:

7. A GPU hardware copy buffer algorithm oriented TLM micro-architecture as claimed in claim 1, wherein:

then copy operation of tile row pixels is carried out.

8. A GPU hardware copy buffer algorithm oriented TLM micro-architecture as claimed in claim 1, wherein:

the buffer pixels are read according to tile coordinates,

and sends the read buffer pixels to the x-direction copy pixel submodule 42.

9. A GPU hardware copy buffer algorithm oriented TLM micro-architecture as claimed in claim 1, wherein:

the processed copy pixel lines are then sent to tile line pixel copy modules 6.

10. A GPU hardware copy buffer algorithm oriented TLM micro-architecture as claimed in claim 1, wherein: