CN116308999A

CN116308999A - Data processing method of graphic processor, graphic processor and storage medium

Info

Publication number: CN116308999A
Application number: CN202310562344.0A
Authority: CN
Inventors: 阙恒; 孙超; 朱康挺; 姚小威
Original assignee: Li Computing Technology Shanghai Co ltd; Nanjing Lisuan Technology Co ltd
Current assignee: Li Computing Technology Shanghai Co ltd; Nanjing Lisuan Technology Co ltd
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2023-06-23
Anticipated expiration: 2043-05-18
Also published as: CN116308999B

Abstract

A data processing method of a graphics processor, the graphics processor and a storage medium, wherein the data processing method of the graphics processor comprises the following steps: receiving a data processing instruction; responding to the data processing instruction to instruct to execute data moving operation, and acquiring data moving information corresponding to the data processing instruction, wherein the data moving information comprises: source vertex cache location, destination vertex cache location, and data information of data to be moved; and moving the data to be moved from the source vertex cache position to the target vertex cache position. By adopting the scheme, the execution efficiency of the graphic processor can be improved, and the power consumption of the graphic processor can be reduced.

Description

Data processing method of graphic processor, graphic processor and storage medium

Technical Field

The present invention relates to the field of graphics processing technologies, and in particular, to a data processing method of a graphics processor, and a storage medium.

Background

The processing of graphics by modern programmable graphics processors (Graphics Processing Unit, GPUs) involves multiple stages, with each stage being executed in a pipelined fashion.

In some application scenarios, after vertex (vertex) data is processed at a certain stage, vertex data may not need to be processed at a next stage, but the vertex data may continue to be passed back.

When the vertex data is transferred backward, the corresponding vertex data needs to be read out from the source vertex cache and stored in a register. And then, reading out the read vertex data from the register and writing the read vertex data into a target vertex cache. If the attributes corresponding to the vertex data are more, more instructions are needed to complete the transfer of the vertex data, so that the power consumption of the graphics processor is higher and the execution efficiency is lower.

Disclosure of Invention

The invention solves the technical problems of larger power consumption and lower execution efficiency of the graphic processor.

In order to solve the above technical problems, the present invention provides a data processing method of a graphics processor, including: receiving a data processing instruction; responding to the data processing instruction to instruct to execute data moving operation, and acquiring data moving information corresponding to the data processing instruction, wherein the data moving information comprises: source vertex cache location, destination vertex cache location, and data information of data to be moved; and moving the data to be moved from the source vertex cache position to the target vertex cache position.

Optionally, if it is detected that the data processing instruction includes a move flag bit, it is determined that the data processing instruction indicates to perform a data move operation.

Optionally, the data information of the data to be moved includes: the number of the attributes corresponding to the data to be moved and the number of the data elements corresponding to each attribute.

Optionally, the data information of the data to be moved includes: the number of vertexes corresponding to the data to be moved, the number of attributes corresponding to each vertex and the number of data elements corresponding to each attribute.

Optionally, the moving the data to be moved from the source vertex cache location to the destination vertex cache location includes: reading the data elements of each attribute from the source vertex cache location in turn; and writing the read data element into the destination vertex cache position.

Optionally, the data processing method of the graphics processor further includes: after the data elements of all the attributes corresponding to one vertex are moved, carrying out data movement operation on the attribute data corresponding to the next vertex.

Optionally, obtaining the attribute number includes: determining a register indicated by the data processing instruction; and determining the number of the attributes based on the numerical value stored in the register.

Optionally, obtaining the attribute number includes: acquiring an immediate in the data processing instruction; and determining the number of the attributes based on the immediate.

The embodiment of the invention also provides a graphics processor, which comprises: an instruction receiving unit configured to receive a data processing instruction; the information acquisition unit is used for determining that the data processing instruction indicates to execute data moving operation and acquiring data moving information corresponding to the data processing instruction, and the data moving information comprises: source vertex cache location, destination vertex cache location, and data information of data to be moved; and the execution unit is used for moving the data to be moved from the source vertex cache position to the target vertex cache position.

The present invention also provides a computer readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the data processing method of a graphics processor as described in any of the above.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

responding to the data processing instruction to instruct to execute data moving operation, and acquiring data moving information corresponding to the data processing instruction, wherein the data moving information comprises a source vertex cache position, a target vertex cache position and data information of data to be moved. And according to the instruction of the data processing instruction, the data to be moved is moved from the source vertex cache to the target vertex cache, so that the data to be moved is moved by one data processing instruction, the execution efficiency of the graphic processor is improved, and the power consumption of the graphic processor is reduced.

Drawings

FIG. 1 is a flow chart of a data processing method of a graphics processor in an embodiment of the invention;

FIG. 2 is a flow chart of another data processing method of a graphics processor in an embodiment of the invention;

FIG. 3 is a schematic diagram of a graphics processor in accordance with an embodiment of the present invention.

Detailed Description

In the prior art, a typical pipeline of a graphics processor is generally composed of the following processing stages: input Integrator (IA), vertex Shader (VS), hull Shader (Hull Shader, HS), tessellator (tesselator), domain Shader (Domain Shader, DS), geometry Shader, GS), rasterizer, pixel Shader, PS, output Merge (OM).

In the geometry shader and the preceding processing stages, geometry information (also called vertex data) of the graphics is processed, the vertex data includes various attributes of vertices (vertexes), and the attributes include color attributes, position attributes, and the like. After the processing in the shader (loader) is completed, the data corresponding to the vertex is written into a vertex cache (vertex buffer) where the corresponding shader is located. When the shader in the next stage needs to use the vertex data processed by the shader in the previous stage, the vertex data is read from the vertex cache corresponding to the shader in the previous stage and stored into the register, and after the processing is finished, the vertex data is written into the vertex cache in which the shader in the current stage is positioned from the register.

In the prior art, the above-mentioned vertex data moving process can refer to the following instructions:

ILD r0, icp [0] [0]. X// reads the 0 th data of the 0 th attribute of the input control vertex 0 into register r0

EMIT r0, ocp [0] [0]. X// write the data in r0 into the destination vertex data cache corresponding to the loader in the current stage

ILD r0, icp [0] [0]. Y// read the 1 st data of the 0 th attribute of the input control vertex 0 into register r0

EMIT r0, ocp [0] [0]. Y// write the data in r0 into the destination vertex data cache corresponding to the loader in the current stage

ILD r0, icp [0] [ n ]. X// reads the 0 th data of the n-th attribute of the input control vertex 0 into register r0

EMIT r0, ocp [0] [ n ]. X// write the data in r0 into the destination vertex data cache corresponding to the loader of the current stage.

As indicated by the above instructions, a read-write instruction is required for each data element in each attribute in the vertex data. When the vertex data contains more attributes, more instructions are needed to complete the transfer of the vertex data, so that the power consumption of the graphics processor is higher and the execution efficiency is lower.

In the embodiment of the invention, the data to be moved is moved from the source vertex cache to the target vertex cache according to the instruction of the data processing instruction, so that one data processing instruction is realized to complete the movement of the data to be moved, the execution efficiency of the graphic processor is improved, and the power consumption of the graphic processor is reduced.

In order to make the above objects, features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

The embodiment of the invention provides a data processing method of a graphics processor, and the method is described in detail by specific steps with reference to fig. 1.

In particular implementations, the data processing methods described below may be performed by a data loading (ILD) module in a graphics processor or by other modules in the graphics processor. In the following embodiments, a data loading module is described as an example.

Step 101, a data processing instruction is received.

In particular implementations, the data loading module receives a data processing instruction and may determine whether the data processing instruction indicates a data move (bypass) operation. If it is determined that the data processing instruction indicates to perform a data move operation, step 102 may be performed.

The data moving operation in the embodiment of the present invention may refer to that the shader in the current stage does not process vertex data and directly transfers the vertex data to the next stage.

Step 102, responding to the data processing instruction to instruct to execute data movement operation, and obtaining data movement information corresponding to the data processing instruction.

In an embodiment of the present invention, the data movement information corresponding to the data processing instruction may include: source vertex cache location, destination vertex cache location, and data information for the data to be moved. The data information of the data to be moved may be used to characterize the data amount of the data to be moved.

In a specific implementation, the data to be moved may be vertex data. Vertex data may be stored in a vertex cache (vertex buffer). The source vertex cache may refer to a vertex cache corresponding to a shader in a previous stage; the destination vertex cache may refer to a vertex cache corresponding to a shader in the current stage.

For example, the source vertex cache is the vertex cache corresponding to the Vertex Shader (VS), and the destination vertex cache is the vertex cache corresponding to the Hull Shader (HS).

In an implementation, if the data to be moved is vertex data of one vertex, the data information of the data to be moved may include: the number of attributes corresponding to the data to be moved and the number of data elements corresponding to each attribute.

Alternatively, if the data to be moved corresponds to a vertex, the data information of the data to be moved may include: the number of vertexes corresponding to the data to be moved, the number of attributes corresponding to each vertex and the number of data elements corresponding to each attribute. In this scenario, the number of vertices corresponding to the data to be moved is 1.

If the data to be moved corresponds to at least two vertices, the data information of the data to be moved may include: the number of vertexes corresponding to the data to be moved, the number of attributes corresponding to each vertex and the number of data elements corresponding to each attribute.

In practice, each vertex may include a number of attributes, including color, location, texture coordinates, normals, and the like, for example. One attribute may include one or more data elements. For example, the attribute is a color attribute including four data elements of R (red), G (green), B (blue), a (transparent), and the like. As another example, the attribute is a location attribute, including an x-coordinate, a y-coordinate, a z-coordinate, and a w-coordinate.

In the embodiment of the invention, the data loading module can acquire the instruction type corresponding to the data processing instruction after receiving the data processing instruction. If the instruction type corresponding to the data processing instruction is a data movement instruction, determining that the data processing instruction indicates to execute the data movement operation.

In implementations, a newly defined data processing instruction may be provided that has a move flag bit present. After receiving the data processing instruction, the data loading module can determine that the data processing instruction indicates to execute the data moving operation if the moving flag bit exists in the data processing instruction.

Specifically, a newly defined data processing instruction (hereinafter referred to as a first instruction) may be:

ILD_BP ocp[a][b].cd， [attr_num]， icp[e][f]；

in a first instruction, ocp [ a ] [ b ] indicates a target vertex cache position, a is a vertex mark of an output three-dimensional graph, and b is a b-th attribute corresponding to a vertex a; icp [ e ] [ f ] indicates a source vertex cache position, e is a vertex identifier of the input three-dimensional graph, and f is an f attribute corresponding to vertex e; attr_num is the number of attributes corresponding to the data to be moved; cd is the data element in the attribute. The flag "BP" in the first instruction may represent a shift flag bit.

The meaning of the first instruction is: starting from the source vertex cache location indicated by icp [ e ] [ f ] (i.e., the location in the source vertex cache of the data of the f-th attribute with vertex identification e), the data corresponding to attr_num attributes is moved to the destination vertex cache location indicated by ocp [ a ] [ b ] (i.e., the location in the destination vertex cache of the b-th attribute with vertex identification a).

Illustratively, in a specific application, the first instruction may be the following example:

ILD_BP ocp[0][0].xy， [5]， icp[9][0]；

the meaning of the instruction is: starting from the source vertex cache location (the location of the 0 th attribute with vertex ID 9 in the source vertex cache), the data corresponding to the 5 attributes is moved to the destination cache location (the location of the 0 th attribute with vertex ID 0 in the destination vertex cache), x is the 1 st data element in the data corresponding to the attributes, and y is the 2 nd data element in the data corresponding to the attributes.

In the above embodiment, the number of attributes may be stored in a predetermined register. In the first instruction, a predetermined register may be further indicated, and the number of attributes to be moved may be determined by reading the stored value from the predetermined register.

In an implementation, a newly defined data processing instruction (hereinafter referred to as a second instruction) may also be:

ILD_BP attr_num， ocp[a][b].cd， icp[e][f]；

the meaning of the second instruction is the same as that of the first instruction. The main difference between the second instruction and the first instruction is that: in the second instruction, the number of attributes is characterized by an immediate therein; whereas in the first instruction the number of attributes is stored in a predetermined register.

In an implementation, a newly defined data processing instruction (hereinafter referred to as a third instruction) may also be:

ILD_BP ocp[a][b].cd，[vertex_num]，[attr_num]， icp[e][f]；

in this instruction, the attribute number attr_num and the vertex number vertex_num may be stored in different predetermined registers, respectively. When the value of vertex_num is 1, the instruction is characterized by the same meaning as the first instruction or the second instruction.

Alternatively, the third instruction may be:

ILD_BP. vertex_num. attr_num ocp[a][b].cd， icp[e][f]；

in the instruction, the attribute number attr_num and the vertex number vertex_num are respectively different immediate numbers in the third instruction. When the value of vertex_num is 1, the instruction is characterized by the same meaning as the first instruction or the second instruction.

Alternatively, the third instruction may be:

ILD_BP. attr_num ocp[a][b].cd，[vertex_num] icp[e][f]；

in this instruction, the attribute number attr_num is an immediate in the third instruction, and the vertex number vertex_num is stored in a predetermined register. When the value of vertex_num is 1, the instruction is characterized by the same meaning as the first instruction or the second instruction.

Alternatively, the third instruction may be:

ILD_BP. vertex_num ocp[a][b].cd，[attr_num]，icp[e][f]；

in this instruction, the attribute number attr_num is stored in a predetermined register, and the vertex number vertex_num is the immediate in the third instruction. When the value of vertex_num is 1, the instruction is characterized by the same meaning as the first instruction or the second instruction.

The meaning of the third instruction is: for a vertex, starting from the source vertex cache location indicated by icp [ e ] [ f ] (i.e., the location in the source vertex cache of the data of the f-th attribute with vertex identification e), the data corresponding to attr_num attributes is moved to the destination vertex cache location specified by ocp [ a ] [ b ] (i.e., the location in the destination vertex cache of the b-th attribute with vertex identification a). And repeatedly executing the step vertex_num for a plurality of times, so as to realize that vertex data of vertex_num vertexes are moved from a source vertex cache position to a target vertex cache position.

For the expression forms of the third instructions, when the third instructions correspond to the vertex data of the plurality of vertices, after the vertex data of one vertex is moved, the moving operation is continuously performed on the vertex data of the next vertex until all the data to be moved are moved.

In the embodiment of the invention, after receiving the data processing instruction, the data loading module can also detect whether the shift flag bit exists in the data processing instruction. If the data processing instruction is detected to comprise the moving flag bit, determining that the data processing instruction indicates to execute the data moving operation.

In implementations, existing data load (ILD) instructions may be modified to add a move flag in the control field of the ILD instruction. By adding a move flag bit, the ILD instruction is characterized to indicate that a data move operation is performed.

For example, in the control field of the existing ILD instruction, a 1-bit shift flag bit is added. When the control domain of the ILD instruction is detected to comprise a move flag bit, the ILD instruction can be determined to be used for executing the data move operation.

In a specific implementation, in the existing ILD instruction, a source vertex cache location, a destination vertex cache location, and data information of data to be moved are further added.

Reference is made to the following instructions:

an improved ILD instruction may be: ILD.BP ocp [ a ] [ b ] [ cd, [ attr_num ], icp [ e ] [ f ];

or may be: ILD.BP., attr_num ocp [ a ] [ b ] [ cd, icp [ e ] [ f ];

or may be: ILD.BP ocp [ a ] [ b ] [ cd, [ vertex_num ], [ attr_num ], icp [ e ] [ f ];

or may be: ILD.BP. vertex_num. Attr_num ocp [ a ] [ b ]. Cd, icp icp [ e ] [ f ];

or may be: ILD.BP.attr_num ocp [ a ] [ b ] [ cd, [ vertex_num ] icp [ e ] [ f ];

or may be: ILD BP, vertex_num ocp [ a ] [ b ] [ cd, [ attr_num ], icp icp [ e ] [ f ].

The meaning of the modified instruction may be correspondingly referred to the first instruction, the second instruction, and the third instruction.

For example, one modified ILD instruction is: ILD BP ocp [0] [0]. Xy, [ attr_num ], icp [9] [0]. In the instruction, "BP" is the moving flag bit. The instruction representation means that: starting from the source vertex cache location (the location of the 0 th attribute with vertex ID 9 in the source vertex cache), the data corresponding to the 5 attributes is moved to the destination cache location (the location of the 0 th attribute with vertex ID 0 in the destination vertex cache), x is the 1 st data element in the data corresponding to the attributes, and y is the 2 nd data element in the data corresponding to the attributes.

As another implementation, whether the ILD instruction indicates to perform a data move operation may be characterized by setting different values of the move flag bit. When the value of the moving flag bit is a first value, the ILD instruction is characterized to indicate to execute data moving operation; when the value of the move flag bit is the second value, the ILD instruction is characterized by not indicating to perform the data move operation.

For example, the length of the shift flag is 1 bit. When detecting that the value of the moving flag bit is 1 in the control domain of the ILD instruction, determining that the ILD instruction indicates to execute data moving operation; when the value of the moving flag bit in the control domain of the ILD instruction is detected to be 0, the ILD instruction is determined to be used for normal data loading operation.

Step 103, the data to be moved is moved from the source vertex cache location to the destination vertex cache location.

In a specific implementation, the data loading module may sequentially read each attribute from the source vertex cache based on the source vertex cache location, and write the read attribute data into the destination cache indicated by the destination vertex cache location.

In particular, one attribute may comprise one or at least two data elements. When each attribute is read in turn, one data element in the attribute can be read each time, and the data element is written into a target vertex cache; or, each time all the data elements corresponding to the attributes are read, writing all the data elements corresponding to the attributes into the destination vertex cache.

In a specific implementation, if the number of vertices corresponding to the data to be moved is 1, after the movement of the attribute data corresponding to the vertices is completed, the above operation flow is ended.

If the number of the vertexes corresponding to the data to be moved is two or more, after the movement of the attribute data corresponding to one vertex is completed, updating the identification of the vertex, and continuing to move the attribute data corresponding to the next vertex. The specific process of moving the attribute data corresponding to the next vertex may correspond to the step 103, and will not be described herein. If the moving operation of the attribute data corresponding to all the vertexes is completed, the above operation flow can be ended.

In a specific implementation, the updating of the vertex identifier may be an operation of self-adding 1 to the vertex identifier, so as to continue the moving operation of the attribute data corresponding to the next vertex.

In summary, in the embodiment of the present invention, according to the instruction of the data processing instruction, the data to be moved is moved from the source vertex cache to the destination vertex cache, so that the data to be moved can be moved by one data processing instruction, the execution efficiency of the graphics processor is improved, and the power consumption of the graphics processor is reduced.

Referring to fig. 2, another data processing method of a graphics processor according to an embodiment of the present invention is shown, and detailed description is given below through specific steps.

Step 201, a data processing instruction is received.

Step 202, determining whether the data processing instruction indicates a data move operation.

In a specific implementation, if the data processing instruction indicates a data move operation, step 203 is executed; if the data processing instruction does not indicate a data move operation, then step 208 is performed.

Step 203, obtaining data movement information corresponding to the data processing instruction.

In a specific implementation, the data information of the data to be moved may include: source vertex cache location, destination vertex cache location, and data information for the data to be moved. The data information of the data to be moved may include: the number of vertexes corresponding to the data to be moved, the number of attributes corresponding to each vertex and the number of data elements corresponding to each attribute.

Step 204, the data elements of the attributes are read from the source vertex cache.

In implementations, data elements of an attribute are read from a source vertex cache based on a source vertex cache location.

In step 205, the read data element is written to the destination vertex cache.

In implementations, the read data elements are written to the destination vertex cache based on the destination vertex cache location.

Through steps 204 to 205, the read and write operations of all data elements of one attribute are completed.

After completing the read and write operations of an attribute (i.e. completing the operations of steps 205-206), the ID of the attribute corresponding to the source vertex cache is self-added by 1, and the ID of the attribute corresponding to the destination vertex cache is self-added by 1.

Step 206, executing attr_num times of steps 204-205.

In an implementation, vertex data attr_num for a vertex are attributes. Therefore, the attr_num attribute read and write operations need to be performed. Therefore, the vertex data of one vertex can be moved by executing attr_num steps 204-205.

Step 207, executing the above steps 204-206 for vertex_num times.

In a specific implementation, if the data to be moved includes vertexes vertex_num, for each vertex, the vertex data may be moved according to the above-mentioned manners of steps 204 to 206. Before step 207 is performed, the vertex identifications corresponding to the source vertex cache location and the destination vertex cache location need to be added by 1 to update the vertex identifications.

Step 208, the operations indicated by the data processing instructions are performed.

In particular implementations, the operations indicated by the data processing instructions may be performed if the data processing instructions do not indicate data movement operations.

Referring to FIG. 3, a graphics processor 30 in an embodiment of the invention is shown, comprising: an instruction receiving unit 301, an information acquiring unit 302, and an executing unit 303, wherein:

an instruction receiving unit 301 for receiving a data processing instruction;

an information obtaining unit 302, configured to determine that the data processing instruction indicates to perform a data movement operation, and obtain data movement information corresponding to the data processing instruction, where the data movement information includes: source vertex cache location, destination vertex cache location, and data information of data to be moved;

and an execution unit 303, configured to move the data to be moved from the source vertex cache location to the destination vertex cache location.

In a specific implementation, the specific execution process of the instruction receiving unit 301, the information obtaining unit 302, and the executing unit 303 may refer to steps 101 to 103, which are not described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs related hardware, the program may be stored on a computer readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, etc.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims

1. A data processing method of a graphic processor, comprising:

receiving a data processing instruction;

responding to the data processing instruction to instruct to execute data moving operation, and acquiring data moving information corresponding to the data processing instruction, wherein the data moving information comprises: source vertex cache location, destination vertex cache location, and data information of data to be moved;

and moving the data to be moved from the source vertex cache position to the target vertex cache position.

2. The method for processing data of a graphic processor according to claim 1,

and if the data processing instruction is detected to comprise a moving flag bit, determining that the data processing instruction indicates to execute data moving operation.

3. The data processing method of a graphic processor as claimed in claim 1, wherein the data information of the data to be moved includes: the number of the attributes corresponding to the data to be moved and the number of the data elements corresponding to each attribute.

4. The data processing method of a graphic processor as claimed in claim 1, wherein the data information of the data to be moved includes: the number of vertexes corresponding to the data to be moved, the number of attributes corresponding to each vertex and the number of data elements corresponding to each attribute.

5. The method for processing data of a graphics processor as claimed in claim 3 or 4, wherein said moving said data to be moved from said source vertex cache location to said destination vertex cache location comprises:

reading the data elements of each attribute from the source vertex cache location in turn;

and writing the read data element into the destination vertex cache position.

6. The data processing method of a graphic processor as claimed in claim 5, further comprising:

after the data elements of all the attributes corresponding to one vertex are moved, carrying out data movement operation on the attribute data corresponding to the next vertex.

7. The data processing method of a graphic processor according to claim 3 or 4, wherein the attribute number is obtained by:

determining a register indicated by the data processing instruction;

and determining the number of the attributes based on the numerical value stored in the register.

8. The data processing method of a graphic processor according to claim 3 or 4, wherein the attribute number is obtained by:

acquiring an immediate in the data processing instruction;

and determining the number of the attributes based on the immediate.

9. A graphics processor, comprising:

an instruction receiving unit configured to receive a data processing instruction;

the information acquisition unit is used for determining that the data processing instruction indicates to execute data moving operation and acquiring data moving information corresponding to the data processing instruction, and the data moving information comprises: source vertex cache location, destination vertex cache location, and data information of data to be moved;

and the execution unit is used for moving the data to be moved from the source vertex cache position to the target vertex cache position.

10. A computer readable storage medium, the computer readable storage medium being a non-volatile storage medium or a non-transitory storage medium, having a computer program stored thereon, wherein the computer program when executed by a processor performs the steps of the data processing method of a graphics processor according to any of claims 1 to 8.