Detailed Description
In the prior art, a typical pipeline of a graphics processor is generally composed of the following processing stages: input Integrator (IA), vertex Shader (VS), hull Shader (Hull Shader, HS), tessellator (tesselator), domain Shader (Domain Shader, DS), geometry Shader, GS), rasterizer, pixel Shader, PS, output Merge (OM).
In the geometry shader and the preceding processing stages, geometry information (also called vertex data) of the graphics is processed, the vertex data includes various attributes of vertices (vertexes), and the attributes include color attributes, position attributes, and the like. After the processing in the shader (loader) is completed, the data corresponding to the vertex is written into a vertex cache (vertex buffer) where the corresponding shader is located. When the shader in the next stage needs to use the vertex data processed by the shader in the previous stage, the vertex data is read from the vertex cache corresponding to the shader in the previous stage and stored into the register, and after the processing is finished, the vertex data is written into the vertex cache in which the shader in the current stage is positioned from the register.
In the prior art, the above-mentioned vertex data moving process can refer to the following instructions:
ILD r0, icp [0] [0]. X// reads the 0 th data of the 0 th attribute of the input control vertex 0 into register r0
EMIT r0, ocp [0] [0]. X// write the data in r0 into the destination vertex data cache corresponding to the loader in the current stage
ILD r0, icp [0] [0]. Y// read the 1 st data of the 0 th attribute of the input control vertex 0 into register r0
EMIT r0, ocp [0] [0]. Y// write the data in r0 into the destination vertex data cache corresponding to the loader in the current stage
ILD r0, icp [0] [ n ]. X// reads the 0 th data of the n-th attribute of the input control vertex 0 into register r0
EMIT r0, ocp [0] [ n ]. X// write the data in r0 into the destination vertex data cache corresponding to the loader of the current stage.
As indicated by the above instructions, a read-write instruction is required for each data element in each attribute in the vertex data. When the vertex data contains more attributes, more instructions are needed to complete the transfer of the vertex data, so that the power consumption of the graphics processor is higher and the execution efficiency is lower.
In the embodiment of the invention, the data to be moved is moved from the source vertex cache to the target vertex cache according to the instruction of the data processing instruction, so that one data processing instruction is realized to complete the movement of the data to be moved, the execution efficiency of the graphic processor is improved, and the power consumption of the graphic processor is reduced.
In order to make the above objects, features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
The embodiment of the invention provides a data processing method of a graphics processor, and the method is described in detail by specific steps with reference to fig. 1.
In particular implementations, the data processing methods described below may be performed by a data loading (ILD) module in a graphics processor or by other modules in the graphics processor. In the following embodiments, a data loading module is described as an example.
Step 101, a data processing instruction is received.
In particular implementations, the data loading module receives a data processing instruction and may determine whether the data processing instruction indicates a data move (bypass) operation. If it is determined that the data processing instruction indicates to perform a data move operation, step 102 may be performed.
The data moving operation in the embodiment of the present invention may refer to that the shader in the current stage does not process vertex data and directly transfers the vertex data to the next stage.
Step 102, responding to the data processing instruction to instruct to execute data movement operation, and obtaining data movement information corresponding to the data processing instruction.
In an embodiment of the present invention, the data movement information corresponding to the data processing instruction may include: source vertex cache location, destination vertex cache location, and data information for the data to be moved. The data information of the data to be moved may be used to characterize the data amount of the data to be moved.
In a specific implementation, the data to be moved may be vertex data. Vertex data may be stored in a vertex cache (vertex buffer). The source vertex cache may refer to a vertex cache corresponding to a shader in a previous stage; the destination vertex cache may refer to a vertex cache corresponding to a shader in the current stage.
For example, the source vertex cache is the vertex cache corresponding to the Vertex Shader (VS), and the destination vertex cache is the vertex cache corresponding to the Hull Shader (HS).
In an implementation, if the data to be moved is vertex data of one vertex, the data information of the data to be moved may include: the number of attributes corresponding to the data to be moved and the number of data elements corresponding to each attribute.
Alternatively, if the data to be moved corresponds to a vertex, the data information of the data to be moved may include: the number of vertexes corresponding to the data to be moved, the number of attributes corresponding to each vertex and the number of data elements corresponding to each attribute. In this scenario, the number of vertices corresponding to the data to be moved is 1.
If the data to be moved corresponds to at least two vertices, the data information of the data to be moved may include: the number of vertexes corresponding to the data to be moved, the number of attributes corresponding to each vertex and the number of data elements corresponding to each attribute.
In practice, each vertex may include a number of attributes, including color, location, texture coordinates, normals, and the like, for example. One attribute may include one or more data elements. For example, the attribute is a color attribute including four data elements of R (red), G (green), B (blue), a (transparent), and the like. As another example, the attribute is a location attribute, including an x-coordinate, a y-coordinate, a z-coordinate, and a w-coordinate.
In the embodiment of the invention, the data loading module can acquire the instruction type corresponding to the data processing instruction after receiving the data processing instruction. If the instruction type corresponding to the data processing instruction is a data movement instruction, determining that the data processing instruction indicates to execute the data movement operation.
In implementations, a newly defined data processing instruction may be provided that has a move flag bit present. After receiving the data processing instruction, the data loading module can determine that the data processing instruction indicates to execute the data moving operation if the moving flag bit exists in the data processing instruction.
Specifically, a newly defined data processing instruction (hereinafter referred to as a first instruction) may be:
ILD_BP ocp[a][b].cd, [attr_num], icp[e][f];
in a first instruction, ocp [ a ] [ b ] indicates a target vertex cache position, a is a vertex mark of an output three-dimensional graph, and b is a b-th attribute corresponding to a vertex a; icp [ e ] [ f ] indicates a source vertex cache position, e is a vertex identifier of the input three-dimensional graph, and f is an f attribute corresponding to vertex e; attr_num is the number of attributes corresponding to the data to be moved; cd is the data element in the attribute. The flag "BP" in the first instruction may represent a shift flag bit.
The meaning of the first instruction is: starting from the source vertex cache location indicated by icp [ e ] [ f ] (i.e., the location in the source vertex cache of the data of the f-th attribute with vertex identification e), the data corresponding to attr_num attributes is moved to the destination vertex cache location indicated by ocp [ a ] [ b ] (i.e., the location in the destination vertex cache of the b-th attribute with vertex identification a).
Illustratively, in a specific application, the first instruction may be the following example:
ILD_BP ocp[0][0].xy, [5], icp[9][0];
the meaning of the instruction is: starting from the source vertex cache location (the location of the 0 th attribute with vertex ID 9 in the source vertex cache), the data corresponding to the 5 attributes is moved to the destination cache location (the location of the 0 th attribute with vertex ID 0 in the destination vertex cache), x is the 1 st data element in the data corresponding to the attributes, and y is the 2 nd data element in the data corresponding to the attributes.
In the above embodiment, the number of attributes may be stored in a predetermined register. In the first instruction, a predetermined register may be further indicated, and the number of attributes to be moved may be determined by reading the stored value from the predetermined register.
In an implementation, a newly defined data processing instruction (hereinafter referred to as a second instruction) may also be:
ILD_BP attr_num, ocp[a][b].cd, icp[e][f];
the meaning of the second instruction is the same as that of the first instruction. The main difference between the second instruction and the first instruction is that: in the second instruction, the number of attributes is characterized by an immediate therein; whereas in the first instruction the number of attributes is stored in a predetermined register.
In an implementation, a newly defined data processing instruction (hereinafter referred to as a third instruction) may also be:
ILD_BP ocp[a][b].cd,[vertex_num],[attr_num], icp[e][f];
in this instruction, the attribute number attr_num and the vertex number vertex_num may be stored in different predetermined registers, respectively. When the value of vertex_num is 1, the instruction is characterized by the same meaning as the first instruction or the second instruction.
Alternatively, the third instruction may be:
ILD_BP. vertex_num. attr_num ocp[a][b].cd, icp[e][f];
in the instruction, the attribute number attr_num and the vertex number vertex_num are respectively different immediate numbers in the third instruction. When the value of vertex_num is 1, the instruction is characterized by the same meaning as the first instruction or the second instruction.
Alternatively, the third instruction may be:
ILD_BP. attr_num ocp[a][b].cd,[vertex_num] icp[e][f];
in this instruction, the attribute number attr_num is an immediate in the third instruction, and the vertex number vertex_num is stored in a predetermined register. When the value of vertex_num is 1, the instruction is characterized by the same meaning as the first instruction or the second instruction.
Alternatively, the third instruction may be:
ILD_BP. vertex_num ocp[a][b].cd,[attr_num],icp[e][f];
in this instruction, the attribute number attr_num is stored in a predetermined register, and the vertex number vertex_num is the immediate in the third instruction. When the value of vertex_num is 1, the instruction is characterized by the same meaning as the first instruction or the second instruction.
The meaning of the third instruction is: for a vertex, starting from the source vertex cache location indicated by icp [ e ] [ f ] (i.e., the location in the source vertex cache of the data of the f-th attribute with vertex identification e), the data corresponding to attr_num attributes is moved to the destination vertex cache location specified by ocp [ a ] [ b ] (i.e., the location in the destination vertex cache of the b-th attribute with vertex identification a). And repeatedly executing the step vertex_num for a plurality of times, so as to realize that vertex data of vertex_num vertexes are moved from a source vertex cache position to a target vertex cache position.
For the expression forms of the third instructions, when the third instructions correspond to the vertex data of the plurality of vertices, after the vertex data of one vertex is moved, the moving operation is continuously performed on the vertex data of the next vertex until all the data to be moved are moved.
In the embodiment of the invention, after receiving the data processing instruction, the data loading module can also detect whether the shift flag bit exists in the data processing instruction. If the data processing instruction is detected to comprise the moving flag bit, determining that the data processing instruction indicates to execute the data moving operation.
In implementations, existing data load (ILD) instructions may be modified to add a move flag in the control field of the ILD instruction. By adding a move flag bit, the ILD instruction is characterized to indicate that a data move operation is performed.
For example, in the control field of the existing ILD instruction, a 1-bit shift flag bit is added. When the control domain of the ILD instruction is detected to comprise a move flag bit, the ILD instruction can be determined to be used for executing the data move operation.
In a specific implementation, in the existing ILD instruction, a source vertex cache location, a destination vertex cache location, and data information of data to be moved are further added.
Reference is made to the following instructions:
an improved ILD instruction may be: ILD.BP ocp [ a ] [ b ] [ cd, [ attr_num ], icp [ e ] [ f ];
or may be: ILD.BP., attr_num ocp [ a ] [ b ] [ cd, icp [ e ] [ f ];
or may be: ILD.BP ocp [ a ] [ b ] [ cd, [ vertex_num ], [ attr_num ], icp [ e ] [ f ];
or may be: ILD.BP. vertex_num. Attr_num ocp [ a ] [ b ]. Cd, icp icp [ e ] [ f ];
or may be: ILD.BP.attr_num ocp [ a ] [ b ] [ cd, [ vertex_num ] icp [ e ] [ f ];
or may be: ILD BP, vertex_num ocp [ a ] [ b ] [ cd, [ attr_num ], icp icp [ e ] [ f ].
The meaning of the modified instruction may be correspondingly referred to the first instruction, the second instruction, and the third instruction.
For example, one modified ILD instruction is: ILD BP ocp [0] [0]. Xy, [ attr_num ], icp [9] [0]. In the instruction, "BP" is the moving flag bit. The instruction representation means that: starting from the source vertex cache location (the location of the 0 th attribute with vertex ID 9 in the source vertex cache), the data corresponding to the 5 attributes is moved to the destination cache location (the location of the 0 th attribute with vertex ID 0 in the destination vertex cache), x is the 1 st data element in the data corresponding to the attributes, and y is the 2 nd data element in the data corresponding to the attributes.
As another implementation, whether the ILD instruction indicates to perform a data move operation may be characterized by setting different values of the move flag bit. When the value of the moving flag bit is a first value, the ILD instruction is characterized to indicate to execute data moving operation; when the value of the move flag bit is the second value, the ILD instruction is characterized by not indicating to perform the data move operation.
For example, the length of the shift flag is 1 bit. When detecting that the value of the moving flag bit is 1 in the control domain of the ILD instruction, determining that the ILD instruction indicates to execute data moving operation; when the value of the moving flag bit in the control domain of the ILD instruction is detected to be 0, the ILD instruction is determined to be used for normal data loading operation.
Step 103, the data to be moved is moved from the source vertex cache location to the destination vertex cache location.
In a specific implementation, the data loading module may sequentially read each attribute from the source vertex cache based on the source vertex cache location, and write the read attribute data into the destination cache indicated by the destination vertex cache location.
In particular, one attribute may comprise one or at least two data elements. When each attribute is read in turn, one data element in the attribute can be read each time, and the data element is written into a target vertex cache; or, each time all the data elements corresponding to the attributes are read, writing all the data elements corresponding to the attributes into the destination vertex cache.
In a specific implementation, if the number of vertices corresponding to the data to be moved is 1, after the movement of the attribute data corresponding to the vertices is completed, the above operation flow is ended.
If the number of the vertexes corresponding to the data to be moved is two or more, after the movement of the attribute data corresponding to one vertex is completed, updating the identification of the vertex, and continuing to move the attribute data corresponding to the next vertex. The specific process of moving the attribute data corresponding to the next vertex may correspond to the step 103, and will not be described herein. If the moving operation of the attribute data corresponding to all the vertexes is completed, the above operation flow can be ended.
In a specific implementation, the updating of the vertex identifier may be an operation of self-adding 1 to the vertex identifier, so as to continue the moving operation of the attribute data corresponding to the next vertex.
In summary, in the embodiment of the present invention, according to the instruction of the data processing instruction, the data to be moved is moved from the source vertex cache to the destination vertex cache, so that the data to be moved can be moved by one data processing instruction, the execution efficiency of the graphics processor is improved, and the power consumption of the graphics processor is reduced.
Referring to fig. 2, another data processing method of a graphics processor according to an embodiment of the present invention is shown, and detailed description is given below through specific steps.
Step 201, a data processing instruction is received.
Step 202, determining whether the data processing instruction indicates a data move operation.
In a specific implementation, if the data processing instruction indicates a data move operation, step 203 is executed; if the data processing instruction does not indicate a data move operation, then step 208 is performed.
Step 203, obtaining data movement information corresponding to the data processing instruction.
In a specific implementation, the data information of the data to be moved may include: source vertex cache location, destination vertex cache location, and data information for the data to be moved. The data information of the data to be moved may include: the number of vertexes corresponding to the data to be moved, the number of attributes corresponding to each vertex and the number of data elements corresponding to each attribute.
Step 204, the data elements of the attributes are read from the source vertex cache.
In implementations, data elements of an attribute are read from a source vertex cache based on a source vertex cache location.
In step 205, the read data element is written to the destination vertex cache.
In implementations, the read data elements are written to the destination vertex cache based on the destination vertex cache location.
Through steps 204 to 205, the read and write operations of all data elements of one attribute are completed.
After completing the read and write operations of an attribute (i.e. completing the operations of steps 205-206), the ID of the attribute corresponding to the source vertex cache is self-added by 1, and the ID of the attribute corresponding to the destination vertex cache is self-added by 1.
Step 206, executing attr_num times of steps 204-205.
In an implementation, vertex data attr_num for a vertex are attributes. Therefore, the attr_num attribute read and write operations need to be performed. Therefore, the vertex data of one vertex can be moved by executing attr_num steps 204-205.
Step 207, executing the above steps 204-206 for vertex_num times.
In a specific implementation, if the data to be moved includes vertexes vertex_num, for each vertex, the vertex data may be moved according to the above-mentioned manners of steps 204 to 206. Before step 207 is performed, the vertex identifications corresponding to the source vertex cache location and the destination vertex cache location need to be added by 1 to update the vertex identifications.
Step 208, the operations indicated by the data processing instructions are performed.
In particular implementations, the operations indicated by the data processing instructions may be performed if the data processing instructions do not indicate data movement operations.
Referring to FIG. 3, a graphics processor 30 in an embodiment of the invention is shown, comprising: an instruction receiving unit 301, an information acquiring unit 302, and an executing unit 303, wherein:
an instruction receiving unit 301 for receiving a data processing instruction;
an information obtaining unit 302, configured to determine that the data processing instruction indicates to perform a data movement operation, and obtain data movement information corresponding to the data processing instruction, where the data movement information includes: source vertex cache location, destination vertex cache location, and data information of data to be moved;
and an execution unit 303, configured to move the data to be moved from the source vertex cache location to the destination vertex cache location.
In a specific implementation, the specific execution process of the instruction receiving unit 301, the information obtaining unit 302, and the executing unit 303 may refer to steps 101 to 103, which are not described herein.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs related hardware, the program may be stored on a computer readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, etc.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.