CN115880134A

CN115880134A - Constant data processing method using vector register, graphic processor and medium

Info

Publication number: CN115880134A
Application number: CN202310072159.3A
Authority: CN
Inventors: 阙恒; 朱康挺; 周义满; 孙鹏
Original assignee: Li Computing Technology Shanghai Co ltd; Nanjing Lisuan Technology Co ltd
Current assignee: Li Computing Technology Shanghai Co ltd; Nanjing Lisuan Technology Co ltd
Priority date: 2023-01-31
Filing date: 2023-01-31
Publication date: 2023-03-31
Anticipated expiration: 2043-01-31

Abstract

A constant data processing method, a graphics processor and a medium using a vector register are provided, wherein the constant data processing method using the vector register includes: receiving a processing instruction; determining a destination general register corresponding to the processing instruction and a destination channel pointed by a data processing operation corresponding to the processing instruction, wherein the destination channel is a part of all channels corresponding to the destination general register; and executing the data processing operation corresponding to the processing instruction on the target channel. By adopting the scheme, the waste of the general register resource can be effectively avoided.

Description

Constant data processing method using vector register, graphic processor and medium

Technical Field

The invention relates to the technical field of graphic processors, in particular to a constant data processing method adopting a vector register, a graphic processor and a medium.

Background

Graphics Processing Unit (GPU) is widely used in the fields of 3D Graphics rendering, general purpose computing, AI acceleration, and the like, and can implement parallel computing. Inside the GPU, there are a large number of thread groups (warps) scheduled to be executed in parallel, each warp being a set of data, e.g. 32 data are bound together, called warp32 (i.e. the warp includes 32 lanes). General Purpose Registers (GPRs) that store all the channel data of warp and are read and written by the scheduler execution core may be referred to as vector registers (vector GPRs).

In GPU applications, certain resources may be constant for a certain render object. That is, the shader resource is a constant (uniform) for all vertices or pixels of the rendered object, such as background light intensity, texture resource class, etc. of the rendered scene. At this time, when a warp executes, the data of the constant resource corresponding to the 32 channels is substantially the same. The constant data is repeatedly stored in the vector register for 32 times, which causes serious waste of GPR resources.

Disclosure of Invention

The embodiment of the invention solves the technical problem of waste of GPR resources.

To solve the foregoing technical problem, an embodiment of the present invention provides a method for processing constant data using a vector register, including: receiving a processing instruction; determining a destination general register corresponding to the processing instruction and a destination channel pointed by the data processing operation corresponding to the processing instruction, wherein the destination channel is a part of all channels corresponding to the destination general register; and executing the data processing operation corresponding to the processing instruction on the target channel.

Optionally, the obtaining a destination channel for data processing includes: acquiring identification information and an indication bit of the destination channel from the processing instruction; and when the indication bit indicates that only the target channel is subjected to data processing operation, determining the target channel according to the identification information of the target channel.

Optionally, the obtaining a destination channel for data processing further includes: and when the indication bit does not indicate that only the target channel is subjected to data processing operation, determining that the target channel is all channels corresponding to the target general register.

Optionally, the processing instruction includes a read operation instruction; the executing the data processing operation corresponding to the processing instruction to the destination channel includes: and receiving target data in the target channel returned by the target general register.

Optionally, the receiving the data stored in the destination channel returned by the destination general register includes: and receiving the target data returned by the destination general register on the interfaces corresponding to all the channels.

Optionally, the processing instruction includes a write operation instruction; the executing the data processing operation corresponding to the processing instruction to the destination channel includes: and writing the data to be written corresponding to the processing instruction into the target channel.

Optionally, the number of the destination channels is 1.

Optionally, the processing instruction includes any one of: constant data load instruction, ALU instruction.

An embodiment of the present invention further provides a graphics processor, including: a receiving unit for receiving a processing instruction; the analysis unit is used for determining a target general register corresponding to the processing instruction and a target channel pointed by the data processing operation corresponding to the processing instruction, wherein the target channel is one part of all channels corresponding to the target general register; and the execution unit is used for executing the data processing operation corresponding to the processing instruction on the target channel.

Optionally, the parsing unit is configured to obtain the identification information and the indication bit of the destination channel from the processing instruction; and when the indication bit indicates that only the target channel is subjected to data processing operation, determining the target channel according to the identification information of the target channel.

Optionally, the parsing unit is further configured to determine that the destination channel is all channels corresponding to the destination general register when the indication bit does not indicate that only the destination channel is subjected to data processing operation.

Optionally, the processing instruction includes a read operation instruction, and the execution unit is configured to receive target data in the destination channel returned by the destination general register.

Optionally, the execution unit is configured to receive the target data returned by the destination general register on the interfaces corresponding to all channels.

Optionally, the processing instruction includes a write operation instruction; and the execution unit is used for writing the data to be written corresponding to the processing instruction into the destination channel.

Optionally, the number of the destination channels is 1.

Optionally, the processing instruction includes any one of: constant data load instruction, arithmetic logic unit instruction.

An embodiment of the present invention further provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, and on which a computer program is stored, where the computer program, when executed by a processor, performs any one of the steps of the above-mentioned constant method using vector registers.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

and according to the received processing instruction, determining a destination general register corresponding to the processing instruction and a destination channel pointed by the processing instruction, and executing data processing operation corresponding to the processing instruction on the destination channel. Because the target channel is a part of all channels corresponding to the target general register, only a part of the channels need to be subjected to data processing operation according to the processing instruction, and all the channels do not need to be subjected to data processing operation, so that the waste of general register resources can be effectively avoided.

Drawings

FIG. 1 is a flow chart of a data processing method in an embodiment of the invention;

FIG. 2 is a diagram illustrating the storage of constant data in a vector register according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an operation of a graphics processor according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a graphics processor in an embodiment of the present invention.

Detailed Description

In GPU applications, certain resources may be constant for a certain render object. That is, the shader resource is a constant (uniform) for all vertices or pixels of the rendered object, such as background light intensity of the rendered scene, texture resource class, etc. At this time, when a warp executes, the data of the constant resource corresponding to the 32 channels is substantially the same.

As another example, a set of constant data (36.0, -50.0,0.00875, \8230;) is defined internally to the shader to participate in instruction operations as required by the application algorithm. Typically, the compiler uses an immediate move instruction to store the constant data into the vector register, and the constant data is copied to 32 lanes.

As shown in fig. 2, a schematic diagram of storing constant data in a vector register according to an embodiment of the present invention is shown. In FIG. 2, 32 lanes of vector register r2 each store a constant "7". Therefore, constant data are repeatedly stored in the vector register for multiple times, which causes serious waste of general register resources.

In the embodiment of the invention, because the target channel is a part of all channels corresponding to the target general register, only a part of the channels need to be subjected to data processing operation according to the processing instruction, and all the channels do not need to be subjected to data processing operation, so that the waste of general register resources can be effectively avoided.

In the embodiment of the present invention, the following general-purpose register is a vector register unless otherwise specified.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

An embodiment of the present invention provides a data processing method, which is described in detail below with reference to fig. 1 through specific steps.

Step 101, receiving a processing instruction.

In one implementation, the processing instruction may be a constant dependent instruction, such as a constant data load instruction (CLD instruction). The processing instruction may also be an Arithmetic and Logic Unit (ALU) instruction, such as FADD, MOVIMM, etc.

Those skilled in the art will appreciate that the type of processing instructions may be the same or different for different application scenarios and application requirements.

Step 102, determining a destination general register corresponding to the processing instruction and a destination channel pointed by the data processing operation corresponding to the processing instruction.

In the embodiment of the present invention, the destination channel may be a part of all channels corresponding to the destination general register. For example, the number of all lanes corresponding to the destination general register is 32, and the destination lane is 2 lanes out of the 32 lanes. For another example, the number of all lanes corresponding to the destination general register is 32, and the destination lane is 1 lane of the 32 lanes.

In the embodiment of the present invention, a bit field may be set in the data processing instruction, where the bit field includes an indication bit and identification information of a destination channel. When the indication bit is enabled, the data processing operation is only indicated to the destination channel. The specific destination channel is determined by the identification information of the destination channel.

For example, in a data processing instruction, the length of the bit field is set to 6 bits, which takes a value of 100010, and the destination general register is set to r4. The 1 st bit corresponds to the indicator bit, and the values of the 2 nd to 6 th bits represent the serial number of the target channel. If the value of the indicator bit is 1 and the number corresponding to 00010 is 2, the destination channel pointed by the data processing instruction is the 3 rd channel in r4 (since the number corresponding to the 1 st channel in r4 is 00000).

In the embodiment of the present invention, two bit fields may also be set in the data processing instruction, where one bit field includes an indication bit, and the other bit field includes identification information of the destination channel. When the indication bit is enabled, the data processing operation is only performed on the destination channel. The specific destination channel is determined by the identification information of the destination channel.

For example, in a data processing instruction, the length of a first bit field is set to be 1bit, and the value is set to be 1; the length of the second bit field is set to be 5 bits, the value is 00010, and the target general register is r4. The first bit field corresponds to the indicator bit and the value of the second bit field represents the number of the destination channel. If the value of the indicator bit is 1 and the number corresponding to 00010 is 2, the destination channel pointed by the data processing instruction is the 3 rd channel in r4 (since the number corresponding to the 1 st channel in r4 is 00000).

In the embodiment of the present invention, if there is no bit field containing the indication bit in the data processing instruction, or the indication bit does not indicate that only the destination channel is to be subjected to the data processing operation, all channels corresponding to the destination general register may be used as the destination channel.

For example, in a data processing instruction, two bit fields are provided, wherein the first bit field includes an indication bit, the second bit field includes identification information of a destination channel, and the destination general register is r4. If the value of the first bit field is 0, the representation takes 32 channels corresponding to r4 as destination channels.

In the embodiment of the invention, when the value of the indicating bit is a first value, the destination channel is only a part of channels in the destination general register; when the value of the indicating bit is a second value, the target channel is all channels in the target general register; the first value is not equal to the second value.

If the length of the indicator bit is 1bit, the first value may be 1, and the second value is 0; alternatively, the first value may be 0 and the second value may be 1.

And 103, executing data processing operation corresponding to the processing instruction on the destination channel.

In specific implementation, if the data processing operation corresponding to the processing instruction is a write operation, performing a data write operation on the determined destination channel; correspondingly, if the data processing operation corresponding to the processing instruction is a read operation, the data read operation is performed on the determined destination channel.

Referring to fig. 3, a schematic diagram of an operation process of a graphics processor in an embodiment of the present invention is shown. This is explained below with reference to fig. 3.

In the embodiment of the invention, if the processing instruction is a constant data loading instruction, the scheduling execution core analyzes the constant data loading instruction when executing the constant data loading instruction, and sends corresponding information obtained by analysis to the constant data loader. The constant data loader checks the value of the indicator bit in the corresponding information, determines a target channel according to the identification information of the target channel when detecting that the value of the indicator bit is a first value, and only writes the constant data into the target channel; and when the value of the indicating bit is detected to be a second value, all channels corresponding to the target general register are used as target channels, and the constant data are written into all the channels, wherein the data written into all the channels are the same.

For example, the destination general register r2 corresponds to 32 lanes. And the constant data loader detects that the value of the indicator bit is 1, determines that the target channel is the 2 nd channel corresponding to r2, and writes the constant data 123 into the 2 nd channel corresponding to r2 without writing into other 31 channels.

In the embodiment of the present invention, if the processing instruction is an ALU instruction and a current write operation of a specific channel is performed on the destination general register, the scheduling execution core may parse the ALU instruction and send a value of an indicator bit carried in the ALU instruction and identification information of the destination channel to the destination general register, thereby writing data only into the destination channel.

For example, the destination general register r2 corresponds to 32 lanes. And the destination general register detects that the value of the indicating bit is a first value 1, determines that the destination channel is a 2 nd channel corresponding to r2, and writes the data corresponding to the ALU instruction into the 2 nd channel corresponding to r2 only.

In the embodiment of the present invention, if the ALU instruction corresponds to a read operation, the scheduling execution core may parse the ALU instruction, and send a value of an indicator bit carried in the ALU instruction and identification information of a destination channel to the source general register, where the source general register only reads data in the destination channel.

In a specific implementation, if the number of the destination channels is 1, after the source general register reads data from the destination channels, the read data is spread on interfaces of all the channels of the source general register, and the data is returned to the scheduling execution core to continue execution of warp.

For example, the source general register r2 corresponds to 32 lanes. And the source general register detects that the value of the indicator bit is a first value 1, determines that the destination channel is a 2 nd channel corresponding to r2, reads data of the 2 nd channel, and lays the data on an interface corresponding to the 32 channels and returns the data to the scheduling execution core.

The data processing method provided in the above-described embodiment of the present invention is explained below by specific examples.

See the following instructions:

CLD r4.ln7，[0x20]，CBV[2]

FADD r12，r4.ln7，r10

in the above instructions, the first instruction CLD instruction is a constant data load instruction, and its corresponding operations are: in the constant buffer description No. two (CBV [2 ]), 32bits of data of 0x20 ([ 0x20 ]) are loaded into the destination general register r4. The destination general register r4 has 32 storage locations, and since the CLD instruction specifies ". Ln7", the constant data will only be written into the 7 th lane of r4.

For the floating-point addition instruction FADD instruction, its second source operand directly references r4.Ln7, i.e.: the 32 channels executing the FADD instruction thread (warp) all use the 7 th channel in r4 when reading r 4; when r10 is operated, 32 channels of r10 need to be read.

See the following instructions:

MOVIMM r4，0x1234

MOVIMM r4.ln2，0x1234

of the above instructions, the first instruction is to assign a constant 0x1234 to all of the lanes of r4. The second instruction is responsible for assigning a constant 0x1234 to the 2 nd lane of r4.

It can be seen from the above summary that the technical solution in the embodiment of the present invention can perform writing and reading operations of constant data on a certain channel in the general purpose register in a targeted manner, effectively avoid waste of general purpose register resources, and improve the performance of the GPU.

Referring to fig. 4, an embodiment of the present invention further provides a graphics processor, including: a receiving unit 401, an addressing unit 402, and an executing unit 403, wherein:

a receiving unit 401, configured to receive a processing instruction;

an analyzing unit 402, configured to determine a destination general-purpose register corresponding to the processing instruction and a destination channel to which a data processing operation corresponding to the processing instruction points, where the destination channel is a part of all channels corresponding to the destination general-purpose register;

an execution unit 403, configured to execute a data processing operation corresponding to the processing instruction on the destination channel.

In a specific implementation, the specific execution processes of the receiving unit 401, the addressing unit 402, and the executing unit 403 may refer to the steps 101 to 103, which is not described herein again.

An embodiment of the present invention further provides a computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, and on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the constant method using the vector register provided in any of the above embodiments.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by instructing the relevant hardware by a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for processing constant data using vector registers, comprising:

receiving a processing instruction;

determining a destination general register corresponding to the processing instruction and a destination channel pointed by the data processing operation corresponding to the processing instruction, wherein the destination channel is a part of all channels corresponding to the destination general register;

and executing the data processing operation corresponding to the processing instruction on the target channel.

2. The method as claimed in claim 1, wherein said determining a destination channel to which a data processing operation corresponding to said processing instruction is directed comprises:

acquiring identification information and an indication bit of the destination channel from the processing instruction;

and when the indication bit indicates that only the target channel is subjected to data processing operation, determining the target channel according to the identification information of the target channel.

3. The constant data processing method using vector registers according to claim 2, wherein said obtaining a destination channel for data processing further comprises: and when the indication bit does not indicate that only the target channel is subjected to data processing operation, determining that the target channel is all channels corresponding to the target general register.

4. The constant data processing method using vector registers according to claim 1, wherein the processing instruction comprises a read operation instruction; the executing the data processing operation corresponding to the processing instruction to the destination channel includes:

and receiving target data in the target channel returned by the target general register.

5. The method for processing constant data using vector registers as claimed in claim 4, wherein said receiving data stored in said destination path returned by said destination general purpose register comprises:

and receiving the target data returned by the destination general register on the interfaces corresponding to all the channels.

6. The method of constant data processing employing vector registers as claimed in claim 1, wherein said processing instruction comprises a write operation instruction; the executing the data processing operation corresponding to the processing instruction to the destination channel includes:

and writing the data to be written corresponding to the processing instruction into the target channel.

7. The constant data processing method using a vector register as claimed in claim 1, wherein the number of destination lanes is 1.

8. The constant data processing method using vector registers according to claim 1, wherein the processing instruction includes any one of: constant data load instruction, arithmetic logic unit instruction.

9. A graphics processor, comprising:

a receiving unit for receiving a processing instruction;

the analysis unit is used for determining a target general register corresponding to the processing instruction and a target channel pointed by the data processing operation corresponding to the processing instruction, wherein the target channel is one part of all channels corresponding to the target general register;

and the execution unit is used for executing the data processing operation corresponding to the processing instruction on the target channel.

10. A computer-readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, is configured to perform the steps of the constant method using vector registers according to any one of claims 1 to 8.