CN116578343B

CN116578343B - Instruction compiling method and device, graphic processing device, storage medium and terminal equipment

Info

Publication number: CN116578343B
Application number: CN202310840716.1A
Authority: CN
Inventors: 陈林锋
Original assignee: Li Computing Technology Shanghai Co ltd; Nanjing Lisuan Technology Co ltd
Current assignee: Li Computing Technology Shanghai Co ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-11-21
Anticipated expiration: 2043-07-10
Also published as: CN116578343A

Abstract

The application provides an instruction compiling method and device, a graphic processing device, a storage medium and a terminal device, wherein the instruction compiling method comprises the following steps: generating a first instruction, wherein the first instruction comprises a plurality of copying instructions and a plurality of first operation instructions, the copying instructions indicate a first source virtual register and a first destination virtual register, the first destination virtual register indicated in the copying instructions is a plurality of virtual sub-registers in the same virtual vector register respectively, and addresses of the plurality of virtual sub-registers are continuous; and generating a second instruction according to the first instruction, wherein the second instruction comprises a plurality of first operation instructions, destination registers indicated in the plurality of first operation instructions are a plurality of physical scalar registers in the same physical vector register, and the plurality of physical scalar registers correspond to the plurality of virtual sub-registers. The application can optimize the execution instruction in the graphic processing device, reduce the instruction quantity, save the physical register resource and improve the operation efficiency.

Description

Instruction compiling method and device, graphic processing device, storage medium and terminal equipment

Technical Field

The present application relates to the field of graphics processing technologies, and in particular, to a method and apparatus for compiling instructions, a graphics processing apparatus, a storage medium, and a terminal device.

Background

As three-dimensional games build scene pictures more realistic and gorgeous, the design of the shader becomes more complex. A large number of texture samples or other arithmetic operations are typically used in shaders to enhance picture effects.

In the prior art, when loading and storing graphic data, coordinates are generally used, that is, coordinates of one-dimensional x, two-dimensional xy or three-dimensional xyz are adopted according to whether an image is one-dimensional, two-dimensional or three-dimensional. These coordinates are placed in a vector register made up of a plurality of consecutive scalar registers, each of which is a component (x-component, y-component, z-component), or sub-register, of the vector register. Two-to-three-dimensional dot-product operations also require vector registers to store operands. For example, when generating a texture sample code, the shader typically generates different arithmetic logic unit (arithmetic and logic unit, ALU) instructions to generate component coordinates of the texture coordinates, which are then copied into vector registers by copy instructions (MOVs). These instructions using vector registers are typically provided to texture sample instructions and image load instructions because of encoding space constraints using the first scalar register as the register header address plus the number of registers. The instruction sequence is as follows:

FMUL R0, R1, R2；

FADD R8, R7, R6；

FADD R10, R11, R12；

MOV R16, R0；

MOV R17, R8；

MOV R18, R10；

SMP R10, R16.xyz,t[0], s[0]；

wherein FMUL/FADD/MOV/SMP respectively denote multiplication, addition, copying, and texture adoption, R0, R1,., R18, and the like denote registers. t 0 represents a texture unit description register used in texture sampling, and s 0 represents a sample description register.

However, in the prior art, since the component registers of the texture coordinates are defined by different instructions, the registers are allocated to different and discrete scalar registers, such as R0, R8 and R10, and additional copy instructions are required to form the vector registers. The conventional compiler optimization method copy propagation cannot eliminate the redundant copy instructions because of the continuous limitation of registers, and when the texture samples in the shader are relatively more, a large number of copy instructions are introduced, so that the running efficiency of the shader is seriously affected. These operations often introduce additional register copy instructions, allocate additional registers, and cause code execution inefficiencies, occupying more register resources.

Disclosure of Invention

The application can optimize the execution instruction in the graphic processing device, reduce the number of instructions and improve the operation efficiency.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, the present application provides an instruction compiling method, the instruction compiling method including: generating a first instruction, wherein the first instruction comprises a plurality of copy instructions and a plurality of first operation instructions, the copy instructions indicate a first source virtual register and a first destination virtual register, the first destination virtual registers indicated in the copy instructions are respectively a plurality of virtual sub-registers in the same virtual vector register, and addresses of the plurality of virtual sub-registers are continuous; generating a second instruction according to the first instruction, wherein the second instruction comprises the plurality of first operation instructions, destination registers indicated in the plurality of first operation instructions are a plurality of physical scalar registers in the same physical vector register, and the plurality of physical scalar registers correspond to the plurality of virtual sub-registers.

Optionally, the generating the second instruction according to the first instruction includes: updating the first source virtual register into the first destination virtual register in a predefined instruction corresponding to the first source virtual register, and deleting the copy instruction to obtain a third instruction;

and allocating a physical register for the virtual register indicated in the third instruction to obtain the second instruction.

Optionally, the allocating a physical register for the virtual register indicated in the third instruction includes: and allocating a corresponding physical vector register for the virtual vector register indicated in the third instruction, and allocating a corresponding physical scalar register for the virtual sub-register indicated in the third instruction.

Optionally, the updating the first source virtual register to the first destination virtual register includes: traversing the second instruction to determine an instruction pair comprising the copy instruction and the predefined instruction, the predefined instruction of the instruction pair corresponding to a first source virtual register indicated in the copy instruction of the instruction pair; updating a first source virtual register indicated in a predefined instruction in the instruction pair to a first destination virtual register in a copy instruction in the instruction pair.

Optionally, the predefined instruction is a second operation instruction, where the second operation instruction is used to instruct to perform an operation, and store an operation result in the first source virtual register.

Optionally, the second operation instruction includes a second source virtual register and the first source virtual register, and the second operation instruction is used for indicating to perform an operation on data in the second source virtual register.

Optionally, the first operation instruction is configured to instruct to perform an operation, and store an operation result in the destination register.

Optionally, the second instruction includes a third operation instruction, where the third operation instruction is configured to perform an operation on data in the physical vector register.

In a second aspect, the present application also discloses a graphics processing apparatus, the graphics processing apparatus including: a scheduling executor for receiving the second instruction; and the operation unit is used for executing the second instruction.

In a third aspect, the present application also discloses an instruction compiling apparatus, including: the first instruction generation module is used for generating a first instruction, the first instruction comprises a plurality of copy instructions and a plurality of first operation instructions, the copy instructions indicate a first source virtual register and a first destination virtual register, the first destination virtual registers indicated in the copy instructions are respectively a plurality of virtual sub-registers in the same virtual vector register, and the addresses of the plurality of virtual sub-registers are continuous; and the second instruction generating module is used for generating a second instruction according to the first instruction, the second instruction comprises the plurality of first operation instructions, destination registers indicated in the plurality of first operation instructions are a plurality of physical scalar registers in the same physical vector register, and the plurality of physical scalar registers correspond to the plurality of virtual sub-registers.

In a fourth aspect, the present application provides a terminal device, which is characterized by comprising the graphics processing apparatus according to the first aspect.

In a fifth aspect, the present application provides a computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the instruction compilation method.

Compared with the prior art, the technical scheme of the application has the following beneficial effects:

in order to optimize an instruction, in the technical scheme of the application, a first instruction is generated, the first instruction comprises a plurality of copy instructions and a plurality of first operation instructions, the copy instructions indicate a first source virtual register and a first destination virtual register, the first destination virtual registers indicated in the copy instructions are respectively a plurality of virtual sub-registers in the same virtual vector register, and addresses of the plurality of virtual sub-registers are continuous; and generating a second instruction according to the first instruction, wherein the first second instruction comprises a plurality of first operation instructions, destination registers indicated in the plurality of first operation instructions are a plurality of physical scalar registers in the same physical vector register, and the plurality of physical scalar registers correspond to the plurality of virtual sub-registers. Because the destination registers in the first operation instructions are a plurality of continuous physical scalar registers, a vector register can be formed, and therefore, data is copied between different registers without additional copying instructions, the number of instructions in the second instruction is reduced, and the optimization of the instructions is realized. In addition, by reducing the number of instructions, the number of instructions to be executed by the subsequent graphics processing device is reduced, and the computing efficiency of the graphics processing device is improved.

Further, in a predefined instruction corresponding to a first source virtual register, the application updates the first source virtual register into a first destination virtual register, and deletes a copy instruction to obtain a third instruction; and allocating a physical register for the virtual register indicated in the third instruction to obtain the first instruction. In the application, the first target virtual registers indicated in the plurality of copy instructions are respectively a plurality of virtual sub-registers in the same virtual vector register, and the addresses of the plurality of virtual sub-registers in the virtual vector register are continuous, so that the vector registers can be formed without copying the instructions in the third instruction and the first instruction by updating the first source virtual register into the first target virtual register, and the number of instructions in the first instruction is reduced on the basis of meeting the continuous requirement of the subsequent task on the data storage address.

Further, by introducing the virtual vector register and the virtual sub-register, the instruction optimization is realized before the physical register is allocated, the physical register is prevented from being occupied, the waste of the physical register is avoided, and the physical register resource is saved.

Drawings

FIG. 1 is a flow chart of an instruction compiling method according to an embodiment of the present application;

FIG. 2 is a flow chart of another method for compiling instructions according to an embodiment of the present application;

FIG. 3 is a block diagram of a graphics processing apparatus according to an embodiment of the present application;

FIG. 4 is a block diagram of an instruction compiling apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of another instruction compiling apparatus according to an embodiment of the present application.

Detailed Description

As described in the background, in the prior art, since the component registers of the texture coordinates are defined by different instructions, when allocating registers, different and discrete scalar registers, such as R0, R8 and R10, are allocated, and additional copy instructions are necessary to form the vector registers. The conventional compiler optimization method copy propagation cannot eliminate the redundant copy instructions because of the continuous limitation of registers, and when the texture samples in the shader are relatively more, a large number of copy instructions are introduced, so that the running efficiency of the shader is seriously affected.

In the technical scheme of the application, the destination registers in the plurality of first operation instructions are a plurality of continuous physical scalar registers, so that a vector register can be formed, and therefore, data is copied between different registers without additional copying instructions, thereby reducing the number of instructions in a second instruction and realizing the optimization of the instructions. In addition, by reducing the number of instructions, the number of instructions to be executed by the subsequent graphics processing device is reduced, and the computing efficiency of the graphics processing device is improved.

Further, in the present application, the first destination virtual register indicated in the plurality of copy instructions is set to be a plurality of virtual sub-registers in the same virtual vector register, and addresses of the plurality of virtual sub-registers in the virtual vector register are continuous, so that the vector registers can be formed without copying instructions in the third instruction and the second instruction by updating the first source virtual register to the first destination virtual register, and the number of instructions in the second instruction is reduced on the basis of meeting the continuous requirement of the subsequent task on the data storage address.

Before describing embodiments of the present application, some terms related to the embodiments of the present application will be explained.

The physical register in the embodiment of the application refers to a register corresponding to an actual hardware entity.

The virtual register refers to an abstract register which is irrelevant to a hardware platform and is used for storing basic data types without quantity limitation, and has a virtual storage address. In particular, the virtual register may be a number of operands in the source file during compilation. The concept of "variable" will be mapped into a "virtual register" in the process of converting from a higher level intermediate representation (IR: intermediate representation) to a lower level IR. The number of virtual registers is unlimited, and when registers are allocated, virtual registers in an instruction need to be mapped to actual physical registers, and virtual registers can be mapped to a limited number of physical registers.

The scalar register according to the embodiment of the application refers to a register for completing scalar calculation.

The vector register according to the embodiment of the present application refers to a register for special purpose, which is wider than a scalar register, and which is composed of a plurality of scalar registers having consecutive memory addresses.

In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

In embodiment 1, referring to fig. 1, the instruction compiling method may be executed by a compiler, that is, the compiler executes each step of the instruction compiling method to generate and send the second instruction. It will of course be appreciated that the instruction compilation method may be performed by any other suitable entity.

Specifically, the instruction compiling method specifically may include the following steps:

step 101: and generating a first instruction, wherein the first instruction comprises a plurality of copy instructions and a plurality of first operation instructions, the copy instructions indicate a first source virtual register and a first destination virtual register, the first destination virtual register indicated in the copy instructions is a plurality of virtual sub-registers in the same virtual vector register respectively, and addresses of the plurality of virtual sub-registers are continuous.

Step 102: and generating a second instruction, wherein the second instruction comprises a plurality of first operation instructions, the destination registers indicated in the plurality of first operation instructions are a plurality of physical scalar registers in the same physical vector register, and the plurality of physical scalar registers correspond to the plurality of virtual sub-registers.

It will be appreciated that in particular implementations, each of the steps of the method described above may be implemented in a software program running on a processor integrated within a chip or chip module. The method may also be implemented by combining software with hardware, and the application is not limited.

The embodiment of the application can be used for scenes requiring logic operation on data and requiring continuous data storage addresses, such as texture sampling scenes, image loading scenes, image storage scenes and three-dimensional dot product scenes.

In an embodiment of the present application, the first operation instruction may be an arithmetic logic unit (arithmetic and logic unit, ALU) instruction, such as a multiply operation instruction, an add operation instruction, or the like. Specifically, in the texture sampling scene, the first operation instruction further includes a sampling instruction.

Taking a texture sampling scenario as an example, the second instruction is as follows:

FMUL R16, R1, R2；

FADD R17, R7, R6；

FADD R18, R11, R12；

SMP R10, R16.xyz,t[0], s[0]。

wherein the first operation instruction FMUL represents multiplication operation of data in the physical scalar registers R1, R2, and stores the result to the physical scalar register R16; accordingly, the first operation instruction FADD indicates that the data in the physical scalar registers R6, R7 are subjected to an addition operation, and the result is stored in the physical scalar register R17; the first arithmetic instruction FADD indicates that the data in the physical scalar registers R11, R12 are added, and the result is stored in the physical scalar register R18; SMP represents a third operation instruction, namely a sampling instruction, using 3 consecutive scalar registers R16-R18 to form vector register V3R16, representing the sampling of data in registers R16-R18, t 0, s 0, and storing the result in physical scalar register R10.t 0 represents a texture unit description register used in texture sampling, and s 0 represents a sample description register.

Taking the image loading scenario as an example, the second instruction is as follows:

FMUL R16, R1, R2；

FADD R17, R7, R6；

FADD R18, R11, R12；

LD R10, R16.xyz,u[0]。

the LD represents a third operation instruction, namely an image loading instruction, and uses 3 consecutive scalar registers R16-R18 to form a vector register V3R16, representing loading the data in the registers R16-R18 and u [0], and storing the result into a physical scalar register R10.u 0 represents an image description register used at the time of image loading.

Taking the image storage scenario as an example, the second instruction is as follows:

FMUL R16, R1, R2；

FADD R17, R7, R6；

FADD R18, R11, R12；

ST R10, R16.xyz,u[0]。

wherein ST represents a third operation instruction, i.e., an image store instruction, using 3 consecutive scalar registers R16-R18 to form vector register V3R16, representing the assembly of the data in registers R16-R18, u [0], and storing the result to physical scalar register R10.

Taking a three-dimensional dot product scene as an example, the second instruction is as follows:

FMUL R16, R1, R2；

FADD R17, R7, R6；

FADD R18, R11, R12；

DOT.rpt2 R10, R16.xyz。

wherein dot.rpt2 represents a third operation instruction, i.e. a three-dimensional dot product instruction, using 3 consecutive scalar registers R16 to R18 to form vector register V3R16, representing three-dimensional dot product operations on data in registers R16 to R18, and storing the result to physical scalar register R10, e.g. R10' =r16 ' ×r16' +r17' ×r17' +r18' ×r18', where R10' represents data stored in memory R10, and R16', R17', and R18' represent data stored in memories R16, R17, and R18, respectively.

In this embodiment, the destination registers of the first operation instruction FMUL, FADD, FADD are the physical scalar registers R16, R17 and R18, respectively, and the addresses of the physical scalar registers R16, R17 and R18 are consecutive, so as to form one physical vector register. The data is copied between different registers without additional copying instructions in the second instruction, so that the number of instructions in the second instruction is reduced, and the optimization of the instructions is realized.

Embodiment 2 referring to fig. 2, fig. 2 shows a flow of another instruction compiling method. Specifically, the instruction compiling method may include the steps of:

step 201: a first instruction is generated.

Step 202: updating the first source virtual register into a first destination virtual register in a predefined instruction corresponding to the first source virtual register, and deleting a copy instruction to obtain a third instruction;

step 203: and allocating a physical register for the virtual register indicated in the third instruction to obtain a second instruction.

In order to make the destination registers indicated in the plurality of first operation instructions in the second instruction be a plurality of physical scalar registers in the same physical vector register, the compiler may be caused to perform steps 201 to 203 described above.

The physical vector registers, physical scalar registers, virtual vector registers, and virtual sub-registers (which may also be referred to as virtual scalar registers) may be defined prior to generating the second instruction. Specifically, the physical registers are defined as physical vector registers and physical scalar registers, respectively, which are sub-registers of the physical vector registers. The physical vector register consists of consecutive physical scalar physical registers, e.g., physical vector register V4R4, consisting of 4 physical scalar registers, R4, R5, R6, R7. Specifically, virtual registers are defined as virtual vector registers and virtual sub-registers, respectively. The virtual vector register is composed of virtual sub-registers, which are sub-registers of the virtual vector register. Each sub-register of the virtual vector register may be represented by an identification of the virtual vector register and its sub-registers, e.g.,% 0:f32v4crf: sub0,% 0 is representing the virtual vector register, F32 is representing the floating point 32, v4 is representing that the virtual vector register comprises 4 virtual sub-registers, crf is representing the general purpose registers, sub0 is representing the sub-registers of the virtual vector register.

The embodiment of the application can generate the first instruction according to the existing algorithm, but the difference between the first instruction and the instruction generated in the prior art is that the register identifications carried in the first instruction are all identifications of virtual registers. Specifically, the source registers and destination registers indicated in the plurality of copy instructions in the first instruction are virtual registers. Accordingly, the register indicated by the predefined instruction in the first instruction is also a virtual register. The predefined instruction corresponding to the first source virtual register refers to a first operation instruction including the first source register, and the first source register is a destination register in the first operation instruction.

Taking a texture sampling scenario as an example, the first instruction is as follows:

%8:f32crf = FMUL %0:f32crf, %1:f32crf；

%9:f32crf = FADD %2:f32crf, %3:f32crf；

%10:f32crf = FADD %4:f32crf, %5:f32crf；

%11.sub0:f32v3crf = COPY %8:f32crf；

%11.sub1:f32v3crf = COPY %9.f32crf；

%11.sub2:f32v3crf = COPY %10.f32crf；

%12:f32crf = Smp_F32_V3 %11:f32v3crf, 1, 0。

wherein,% 0,% 1, …%5,% 8,% 9,% 10,% 12 are virtual sub-registers, and,% 11 is a virtual vector register comprising three virtual sub-registers: %11.sub0,% 11.sub1, and% 11.sub2. %8,% 9,% 10 are the first source virtual registers,% 11.sub0,% 11.sub1,% 11.sub2 are the first destination virtual registers. FMUL is a predefined instruction for the first source virtual register% 8, FADD is a predefined instruction for the first source virtual registers% 9,% 10. Smp represents a sample instruction.

Updating a first source virtual register in the predefined instructions FMUL and FADD into a first destination virtual register, and deleting the copy instruction. That is, the first source virtual register% 8 is updated to the first destination virtual register% 11.sub0; updating the first source virtual register% 9 to a first destination virtual register% 11.sub1; the first source virtual register% 10 is updated to the first destination virtual register% 11.

The third instruction obtained is as follows:

%11.sub0:f32crf = FMUL %0:f32crf, %1:f32crf；

%11.sub1:f32crf = FADD %2:f32crf, %3:f32crf；

%11.sub2:f32crf = FADD %4:f32crf, %5:f32crf；

%12:f32crf = Smp_F32_V3 %11:f32v3crf, 0, 0。

and running a register allocation algorithm on the third instruction, and allocating a physical register for the virtual register to obtain a second instruction, wherein the second instruction is as follows:

FMUL R16, R1, R2；

FADD R17, R7, R6；

FADD R18, R11, R12；

SMP R10, R16.xyz,t[0], s[0]。

wherein virtual register% 0 maps to physical register R1, virtual register% 1 maps to physical register R2, virtual register% 2 maps to physical register R7, virtual register% 3 maps to physical register R3, virtual register% 4 maps to R11, virtual register% 5 maps to physical register R12, virtual registers% 11.sub0,% 11.sub1, and% 11.sub2 maps to physical registers R16, R17, and R18, respectively.

It should be noted that the above embodiment is described taking the virtual register as a floating point 32-bit example, and in practical application, the virtual register may be any register with any number of bits, which is not limited in this aspect of the present application.

In particular, the register allocation algorithm can map virtual registers to actual physical registers, and may specifically be a graph coloring algorithm, a linear scanning algorithm based on a register life cycle, a variant algorithm thereof, and the like, which is not limited in this application.

Further, when allocating physical registers for the virtual registers, allocating corresponding physical vector registers for the virtual vector registers indicated in the third instruction, and allocating corresponding physical scalar registers for the virtual sub-registers indicated in the third instruction.

For example, physical scalar registers are allocated for virtual sub-registers% 0,% 1, …%5, respectively, and physical vector register V3R16, i.e., physical scalar registers R16-R18, are allocated for virtual vector register% 11.

Compared with the instructions in the prior art, the second instruction in the embodiment of the application has no additional copy instruction MOV, and the number of instructions is greatly reduced.

In one particular embodiment, upon updating from a first instruction to a third instruction, the first instruction may be traversed to determine an instruction pair, the instruction pair including a copy instruction and a predefined instruction, the predefined instruction of the instruction pair corresponding to a first source virtual register indicated in the copy instruction of the instruction pair. The first source virtual register indicated in the predefined instruction in the instruction pair is updated to the first destination virtual register in the copy instruction in the instruction pair.

Taking the first instruction as an example, there are three instruction pairs, which are predefined instruction% 8:f32crf=fmul% 0:f32crf,% 1:f32crf and COPY instruction% 11.sub 0:f32v3crf=copy% 8:f32crf; predefined instruction% 9:f32crf=fadd% 2:f32crf,% 3:f32crf and COPY instruction% 11.sub1:f32v3crf=copy% 9.f32crf; predefined instruction% 10:f32crf=fadd% 4:f32crf,% 5:f32crf, and COPY instruction% 11.sub 2:f32v3crf=copy% 10.f32crf.

Updating a first source virtual register%8 in a first instruction pair to a first destination virtual register%11.sub0; updating the first source virtual register%9 in the second instruction pair to the first destination virtual register%11.sub1; the first source virtual register% 10 in the third instruction pair is updated to the first destination virtual register% 11.

It should be noted that the first operation instruction, the second operation instruction, and the third operation instruction may be any other executable instructions, which is not limited in this aspect of the present application.

Embodiment 3 referring to fig. 3, the present application also discloses a graphics processing apparatus 30. The graphic processing apparatus 30 includes a schedule executor 301 and an operation unit 302. Wherein, the scheduling executor 301 is configured to receive a second instruction; an operation unit 302, configured to execute the second instruction.

In this embodiment, the compiler 40 may execute the steps of the foregoing instruction compiling method to generate the second instruction. The compiler 40 sends the second instruction to the schedule executor 301 in the graphics processing apparatus 30. The scheduling executor 301 receives and parses the second instruction, and forwards the parsed second instruction to the operation unit 302. The operation unit 302 executes the second instruction to complete the corresponding operation.

Specifically, the operation unit 302 may be a shader. Such as a Hull shader (Hull shader), domain shader (Domain shader), etc.

Taking the texture sampling scenario as an example, the operation unit 302 is a shader, which executes a first operation instruction FMUL, FADD, FADD in a second instruction and a third operation instruction SMP. After the arithmetic unit 302 executes the second instruction, the resulting texture coordinates are stored in the register R10.

Compared with the prior art that the operation unit 302 needs to execute 7 instructions, the operation unit in the embodiment of the application only needs to execute 4 instructions, when the number of coordinates needing texture sampling in the shader is relatively large, a large number of copy instructions can be reduced, and the operation efficiency of the operation unit 302 is improved.

For more specific implementation manners of the embodiments of the present application, please refer to the foregoing embodiments, and the details are not repeated here.

Referring to fig. 4, fig. 4 shows an instruction compiling apparatus. The instruction compiling device includes:

the first instruction generating module 401 is configured to generate a first instruction.

The second instruction generating module 402 is configured to generate a second instruction, where the second instruction includes a plurality of first operation instructions, and destination registers indicated in the plurality of first operation instructions are a plurality of physical scalar registers in a same physical vector register.

According to the embodiment of the application, the data is copied between different registers without additional copying instructions, so that the number of instructions in the second instruction is reduced, and the optimization of the instructions is realized. In addition, by reducing the number of instructions, the number of instructions to be executed by the subsequent graphics processing device is reduced, and the computing efficiency of the graphics processing device is improved.

Further, referring to fig. 5, the second instruction generating module 402 may include:

an updating unit 501, configured to update the first source virtual register to the first destination virtual register in the predefined instruction corresponding to the first source virtual register, and delete the copy instruction to obtain a third instruction;

a register allocation unit 502, configured to allocate a physical register to the virtual register indicated in the third instruction, so as to obtain the second instruction.

According to the embodiment of the application, the first target virtual register indicated in the plurality of copy instructions is respectively a plurality of virtual sub-registers in the same virtual vector register, and the addresses of the plurality of virtual sub-registers in the virtual vector register are continuous, so that the first source virtual register is updated into the first target virtual register, the vector registers can be formed without copying the instructions in the third instruction and the second instruction, and the number of instructions in the second instruction is reduced on the basis of meeting the continuous requirement of the subsequent task on the data storage address.

With respect to each of the apparatuses and each of the modules/units included in the products described in the above embodiments, it may be a software module/unit, a hardware module/unit, or a software module/unit, and a hardware module/unit. For example, for each device or product applied to or integrated on a chip, each module/unit included in the device or product may be implemented in hardware such as a circuit, or at least part of the modules/units may be implemented in software program, where the software program runs on a processor integrated inside the chip, and the rest (if any) of the modules/units may be implemented in hardware such as a circuit; for each device and product applied to or integrated in the chip module, each module/unit contained in the device and product can be realized in a hardware manner such as a circuit, different modules/units can be located in the same component (such as a chip, a circuit module and the like) or different components of the chip module, or at least part of the modules/units can be realized in a software program, the software program runs on a processor integrated in the chip module, and the rest (if any) of the modules/units can be realized in a hardware manner such as a circuit; for each device, product, or application to or integrated with the terminal device, each module/unit included in the device may be implemented in hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal device, or at least some modules/units may be implemented in a software program, where the software program runs on a processor integrated within the terminal device, and the remaining (if any) part of the modules/units may be implemented in hardware such as a circuit.

The embodiment of the application also discloses a storage medium which is a computer readable storage medium and is stored with a computer program, and the computer program can execute the steps of the method shown in fig. 1 when running. The storage medium may include Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic or optical disks, and the like. The storage medium may also include non-volatile memory (non-volatile) or non-transitory memory (non-transitory) or the like.

The embodiment of the application also discloses a terminal device, which comprises the graphics processing device; alternatively, the terminal device comprises a memory and a processor, the memory storing a computer program executable on the processor, the processor executing the steps of the instruction compiling method described above when the computer program is executed.

The term "plurality" as used in the embodiments of the present application means two or more.

The first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the description objects, and no order is used, nor is the number of the devices in the embodiments of the present application limited, and no limitation on the embodiments of the present application should be construed.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and system may be implemented in other manners. For example, the device embodiments described above are merely illustrative; for example, the division of the units is only one logic function division, and other division modes can be adopted in actual implementation; for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform part of the steps of the method according to the embodiments of the present application.

Although the present application is disclosed above, the present application is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the application, and the scope of the application should be assessed accordingly to that of the appended claims.

Claims

1. A method of compiling instructions, comprising:

generating a first instruction, wherein the first instruction comprises a plurality of copy instructions and a plurality of first operation instructions, the copy instructions indicate a first source virtual register and a first destination virtual register, the first destination virtual registers indicated in the copy instructions are respectively a plurality of virtual sub-registers in the same virtual vector register, and addresses of the plurality of virtual sub-registers are continuous;

generating a second instruction according to the first instruction, wherein the second instruction comprises the plurality of first operation instructions, destination registers indicated in the plurality of first operation instructions are a plurality of physical scalar registers in the same physical vector register, and the plurality of physical scalar registers correspond to the plurality of virtual sub-registers;

wherein generating a second instruction according to the first instruction comprises:

updating the first source virtual register into the first destination virtual register in a predefined instruction corresponding to the first source virtual register, and deleting the copy instruction to obtain a third instruction;

2. The method of claim 1, wherein said allocating physical registers for virtual registers indicated in said third instruction comprises:

and allocating a corresponding physical vector register for the virtual vector register indicated in the third instruction, and allocating a corresponding physical scalar register for the virtual sub-register indicated in the third instruction.

3. The method of claim 1, wherein the updating the first source virtual register to the first destination virtual register comprises:

traversing the second instruction to determine an instruction pair comprising the copy instruction and the predefined instruction, the predefined instruction of the instruction pair corresponding to a first source virtual register indicated in the copy instruction of the instruction pair;

updating a first source virtual register indicated in a predefined instruction in the instruction pair to a first destination virtual register in a copy instruction in the instruction pair.

4. The instruction compilation method according to claim 1, wherein the predefined instruction is a second operation instruction, the second operation instruction being configured to instruct execution of an operation and store an operation result in the first source virtual register.

5. The instruction compilation method of claim 4 wherein the second operation instruction includes a second source virtual register and the first source virtual register, the second operation instruction to instruct performing an operation on data in the second source virtual register.

6. The instruction compiling method according to claim 1, wherein the first operation instruction is for instructing execution of an operation and storing an operation result in the destination register.

7. The instruction compilation method of claim 1 wherein the second instruction includes a third operation instruction for performing operations on data in the physical vector registers.

8. A graphics processing apparatus based on the instruction compiling method according to any one of claims 1 to 7, comprising:

a scheduling executor for receiving the second instruction;

and the operation unit is used for executing the second instruction.

9. An instruction compiling apparatus, comprising:

the first instruction generation module is used for generating a first instruction, the first instruction comprises a plurality of copy instructions and a plurality of first operation instructions, the copy instructions indicate a first source virtual register and a first destination virtual register, the first destination virtual registers indicated in the copy instructions are respectively a plurality of virtual sub-registers in the same virtual vector register, and the addresses of the plurality of virtual sub-registers are continuous;

a second instruction generating module, configured to generate a second instruction according to the first instruction, where the second instruction includes the plurality of first operation instructions, destination registers indicated in the plurality of first operation instructions are a plurality of physical scalar registers in a same physical vector register, and the plurality of physical scalar registers correspond to the plurality of virtual sub-registers;

the second instruction generating module updates the first source virtual register into the first destination virtual register in a predefined instruction corresponding to the first source virtual register, and deletes the copy instruction to obtain a third instruction; and the second instruction generating module allocates a physical register for the virtual register indicated in the third instruction to obtain the second instruction.

10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the instruction compiling method of any one of claims 1 to 7.

11. A terminal device comprising a memory and a processor, said memory having stored thereon a computer program, characterized in that said processor performs the steps of the instruction compiling method according to any one of claims 1 to 7 or comprises the graphics processing apparatus according to claim 8.