CN114391155A

CN114391155A - GPU shader program iteration calling method, GPU, compiler and GPU driver

Info

Publication number: CN114391155A
Application number: CN202080006290.4A
Authority: CN
Inventors: 朱韵鹏
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2022-04-22
Also published as: WO2022036628A1

Abstract

A graphics processor GPU shader program iterative call method, a GPU (400), a compiler (500), and a GPU driver (600). The GPU (400) comprises: an obtaining module (410) configured to obtain, from a GPU driver (600), a program table storing N program counters PC corresponding to N shader programs, where a PC corresponding to each shader program of the N shader programs is configured to point to each shader program, and N is a positive integer; and the calling module (420) is used for determining a first PC corresponding to a first shader program in the N PCs according to the program table, and calling the first shader program according to the first PC, so that iterative calling of the shader program can be realized on the basis of not greatly changing a GPU hardware framework.

Description

GPU shader program iteration calling method, GPU, compiler and GPU driver

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method for iteratively invoking a GPU shader program of a graphics processor, a GPU, a compiler, and a driver.

Background

Ray tracing (ray tracing) technology simulates physical characteristics of rays in the real world by using an algorithm, and can achieve physically accurate shadow, reflection and refraction and global illumination, so that a 3D picture is more real. Ray tracing techniques have high requirements for Graphics Processing Units (GPUs). Existing standards for ray tracing DXR ray tracing and vulkan NV/KHR ray tracing require multiple shader programs (shader programs) on a pipeline to be called iteratively. There are various types of iterative calls, for example, shader program shader a may call multiple other shader programs (e.g., shader B0 and shader B1); for example, shader program shader a may call shader program shader B, shader program shader B may further call shader program shader C, and so on.

Most of the existing hardware structures of the Graphics Processing Unit (GPU) cannot support dynamic iterative calling of shader programs well. If such support is to be achieved, a large number of changes to the hardware architecture of the GPU are required. In order to fully utilize the existing GPU hardware structure, the prior art proposes a scheme for compiling a plurality of shader programs into a large program. Since the iterative invocation of the shader program is mostly decided at program run time. It is necessary to consider compiling all possible calling programs together, increasing program complexity. Thus, the prior art does not provide a good iterative invocation technique for the shader program.

Disclosure of Invention

The application provides a GPU shader program iterative call method, a GPU, a compiler and a GPU driver, which can realize the iterative call of the shader program on the basis of not greatly changing a GPU hardware structure framework.

In a first aspect, a GPU is provided, comprising: the acquisition module is used for acquiring a program table from the GPU driver, wherein the program table records N program counters PC corresponding to N shader programs, the PC corresponding to each shader program in the N shader programs is used for pointing to each shader program, and N is a positive integer; and the calling module is used for determining a first PC corresponding to a first shader program in the N shader programs in the N PCs according to the program table and calling the first shader program according to the first PC.

In the technical scheme, the program table records PCs corresponding to N shader programs, when the first shader program needs to be called, only the PC of the first shader program needs to be obtained according to the program table, and the instruction code of the first shader program is obtained according to the PC, so that iterative calling of the shader program can be realized on the basis of not greatly changing a GPU framework, all possible programs do not need to be written into a complex program, and calling of the shader program is more flexible.

In one possible implementation, before determining the first PC, the obtaining module is further configured to: acquiring program table parameters corresponding to a first PC from a GPU driver, wherein the program table parameters comprise a table base address and an offset; the calling module is specifically used for determining the address of the first PC in the program table according to the program table parameters, querying the program table by the address and determining the first PC.

In another possible implementation, the first shader program is a parent program, and the offset includes a start offset, the start offset is used for indicating an offset of the parent program relative to a base address of a table in a program table, and the parent program is used for iteratively calling other shader programs in the N shader programs; the determination module is specifically configured to: the address of the first PC in the program table is determined based on the table base address and the start offset.

In another possible implementation manner, the first shader program is a first subprogram, the offset includes a subprogram offset, the program table parameter further includes a step size, the subprogram offset is used to indicate an offset of a second subprogram from a table base address in the program table, the first subprogram is a program called by a parent program in the N shader programs, and the second subprogram is a base subprogram used to determine an address of a first PC corresponding to the first subprogram in the program table; the obtaining module is further configured to obtain variable information of the first subprogram from the GPU driver, where the variable information includes a variable of the first subprogram and a storage resource of the variable, and the storage resource includes a register resource occupied by the variable or a global storage resource, where the variable includes an index variable, and the index variable is used to indicate an offset of the first subprogram with respect to the second subprogram in the program table; the calling module is specifically configured to: and determining the address of the first PC in the program table according to the table base address, the subroutine offset, the index variable and the step size.

Through the base address of the program table, the offset and the step length which are dynamically changed in the program running process and the index variable parameter, the GPU can conveniently determine the PC of the parent program or the subprogram from the program table, and the first shader program is called according to the PC, so that the flexibility of program calling is improved.

In another possible implementation, the calling module is further configured to update a value of a variable stored in the storage resource when the first shader program is called.

In a second aspect, there is provided a compiler, comprising: the compiling module is used for compiling the N shader programs into N binary executable files, wherein N is a positive integer; the distribution module is used for distributing storage resources for variables of the N shader programs, the values of the variables of a first shader program in the N shader programs are obtained or updated when the GPU calls the first shader program, and the storage resources comprise register resources or global storage resources; the sending module is used for sending control information of the N shader programs to the GPU driver, the control information comprises a program counter PC of each shader program in the N shader programs and variable information of each shader program in the N shader programs, and the variable information comprises variables in each shader program and storage resources occupied by the variables.

Before calling the programs, the compiler directly compiles the N shader programs into a binary GPU executable file, and when the GPU calls a certain shader program, the corresponding instruction codes can be directly obtained without recompilation.

In one possible implementation, the compiler further includes a determining module, configured to determine that the variable exists in at least two shader programs of the N shader programs, where the at least two shader programs include the first shader program; the allocation module is specifically configured to allocate a storage resource for a first variable existing in at least two shader programs.

For the same variables in different shader programs, the compiler enables the same variables to multiplex the same storage resources, so that register resources are saved, and the pressure of the register resources in the GPU is relieved.

In another possible implementation manner, the determining module is specifically configured to: when the first field of the variable has the same value in at least two shader programs, determining that the variable is present in at least two shader programs.

In another possible implementation manner, the determining module is further configured to determine that a storage resource occupied by the variable is greater than a threshold; the allocation module is specifically configured to allocate global storage resources to the variables in the global storage.

When the storage resource needed by the variable is large, the compiler allocates the resource in the global storage for the variable, and the pressure on the register resource in the GPU is further relieved.

In a third aspect, a GPU driver is provided, comprising: the system comprises a receiving module, a compiling module and a processing module, wherein the receiving module is used for receiving control information of N shader programs sent by the compiling device, the control information comprises a program counter PC of each shader program in the N shader programs and variable information of each shader program, the variable information comprises variables of each shader program in the N shader programs and storage resources occupied by the variables, the storage resources comprise register resources or global storage resources, and N is a positive integer; the management module is used for establishing a program table according to the PC of each shader program, and the program table records N PCs corresponding to the N shader programs; and the sending module is used for sending the program table to the GPU.

The GPU driver establishes a program table of the shader program according to the control information obtained by the compiler in the program compiling process, so that the GPU can directly obtain the PC of the first shader program according to the program table when the program is called, the first shader program is called, and the program calling efficiency is improved.

In one possible implementation manner, the sending module is further configured to send, to the GPU, a program table parameter of a first PC corresponding to a first shader program of the N shader programs, where the program table parameter includes: table base address and offset.

In another possible implementation, the first shader program is a parent program, and the offset includes a start offset that indicates an offset of the parent program from a base address of the table in the program table, and the parent program can be used to call other shader programs of the N shader programs.

In another possible implementation manner, the first shader program is a first subprogram, the offset includes a subprogram offset, the program table parameter further includes a step size, the subprogram offset is used to indicate an offset of a second subprogram from a table base address in the program table, the first subprogram is a program called by a parent program in the N shader programs, and the second subprogram is a base subprogram used to determine an address of a first PC corresponding to the first subprogram in the program table; the sending module is further configured to send variable information of the first subprogram to the GPU, where the variable information includes variables in the first subprogram and storage resources occupied by the variables, where the variables include an index variable, and the index variable is used to indicate an offset of the first subprogram with respect to the second subprogram in the program table.

In a fourth aspect, a GPU shader program iterative call method is provided, including: the GPU obtains a program table from a GPU driver, wherein the program table stores N PCs corresponding to N shader programs of a first pipeline, the PC corresponding to each shader program in the N shader programs is used for pointing to each shader program, and N is a positive integer; determining a first PC corresponding to a first shader program in the N shader programs in the N PCs according to the program table; calling a first shader program according to the first PC.

In the technical scheme, the program table records PCs corresponding to N shader programs, when a first shader program needs to be called, only the first PC of the first shader program needs to be obtained according to the program table, and the instruction code of the first shader program is obtained according to the first PC, so that iterative calling of the shader program can be realized on the basis of not greatly changing a GPU framework, all possible programs do not need to be written into a complex program, and calling of the shader program is more flexible.

In one possible implementation, before determining the first PC to which the first shader program corresponds, the method further includes: acquiring program table parameters corresponding to the first PC from a GPU driver, wherein the program table parameters comprise: table base address and offset; and determining the address of the first PC in the program table according to the program table parameters, and querying the program table by the address to determine the first PC.

In another possible implementation, the first shader program is a parent program, and the offset includes a start offset, the start offset is used for indicating an offset of the parent program relative to a base address of a table in a program table, and the parent program can be used for iteratively calling other shader programs in the N shader programs; determining the address of the first PC in the table according to the program table parameters comprises: the address of the first PC in the table is determined based on the table base address and the start offset.

In another possible implementation manner, the first shader program is a first subprogram, the offset includes a subprogram offset, the program table parameter further includes a step size, the subprogram offset is used to indicate an offset of a second subprogram from a table base address in the program table, the first subprogram is a shader program called by a parent program in the N shader programs, and the second subprogram is a benchmark subprogram used to determine an address of a first PC corresponding to the first subprogram in the program table; determining the address of the first PC in the program table according to the program table parameters comprises: acquiring variable information of the first subprogram from a GPU driver, wherein the variable information comprises variables in the first subprogram and storage resources occupied by the variables, and the storage resources comprise register resources or global storage resources, wherein the variables comprise index variables which are used for indicating the offset of the first subprogram relative to the second subprogram in a program table; and determining the address of the first PC in the program table according to the table base address, the subroutine offset, the index variable and the step size.

Through the table base address of the program table, and the dynamically changed offset, index variable and step length parameter in the program running process, the GPU can conveniently determine the PC of the parent program or the subprogram from the program table, and the first shader program is called according to the PC, so that the flexibility of program calling is improved.

In another possible implementation manner, the method further includes: the value of the variable in the storage resource is updated when the first subroutine is invoked.

In a fifth aspect, a GPU shader program compiling method is provided, including: compiling the N shader programs of the first pipeline into N binary executable files by a compiler, wherein N is a positive integer; allocating storage resources to variables in the N shader programs, wherein values of variables of a first shader program in the N shader programs are acquired or updated when the GPU calls the first shader program, and the storage resources include register resources or global storage resources; and sending control information of the N shader programs to a GPU driver, wherein the control information comprises a program counter PC of each shader program in the N shader programs and variable information of each shader program, and the variable information comprises variables in each shader program and storage resources occupied by the variables.

In one possible implementation, allocating storage resources for variables of the N shader programs includes: determining that a variable exists in at least two shader programs of the N shader programs, the at least two shader programs including a first shader program; storage resources are allocated for variables present in at least two shader programs.

For the same variables in different shader programs, the compiler enables the same variables to multiplex the same storage resources, so that register resources in the GPU are saved, and the pressure of the register resources is relieved.

In another possible implementation, the method includes: when the first field of the variable has the same value in at least two shader programs, determining that the variable exists in at least two shader programs of the N shader programs.

In another possible implementation, allocating storage resources for the variables of the N shader programs includes: determining that the storage resources occupied by the variables are greater than a threshold; global storage resources are allocated for the variable in global storage.

When the register resource needed by the variable is large, the compiler allocates the resource in the global storage to the variable, and the stress on the register resource is further relieved.

In a sixth aspect, a method for creating a program table of a GPU shader program is provided, including: the GPU driver receives control information of N shader programs sent by a compiler, the control information comprises a program counter PC of each shader program in the N shader programs and variable information of each shader program, the variable information comprises variables of each shader program in the N shader programs and storage resources occupied by the variables, and N is a positive integer; establishing a program table according to the PC of each shader program, wherein the program table records N PCs corresponding to the N shader programs; the program table is sent to the graphics processor GPU.

In one possible implementation, the method further includes: sending program table parameters of a first PC corresponding to a first shader program in the N shader programs to the GPU, wherein the program table parameters comprise: table base address and offset.

In another possible implementation, the first shader program is a parent program, and the offset includes a start offset that indicates an offset of the parent program from a table base address in a table of programs, the parent program being capable of iteratively calling other shader programs of the N shader programs.

In another possible implementation manner, the first shader program is a first subprogram, the offset includes a subprogram offset, the program table parameter further includes a step size, the subprogram offset is used to indicate an offset of a second subprogram from a table base address in the program table, the first subprogram is a shader program called by a parent program in the N shader programs, and the second subprogram is a benchmark subprogram used to determine an address of a first PC corresponding to the first subprogram in the program table; the method further comprises the following steps: and sending variable information of the first subprogram to the GPU, wherein the variable information comprises variables in the first subprogram and storage resources occupied by the variables, and the storage resources comprise register resources or global storage resources, wherein the variables comprise index variables which are used for indicating the offset of the first subprogram relative to the second subprogram in a program table.

In a seventh aspect, a computer-readable medium is provided, which stores program code for execution by a device, where the program code includes instructions for performing the GPU shader program iteration calling method of the fourth aspect or any one of the implementations of the fourth aspect.

In an eighth aspect, a computer-readable medium is provided that stores program code for execution by a device, the program code comprising instructions for performing the GPU shader program compilation method of the fifth aspect or any one of the implementations of the fifth aspect.

In a ninth aspect, a computer readable medium is provided, which stores program code for execution by a device, the program code comprising program table creation means for executing the GPU shader program of the sixth aspect or any one of the implementations of the sixth aspect.

In a tenth aspect, there is provided a computer program product comprising: computer program code for causing a GPU to perform the GPU shader program iteration call method of any one of the implementations of the fourth aspect or the fourth aspect when the computer program code is run on the GPU.

In an eleventh aspect, there is provided a computer program product comprising: computer program code for causing a computer device to perform the GPU shader program compilation method of the fifth aspect or any one of the implementations of the fifth aspect when the computer program code is run on the computer device.

In a twelfth aspect, there is provided a computer program product comprising: computer program code for causing a computer device to perform the program table creation method of the GPU shader program in any one of the implementations of the sixth aspect or the sixth aspect when the computer program code runs on the computer device.

In a thirteenth aspect, a chip is provided, where the chip includes a processor and a data interface, the processor includes a GPU, and the processor reads instructions stored in a memory through the data interface, and executes the GPU shader program iterative call method in any implementation manner of the fourth aspect or the fourth aspect.

Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the GPU shader program iteration calling method in any one implementation manner of the fourth aspect or the fourth aspect.

In a fourteenth aspect, a graphics processing system is provided, comprising a GPU as in any implementation of the first aspect or the first aspect, a compiler as in any implementation of the second aspect or the second aspect, and a GPU driver as in any implementation of the third aspect or the third aspect.

Drawings

FIG. 1 is a schematic diagram of shader program iterative invocation;

FIG. 2 is a flowchart illustrating an example of a method for iterative invocation of a GPU shader program;

FIG. 3 is a block diagram of a framework for implementing an iterative GPU shader program invocation method according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a GPU according to an embodiment of the present disclosure;

FIG. 5 is a software framework diagram of a compiler according to an embodiment of the present application;

fig. 6 is a software framework diagram of a driver according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are examples of a part of this application and not of all embodiments.

The ray tracing (ray tracing) technology simulates the physical characteristics of rays in the real world by using an algorithm, can achieve physically accurate shadow, reflection and refraction and global illumination, and is more real in a 3D picture. Existing standards for ray tracing DXR ray tracing and vulkan NV/KHR ray tracing require multiple shader programs (shader programs) on a pipeline to be called iteratively. FIG. 1 is a diagram of shader program iterative invocation. As shown in fig. 1, a shader program shader a runs on three threads of a Graphics Processing Unit (GPU), each of which can dynamically call a different shader program. For example, as shown in FIG. 1, in thread 1, the shader A may call the shader B0, and thread 1 may also call the shader C0 during execution of the code of the shader B0. When thread 1 finishes executing the shader C0, thread 1 continues to execute the remaining code segments of shader B0, followed by the remaining code segments of shader A. In thread 2, the shader A may call the shader B1, and may also call the shader C1 during execution of the shader B1. After thread 2 finishes executing the code of the shader C1, it continues to execute the remaining code segments of the shader B1, and then continues to execute the remaining code segments of the shader A. In thread 3, the shader A may call the shader B2, and may also call the shader C1 during execution of the shader B2. After thread 3 finishes executing the code of the shader C1, it may continue to execute the remaining code segments of the shader B2, and then continue to execute the remaining code segments of the shader A.

When the GPU is running (e.g., invoking the shader program as described above), coordination of the compiler and the GPU driver is required. The compiler is tool software used for compiling source codes of the shader programs into binary GPU executable files, is independent of the GPU, runs in a development environment of computer equipment, and provides code compiling services for the GPU. The GPU driver is a driver software that contains hardware information of the GPU, and the GPU driver is run in a computer device in which the GPU is installed to control communication between the GPU and other hardware or software of the computer device. Illustratively, a GPU is installed in one computer device, compiler software runs in a CPU of the computer device and compiles source codes for the GPU; GPU driver software runs in a CPU of the computer equipment and configures the communication between the CPU and the GPU; and the GPU acquires the codes compiled by the compiler and the control information output by the GPU driver, and calls a shader program to realize the function of image processing. The CPU and the GPU may be separate components or may be integrated, for example, in the following embodiments, the CPU may be included in the GPU, and the GPU in the following embodiments may be regarded as a processor architecture including various processing functions in a broad sense, not only a GPU core in a narrow sense.

The Optix scheme is a scheme that can realize iterative invocation of a shading program without changing the hardware of the GPU. The scheme compiles all possible programs needing to be called, so that the requirement of more resources such as registers and the like is caused, and the complexity of the programs is increased. In view of this, embodiments of the present application provide a GPU shader program iterative call method, which can implement dynamic iterative call of a shader program without greatly changing a GPU architecture, and reduce the complexity of the program.

Fig. 2 is a flowchart illustrating a method for dynamic iterative invocation of a GPU shader program according to an embodiment of the present application. Fig. 3 is a framework diagram for implementing an iterative invocation method of a GPU shader program according to an embodiment of the present application.

As shown in fig. 2, the method includes steps S210 to S230, and the method for dynamically invoking an iteration of a GPU shader program according to the embodiment of the present application is described in detail below with reference to fig. 2 and 3.

S210, the compiler compiles the N shader programs into N binary executable files.

The application program may include a plurality of shader programs in a rendering pipeline (e.g., a first pipeline), where each shader program corresponds to a different stage of graphics processing, and executes a corresponding shader program at the corresponding stage to implement a certain function. As shown in fig. 3, the GPU driver may invoke the GPU compiler to compile the relevant shader program when creating the rendering pipeline. Illustratively, the first pipeline includes 4 shader programs: a loader A, a loader B0, a loader B1, and a loader C. The GPU compiler compiles the 4 shader programs into 4 binary GPU executable files, and outputs control information of each shader program to the GPU driver. The control information of the shader program includes a Program Counter (PC) of the program, and the PC is a pointer, and the PC of the program can point to the code of the next instruction.

In other embodiments, the control information may also include variable information for the shader program.

All shader programs in the first pipeline are compiled into GPU executable files in advance before program calling, and instruction codes of corresponding programs can be directly obtained according to PCs of the programs when the corresponding programs are called, so that the compiling complexity during program calling is reduced.

In some embodiments, each shader program may include a program identifier, and the GPU compiler may determine that the shader programs belong to the same rendering pipeline according to the program identifier. For example, the first pipeline includes the above 4 shader programs, program identifiers of the 4 shader programs are "first", "second", "third", and "last", respectively, and according to the program identifiers, the compiler may determine that the 4 shader programs belong to the first pipeline, and may perform iterative call on the 4 shader programs in the first pipeline.

It should be understood that the program identifier may be in other forms, and the embodiment of the present application is not limited thereto.

In some embodiments, when the compiler compiles the N shader programs into the binary executable file, a storage resource may be further allocated for a variable in the N shader programs, the variable is stored in a register or a global storage of the GPU, and the compiler may output the above-mentioned variable information including the variable and the storage resource occupied by the variable to the GPU driver.

In some embodiments, in compiling the N shader programs of the first pipeline, the GPU compiler may allocate the same storage resources (e.g., first register resources) for the same variables (e.g., first variables) in the non-shader programs.

For example, the shader B0 includes a variable var 1, and the GPU compiler allocates a register resource r 1 for the variable var 1 when compiling the shader B0. The shader B1 includes a variable var 1 and a variable var 2, and when the GPU compiler compiles the shader B1, since the shader B1 and the shader B0 include the same variable var 1, the compiler may also allocate the register resource r 1 to the variable var 1 of the shader B1. For variable var 2 of shader B1, the compiler allocates register resource r 2 to variable var 2. After the compilation is completed, the GPU compiler can output the variable information to the GPU driver, and the GPU driver further outputs the control information to the GPU, so that each shader program can be correctly called when the GPU runs.

In some embodiments, the GPU compiler may determine the same variable in the N shader programs according to the value of the first field. For example, a rayPayload is a type of storage for a variable that indicates a different variable by a value in a first field, such as a location field. Illustratively, the value of the location field of the variable var 1 of the above-mentioned shader B0 is 1, and the value of the location field of the variable var 1 of the above-mentioned shader B1 is also 1, so that the compiler determines that the two variables are the same variable, and can multiplex the same register resource r 1. For another example, the location field of the variable var 2 of the above-mentioned shader B1 has a value of 2, and thus is a variable different from var 1, and the compiler reallocates the register resource r 2 to the variable var 2.

In other embodiments, the GPU compiler may determine the occupied resources of the variable according to the variable size. For example, when the number of registers a variable (e.g., a first variable) needs to occupy exceeds a threshold, the compiler places the variable into global storage and outputs information of the variable to the GPU driver.

By multiplexing the same resources for the same variables and storing the variables occupying more registers in the global storage, the stress of the register resources in the GPU can be relieved.

S220, the GPU driver establishes a program table, and the GPU determines a first PC corresponding to the first shader program according to the program table.

In some embodiments, the first PC corresponding to the first shader program may be determined from a program table. After the GPU compiler compiles the N shader programs into a binary executable file, the GPU driver may create or update a program table according to the control information output by the GPU compiler. The program table includes information of the PC corresponding to each shader program in the first pipeline. Table 1 is an example of a program table of the embodiment of the present application.

TABLE 1

Shader program	PC
shader A	pc 0
shader C	pc 1
shader B0	pc 2
shader B1	pc 3

As shown in Table 1, the PC corresponding to the shader A is PC0, the PC corresponding to the shader C is PC1, the PC corresponding to the shader B0 is PC2, and the PC corresponding to the shader B1 is PC 3. Once the PC of the first shader program is determined, the instructions of that program can be read. For example, if PC is PC2, the instruction of the loader B0 can be read according to the pointer PC 2.

As described in step S210, after the GPU compiler finishes compiling the plurality of shader programs, the GPU compiler may output control information to the GPU driver, for example, a PC corresponding to each shader program may be output. The GPU driver establishes or updates the program table shown in table 1 according to the control information of the shader program output by the GPU compiler, so that the GPU can perform dynamic iterative invocation of the shader program according to the program table when running. For example, the compiler may also output, to the GPU driver, variable information of the N shader programs, the variable information including variables in the N shader programs and occupied storage resources of the variables.

In some embodiments, the GPU driver outputs the parameters of the program table to the GPU, and the GPU may obtain the PCs corresponding to each shader program in the table according to the parameters of the program table. Illustratively, the table parameters include a table base address (table address), an offset (offset). In other embodiments, the program table parameters also include a step size (stride).

The above-mentioned program table parameters may be obtained from the application program by the GPU driver and stored in the memory of the device. In some embodiments, these parameters may be retrieved from device memory in advance and stored in on-chip memory of the GPU for optimization of multi-threaded accesses.

It should be understood that the column 1 in the above table is for more clearly showing the corresponding relationship between different shader programs and the PC of the shader program, and in practical applications, only the content in the column 2 of the above table 1 may be stored.

It should also be understood that other control information may also be stored in the program table, which is not limited in this embodiment of the application.

The following describes the process of determining the first PC corresponding to the first shader program according to the program table.

In some embodiments, the first shader program is a parent program. The parent program is the initial program when the GPU performs some image processing task in the first pipeline. The GPU will call other shader programs in the first pipeline in this parent program. The PC address of the parent program may be determined from the table base address and a start offset indicating the amount of offset of the parent program in the table relative to the table base address.

Parent PC address is table base address + start offset.

As shown in table 1, when the start offset is 0, the parent program (start program) of the first pipeline is a shader a, and other shader programs of the first pipeline are called in the shader a. The PC capable of reading the shader A from the table 1 according to the parent program PC address is PC0, and the instruction code of the shader A is read according to PC 0.

For another example, when the start offset is 1, the parent program (start program) of the first pipeline is a shader C, and other shader programs of the first pipeline are called in the shader C. The PC of the shader C can be read from the table according to the parent program PC address as PC1, and the instruction code of the shader C is read according to PC 1.

In some embodiments, the first shader program is a first subprogram, and the first subprogram is a shader program called by the parent program. And in the process of running the parent program, acquiring the PC address of the first subprogram called by the parent program, reading the PC value of the first subprogram from the table according to the PC address of the first subprogram, and acquiring the instruction code of the first subprogram.

Illustratively, the GPU driver issues an image rendering task to the GPU, using a shader A, a shader B0, and a shader B1. The task uses the shader A as a parent program and iteratively calls the shader B0 and the shader B1. For example, the shader a is used to implement the function of intersecting rays and different objects, where the shader programs that each object needs to call are the shader B0 and the shader B1. The shader B0 and the shader B1 are subroutines of the shader A.

When the GPU runs, firstly, the address of the pc0 in the program table is obtained and calculated according to the base address and the initial offset of the table, the pc0 is read according to the address, and then the code instruction of the loader A is read according to the pc 0. When the first subprogram is called, a subprogram starting PC address is determined, the subprogram starting PC address is the address of the PC corresponding to a certain subprogram (for example, a second subprogram) in the subprogram called by the parent program, and the address of the PC of the first subprogram needing to be called currently in the program table is calculated by taking the address of the PC of the second subprogram in the program table as a reference. For example, the child program called by the parent program, the loader A, has a loader B0 and a loader B1, the second child program may be a loader B0, and then the starting address of the child program is the address of the PC of the loader B0 in the program table; alternatively, the second subroutine may be the shader B1, and the starting address of the subroutine is the address of the PC of the shader B1 in the program table. The PC addresses of other subroutines (e.g., the first subroutine currently being invoked) may be calculated with the PC address of the second subroutine as a reference.

Taking the PC address of the start address of the subroutine, the shader B0, as an example (the second subroutine is the shader B0), when each object calls a shader program, the GPU obtains an index variable (hitIndex) from the GPU driver, the value of the index variable may change dynamically according to the running of the program, and the index variable represents the offset of the first subroutine relative to the second subroutine in the program table. The calculation formula of the PC address of the first subroutine is as follows:

first subprogram PC address ═ subprogram start address + index step size

The subroutine offset is the offset of the second subroutine from the table base address.

It should be understood that the first and second sub-programs may be the same sub-program. Illustratively, when calculating the PC addresses of the subroutines, the PC addresses of the respective subroutines are calculated with reference to the loader B0. In this case, the subroutine start address is the offset of the table base address + loader B0. When the index hitIndex is 0, the PC address of the first subroutine is the PC address of the loader B0, the PC address from which the loader B0 can be read is PC2, and the instruction code of the loader B0 is obtained according to the PC 2. In the above embodiment, both the first subroutine and the second subroutine are the loader B0.

When the index hitIndex is 1, the PC address of the first subroutine is the PC address of the loader B1, the PC address from which the loader B1 can be read is PC3, and the instruction code of the loader B1 is obtained according to the PC 3.

When the GPU acquires the value of the index variable from the GPU driver, the storage resource corresponding to the index variable can be acquired, and when the value of the index variable changes in the program running process, the GPU can update the value of the index variable in the storage resource of the index variable. For example, in the above embodiment, when the value of the hitIndex changes from 0 to 1, the GPU may update the value of the hitIndex in the storage resource corresponding to the hitIndex.

It should be understood that the subprogram start address may be a PC address of any subprogram called by the parent program, and the subprogram start address indicates that when calculating the PC address of the subprogram, based on the PC address of a certain subprogram, other subprograms dynamically generate the value of the index variable according to the offset from the reference subprogram, and acquire the corresponding PC address.

S230, the GPU calls the first shader program according to the first PC.

After acquiring the PC of the first program (first PC), the instruction code of the first program may be acquired from the PC. And updating variables involved in the program in the process of running the program by the GPU.

When the GPU calls the first shader program, the GPU may update the variable of the first shader program according to the variable information of the first shader program output by the GPU driver.

For example, the GPU calls the shader B0, and in step S210, the GPU compiler allocates the register r 1 for the variable var 1, so that the contents in r 1 are updated when the shader B0 is run.

For example, the GPU calls the shader B1, and in step S210, the GPU compiler allocates registers r 1 and r 2 for the variable var 1 and the variable var 2, respectively, so that the contents in r 1 and r 2 are updated when the shader B1 is run.

Fig. 4 is a schematic diagram of a hardware structure of a GPU according to an embodiment of the present application. As shown in fig. 4, the GPU of the embodiment of the present application includes an obtaining module 410 and a calling module 420. The fetch module 410 and the call module 420 are circuit modules in the GPU. For example, the fetch module 410 is an interface for the GPU to interact with other software modules or with hardware modules, and the call module 420 is a specific execution unit. The GPU may include multiple cores (cores). For example, the calling module 420 may be an execution body of the scheme, and may include one or more image processing cores, and may further optionally include a Central Processing Unit (CPU), a microprocessor, a microcontroller, a logic gate circuit, or the like. The relevant CPU, microprocessor, microcontroller or logic gate circuit can cooperate with the image processing core to work under the drive of GPU drive. Therefore, the GPU can be understood as a broad concept, which is a circuit architecture for image processing, including hardware with various functions, not only a narrow definition of an image processing core, but also optionally integrated in a system on chip (SoC).

The obtaining module 410 is configured to obtain a program table from the GPU driver, where N PCs corresponding to N shader programs are recorded in the program table, where a PC corresponding to each shader program in the N shader programs is used to point to each shader program.

The invoking module 420 is configured to determine a first PC corresponding to the first shader program according to the program table, and invoke the first shader program according to the first PC. In some embodiments, after invoking the shader program, the invocation module 420 may also be used to update the variables of the first shader program.

It should be understood that the GPU 400 of the embodiment of the present application is equivalent to the GPU in the frame diagram described in fig. 3, and implements the corresponding functions of steps S220 and S230 in the method 200 shown in fig. 2.

Fig. 5 is a software framework diagram of a compiler according to an embodiment of the present application. As shown in fig. 5, the software framework of the compiler of the embodiment of the present application includes a compiling module 510, an allocating module 520, and a sending module 530. The compiler is tool software which is independent of the GPU and can run in a software development environment, and the compiler is mainly used for compiling source codes of shader programs into binary files which can be executed by the GPU.

And a compiling module 510 for compiling the N shader programs into N binary executable files.

An allocating module 520, configured to allocate a storage resource for a variable in the N shader programs, so that the GPU obtains or updates a value of the variable when calling a first shader program in the N shader programs. Specifically, in some embodiments, allocation module 530 may allocate the same storage resources for the same variable in multiple shader programs. In other embodiments, for a variable that occupies a larger amount of memory resources, the allocation module 530 may allocate the variable to the global memory for storage.

A sending module 530, configured to send control information of the N shader programs to the GPU driver, where the control information includes a PC of each shader program and variable information of each shader program, where the variable information includes variables in each shader program and storage resources occupied by the variables.

The compiler 500 of the embodiment of the present application is equivalent to a GPU compiler in the framework shown in fig. 3, and can be used to implement the corresponding functions of step S210 in the method 200 shown in fig. 2. For the sake of brevity, the detailed functions of the modules may be referred to in the description of the method 200, and are not described herein again.

Fig. 6 is a software framework diagram of a GPU driver according to an embodiment of the present application. As shown in fig. 6, the GPU driver of the embodiment of the present application includes a receiving module 610, a managing module 620, and a transmitting module 630.

The receiving module 610 is configured to receive control information of the N shader programs sent by the compiler, where the control information includes a PC corresponding to each shader program and variable information of each shader program, where the variable information includes a variable of each shader program in the N shader programs and a storage resource occupied by the variable.

The management module 620 is configured to establish a program table according to the PC of each shader program, where the program table records the PC corresponding to each shader program.

A sending module 630, configured to send the program table to the GPU. In some embodiments, the sending module 630 may also send the variable information of the first shader program to the GPU so that the GPU acquires or updates the value of the variable in the shader program when invoking the first shader program.

The driver 600 implemented in the present application corresponds to the GPU driver in the framework shown in fig. 3, and each module in the driver 600 can implement the corresponding function of the GPU driver in each step of the method 200 shown in fig. 2. For brevity, no further description is provided.

Embodiments of the present application further provide a computer-readable medium, which stores a computer program (also referred to as code, or instructions), which when executed on the GPU, causes the GPU to execute the shader program calling method in steps S220 and S230 described above.

Embodiments of the present application further provide a computer-readable medium storing a computer program (also referred to as code, or instructions), which when executed on a computer device, causes the computer device to perform any of the GPU shader program compiling methods in step S210.

The present embodiment also provides a computer-readable medium, which stores a computer program (also referred to as code, or instructions), and when the computer program runs on a computer device, the computer device executes the program table establishment method in step S220.

The embodiment of the present application further provides a chip system, which includes a memory and a processor, where the memory is used to store a computer program, and the processor includes the GPU circuit architecture in the embodiment of the present application in a broad sense, and is used to call and run the computer program from the memory, so that a computer device in which the chip system is installed executes the shader program calling method described in any method 200 above.

The system-on-chip may include, among other things, input circuitry or interfaces for transmitting information or data, and output circuitry or interfaces for receiving information or data.

An embodiment of the present application further provides a graphics processing system, including: the GPU, compiler, and driver in the above embodiments.

In the above embodiment, the shader program compiling method implemented by the compiler in step S210 and the program table establishing method implemented by the GPU driver in step S220 may be implemented by software, and the shader program calling method implemented by the GPU in steps S220 and S230 may be implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a storage medium or transmitted from one storage medium to another storage medium, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.

It should be appreciated that reference throughout this specification to "some embodiments" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in some embodiments" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative steps (steps) described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it is understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A Graphics Processor (GPU), comprising:

an obtaining module, configured to obtain a program table from a GPU driver, where the program table records N program counters PC corresponding to N shader programs, where a PC corresponding to each shader program in the N shader programs is used to point to each shader program, and N is a positive integer;

and the calling module is used for determining a first PC corresponding to a first shader program in the N shader programs in the N PCs according to the program table and calling the first shader program according to the first PC.
The GPU of claim 1, wherein prior to the determining the first PC, the fetch module is further configured to: acquiring program table parameters corresponding to the first PC from the GPU driver, wherein the program table parameters comprise a table base address and an offset;

the calling module is specifically configured to: and determining the address of the first PC in the program table according to the program table parameters, querying the program table by the address, and determining the first PC.
A GPU as claimed in claim 2, wherein the first shader program is a parent program, the offset comprising a starting offset to indicate an offset of the parent program in the program table relative to the table base address, the parent program being usable to iteratively invoke other ones of the N shader programs;

the calling module is specifically configured to:

and determining the address of the first PC in the program table according to the table base address and the starting offset.
A GPU according to claim 2 or 3, wherein the first shader program is a first subroutine, the offset comprises a subroutine offset, the program table parameter further comprises a stride, the subroutine offset is used to indicate an offset of a second subroutine from the table base address in the program table, the first subroutine is a shader program called by a parent program in the N shader programs, the second subroutine is a base subroutine used to determine an address of the first PC in the program table corresponding to the first subroutine;

the obtaining module is further configured to obtain, from a GPU driver, variable information of the first subprogram, where the variable information includes a variable in the first subprogram and a storage resource of the variable, and the storage resource includes a register resource occupied by the variable or a global storage resource, where the variable includes an index variable, and the index variable is used to indicate an offset of the first subprogram with respect to the second subprogram in the program table;

the calling module is specifically configured to:

and determining the address of the first PC in the program table according to the table base address, the subprogram offset, the index variable and the step size.
The GPU of claim 4, wherein the calling module is further configured to update the value of the variable stored in the storage resource when the first subroutine is called.
A compiler, comprising:

the compiling module is used for compiling the N shader programs into N binary executable files, wherein N is a positive integer;

an allocation module, configured to allocate a storage resource to a variable in the N shader programs, where a value of the variable of a first shader program in the N shader programs is obtained or updated when a graphics processor GPU calls the first shader program, and the storage resource includes a register resource or a global storage resource;

a sending module, configured to send control information of the N shader programs to a GPU driver, where the control information includes a program counter PC of each shader program of the N shader programs and variable information, and the variable information includes the variable and the storage resource in each shader program.
The compiler of claim 6, further comprising:

a determination module to determine that the variable is present in at least two shader programs of the N shader programs, the at least two shader programs including the first shader program;

the allocation module is specifically configured to allocate the storage resource to the variable existing in the at least two shader programs.
The compiler of claim 7, wherein the determination module is specifically configured to:

determining that the variable is present in the at least two shader programs when a first field of the variable has a same value in the at least two shader programs.
The compiler of any of claims 6-8, further comprising:

a determining module, configured to determine that the storage resource occupied by the variable is greater than a threshold;

the allocation module is further configured to allocate the global storage resource for the variable in a global storage.
A GPU driver, comprising:

a receiving module, configured to receive control information of N shader programs sent by a compiler, where the control information includes a program counter PC of each shader program in the N shader programs and variable information of each shader program, the variable information includes a variable of each shader program and a storage resource occupied by the variable, the storage resource includes a register resource or a global storage resource, and N is a positive integer;

the management module is used for establishing a program table according to the PC of each shader program, and the program table records N PCs corresponding to the N shader programs;

and the sending module is used for sending the program table to the GPU.
A GPU driver according to claim 10, wherein the sending module is further configured to send, to the GPU, program table parameters of a first PC corresponding to a first shader program of the N shader programs, the program table parameters including: table base address and offset.
A GPU driver as defined in claim 11, wherein the first shader program is a parent program, the offset comprising a starting offset to indicate an offset of the parent program in the program table relative to the table base address, the parent program being usable to iteratively invoke other ones of the N shader programs.
A GPU driver according to claim 11 or 12, wherein the first shader program is a first subroutine, the offset comprises a subroutine offset, the program table parameter further comprises a stride, the subroutine offset is used to indicate an offset of a second subroutine in the program table from the table base address, the first subroutine is a shader program called by a parent program in the N shader programs, the second subroutine is a benchmark subroutine used to determine an address of a first PC corresponding to the first subroutine in the program table;

the sending module is further configured to send the variable information of the first subprogram to a GPU, where the variable information includes a variable in the first subprogram and a storage resource occupied by the variable, and the storage resource includes a register resource or a global storage resource occupied by the variable, where the variable includes an index variable, and the index variable is used to indicate an offset of the first subprogram with respect to the second subprogram in the program table.
A method for iterative invocation of GPU shader programs of a graphics processor, comprising:

the method comprises the steps that a GPU obtains a program table from a GPU driver, the program table records N program counters PC corresponding to N shader programs, wherein the PC corresponding to each shader program in the N shader programs is used for pointing to each shader program, and N is a positive integer;

determining a first PC corresponding to a first shader program of the N shader programs in the N PCs according to the program table;

and calling the first shader program according to the first PC.
The method of claim 14, wherein prior to the determining the first PC to which the first shader program corresponds, the method further comprises: acquiring program table parameters corresponding to the first PC from the GPU driver, wherein the program table parameters comprise: table base address and offset;

and determining the address of the first PC in the program table according to the program table parameters, querying the program table by the address, and determining the first PC.
The method of claim 15, wherein the first shader program is a parent program, the offset comprising a starting offset that indicates an offset of the parent program in the program table relative to the table base address, the parent program being usable to iteratively invoke other shader programs of the N shader programs;

the determining the address of the first PC in the program table according to the program table parameters comprises:

and determining the address of the first PC in the program table according to the table base address and the starting offset.
The method of claim 15 or 16, wherein the first shader program is a first subroutine, the offset comprises a subroutine offset, the program table parameter further comprises a stride, the subroutine offset is used to indicate an offset of a second subroutine from the table base address in the program table, the first subroutine is a shader program called by a parent program in the N shader programs, the second subroutine is a benchmark subroutine used to determine an address of a first PC corresponding to the first subroutine in the program table;

the determining the address of the first PC in the program table according to the program table parameters comprises:

obtaining variable information of the first shader program from a GPU driver, wherein the variable information comprises variables in the first shader program and storage resources occupied by the variables, and the storage resources comprise register resources or global storage resources, wherein the variables comprise index variables used for indicating offsets of the first subprogram relative to the second subprogram in the program table;

and determining the address of the first PC in the program table according to the table base address, the subprogram offset, the index variable and the step size.
The method of claim 17, further comprising:

updating the value of the variable stored in the storage resource when the first subprogram is called.
A shader program compiling method, comprising:

compiling the N shader programs into N binary executable files by a compiler, wherein N is a positive integer;

allocating storage resources to variables in the N shader programs, values of the variables of a first shader program of the N shader programs being obtained or updated when a graphics processor GPU calls the first shader program, the storage resources including register resources or global storage resources;

sending control information of the N shader programs to a GPU driver, wherein the control information comprises a program counter PC of each shader program in the N shader programs and variable information of each shader program, and the variable information comprises variables in each shader program and the storage resources occupied by the variables.
The method of claim 19, wherein the allocating storage resources for the variables of the N shader programs comprises:

determining that the variable is present in at least two shader programs of the N shader programs, the at least two shader programs including the first shader program;

allocating the storage resources for the variables present in at least two shader programs.
The method of claim 20, wherein determining that the first variable is present in at least two of the N shader programs comprises:

determining that the variable is present in the at least two shader programs when a first field of the variable has a same value in the at least two shader programs.
The method of any of claims 19-21, wherein said allocating storage resources for variables in the N shader programs comprises:

determining that the storage resources occupied by the variable are greater than a threshold;

and allocating the global storage resources for the variables in global storage.
A method for building a program table of a GPU shader program of a graphics processor, comprising:

the method comprises the steps that a GPU driver receives control information of N shader programs sent by a compiler, the control information comprises a program counter PC of each shader program in the N shader programs and variable information of each shader program, the variable information comprises variables of each shader program in the N shader programs and storage resources occupied by the variables, the storage resources comprise register resources or global storage resources, and N is a positive integer;

establishing a program table according to the PC of each shader program, wherein the program table records N PCs corresponding to the N shader programs in the N shader programs;

and sending the program table to a GPU.
The method of claim 23, further comprising:

sending, to the GPU, program table parameters of a first PC corresponding to a first shader program of the N shader programs, the program table parameters including: table base address and offset.
The method of claim 24, wherein the first shader program is a parent program, and wherein the offset comprises a starting offset that indicates an offset of the parent program from a base address of the table in the program table, the parent program being usable to iteratively invoke other shader programs of the N shader programs.
The method of claim 24 or 25, wherein the first shader program is a first subroutine, the offset comprises a subroutine offset, the program table parameter further comprises a stride, the subroutine offset is used to indicate an offset of a second subroutine from the table base address in the program table, the first subroutine is a shader program of the N shader programs that is called by a parent program, the second subroutine is a benchmark subroutine used to determine an address of a first PC in the program table to which the first subroutine corresponds;

the method further comprises the following steps: sending the variable information of the first subprogram to a GPU, wherein the variable information comprises variables in the first subprogram and storage resources occupied by the variables, and the storage resources comprise register resources or global storage resources, wherein the variables comprise index variables used for indicating the offset of the first subprogram relative to the second subprogram in the program table.
A graphics processing system comprising a graphics processor GPU as claimed in claims 1-5, a compiler as claimed in claims 6-9 and a GPU driver as claimed in claims 10-13.