CN116185501A - Command execution method, device, graphics processor, electronic device and storage medium - Google Patents

Command execution method, device, graphics processor, electronic device and storage medium Download PDF

Info

Publication number
CN116185501A
CN116185501A CN202211654158.1A CN202211654158A CN116185501A CN 116185501 A CN116185501 A CN 116185501A CN 202211654158 A CN202211654158 A CN 202211654158A CN 116185501 A CN116185501 A CN 116185501A
Authority
CN
China
Prior art keywords
command
queue
target
commands
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211654158.1A
Other languages
Chinese (zh)
Inventor
杨晓松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tiantian Smart Core Semiconductor Co ltd
Original Assignee
Shanghai Tiantian Smart Core Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tiantian Smart Core Semiconductor Co ltd filed Critical Shanghai Tiantian Smart Core Semiconductor Co ltd
Priority to CN202211654158.1A priority Critical patent/CN116185501A/en
Publication of CN116185501A publication Critical patent/CN116185501A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the invention provides a command execution method, a device, a graphic processor, electronic equipment and a storage medium, and relates to the technical field of electronics. The command execution method is applied to the graphics processor and comprises the following steps: constructing a plurality of command execution queues, wherein each command execution queue comprises a plurality of commands; reading a command in any command execution queue as a target command, and judging whether the target command is a queue switching command; and if the target command is the queue switching command, reading and executing commands in other command execution queues. Compared with the prior art, the command execution method, the device, the graphics processor, the electronic equipment and the storage medium provided by the embodiment of the invention can effectively improve the utilization rate of the operation capability of the GPU chip.

Description

Command execution method, device, graphics processor, electronic device and storage medium
Technical Field
The present invention relates to the field of electronic technologies, and in particular, to a command execution method, a device, a graphics processor, an electronic apparatus, and a storage medium.
Background
With the general trend of graphics processors (graphics processing unit, GPU), the types of special logic operation units in the GPU are increasing, and the purpose of the special operation is to make special units perform special operations from the first general operation unit, special function unit, high performance operation unit, tensor operation unit, etc., so as to achieve the optimal efficiency.
A system comprising a GPU chip and a central processing unit, wherein the central processing unit typically transmits a command sequence to the GPU chip, and the GPU chip parses the received command sequence and distributes the command to different functional modules in the GPU chip for execution. It is known that the cpu puts the generated commands in a sequence of commands and then sequentially executed by the GPU chip.
However, if the commands in the command sequence have a dependency relationship, for example, commands a and C in the sequence are 2 different computing tasks, and the execution of C depends on the execution result of a, so C must wait until the execution of a ends, so a wait command is inserted between commands a and C to wait for the execution of a to end. At this time, since only the command A is operated in the GPU chip, if the calculated amount of the command A is not large enough, the calculation force of the GPU chip cannot be fully utilized, a part of the calculation force of the whole GPU chip is in an idle state, and the utilization rate of the calculation capability of the GPU chip is reduced.
Disclosure of Invention
The invention aims to provide a command execution method, a device, a graphics processor, electronic equipment and a storage medium, which can effectively improve the utilization rate of the operation capability of a GPU chip.
In a first aspect, an embodiment of the present invention provides a command execution method, which is applied to a graphics processor, and includes: constructing a plurality of command execution queues, wherein each command execution queue comprises a plurality of commands; reading a command in any command execution queue as a target command, and judging whether the target command is a queue switching command; and if the target command is the queue switching command, reading and executing commands in other command execution queues.
In some embodiments, the determining whether the target command is a queue switch command comprises: judging whether a preset identifier exists in the target command; and if the preset identifier exists in the target command, judging the target command as the queue switching command. The judging process according to the preset identifier is simpler, and the judging efficiency can be effectively improved, so that the overall efficiency of task execution is improved.
In some embodiments, the determining whether the target command is a queue switch command comprises: judging whether a dependency relationship exists between the target command and the command being executed at the current moment; and if the dependency relationship exists between the target command and the command being executed at the current moment, judging the target command as the queue switching command. By judging the dependency relationship between the target command and the command being executed at the current moment, the target command with the dependency relationship after the command is executed can be prevented from being judged as the queue switching command, and therefore accuracy of a judging result is improved.
In some embodiments, the determining whether there is a dependency relationship between the target command and the command being executed at the current time includes: acquiring a preset dependency relationship value and a preset storage address of the target command; reading a target value stored in the preset storage address; judging whether the preset dependency relationship value is the same as the target value; if the preset dependency relationship value is different from the target value, judging that a dependency relationship exists between the target command and the command being executed at the current moment; and if the preset dependency relationship value is the same as the target value, judging that no dependency relationship exists between the target command and the command being executed at the current moment. By comparing the target value in the preset storage address with the preset dependency relationship value, whether another command with the dependency relationship with the target command is executed or not can be accurately judged, and whether the target command is a queue switching command or not can be accurately judged.
In some embodiments, the determining whether the target command is a queue switch command comprises: judging whether a preset identifier exists in the target command and whether a dependency relationship exists between the target command and a command being executed at the current moment; if the preset identifier exists in the target command or a dependency relationship exists between the target command and a command being executed at the current moment, judging the target command as the queue switching command; and if the preset identifier does not exist in the target command and the dependency relationship does not exist between the target command and the command being executed at the current moment, judging that the target command is not the queue switching command. Through the comprehensive use of a plurality of different judging methods, the accuracy of the judging result of whether the target command is the queue switching command can be effectively improved, and the missed judgment or the misjudgment is avoided.
In some embodiments, before the reading the command in any of the command execution queues as the target command, the method further comprises: setting a unique current command execution queue; the reading the command in any command execution queue as a target command includes: and reading the command in the current command execution queue as the target command.
In some embodiments, before the reading and executing the commands in the other command execution queue, the method further comprises: setting the other command execution queue as the new current command execution queue. By setting the unique current execution queue, only executing the command in the current execution queue at the current moment, when the target command in the current execution queue is the queue switching command, resetting the other command execution queue as a new current execution queue, and reading and executing the command in the new current execution queue, the reading and executing processes of the command can be more orderly carried out, the command execution confusion and execution errors caused in the command execution queue switching process are avoided, and the accuracy of the task execution result is improved as a whole.
In a second aspect, an embodiment of the present invention provides a command execution method, which is applied to a command execution device, including: constructing a plurality of commands, constructing a plurality of queue switching commands in the plurality of commands according to the dependency relationship among the commands, and sending the plurality of commands to a graphics processor; the queue switch command is for switching a command execution queue when read by the graphics processor.
In some embodiments, said constructing a number of queue switch commands among said number of commands according to dependencies between respective ones of said commands comprises: acquiring a target command from the commands, wherein the target command is a command with a dependency relationship with any other command; and inserting a preset identifier into the target command.
In some embodiments, said constructing a number of queue switch commands among said number of commands according to dependencies between respective ones of said commands comprises: acquiring a target command from the commands, wherein the target command is a command with a dependency relationship with any other command; inserting a preset dependency relationship value and a preset storage address into the target command; and inserting the preset storage address into the command on which the target command depends so as to control the command on which the target command depends to write a preset dependency relationship value into the preset storage address when the execution is completed.
In a third aspect, an embodiment of the present invention provides a graphics processor, including: the system comprises a queue construction module, a queue management module and a queue management module, wherein the queue construction module is used for constructing a plurality of command execution queues, and each command execution queue comprises a plurality of commands; the command execution module is used for reading commands in any command execution queue as target commands and judging whether the target commands are queue switching commands or not; the command execution module is further configured to read and execute commands in other command execution queues when the target command is the queue switch command.
In a fourth aspect, an embodiment of the present invention provides a command execution apparatus, including: the command construction module is used for constructing a plurality of commands; the switching command construction module is used for constructing a plurality of queue switching commands in the plurality of commands according to the dependency relationship among the commands; and the communication module is used for sending the commands to the graphics processor.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including: a processor, a first memory communicatively coupled to the at least one processor, and a graphics processor, a second memory coupled to the graphics processor; wherein the first memory stores instructions executable by the at least one processor to enable the at least one processor to perform a command execution method as described above; the second memory stores instructions executable by the graphics processor to enable the graphics processor to perform a command execution method as described above.
In a sixth aspect, an embodiment of the present invention provides a computer readable storage medium storing a computer program, wherein the computer program is executed by a graphics processor to implement the foregoing command execution method.
Compared with the prior art, in the command execution method, the device, the graphics processor, the electronic equipment and the storage medium provided by the embodiment of the invention, a plurality of command execution queues are constructed in the graphics processor, when the graphics processor receives a command, the received command is stored in the command execution queues to be sequentially executed, when each command is executed, whether the command is a queue switching command is firstly judged, if the command is the queue switching command, the command is switched to other command execution queues, and the command in the other command execution queues is read and executed, so that the situation that the subsequent other commands without dependency cannot be read and executed due to the dependency relationship between the commands is avoided, the utilization rate of the operation capability of a GPU chip is improved, and the work efficiency of the GPU chip is integrally improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a prior art command execution method;
FIG. 2 is a flowchart of a command execution method according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a command execution method according to an embodiment of the invention;
FIG. 4 is a flowchart illustrating a command execution method according to a second embodiment of the present invention;
FIG. 5 is a flowchart of a command execution method according to a third embodiment of the present invention;
FIG. 6 is a schematic diagram of a graphics processor according to a fourth embodiment of the present invention;
fig. 7 is a schematic structural diagram of a command execution device according to a fifth embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
In the description of the present invention, it should be noted that, if the terms "upper", "lower", "inner", "outer", and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or the azimuth or the positional relationship in which the inventive product is conventionally put in use, it is merely for convenience of describing the present invention and simplifying the description, and it is not indicated or implied that the apparatus or element referred to must have a specific azimuth, be configured and operated in a specific azimuth, and thus it should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," and the like, if any, are used merely for distinguishing between descriptions and not for indicating or implying a relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
Fig. 1 is a schematic process diagram of a command execution method in the prior art, where ABCD is a command to be executed in a command execution queue, in the prior art, if there is a dependency relationship between an a command and a C command, for example, when execution result data of the a command needs to be used in executing the C command, a waiting command is inserted between the a command and the C command, so as to control the C command not to be read and executed by the GPU before the execution of the a command ends, but the insertion of the waiting command simultaneously causes that a B command and a D command cannot be executed before the execution of the a command ends, at this time, since only the a command is executed in the GPU chip, if the calculation amount of the a command is not large enough, a part of calculation power of the whole GPU chip will be in an idle state, and the utilization ratio of the calculation capability of the GPU chip is reduced.
In order to solve the above technical problem, a first embodiment of the present invention provides a command execution method applied to a graphics processor, as shown in fig. 2, including the following steps:
step S101: a plurality of command execution queues are constructed, each command execution queue including a plurality of commands.
In some embodiments of the present invention, a queue description register is included in the graphics processor, which builds a number (2 or more) of command execution queues by configuring the queue description register in the chip, e.g., the command execution queue structure may be composed of a number of different queue description registers. Each queue description register holds information such as residence time of commands, row address, column address, bank address, command type, and data length.
In some embodiments of the present invention, when a graphics processor receives a task of another device in a system, for example, a processor, another graphics processor, etc., the task is split into a plurality of commands, and then the commands are respectively added into a plurality of command execution queues after completion of construction according to a dependency relationship between the commands; it is also possible that other devices in the system, such as a processor, another graphics processor, etc., have split the task into several commands and will. For example, in some embodiments of the invention, commands that have a dependency may be added to the same command execution queue and commands that do not have a dependency may be added to a different command execution queue. The commands with the dependency relationship are added into the same command execution queue, so that the influence of the mutual dependency relationship among the commands after the command execution queue is switched can be avoided, the switching times of the command execution queue are reduced, and the task processing efficiency is improved.
In some embodiments of the present invention, as shown in fig. 3, there is a dependency between the a command and the C command, and a dependency between the B command and the D command, after adding the a command and the C command to the same command execution queue, the system adds the queue switch command E between the a command and the C command, and similarly, may also add the queue switch command F between the B command and the D command. It should be understood that the foregoing adding of the queue switch command E between the command a and the command C and adding of the queue switch command F between the command B and the command D are merely illustrative in some embodiments of the present invention, and not limiting, and in some other embodiments of the present invention, the command C and the command D may be modified directly into the queue switch command, for example, adding a preset identifier to the command C and the command D, where the preset identifier is used to identify whether a command is a queue switch command, if a preset identifier exists in a command, the command is a queue switch command, otherwise, if a preset identifier does not exist in a command, the command is not a queue switch command. In some embodiments of the present invention, for example, an allow_yield command may be used as the preset identifier. By modifying the C command and the D command into the queue switching command, the queue switching command does not need to be additionally inserted into the command execution queue, so that more task commands can be added into the command execution queue, and the task command capacity of the command execution queue is improved.
Step S102: and reading a command in any command execution queue as a target command, judging whether the target command is a queue switching command, if so, executing step S103, and if not, executing step S104.
In some embodiments of the present invention, for each command execution queue, the stored commands are sequentially executed in order, and when any command is executed, the command is the target command at the current time, and at this time, it is determined whether the target command is a queue switching command.
The flexible detection can be performed according to the specific setting of the queue switching command. In some embodiments of the present invention, for example, when the dedicated queue switch command is inserted between the different commands with dependency relationships as disclosed in step S101, determining whether the target command is the queue switch command is determining whether the target command is the dedicated queue switch command that is set individually. In some other embodiments of the present invention, for example, the preset identifier is inserted into the part of task commands in step S102, and then it is determined whether the target command is a queue switch command, that is, whether the preset identifier exists in the target command is detected, if the preset identifier exists in the target command, the target command is a queue switch command, and if the preset identifier does not exist in the target command, the target command is not a queue switch command. The judging process according to the preset identifier is simpler, and the judging efficiency can be effectively improved, so that the overall efficiency of task execution is improved.
In some embodiments of the present invention, there may be a dependency relationship between the target command at the current time and another command, but the other command having a dependency relationship between the current time and the target command has been already executed, at this time, whether the target command is a queue switching command may be determined according to whether there is a dependency relationship between the target command at the current time and the command being executed at the current time. Specifically, for each command, if a dependency relationship exists between the command and other commands, a dependency relationship value and a preset storage address are preset for the command, the preset dependency relationship value is a value preset between the command and another command with the dependency relationship, the preset storage address is a storage address preset between the command and another command with the dependency relationship, the preset dependency relationship value is written into the preset storage address after the execution of the other command is completed, when the target command is executed, the value in the preset storage address is firstly read, if the value written into the preset storage address is the preset dependency relationship value, the other command is indicated to be executed, at the moment, no dependency relationship exists between the target command and the currently executed command, the target command is not a queue switching command, if the value written into the preset storage address is not the preset dependency relationship value, the other command is executed, at the moment, the dependency relationship exists between the target command and the currently executed command, and the target command is a queue switching command. By judging the dependency relationship between the target command and the command being executed at the current moment, the target command with the dependency relationship after the command is executed can be prevented from being judged as the queue switching command, and therefore accuracy of a judging result is improved.
It should be understood that the foregoing is merely illustrative of some embodiments of the present invention in which the above determination methods for determining whether the target command is a queue switch command are combined, and in some other embodiments of the present invention, for example, the above determination methods may be combined to determine whether a preset identifier exists in the target command and whether a dependency relationship exists between the target command and a command being executed at the current time, and determine that the target command is a queue switch command when the preset identifier exists in the target command or the dependency relationship exists between the target command and the command being executed at the current time, and determine that the target command is not a queue switch command when the preset identifier does not exist in the target command and the dependency relationship does not exist between the target command and the command being executed at the current time. Through the comprehensive use of a plurality of different judging methods, the accuracy of the judging result of whether the target command is the queue switching command can be effectively improved, and the missed judgment or the misjudgment is avoided.
Step S103: the commands in the other command execution queues are read and executed.
In some embodiments of the present invention, the target command is a queue switch command, and the target command is not executed temporarily, and the commands in the other command execution queues are read again and executed. It will be appreciated that in some embodiments of the present invention, before executing the command in the other command execution queue, it may be equally determined whether the command is a queue switch command, and the command may be executed according to the determination result or the command in the other command queue may be read again for execution.
Step S104: and executing the target command.
And if the target command is not the queue switching command, directly distributing the target command to the corresponding execution unit for execution.
Compared with the prior art, in the command execution method provided by the embodiment of the invention, a plurality of command execution queues are constructed in the graphics processor, when the graphics processor receives commands, the received commands are stored in the command execution queues for sequential execution, when each command is executed, whether the command is a queue switching command is firstly judged, if the command is the queue switching command, the command is switched to other command execution queues, the commands in the other command execution queues are read and executed, and therefore the situation that the subsequent other commands without dependency relationship cannot be read and executed due to the dependency relationship among the commands is avoided, the utilization rate of the operation capability of the GPU chip is improved, and the work efficiency of the GPU chip is improved as a whole.
The second embodiment of the present invention provides a command execution method, which is applied to a graphics processor, as shown in fig. 4, and includes the following steps:
step S201: a plurality of command execution queues are constructed, each command execution queue including a plurality of commands.
Step S202: a unique current command execution queue is set.
Specifically, in some embodiments of the present invention, only one command execution queue is executed at the current time, i.e., only the command is read from the unique command execution queue at the current time and executed, and the command execution queue is the unique current command execution queue.
In some embodiments of the present invention, the unique current command execution queue may be set according to an active_queue register in the GPU chip, and the GPU chip hardware obtains the unique current command execution queue according to the value of the active_queue register.
In addition, in some other embodiments of the present invention, an initial global atomic variable may also be set for the current execution queue, as the current execution queue access switch, where the initial global atomic variable set in this embodiment is initialized to 0, and thread No. 0 of each work group accesses the initial global atomic variable. Multiple working groups may be simultaneously and parallelly arranged in one GPU chip, and when one working group obtains access rights to an initial global atomic variable, the working group obtaining the access rights to the initial global atomic variable locks unique access rights to the initial global atomic variable so as to prevent other working groups from simultaneously executing queues by current commands. And then, acquiring a target command which needs to be executed by the work group from the current command execution queue according to the variable value of the initial global atomic variable. Specifically, under the condition that the variable value of the initial global atomic variable is smaller than the grouping number of commands in the current execution queue, acquiring a group of computing task execution corresponding to the variable value of the initial global atomic variable from the current execution queue, and releasing the access right to the initial global atomic variable; and under the condition that the variable value of the initial global atomic variable is greater than or equal to the grouping number of the commands in the current execution queue, indicating that all the commands in the current execution queue are executed, and no executable commands exist, and the work group exits from executing. And after all the working groups exit from execution, finishing the execution of the GPU program. Unlike conventional GPU threads, the GPU thread that is started in this embodiment does not exit immediately after processing a computing task, but continues to process the computing task until the assigned task is completely processed. Meanwhile, in the embodiment, by introducing the global atomic variable, the conflict that multiple working groups access the current command execution queue simultaneously is avoided.
Step S203: and reading the command in the current command execution queue as a target command, judging whether the target command is a queue switching command, if so, executing step S204, and if not, executing step S206.
Step S204: setting other command execution queues as new current command execution queues.
In some embodiments of the present invention, the target command read in the unique command execution queue at the current time is a queue switch command, and then another other command queue is first set as a new current command execution queue.
In some embodiments of the present invention, the hardware of the GPU chip may be caused to fetch another command execution queue as a new current command execution queue by modifying the value of the active_queue register.
Step S205: the commands in the new current command execution queue are read and executed.
Step S206: and executing the target command.
It should be understood that, in the present embodiment, the steps S201, S203, S205 and S206 are substantially the same as the steps S101 to S104 in the first embodiment, and specific reference may be made to the specific description in the first embodiment, which is not repeated herein.
Compared with the prior art, the command execution method provided by the second embodiment of the present invention includes all the technical features of the first embodiment, and has the same technical effects as the first embodiment. In addition, in the command execution method provided by the second embodiment of the present invention, by setting the unique current execution queue, only executing the command in the current execution queue at the current time, when the target command in the current execution queue is the queue switching command, resetting another command execution queue as the new current execution queue, and reading and executing the command in the new current execution queue, the reading and executing processes of the command can be more orderly performed, thereby avoiding the command execution confusion and execution error in the command execution queue switching process, and improving the accuracy of the task execution result as a whole.
The third embodiment of the present invention provides a command execution method, which is applied to a command execution device, as shown in fig. 5, and includes the following steps:
step S301: several commands are built.
In some embodiments of the present invention, the command execution device is connected to the graphics processor, and is used for generating or receiving a task that the graphics processor needs to execute, dividing the task into a plurality of commands in the command execution device, where the plurality of commands respectively complete a part of the content of the task, and all the command execution is completed, that is, the execution of the task is completed.
Step S302: and constructing a plurality of queue switching commands in the plurality of commands according to the dependency relationship among the commands, and sending the plurality of commands to the graphics processor.
In some embodiments of the present invention, it may be that a queue switching command independent of the task command in step S301 is directly constructed, that is, a new independent command is constructed as the queue switching command. In some other embodiments of the present invention, the task command may be modified directly into the queue switch command, for example, a preset identifier is added to the task command, where the preset identifier is used to identify whether a command is a queue switch command, if the preset identifier exists in a command, the command is a queue switch command, otherwise, if the preset identifier does not exist in a command, the command is not a queue switch command. In some embodiments of the present invention, for example, an allow_yield command may be used as the preset identifier. By modifying the task command into the queue switching command, the queue switching command does not need to be additionally inserted into the command execution queue, so that more task commands can be added into the command execution queue, and the task command capacity of the command execution queue is improved.
Further, in some embodiments of the present invention, the target command may be obtained according to the dependency relationship between the task commands in step S301, where the target command is a command having a dependency relationship with any other command, and the preset identifier is inserted into the target command. For example, for the command a and the command C, if the command C needs to use the running result of the command a when executing, the command C is the target command, and a preset identifier is inserted into the command C to form a queue switching command.
In addition, in some other embodiments of the present invention, a preset dependency value and a preset memory address may be inserted into the target command; and inserting a preset storage address into the command on which the target command depends so as to control the command on which the target command depends to write a preset dependency relationship value into the preset storage address when the execution is completed. For example, for the command a and the command C, if the command C needs to use the running result of the command a when executing, the command C is the target command, the preset dependency value and the preset storage address are inserted into the command C to form the queue switching command, and the preset storage address is inserted into the command a. After the command A is executed, writing the preset dependency relationship value into the preset storage address. In some embodiments of the present invention, the command a may also insert a preset dependency value, where the command a writes the preset dependency value into a preset storage address after the execution is completed, or the command a may directly write the operation result as the dependency value into the preset storage address, which may specifically be flexibly set according to actual needs.
Compared with the prior art, in the command execution method provided by the third embodiment of the invention, the command is constructed in the command execution device, and a plurality of queue switching commands are constructed in a plurality of commands according to the dependency relationship among the commands, so that the graphics processor can directly realize the effects of executing the commands and switching the queues without constructing the commands and the queue switching commands, and the operation efficiency of the graphics processor is improved.
A fourth embodiment of the present invention provides a graphics processor, specifically as shown in FIG. 6, including:
the queue construction module 601, the queue construction module 601 is configured to construct a plurality of command execution queues, each command execution queue including a plurality of commands; the command execution module 602, the command execution module 602 is configured to read a command in any command execution queue as a target command, and determine whether the target command is a queue switching command; the command execution module 602 is further configured to read and execute commands in the other command execution queues when the target command is a queue switch command.
Compared with the prior art, in the graphics processor provided by the fourth embodiment of the present invention, the queue construction module 601 constructs a plurality of command execution queues in the graphics processor, when the graphics processor receives a command, the graphics processor stores the received command into the command execution queues for sequential execution, and when executing each command, the command execution module 602 first determines whether the command is a queue switching command, if the command is a queue switching command, switches to other command execution queues, reads the command in the other command execution queues and executes the command, thereby avoiding that other subsequent commands without dependency cannot be read and executed due to dependency between commands, and improving the utilization ratio of the operation capability of the GPU chip, and overall improving the working efficiency of the GPU chip.
Further, in some embodiments of the present invention, the queue construction module 601 may be a queue description register in the graphics processor, and the graphics processor constructs a number (2 or more) of command execution queues by configuring the queue description register in the chip. For example, a command execution queue structure may be made up of several different queue description registers. Each queue description register holds information such as residence time of commands, row address, column address, bank address, command type, and data length.
A fifth embodiment of the present invention provides a command execution device, as shown in fig. 7, including:
a command construction module 701, wherein the command construction module 701 is used for constructing a plurality of commands; the switching command construction module 702, the switching command construction module 702 is configured to construct a plurality of queue switching commands in a plurality of commands according to the dependency relationship between the commands; a communication module 703, the communication module 703 is used to send several commands to the graphics processor.
Compared with the prior art, in the command execution device provided by the fifth embodiment of the present invention, the command construction module 701 constructs commands in the command execution device, the switching command construction module 702 constructs a plurality of queue switching commands in a plurality of commands according to the dependency relationship between the commands, and finally the communication module 703 sends the plurality of commands to the graphics processor, so that the graphics processor can directly realize the effects of executing the commands and switching the queues without constructing the commands and the queue switching commands, and the operation efficiency of the graphics processor is improved.
A sixth embodiment of the present invention relates to an electronic device, as shown in fig. 8, including: at least one processor 801, a first memory 802 and a graphics processor 803 communicatively coupled to the at least one processor 801, and a second memory 804 coupled to the graphics processor 803; wherein the first memory 801 stores instructions executable by the at least one processor 801, the instructions being executable by the at least one processor 801 to enable the at least one processor 801 to perform the command execution method as provided by the third embodiment described above; the second memory 804 stores instructions executable by the graphic processor 803, the instructions being executable by the graphic processor 803 to enable the graphic processor 803 to perform the command execution method as provided in the foregoing first and second embodiments.
Where the first memory 802 and the processor 801, and the second memory 804 are connected to the graphics processor 803 by way of a bus, which may include any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and memories together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.
The processor 801 is responsible for managing the bus and general processing and may provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.
The seventh embodiment of the invention relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the above embodiment methods may be implemented by a program, which is stored in a storage medium and includes several commands for causing a device (which may be a single chip, a chip, etc.) or a processor (processor) to perform all or part of the steps in the embodiment methods. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The present invention is not limited to the above embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (14)

1. A command execution method, applied to a graphics processor, comprising:
constructing a plurality of command execution queues, wherein each command execution queue comprises a plurality of commands;
reading a command in any command execution queue as a target command, and judging whether the target command is a queue switching command;
and if the target command is the queue switching command, reading and executing commands in other command execution queues.
2. The method of claim 1, wherein said determining whether said target command is a queue switch command comprises:
judging whether a preset identifier exists in the target command;
and if the preset identifier exists in the target command, judging the target command as the queue switching command.
3. The method of claim 1, wherein said determining whether said target command is a queue switch command comprises:
judging whether a dependency relationship exists between the target command and the command being executed at the current moment;
and if the dependency relationship exists between the target command and the command being executed at the current moment, judging the target command as the queue switching command.
4. A method according to claim 3, wherein said determining whether a dependency exists between the target command and a command being executed at the current time comprises:
acquiring a preset dependency relationship value and a preset storage address of the target command;
reading a target value stored in the preset storage address;
judging whether the preset dependency relationship value is the same as the target value;
if the preset dependency relationship value is different from the target value, judging that a dependency relationship exists between the target command and the command being executed at the current moment;
and if the preset dependency relationship value is the same as the target value, judging that no dependency relationship exists between the target command and the command being executed at the current moment.
5. The method of claim 1, wherein said determining whether said target command is a queue switch command comprises:
judging whether a preset identifier exists in the target command and whether a dependency relationship exists between the target command and a command being executed at the current moment;
if the preset identifier exists in the target command or a dependency relationship exists between the target command and a command being executed at the current moment, judging the target command as the queue switching command;
and if the preset identifier does not exist in the target command and the dependency relationship does not exist between the target command and the command being executed at the current moment, judging that the target command is not the queue switching command.
6. The method of claim 1, wherein before said reading a command in any of said command execution queues as a target command, the method further comprises:
setting a unique current command execution queue;
the reading the command in any command execution queue as a target command includes:
and reading the command in the current command execution queue as the target command.
7. The method of claim 6, wherein before the reading and executing commands in the other command execution queue, the method further comprises:
setting the other command execution queue as the new current command execution queue.
8. A command execution method, applied to a command execution device, comprising:
constructing a plurality of commands, constructing a plurality of queue switching commands in the plurality of commands according to the dependency relationship among the commands, and sending the plurality of commands to a graphics processor;
the queue switch command is for switching a command execution queue when read by the graphics processor.
9. The method of claim 8, wherein constructing a number of queue switch commands among the number of commands according to dependencies between the commands comprises:
acquiring a target command from the commands, wherein the target command is a command with a dependency relationship with any other command;
and inserting a preset identifier into the target command.
10. The method of claim 8, wherein constructing a number of queue switch commands among the number of commands according to dependencies between the commands comprises:
acquiring a target command from the commands, wherein the target command is a command with a dependency relationship with any other command;
inserting a preset dependency relationship value and a preset storage address into the target command;
and inserting the preset storage address into the command on which the target command depends so as to control the command on which the target command depends to write a preset dependency relationship value into the preset storage address when the execution is completed.
11. A graphics processor, comprising:
the system comprises a queue construction module, a queue management module and a queue management module, wherein the queue construction module is used for constructing a plurality of command execution queues, and each command execution queue comprises a plurality of commands;
the command execution module is used for reading commands in any command execution queue as target commands and judging whether the target commands are queue switching commands or not;
the command execution module is further configured to read and execute commands in other command execution queues when the target command is the queue switch command.
12. A command execution device, comprising:
the command construction module is used for constructing a plurality of commands;
the switching command construction module is used for constructing a plurality of queue switching commands in the plurality of commands according to the dependency relationship among the commands;
and the communication module is used for sending the commands to the graphics processor.
13. An electronic device, comprising:
a processor, a first memory communicatively coupled to the at least one processor, and a graphics processor, a second memory coupled to the graphics processor;
wherein the first memory stores instructions executable by the at least one processor to enable the at least one processor to perform the command execution method of any one of claims 9 to 11;
the second memory stores instructions executable by the graphics processor to enable the graphics processor to perform the command execution method of any one of claims 1 to 7.
14. A computer readable storage medium storing a computer program, wherein the computer program is executed by a graphics processor to implement the command execution method of any one of claims 1 to 7 or claims 9 to 11.
CN202211654158.1A 2022-12-22 2022-12-22 Command execution method, device, graphics processor, electronic device and storage medium Pending CN116185501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211654158.1A CN116185501A (en) 2022-12-22 2022-12-22 Command execution method, device, graphics processor, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211654158.1A CN116185501A (en) 2022-12-22 2022-12-22 Command execution method, device, graphics processor, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN116185501A true CN116185501A (en) 2023-05-30

Family

ID=86443318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211654158.1A Pending CN116185501A (en) 2022-12-22 2022-12-22 Command execution method, device, graphics processor, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN116185501A (en)

Similar Documents

Publication Publication Date Title
US10877766B2 (en) Embedded scheduling of hardware resources for hardware acceleration
US5673423A (en) Method and apparatus for aligning the operation of a plurality of processors
EP2962192B1 (en) System and method thereof to optimize boot time of computers having multiple cpus
US9678806B2 (en) Method and apparatus for distributing processing core workloads among processing cores
EP2579164B1 (en) Multiprocessor system, execution control method, execution control program
US20060161924A1 (en) Scheduling method, in particular context scheduling method, and device to be used with a scheduling method
US9086911B2 (en) Multiprocessing transaction recovery manager
CN110888727A (en) Method, device and storage medium for realizing concurrent lock-free queue
US8572355B2 (en) Support for non-local returns in parallel thread SIMD engine
US9513923B2 (en) System and method for context migration across CPU threads
CN104932933A (en) Spin lock acquisition method and apparatus
US20150143378A1 (en) Multi-thread processing apparatus and method for sequentially processing threads
US11934827B2 (en) Partition and isolation of a processing-in-memory (PIM) device
US8019982B2 (en) Loop data processing system and method for dividing a loop into phases
CN100514362C (en) Switch system with separate output and its method
CN116185501A (en) Command execution method, device, graphics processor, electronic device and storage medium
CN113867803A (en) Memory initialization device and method and computer system
US6493781B1 (en) Servicing of interrupts with stored and restored flags
US6253272B1 (en) Execution suspension and resumption in multi-tasking host adapters
US10366049B2 (en) Processor and method of controlling the same
US20100199284A1 (en) Information processing apparatus, self-testing method, and storage medium
US20060156291A1 (en) System and method for managing processor execution in a multiprocessor system
US20130166887A1 (en) Data processing apparatus and data processing method
US7523297B1 (en) Shadow scan decoder
US9921891B1 (en) Low latency interconnect integrated event handling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination