CN111708622A - Instruction group scheduling method, architecture, equipment and storage medium - Google Patents

Instruction group scheduling method, architecture, equipment and storage medium Download PDF

Info

Publication number
CN111708622A
CN111708622A CN202010482280.XA CN202010482280A CN111708622A CN 111708622 A CN111708622 A CN 111708622A CN 202010482280 A CN202010482280 A CN 202010482280A CN 111708622 A CN111708622 A CN 111708622A
Authority
CN
China
Prior art keywords
instruction
current
executed
write
groups
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010482280.XA
Other languages
Chinese (zh)
Other versions
CN111708622B (en
Inventor
王凯
周玉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202010482280.XA priority Critical patent/CN111708622B/en
Publication of CN111708622A publication Critical patent/CN111708622A/en
Application granted granted Critical
Publication of CN111708622B publication Critical patent/CN111708622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/484Precedence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a method, a structure, equipment and a storage medium for scheduling instruction groups, wherein the method comprises the following steps: dividing threads contained in an input thread group into different instruction groups, and sequentially determining each instruction group as a current instruction group according to the arrangement sequence of the instruction groups; taking out the current instruction to be executed from the current instruction group as the current instruction, and executing the current instruction by using the computing resource and the storage resource distributed for the current instruction; after the current instruction is taken out, predicting the instruction which needs to be executed after the current instruction is executed, determining the predicted instruction as a target instruction, reading instruction information of the target instruction, and distributing corresponding computing resources and storage resources for the target instruction based on the instruction information; the instruction information comprises operands and operators; and after the current instruction is executed, determining that the target instruction is the current instruction, and returning to the step of executing the current instruction by using the computing resources and the storage resources distributed for the current instruction. Thereby enabling a significant increase in instruction execution efficiency.

Description

Instruction group scheduling method, architecture, equipment and storage medium
Technical Field
The present invention relates to the field of instruction processing technologies, and in particular, to an instruction group scheduling method, architecture, device, and storage medium.
Background
For GPU scheduling, a GPGPU scheduling structure is usually adopted in the prior art, but after an instruction to be executed is obtained, the instruction is directly executed by the structure, but the inventor finds that the problem of low execution efficiency exists when the instruction is executed by using the scheme.
Disclosure of Invention
The invention aims to provide a method, a structure, equipment and a storage medium for scheduling an instruction group, which can effectively improve the execution efficiency of instructions.
In order to achieve the above purpose, the invention provides the following technical scheme:
an instruction group scheduling method, comprising:
dividing threads contained in an input thread group into different instruction groups, and sequentially determining each instruction group as a current instruction group according to the arrangement sequence of the instruction groups;
taking out the current instruction to be executed from the current instruction group as the current instruction, and executing the current instruction by using the computing resource and the storage resource distributed for the current instruction; after the current instruction is taken out, predicting the instruction which needs to be executed after the current instruction is executed, determining the predicted instruction as a target instruction, reading instruction information of the target instruction, and distributing corresponding computing resources and storage resources for the target instruction based on the instruction information; the instruction information comprises operands and operators;
and after the current instruction is executed, determining that the target instruction is the current instruction, and returning to the step of executing the current instruction by using the computing resources and the storage resources distributed for the current instruction.
Preferably, the method further includes, after dividing the threads included in the input thread group into different instruction groups:
analyzing the correlation among the instruction groups, and if the correlation does not exist among the instruction groups, sequencing the instruction groups according to the sequence of the priorities of the instruction groups from high to low; if any instruction group has correlation, the instruction groups are sorted according to the correlation, and the instruction groups except the instruction groups are sorted according to the order of the priorities from high to low.
Preferably, the number of the current instructions is multiple; executing the current instruction, including:
and judging whether read-after-write conflicts or write-after-write conflicts exist among the current instructions, if so, controlling the current instructions with the read-after-write conflicts or write-after-write conflicts to be sequentially executed by utilizing the computing resources and the storage resources distributed for the current instructions.
Preferably, the method further comprises the following steps:
monitoring each instruction being executed in real time, if the read-after-write conflict or the write-after-write conflict is monitored, suspending the instruction which is started to be executed later in the instructions which generate the read-after-write conflict or the write-after-write conflict, and after the instruction which is started to be executed first in the instructions which generate the read-after-write conflict or the write-after-write conflict is completely executed, executing the instruction which is started to be executed later.
An instruction set scheduling architecture comprising:
a thread processing module to: dividing threads contained in an input thread group into different instruction groups, and sequentially determining each instruction group as a current instruction group according to the arrangement sequence of the thread groups;
an instruction flow module to: taking out the current instruction to be executed from the current instruction group as the current instruction, predicting the instruction to be executed after the current instruction is executed, determining the predicted instruction as a target instruction, reading instruction information of the target instruction, and distributing corresponding computing resources and storage resources for the target instruction based on the instruction information; the instruction information comprises operands and operators;
an instruction execution module to: and after the current instruction is taken out, executing the current instruction by using the computing resources and the storage resources distributed for the current instruction, determining that the target instruction is the current instruction after the current instruction is executed, and returning to the step of executing the current instruction by using the computing resources and the storage resources distributed for the current instruction.
Preferably, the method further comprises the following steps:
an instruction set ordering module to: dividing threads contained in an input thread group into different instruction groups, analyzing the correlation among the instruction groups, and if the correlation does not exist among the instruction groups, sequencing the instruction groups according to the sequence of the priorities of the instruction groups from high to low; if any instruction group has correlation, the instruction groups are sorted according to the correlation, and the instruction groups except the instruction groups are sorted according to the order of the priorities from high to low.
Preferably, the instruction execution module comprises:
an instruction execution unit to: judging whether read-after-write conflicts or write-after-write conflicts exist among the current instructions, if so, controlling the current instructions with the read-after-write conflicts or write-after-write conflicts to be sequentially executed by utilizing computing resources and storage resources distributed for the current instructions; the number of current instructions is plural.
Preferably, the method further comprises the following steps:
a real-time monitoring module for: monitoring each instruction being executed in real time, if the read-after-write conflict or the write-after-write conflict is monitored, suspending the instruction which is started to be executed later in the instructions which generate the read-after-write conflict or the write-after-write conflict, and after the instruction which is started to be executed first in the instructions which generate the read-after-write conflict or the write-after-write conflict is completely executed, executing the instruction which is started to be executed later.
An instruction group scheduling apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the instruction set scheduling method as described in any one of the above when executing the computer program.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the instruction set scheduling method of any one of the above.
The invention provides a method, a structure, equipment and a storage medium for scheduling instruction groups, wherein the method comprises the following steps: dividing threads contained in an input thread group into different instruction groups, and sequentially determining each instruction group as a current instruction group according to the arrangement sequence of the instruction groups; taking out the current instruction to be executed from the current instruction group as the current instruction, and executing the current instruction by using the computing resource and the storage resource distributed for the current instruction; after the current instruction is taken out, predicting the instruction which needs to be executed after the current instruction is executed, determining the predicted instruction as a target instruction, reading instruction information of the target instruction, and distributing corresponding computing resources and storage resources for the target instruction based on the instruction information; the instruction information comprises operands and operators; and after the current instruction is executed, determining that the target instruction is the current instruction, and returning to the step of executing the current instruction by using the computing resources and the storage resources distributed for the current instruction. After the current instruction to be executed is determined, the instruction to be executed after the execution of the current instruction to be executed is finished can be predicted, then, the calculation resource and the storage resource required by the execution of the instruction are allocated to the instruction based on the operand and the operator of the instruction, further, the execution of the instruction can be directly realized based on the allocated calculation resource and storage resource when the instruction is executed, and obviously, the instruction execution efficiency can be greatly improved compared with the method that the resource required by the execution of any instruction is allocated to any instruction when the instruction is required to be executed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart of a method for instruction group scheduling according to an embodiment of the present invention;
FIG. 2 is a block diagram of a method for instruction set dispatch according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an instruction set scheduling architecture according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart of an instruction group scheduling method according to an embodiment of the present invention is shown, where the method includes:
s11: dividing the threads contained in the input thread group into different instruction groups, and sequentially determining each instruction group as the current instruction group according to the arrangement sequence of the instruction groups.
It should be noted that the execution main body of the instruction group scheduling method provided by the embodiment of the present invention may be a corresponding instruction group scheduling device, and the present application may be implemented on an FPGA platform by using Verilog hardware description language based on RISC-V architecture.
Dividing the threads contained in the input thread group into different instruction groups is consistent with the realization principle of the corresponding technical scheme in the prior art, specifically, after the thread groups are input, a software editor and an optimizer can be used for optimizing the running sequence of the thread groups, the thread groups are arranged according to the running sequence, corresponding labels such as 00, 01 and 02 … are added to each thread group in a hardware stage, then each thread group is separately stored, the threads contained in the thread groups are divided to obtain the corresponding instruction groups, each instruction group is rearranged according to the running sequence required to be realized, and then each instruction group is sequentially executed; and the storage is carried out by taking the instruction group as a unit, and compared with the storage by taking the thread group as a unit, the storage unit can be thinned, and further, the fragmentary cache space is fully utilized.
In addition, when the threads included in the thread group are divided into corresponding instruction groups, the division may be performed according to the size and the constraint of the instruction group set in advance, for example, 36 × 36 threads are set as one instruction group, and one thread group includes 72 × 72 threads, so that one thread group may be divided into 4 instruction groups; when dividing a thread group into 4 instruction groups, division is performed in accordance with a set matrix or matrix. After the splitting is complete, each instruction group may also be tagged with a corresponding warp-id.
S12: taking out the current instruction to be executed from the current instruction group as the current instruction, and executing the current instruction by using the computing resource and the storage resource distributed for the current instruction; after the current instruction is taken out, predicting the instruction which needs to be executed after the current instruction is executed, determining the predicted instruction as a target instruction, reading instruction information of the target instruction, and distributing corresponding computing resources and storage resources for the target instruction based on the instruction information; the instruction information includes operands and operators.
The current instruction group is the instruction group which needs to be operated currently, the instruction which needs to be executed currently is taken out from the current instruction group to be the current instruction, and then the current instruction is executed by utilizing the computing resource and the storage resource which are distributed for the current instruction; in addition, in order to realize the advance allocation of the computing resources and the storage resources, the method can predict the instruction which needs to be executed after the current instruction is taken out, further read the computing resources and the storage resources which are needed by the instruction in the execution process, allocate the corresponding computing resources and the storage resources for the instruction, and enable the instruction to quickly realize the operation; the instruction to be executed after the current instruction is executed may be predicted according to any preset prediction mode, for example, the number of times of executing each instruction after the current instruction is executed may be counted historically, and the instruction with the largest number of times is determined to be the predicted instruction to be executed after the current instruction is executed, and if the operation to be implemented of the current instruction is determined, the association between the operation to be implemented by the other instruction in the instruction group where the current instruction is located and the operation to be implemented by the current instruction is determined, and further, the instruction having the association between the operation to be implemented and the operation to be implemented by the current instruction (for example, the operation to lock a certain storage area and the operation to write to the storage area have the association) is determined to be the predicted instruction to be executed after the current instruction is executed, and of course, according to other settings required actually, are within the scope of the invention.
In addition, when the corresponding computing resources and storage resources are allocated to the target instruction based on the instruction information, the operand and the operator of the target instruction can be obtained, so that the computing resources and the storage resources required by the execution of the target instruction can be estimated based on the operand and the operator, and the required computing resources and the required storage resources are allocated to the target instruction; the calculation resources and storage resources required for predicting and executing the target instruction based on the operand and the operator can be calculation resources and storage resources required for counting the historical operands with the same number of bits and the same operator, and then the calculation resources and storage resources required for predicting and executing the target instruction are the maximum calculation resources and storage resources required for counting the historical operands with the same number of bits and the same operator, so that smooth execution of target execution is ensured, and other settings can be performed according to actual needs.
S13: and after the current instruction is executed, determining that the target instruction is the current instruction, and returning to the step of executing the current instruction by using the computing resources and the storage resources distributed for the current instruction.
After the current instruction is executed, taking out the next instruction needing the instruction as the current instruction, and if the current instruction is the target instruction, executing the current instruction according to the computing resource and the storage resource distributed to the current instruction, thereby achieving the purpose of quickly executing the instruction; otherwise, the required computing resources and storage resources may be allocated to the current instruction first, and then the execution of the current instruction is realized by using the allocated computing resources and storage resources.
According to the technical characteristics disclosed by the application, after the current instruction needing to be executed is determined, the instruction needing to be executed after the current instruction needing to be executed is executed can be predicted, the calculation resource and the storage resource which are needed by the execution of the instruction are distributed to the instruction based on the operand and the operator of the instruction, the execution of the instruction can be directly realized based on the distributed calculation resource and storage resource when the instruction is executed, and compared with the method that the resource which is needed by the execution of any instruction is distributed to the instruction when the instruction needs to be executed, the instruction execution efficiency can be obviously greatly improved.
It should be noted that, when predicting the target instruction, it can be implemented by SM/SFU/LU priority arbitration; specifically, the time required by execution of each instruction in the current instruction group, the priority (the higher the requirement on the timeliness of execution, the higher the priority) and the correlation among different instructions can be obtained; the principle of determining the correlation between instructions is the same as that of determining the correlation between instruction groups, and is not described herein again. When a target instruction is predicted, if an instruction (such as jump, call and the like) which has correlation with the current instruction exists, selecting the instruction as the target instruction, if the instruction which has correlation with the current instruction does not exist, selecting an instruction with the highest priority as the target instruction, and if a plurality of instructions with the highest priority exist, selecting the instruction with the shortest execution time as the target instruction; therefore, the purpose of meeting the current requirement during instruction prediction is achieved.
In addition, when the computing resources and the storage resources are allocated to the target instruction, the computing resources and the storage resources can comprise a plurality of nodes, so that the resource allocation can be realized in a load balancing manner, and the resource utilization rate is improved.
The method for scheduling an instruction group according to an embodiment of the present invention may further include, after dividing the threads included in the input thread group into different instruction groups:
analyzing the correlation among the instruction groups, and if the correlation does not exist among the instruction groups, sequencing the instruction groups according to the priority of the instruction groups from high to low; if any instruction group has correlation, the instruction groups are sorted according to the correlation, and the instruction groups except the instruction groups are sorted according to the priority from high to low.
Considering that different instruction groups may have different requirements on timeliness of execution, corresponding priorities may be set for the instruction groups (the higher the requirement on timeliness of execution is, the higher the priority is), so that the instruction group with the higher priority is preferentially executed; however, since there may be a correlation between any plurality of instruction groups, the instruction groups having a correlation need to be executed in the order in which the correlation requires execution, and the instruction groups having no correlation may be executed in a manner that the higher the priority is, the earlier the instruction groups are executed; the execution according to the sequence in which the dependency needs to be executed specifically means that operations of any instruction group may exist between some instruction groups to affect operations of other instruction groups, such as reading and writing the same data, modifying and deleting the same file, and at this time, the corresponding instruction groups need to be executed according to the required sequence to ensure effective implementation of corresponding operations of each instruction group; if the two instruction groups are used for reading and writing the same data respectively, in order to ensure the validity of the read data, the instruction group for writing the data is executed first, and then the instruction group for reading the data is executed. In the instruction group scheduling method provided by the embodiment of the invention, the number of the current instructions is multiple; executing the current instruction may include:
and judging whether read-after-write conflicts or write-after-write conflicts exist among the current instructions, if so, controlling the current instructions with the read-after-write conflicts or write-after-write conflicts to be sequentially executed by utilizing the computing resources and the storage resources distributed for the current instructions.
If a plurality of instructions need to be executed in parallel currently, whether a write-after-read conflict or a write-after-write conflict exists among the plurality of instructions can be judged, specifically, if any plurality of current instructions need to perform read operation and write operation on the same data respectively, then a write-after-read conflict exists at the moment, if any plurality of current instructions need to perform write operation on the same data, then a write-after-write conflict exists at the moment, in order to avoid the conflict, effective data operation is realized, in the application, if the conflict exists among the current instructions, the current instructions with the conflict are controlled to be executed in sequence, namely, the current instructions with the conflict do not execute in parallel any more, but only one instruction is executed at the same time, and the conflict is solved in such a way; in addition, when the current instruction with read-after-write conflict is controlled to be executed sequentially, the current instruction needing write operation can be executed first, and then the instruction needing read operation is executed, so that the read effective data is ensured.
The instruction group scheduling method provided by the embodiment of the invention can further comprise the following steps:
and monitoring each instruction which is being executed in real time, if the read-after-write conflict or the write-after-write conflict is monitored, suspending the instruction which is started to be executed later in the instructions which have the read-after-write conflict or the write-after-write conflict, and after the instruction which is started to be executed first in the instructions which have the read-after-write conflict or the write-after-write conflict is completely executed, executing the instruction which is started to be executed later.
In order to further ensure the validity of data operation, the method also monitors whether conflicts exist among the executed instructions in real time in the instruction execution process after judging whether write-read conflicts and read-write conflicts exist among a plurality of current instructions, if yes, the instructions with the conflicts are controlled to be sequentially executed according to the execution starting time of the instructions with the conflicts from early to late, namely, the instructions which are started to be executed later in the instructions with the write-read conflicts or the write-write conflicts are suspended, and the instructions which are started to be executed first in the instructions with the write-read conflicts or the write-write conflicts are executed again after the execution of the instructions which are started to be executed first in the instructions with the write-read conflicts or the write-write conflicts is finished.
In a specific application scenario, an architecture flowchart of the above technical solution disclosed in the present application may be as shown in fig. 2, and each function is as follows:
1. warp (instruction set) stack: controlling the execution, thread bifurcation and thread reunion point of the next instruction of a certain thread in warp (the thread bifurcation and the thread reunion have the same meaning as the corresponding concepts in the prior art);
2. warp scoreboard: checking write-after-write conflicts and read-after-write conflicts;
3. and (3) to-be-transmitted cache and DU interaction: caching warp to be transmitted and realizing the transmission of the instruction based on a transmitting module; and, determining the computational resources allocated to it based on the operands and operators;
4. pre-read operand: determining whether the LSU needs to be started or not by pre-reading the operand and the operator, namely whether the operand needs to be acquired and put into a cache by using the LSU or not; and, also based on operands and operators, determining the storage resources allocated to it;
5. and (3) instruction flow: the method comprises the steps of instruction fetching, prediction, SM/SFU/LU priority arbitration, PC decoding and write-back, wherein the value taking is to take out an instruction to be executed currently, the prediction and SM/SFU/LU priority arbitration are to predict the instruction to be executed after the instruction to be executed is executed completely, the PC decoding is to translate the instruction into information which can be identified by computing resources, and the write-back is to write back according to the arrangement form of a single warp after the single warp is completed;
6. PC _ CACHE: a cache for caching the instruction pc so as to obtain the instruction;
7. and (3) decoding: primarily decoding an instruction in warp to obtain a key bit in the instruction, determining simple correlation among the instructions based on the key bit, and further realizing instruction execution based on the simple correlation, wherein if a plurality of instructions do a large amount of repeated work, the plurality of instructions can be dispatched to make calculation faster; wherein, the correlation bits in the instructions are the bits of the corresponding operational characters and operational characters, and the correlation is whether the two warp instructions influence the operation result mutually or not;
8. thread buffering and warp queue sequencing internal insertion: and acquiring the correlation between the warps, and setting the arrangement sequence of the warps through the correlation.
Therefore, the functions of the thread in the architecture flowchart are thread buffering, warp queue sequencing internal insertion, decoding, PC caching, instruction pipelining, operand pre-reading, and resource allocation and emission based on the cache to be emitted and DU interaction.
An embodiment of the present invention further provides an instruction group scheduling architecture, as shown in fig. 3, which may include:
a thread processing module 11, configured to: dividing threads contained in an input thread group into different instruction groups, and sequentially determining each instruction group as a current instruction group according to the arrangement sequence of the thread groups;
an instruction pipeline module 12 for: taking out the current instruction to be executed from the current instruction group as the current instruction, predicting the instruction to be executed after the current instruction is executed, determining the predicted instruction as a target instruction, reading instruction information of the target instruction, and distributing corresponding computing resources and storage resources for the target instruction based on the instruction information; the instruction information comprises operands and operators;
an instruction execution module 13, configured to: and after the current instruction is taken out, executing the current instruction by using the computing resources and the storage resources distributed for the current instruction, determining that the target instruction is the current instruction after the current instruction is executed, and returning to the step of executing the current instruction by using the computing resources and the storage resources distributed for the current instruction.
The instruction group scheduling architecture provided in the embodiment of the present invention may further include:
an instruction set ordering module to: dividing the threads contained in the input thread groups into different instruction groups, analyzing the correlation among the instruction groups, and if the correlation does not exist among the instruction groups, sequencing the instruction groups according to the sequence of the priorities of the instruction groups from high to low; if any instruction group has correlation, the instruction groups are sorted according to the correlation, and the instruction groups except the instruction groups are sorted according to the priority from high to low.
In an instruction group scheduling architecture provided in an embodiment of the present invention, an instruction execution module may include:
an instruction execution unit to: judging whether read-after-write conflicts or write-after-write conflicts exist among the current instructions, if so, controlling the current instructions with the read-after-write conflicts or write-after-write conflicts to be sequentially executed by utilizing computing resources and storage resources distributed for the current instructions; the number of current instructions is plural.
The instruction group scheduling architecture provided in the embodiment of the present invention may further include:
a real-time monitoring module for: and monitoring each instruction which is being executed in real time, if the read-after-write conflict or the write-after-write conflict is monitored, suspending the instruction which is started to be executed later in the instructions which have the read-after-write conflict or the write-after-write conflict, and after the instruction which is started to be executed first in the instructions which have the read-after-write conflict or the write-after-write conflict is completely executed, executing the instruction which is started to be executed later.
An embodiment of the present invention further provides an instruction group scheduling apparatus, which may include:
a memory for storing a computer program;
a processor for implementing the steps of the instruction set scheduling method as described in any one of the above when executing the computer program.
The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the instruction set scheduling method are implemented.
It should be noted that, for the description of the relevant parts in the instruction set scheduling architecture, the device and the storage medium provided in the embodiments of the present invention, reference is made to the detailed description of the corresponding parts in the instruction set scheduling method provided in the embodiments of the present invention, and details are not described herein again. In addition, parts of the above technical solutions provided in the embodiments of the present invention that are consistent with the implementation principles of the corresponding technical solutions in the prior art are not described in detail, so as to avoid redundant description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for instruction group scheduling, comprising:
dividing threads contained in an input thread group into different instruction groups, and sequentially determining each instruction group as a current instruction group according to the arrangement sequence of the instruction groups;
taking out the current instruction to be executed from the current instruction group as the current instruction, and executing the current instruction by using the computing resource and the storage resource distributed for the current instruction; after the current instruction is taken out, predicting the instruction which needs to be executed after the current instruction is executed, determining the predicted instruction as a target instruction, reading instruction information of the target instruction, and distributing corresponding computing resources and storage resources for the target instruction based on the instruction information; the instruction information comprises operands and operators;
and after the current instruction is executed, determining that the target instruction is the current instruction, and returning to the step of executing the current instruction by using the computing resources and the storage resources distributed for the current instruction.
2. The method of claim 1, wherein after dividing the threads included in the input thread group into different instruction groups, further comprising:
analyzing the correlation among the instruction groups, and if the correlation does not exist among the instruction groups, sequencing the instruction groups according to the sequence of the priorities of the instruction groups from high to low; if any instruction group has correlation, the instruction groups are sorted according to the correlation, and the instruction groups except the instruction groups are sorted according to the order of the priorities from high to low.
3. The method of claim 2, wherein the number of current instructions is plural; executing the current instruction, including:
and judging whether read-after-write conflicts or write-after-write conflicts exist among the current instructions, if so, controlling the current instructions with the read-after-write conflicts or write-after-write conflicts to be sequentially executed by utilizing the computing resources and the storage resources distributed for the current instructions.
4. The method of claim 3, further comprising:
monitoring each instruction being executed in real time, if the read-after-write conflict or the write-after-write conflict is monitored, suspending the instruction which is started to be executed later in the instructions which generate the read-after-write conflict or the write-after-write conflict, and after the instruction which is started to be executed first in the instructions which generate the read-after-write conflict or the write-after-write conflict is completely executed, executing the instruction which is started to be executed later.
5. An instruction set scheduling architecture, comprising:
a thread processing module to: dividing threads contained in an input thread group into different instruction groups, and sequentially determining each instruction group as a current instruction group according to the arrangement sequence of the thread groups;
an instruction flow module to: taking out the current instruction to be executed from the current instruction group as the current instruction, predicting the instruction to be executed after the current instruction is executed, determining the predicted instruction as a target instruction, reading instruction information of the target instruction, and distributing corresponding computing resources and storage resources for the target instruction based on the instruction information; the instruction information comprises operands and operators;
an instruction execution module to: and after the current instruction is taken out, executing the current instruction by using the computing resources and the storage resources distributed for the current instruction, determining that the target instruction is the current instruction after the current instruction is executed, and returning to the step of executing the current instruction by using the computing resources and the storage resources distributed for the current instruction.
6. The architecture of claim 5, further comprising:
an instruction set ordering module to: dividing threads contained in an input thread group into different instruction groups, analyzing the correlation among the instruction groups, and if the correlation does not exist among the instruction groups, sequencing the instruction groups according to the sequence of the priorities of the instruction groups from high to low; if any instruction group has correlation, the instruction groups are sorted according to the correlation, and the instruction groups except the instruction groups are sorted according to the order of the priorities from high to low.
7. The architecture of claim 6, wherein the instruction execution module comprises:
an instruction execution unit to: judging whether read-after-write conflicts or write-after-write conflicts exist among the current instructions, if so, controlling the current instructions with the read-after-write conflicts or write-after-write conflicts to be sequentially executed by utilizing computing resources and storage resources distributed for the current instructions; the number of current instructions is plural.
8. The architecture of claim 7, further comprising:
a real-time monitoring module for: monitoring each instruction being executed in real time, if the read-after-write conflict or the write-after-write conflict is monitored, suspending the instruction which is started to be executed later in the instructions which generate the read-after-write conflict or the write-after-write conflict, and after the instruction which is started to be executed first in the instructions which generate the read-after-write conflict or the write-after-write conflict is completely executed, executing the instruction which is started to be executed later.
9. An instruction group scheduling apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the instruction set scheduling method according to any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the instruction set scheduling method according to any one of claims 1 to 4.
CN202010482280.XA 2020-05-28 2020-05-28 Instruction group scheduling method, architecture, equipment and storage medium Active CN111708622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010482280.XA CN111708622B (en) 2020-05-28 2020-05-28 Instruction group scheduling method, architecture, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010482280.XA CN111708622B (en) 2020-05-28 2020-05-28 Instruction group scheduling method, architecture, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111708622A true CN111708622A (en) 2020-09-25
CN111708622B CN111708622B (en) 2022-06-10

Family

ID=72537432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010482280.XA Active CN111708622B (en) 2020-05-28 2020-05-28 Instruction group scheduling method, architecture, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111708622B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5872951A (en) * 1996-07-26 1999-02-16 Advanced Micro Design, Inc. Reorder buffer having a future file for storing speculative instruction execution results
CN1542607A (en) * 2003-04-21 2004-11-03 �Ҵ���˾ Simultaneous multithread processor and method for improving performance
CN101606130A (en) * 2007-02-06 2009-12-16 国际商业机器公司 Enable the method and apparatus of resource allocation identification in the instruction-level of processor system
CN107277125A (en) * 2017-06-13 2017-10-20 网宿科技股份有限公司 File prefetched instruction method for pushing, device and file pre-fetching system
CN108595258A (en) * 2018-05-02 2018-09-28 北京航空航天大学 A kind of GPGPU register files dynamic expansion method
CN108829457A (en) * 2018-05-29 2018-11-16 Oppo广东移动通信有限公司 Application program prediction model update method, device, storage medium and terminal
US20190056950A1 (en) * 2017-08-18 2019-02-21 International Business Machines Corporation Determining and predicting affiliated registers based on dynamic runtime control flow analysis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5872951A (en) * 1996-07-26 1999-02-16 Advanced Micro Design, Inc. Reorder buffer having a future file for storing speculative instruction execution results
CN1542607A (en) * 2003-04-21 2004-11-03 �Ҵ���˾ Simultaneous multithread processor and method for improving performance
CN101606130A (en) * 2007-02-06 2009-12-16 国际商业机器公司 Enable the method and apparatus of resource allocation identification in the instruction-level of processor system
CN107277125A (en) * 2017-06-13 2017-10-20 网宿科技股份有限公司 File prefetched instruction method for pushing, device and file pre-fetching system
US20190056950A1 (en) * 2017-08-18 2019-02-21 International Business Machines Corporation Determining and predicting affiliated registers based on dynamic runtime control flow analysis
US20190056945A1 (en) * 2017-08-18 2019-02-21 International Business Machines Corporation Determining and predicting affiliated registers based on dynamic runtime control flow analysis
CN108595258A (en) * 2018-05-02 2018-09-28 北京航空航天大学 A kind of GPGPU register files dynamic expansion method
CN108829457A (en) * 2018-05-29 2018-11-16 Oppo广东移动通信有限公司 Application program prediction model update method, device, storage medium and terminal

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
H. HOMAYOUN ET AL: "Thread scheduling based on low-quality instruction prediction for simultaneous multithreaded processors", 《THE 3RD INTERNATIONAL IEEE-NEWCAS CONFERENCE, 2005.》 *
方娟等: "一种改进的多核处理器硬件预取技术", 《计算机科学》 *
蔡卫光等: "RISC-DSP处理器中指令数据相关性的提前判断方法", 《电子与信息学报》 *
顾青等: "基于GPGPU和CUDA的高速AES算法的实现和优化", 《中国科学院研究生院学报》 *

Also Published As

Publication number Publication date
CN111708622B (en) 2022-06-10

Similar Documents

Publication Publication Date Title
US8082420B2 (en) Method and apparatus for executing instructions
US7418576B1 (en) Prioritized issuing of operation dedicated execution unit tagged instructions from multiple different type threads performing different set of operations
KR101638225B1 (en) Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
JP5177141B2 (en) Arithmetic processing device and arithmetic processing method
US9652243B2 (en) Predicting out-of-order instruction level parallelism of threads in a multi-threaded processor
EP1916601A2 (en) Multiprocessor system
KR101730282B1 (en) Select logic using delayed reconstructed program order
US9875139B2 (en) Graphics processing unit controller, host system, and methods
US8997071B2 (en) Optimized division of work among processors in a heterogeneous processing system
KR20140018946A (en) Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
KR20140018945A (en) Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9268595B2 (en) Scheduling thread execution based on thread affinity
CN110308982B (en) Shared memory multiplexing method and device
CN108549574A (en) Threading scheduling management method, device, computer equipment and storage medium
CN107729267B (en) Distributed allocation of resources and interconnect structure for supporting execution of instruction sequences by multiple engines
Yu et al. Smguard: A flexible and fine-grained resource management framework for gpus
CN111708639A (en) Task scheduling system and method, storage medium and electronic device
US11669366B2 (en) Reduction of a number of stages of a graph streaming processor
US11875425B2 (en) Implementing heterogeneous wavefronts on a graphics processing unit (GPU)
US20150212859A1 (en) Graphics processing unit controller, host system, and methods
CN111708622B (en) Instruction group scheduling method, architecture, equipment and storage medium
CN116795503A (en) Task scheduling method, task scheduling device, graphic processor and electronic equipment
CN117093335A (en) Task scheduling method and device for distributed storage system
KR102210765B1 (en) A method and apparatus for long latency hiding based warp scheduling
US11734065B2 (en) Configurable scheduler with pre-fetch and invalidate threads in a graph stream processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant