CN113721987A - Instruction execution method and instruction execution device - Google Patents

Instruction execution method and instruction execution device Download PDF

Info

Publication number
CN113721987A
CN113721987A CN202111027116.0A CN202111027116A CN113721987A CN 113721987 A CN113721987 A CN 113721987A CN 202111027116 A CN202111027116 A CN 202111027116A CN 113721987 A CN113721987 A CN 113721987A
Authority
CN
China
Prior art keywords
memory
instruction
execution
group
barrier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111027116.0A
Other languages
Chinese (zh)
Other versions
CN113721987B (en
Inventor
卢一帆
潘于
王成卉
陈丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202111027116.0A priority Critical patent/CN113721987B/en
Publication of CN113721987A publication Critical patent/CN113721987A/en
Application granted granted Critical
Publication of CN113721987B publication Critical patent/CN113721987B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Advance Control (AREA)

Abstract

An instruction execution method and an instruction execution apparatus. The instruction execution method is applied to a thread group, the thread group comprises a plurality of execution groups, each execution group can execute a first memory execution instruction and a first memory barrier instruction, and the first memory barrier instruction is used for blocking the first memory execution instruction. The instruction execution method comprises the following steps: a first memory barrier instruction of a first execution group of the plurality of execution groups is executed. Executing the first memory barrier instruction of the first execution group includes: obtaining first memory type information in a first memory barrier instruction of a first execution group and a corresponding first thread group waiting value according to the first memory barrier instruction of the first execution group; acquiring a first thread group memory instruction count value; in response to the first thread group memory instruction count value being greater than the first thread group wait value, indicating the first execution group not to continue executing instructions; and instructing the first execution group to continue executing instructions in response to the first thread group memory instruction count value being less than or equal to the first thread group wait value.

Description

Instruction execution method and instruction execution device
Technical Field
Embodiments of the present disclosure relate to the field of general-purpose graphics processors, and more particularly, to an instruction execution method and an instruction execution apparatus.
Background
A Graphics Processor (GPU) comprises a plurality of computing units, each computing Unit comprises a plurality of Single Instruction Multiple Data (SIMD) structures, and some mechanisms further comprise a local data sharing memory. Each SIMD includes a set of vector general purpose registers and Arithmetic Logic Units (ALUs). SIMD is the smallest unit in a GPU that performs parallel computations, and can control multiple threads to perform the same operation simultaneously by executing one instruction. General-purpose graphics processors (GPGPU) are GPUs for General-purpose computing that take advantage of the high concurrent computing power of a graphics processor to perform General-purpose computing tasks that would otherwise be processed by a central processing unit. With the rapid development of the internet industry, the emerging field of artificial intelligence and the subsequent innovation of the traditional industries such as aerospace, weather prediction, monitoring and security and the like, the solution of the big data processing problem by replacing a CPU with a GPGPU has become one of the main trends.
Disclosure of Invention
At least one embodiment of the present disclosure provides an instruction execution method applied to a thread group. The thread group comprises a plurality of execution groups, each of the plurality of execution groups executes at least one first memory execution instruction and a first memory barrier instruction, the first memory barrier instruction is used for blocking the at least one first memory execution instruction inside the thread group, and the plurality of execution groups comprises a first execution group, and the instruction execution method comprises the following steps: supervising the execution of all first memory execution instructions in the plurality of execution groups to obtain a first supervision result; a first memory barrier instruction of a first execution group is executed. The first supervision result includes a first thread group memory instruction count value corresponding to a first memory barrier instruction of a first execution group, and the execution of the first memory barrier instruction of the first execution group includes: obtaining first memory type information in a first memory barrier instruction of a first execution group and a first thread group waiting value corresponding to the first memory type information according to the first memory barrier instruction of the first execution group; acquiring a first thread group memory instruction count value; in response to the first thread group memory instruction count value being greater than the first thread group wait value, indicating to the first execution group not to continue executing instructions subsequent to the first memory barrier instruction of the first execution group; and in response to the first thread group memory instruction count value being less than or equal to the first thread group wait value, instructing the first execution group to continue executing instructions subsequent to the first memory barrier instruction of the first execution group.
At least one embodiment of the present disclosure provides an instruction execution apparatus for use in a thread group. The thread group comprises a plurality of execution groups, the plurality of execution groups comprise a first execution group, the plurality of execution groups each execute at least one first memory execution instruction and a first memory barrier instruction, the first memory barrier instruction is used for blocking at least one first memory execution instruction in the thread group, the instruction execution device comprises an instruction monitoring unit and a barrier instruction execution unit, the instruction monitoring unit is configured to monitor the execution of all the first memory execution instructions in the plurality of execution groups to obtain a first monitoring result, the first monitoring result comprises a first thread group memory instruction counting value corresponding to the first memory barrier instruction of the first execution group, the barrier instruction execution unit is configured to execute the first memory barrier instruction of the first execution group, and the barrier instruction execution unit comprises an instruction fetching module, a decoding module, an obtaining module and an execution module, the fetch module is configured to fetch a first memory barrier instruction of a first execution group; the decoding module is configured to analyze the first memory barrier instruction of the first execution group to obtain first memory type information in the first memory barrier instruction of the first execution group and a first thread group waiting value corresponding to the first memory type information; the obtaining module is configured to obtain a first thread group memory instruction count value; the execution module is configured to: in response to the first thread group memory instruction count value being greater than the first thread group wait value, indicating to the first execution group not to continue executing instructions subsequent to the first memory barrier instruction of the first execution group; in response to the first thread group memory instruction count value being less than or equal to the first thread group wait value, instructing the first execution group to continue executing instructions subsequent to the first memory barrier instruction of the first execution group.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.
FIG. 1A is a block diagram of a general purpose graphics processor;
FIG. 1B is a schematic diagram of a dispatch module execution pipeline;
FIG. 1C is a schematic diagram of a portion of a register in a scheduling module according to some embodiments of the present disclosure;
FIG. 2 is a schematic flow chart diagram of a method of instruction execution according to some embodiments of the present disclosure;
FIG. 3 is a schematic flow chart diagram of step S20 of the instruction execution method shown in FIG. 2;
FIG. 4A is a diagram illustrating a barrier register set according to some embodiments of the present disclosure;
FIG. 4B is a schematic diagram of a first execution group and a second execution group provided by some embodiments of the present disclosure;
fig. 5 is a schematic diagram illustrating information transfer between a scheduling module and a memory-related module according to some embodiments of the present disclosure;
FIG. 6A is a diagram illustrating another barrier register set according to some embodiments of the present disclosure;
FIG. 6B is a schematic diagram of another first execution group and a second execution group provided by some embodiments of the present disclosure; and
fig. 7 is a schematic diagram of an instruction execution apparatus according to at least one embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described below clearly and completely with reference to the accompanying drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without any inventive step, are within the scope of protection of the disclosure.
Unless otherwise defined, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
FIG. 1A is a block diagram of a general-purpose graphics processor.
In parallel computing, the task being executed typically includes multiple threads (workitem). As shown in fig. 1A, before executing in a general-purpose graphics processor (or referred to as a parallel computing processor), the threads are divided into multiple thread groups (workgroups) in a command processor, and then the multiple thread groups are distributed to respective computing units via a thread group distribution unit, where all threads in a thread group must be distributed to the same computing unit for execution. Meanwhile, the thread group is split into minimum execution thread groups (hereinafter referred to as execution groups), each of which contains a fixed number (or less than the fixed number) of threads, for example, 32 threads. Multiple thread groups may execute in the same compute unit. In the computing unit, according to the number of the arithmetic logic units and other modules in the computing unit, a plurality of execution groups in one thread group can be executed simultaneously or in a time-sharing manner. The multiple threads in each execution group execute the same instruction, the reading, decoding and transmitting of the instruction are completed in the scheduling module, the execution of the calculation instruction is completed in the arithmetic logic unit, and the memory execution instruction is transmitted to the cache to perform read-write operation.
When an execution group executes a compute instruction, the source data (from the general purpose register) needed may come from a previous memory read instruction, and a wait instruction (wait) is needed to ensure that the data read back by the previous memory read instruction is ready. In the same thread group, the synchronous relation exists among a plurality of execution groups. For example, a thread group includes 2 execution groups, i.e., execution group 0 and execution group 1, and the 2 execution groups need to read data in the memory area a for calculation. In order to save the reading time and bandwidth, a general optimization method is to perform group 0 to read half of the data in the memory area a, and perform group 1 to read the other half of the data in the memory area a. However, for any execution group, all data of the memory area a is needed when the execution group executes the computation instruction, so before the execution group executes the computation instruction, it is necessary to wait for the memory read instructions of the execution group 0 and the execution group 1 to read back the data, and at this time, a barrier instruction (barrier) is needed to prevent the execution group from continuing to execute the instruction until the memory read instructions of the execution group 0 and the execution group 1 end to obtain all data of the memory area a.
FIG. 1B is a schematic diagram of a pipeline for execution of a dispatch module, and FIG. 1C is a schematic diagram of a portion of a register in a dispatch module according to some embodiments of the present disclosure.
The existing scheduling module execution pipeline is roughly shown in fig. 1B, and scheduling modules of different architectures may have different implementations, and fig. 1B is only a schematic diagram.
Multiple execution groups may simultaneously grab instructions (fetch) and resolve instructions (decode) in the scheduling module. In the decode stage, if a barrier instruction or a memory wait instruction is found, it is determined whether to advance a Program Counter (PC) to execute the next instruction, for example, the line with the letter a shown in FIG. 1B, according to whether a condition is satisfied. In the issue stage, according to the type and number of execution modules, an appropriate execution group is selected according to certain arbitration logic from multiple execution groups to issue instructions to the execution modules for execution, and the PC for fetching is advanced, for example, the line with the letter B shown in fig. 1B.
The existing barrier instruction can force all execution groups of the same thread group to execute the barrier instruction, and then the subsequent instructions can be executed continuously. In the scheduling module, a certain number of registers are reserved for the barrier mechanism. As shown in fig. 1C, 4 register sets may be reserved, the 4 register sets corresponding to barrier 0, barrier 1, barrier 2, and barrier 3, respectively. The compute unit in which the dispatch module resides can support 4 different thread groups to process barrier instructions within their respective internal thread groups. If a thread group can be divided into 4 execution groups, when the execution groups are divided, a group of available barrier register sets (such as barrier 0) is selected, the total number of execution groups is set to 4, the number of current execution groups is set to 0, and barrier ids corresponding to all execution groups in the thread group are set to barrier 0.
When the barrier instruction of the execution group is executed in the decoding stage, the current execution group number stored in the barrier register set corresponding to the barrier id corresponding to the execution group is increased by 1, and the execution group is blocked from executing the instruction continuously (i.e., the PC does not advance). When the current execution group number is equal to the total number of execution groups, it indicates that all execution groups of the thread group have executed the barrier instruction. At this time, the scheduling module advances the PCs of all the execution groups of the thread group, and clears the current execution group number stored in the barrier register group corresponding to the execution group, so as to perform the next barrier.
The memory wait instruction is used for waiting for completion of the memory execution instruction in the execution group. As shown in fig. 1C, the dispatch module allocates a memory counter for each execution group to record a memory count value. When the memory execution instruction is transmitted, the memory count value in the memory counter of the corresponding execution group is increased by 1, and when the memory execution instruction is completed, the memory count value in the memory counter is decreased by 1. When the memory waiting instruction is executed, a waiting value is set according to the immediate number in the memory waiting instruction, and when the memory count value is greater than the waiting value, the execution group is always blocked to continue executing the subsequent instructions. For example, when an execution group executes 2 memory execution instructions, the memory count value is 2, and then the execution group executes a memory wait instruction again, if the immediate value in the memory wait instruction is 0, the wait value corresponding to the execution group is 0, since the memory count value (i.e. 2) is greater than the wait value (i.e. 0), the execution group cannot continue to execute the instructions after the memory wait instruction, and when the memory count value needs to be 0, the PC is advanced, so that the execution group can execute the instructions after the memory wait instruction.
Barrier instructions are typically used in conjunction with memory wait instructions to achieve the memory barrier function, but are not precise memory barriers. Precise control of instruction execution is less than desired, possibly with little loss of performance. For example, assuming that a thread group includes 2 execution groups, consider the instruction sequence shown in table 1 below (the instructions of the execution groups in the same computing task are the same, and in the case of no branch, the sequence of execution instructions of the execution groups is the same):
table 1:
execution group 0 Execution group 1
Memory execution instruction block Memory execution instruction block
Other instruction Block 0 Other instruction Block 0
Memory wait instruction(0) Memory waiting instruction (0)
Barrier instructions Barrier instructions
Other instruction Block 1 Other instruction Block 1
In the above table, the "memory wait instruction (0)" indicates that the immediate number in the memory wait instruction is 0. "memory execute instruction block" includes at least one memory execute instruction, and "other instruction block 0"/"other instruction block 1" means a non-memory execute instruction block and includes at least one other instruction.
For example, while executing the memory wait instruction, execution group 0 waits for all memory execution instructions of the memory execution instruction block corresponding to execution group 0 to complete, at which point the memory wait instruction completes execution, and then execution group 0 subsequently executes the barrier instruction. At this point, if execution group 0 executes faster than execution group 1, i.e., in time, execution group 0 begins executing the barrier instruction earlier than execution group 1, so that execution group 0 waits for execution group 1 to execute the barrier instruction before beginning execution of instructions in other instruction block 1 of execution group 0. Before executing the barrier instruction in the execution group 1, the other instruction block 0 in the execution group 1 is executed, the memory wait instruction is executed, and then all the memory execution instructions in the execution group 1 are completely executed, at this time, the execution group 1 starts to execute the barrier instruction. Thus, when execution group 0 ends the barrier, all memory execution instructions in the memory execution instruction blocks in execution group 0 and execution group 1 must have completed execution. However, for the execution group 0, it is desirable to wait for the memory execution instruction in the memory execution instruction block corresponding to the execution group 0 to be executed completely and the memory execution instruction in the memory execution instruction block corresponding to the execution group 1 to be executed completely, without considering whether the instructions in the other instruction blocks 0 in the execution group 1 are executed completely. Based on the memory barrier method, the work efficiency of the execution group 0 is reduced, and the execution time of the execution groups in the same thread group affects each other, resulting in low calculation efficiency.
In view of the above drawbacks, embodiments of the present disclosure provide an instruction execution method and an instruction execution apparatus to implement a memory barrier mechanism, so that when executing a memory barrier, only all relevant memory execution instructions of an execution group in the same thread group need to be waited, and no other types of instructions need to be waited, so that instructions after the barrier can be executed earlier, thereby improving the synchronization accuracy and the instruction execution performance to a certain extent.
At least one embodiment of the present disclosure provides an instruction execution method and an instruction execution apparatus. The instruction execution method can be applied to a thread group, the thread group comprises a plurality of execution groups, each of the plurality of execution groups can execute at least one first memory execution instruction and a first memory barrier instruction, the first memory barrier instruction is used for blocking at least one first memory execution instruction inside the thread group, and the plurality of execution groups comprise a first execution group, and the instruction execution method comprises the following steps: the method includes monitoring execution of all first memory execution instructions in a plurality of execution groups to obtain a first monitoring result, and executing first memory barrier instructions of the first execution group. The first supervision result includes a first thread group memory instruction count value corresponding to a first memory barrier instruction of the first execution group. Executing the first memory barrier instruction of the first execution group includes: obtaining first memory type information in a first memory barrier instruction of a first execution group and a first thread group waiting value corresponding to the first memory type information according to the first memory barrier instruction of the first execution group; acquiring a first thread group memory instruction count value; in response to the first thread group memory instruction count value being greater than the first thread group wait value, indicating to the first execution group not to continue executing instructions subsequent to the first memory barrier instruction of the first execution group; in response to the first thread group memory instruction count value being less than or equal to the first thread group wait value, instructing the first execution group to continue executing instructions subsequent to the first memory barrier instruction of the first execution group.
Based on the instruction execution method, when the execution group executes the memory barrier instruction, the execution of the non-memory execution instruction does not need to be waited, so that the instructions after the barrier can be executed earlier, the execution and the waiting of the instructions can be controlled more accurately, and the execution efficiency of the instructions can be improved to a certain extent.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings, but the present disclosure is not limited to these specific embodiments. To maintain the following description of the embodiments of the present disclosure clear and concise, a detailed description of some known functions and components have been omitted from the present disclosure.
It should be noted that, in the embodiments of the present disclosure, the "memory execution instruction" means a memory instruction.
Fig. 2 is a schematic flowchart of an instruction execution method according to some embodiments of the present disclosure, and fig. 3 is a schematic flowchart of step S20 of the instruction execution method shown in fig. 2.
For example, the instruction execution method is applied to a thread group, and the thread group includes a plurality of execution groups, that is, the plurality of execution groups belong to the same thread group. The plurality of execution groups may each execute at least one first memory execution instruction and a first memory barrier instruction for blocking the at least one first memory execution instruction within the thread group. It should be noted that the number of execution groups included in a thread group may be set according to an actual hardware circuit condition, and embodiments of the present disclosure do not limit this.
For example, within each execution group, at least one first memory execution instruction and a first memory barrier instruction are executed in an order. For example, in some examples, the first memory execution instructions are transmitted in the same order as the first memory execution instructions are completed, that is, the first memory execution instructions transmitted earlier are transmitted earlier than the first memory execution instructions transmitted later, and the first memory execution instructions transmitted earlier are also executed earlier than the first memory execution instructions transmitted later.
It should be noted that, in the present disclosure, the phrase "a first memory barrier instruction is used for blocking at least one first memory execution instruction within a thread group" indicates that the first memory barrier instruction may control whether to continue executing instructions subsequent to the first memory barrier instruction based on the execution status of the first memory execution instruction prior to the first memory barrier instruction. For example, in some examples, if the wait value for the first memory barrier instruction is 0, the first memory barrier instruction may cause all first memory execute instructions preceding the first memory barrier instruction to be executed before the first memory execute instruction begins executing instructions following the first memory barrier instruction. It should be noted that if there is no memory execute instruction before the memory barrier instruction, the memory barrier instruction may also work normally, i.e., execute the following instruction directly.
For example, in one example, a thread group includes an execution group that can execute 4 first memory execute instructions and a first memory barrier instruction, the 4 first memory execute instructions preceding the first memory barrier instruction, i.e., the 4 first memory execute instructions are executed prior to the first memory barrier instruction. The first memory barrier instruction corresponds to a wait value of 2. When the first memory barrier instruction is executed, if the first two memory execute instructions of the 4 memory execute instructions are completed, then the instructions following the first memory barrier instruction may continue to be executed. For example, when the first memory barrier instruction is executed, at a first time, only a first memory execution instruction of the 4 first memory execution instructions is executed, and the execution group PC needs to be blocked from advancing, that is, the instructions behind the first memory barrier instruction cannot be executed continuously; at a second time, when both the first and second of the 4 first memory execution instructions are completed, the PC of the execution group may advance to begin executing instructions subsequent to the first memory barrier instruction.
For example, each execution group includes multiple threads, e.g., 32 threads, 64 threads, etc. The present disclosure does not limit the number of threads in an execution group.
For example, the plurality of execution groups includes a first execution group.
For example, as shown in FIG. 2, the instruction execution method includes the following steps S10-S20.
Step S10: the execution of all first memory execution instructions in the plurality of execution groups is supervised to obtain a first supervision result.
Step S20: a first memory barrier instruction of a first execution group is executed.
For example, in step S10, the first supervision result includes a first thread group memory instruction count value corresponding to the first memory barrier instruction of the first execution group. In step S20, a first memory barrier instruction of a first execution group may be executed based on the first supervision result obtained in step S210.
For example, in an embodiment of the present disclosure, each memory barrier instruction (e.g., the first memory barrier instruction described above) may include, but is not limited to, the fields shown in table 2 below (OP is Operation, which is used to distinguish different instructions).
Table 2:
OP memory type information Wait for count
For example, referring to table 3, one memory barrier instruction may wait for multiple types of memory execution instructions at the same time, so that a wait count (assuming 3 types) may be set for different memory types at the same time in each memory barrier instruction.
Table 3:
OP wait for count 0 Wait for count 1 Wait count 2
Wait count 0, wait count 1, and wait count 2 correspond to 3 different types of memory execution instructions, respectively.
For example, as shown in fig. 3, step S20 may include the following steps S201 to S204.
Step S201: according to the first memory barrier instruction of the first execution group, first memory type information in the first memory barrier instruction of the first execution group and a first thread group waiting value corresponding to the first memory type information are obtained.
For example, in step S201, the first thread group wait value corresponds to the first memory type information, i.e., different memory type information corresponds to different thread group wait values. The thread group waiting value corresponding to each memory type information can be set and stored in advance according to the actual situation. For example, a certain field of each memory barrier instruction may store a thread group wait value corresponding to the memory type information in the memory barrier instruction, for example, the thread group wait value in the memory barrier instruction may be stored in an immediate manner, or may be stored in another suitable manner.
Step S202: a first thread group memory instruction count value is obtained.
For example, in step S202, the first thread group memory instruction count value represents a maximum value of a plurality of first outstanding instruction numbers corresponding to a plurality of execution groups respectively in the current cycle, and the first outstanding instruction number corresponding to each execution group represents a number of first memory execution instructions that have been issued but have not yet been executed in the current cycle by each execution group. The first thread group memory instruction count value may indicate execution of a first memory execution instruction in a plurality of execution groups in the thread group.
For example, the current cycle may represent a time period in which all instructions in the first execution group are executed once.
For example, in step S202, the first thread group memory instruction count value represents a first thread group memory instruction count value stored in the barrier register group acquired at the time of executing the first memory barrier instruction. It should be noted that, at different times, the first thread group memory instruction count values stored in the barrier register sets may be different, and in addition, the first thread group memory instruction count values do not depend on the memory barrier instructions, and even if there are no memory barrier instructions, the first thread group memory instruction count values need to be counted normally.
For example, in some embodiments, the thread group includes a first execution group and a second execution group, the first execution group may execute 4 first memory execution instructions, and the 4 first memory execution instructions of the first execution group precede the first memory barrier instruction of the first execution group, the second execution group also includes 4 first memory execution instructions, and the 4 second memory execution instructions of the second execution group precede the first memory barrier instruction of the second execution group, when the first memory barrier instruction of the first execution group is executed, the 4 first memory execution instructions of the first execution group and the 4 first memory execution instructions of the second execution group have been issued, if both the first memory execution instruction of the 4 first memory execution instructions of the first execution group and the first memory execution instruction of the 4 first memory execution instructions of the second execution group have been executed, at this time, if the number of the first outstanding instructions corresponding to the first execution group is 3, and the number of the first outstanding instructions corresponding to the second execution group is also 3, the memory instruction count value of the first thread group is 3; if the first and second first memory execution instructions of the 4 first memory execution instructions of the first execution group are both executed, and only the first memory execution instruction of the 4 first memory execution instructions of the second execution group is executed, at this time, the number of first unfinished instructions corresponding to the first execution group is 2, and the number of first unfinished instructions corresponding to the second execution group is 3, then the memory instruction count value of the first thread group is 3.
FIG. 4A is a diagram illustrating a barrier register set according to some embodiments of the present disclosure; fig. 4B is a schematic diagram of a first execution group and a second execution group according to some embodiments of the disclosure.
For example, step S202 includes: acquiring a first barrier identification code corresponding to the first execution group; determining a barrier register set corresponding to the first execution set based on the first barrier identification code, wherein the barrier register set comprises a first barrier storage area, and the first barrier storage area is used for storing a first thread group memory instruction count value; the first thread group memory instruction count value is read from the first barrier storage area.
For example, a set of registers for a memory barrier is added to the barrier register set, which may store at least one thread group memory instruction count value (e.g., a first thread group memory instruction count value as shown in fig. 4A) and at least one memory barrier count-pair group (e.g., a first memory barrier count-pair group as shown in fig. 4A, which includes four memory barrier count pairs a0-a 3). In the dispatch module, a current memory barrier index, a thread group wait value, and an execution group memory instruction count value (e.g., the first execution group memory instruction count value as shown in FIG. 4B) are added for each execution group. If the processor architecture includes multiple memory types, an execution group memory instruction count value, a thread group memory instruction count value, a memory barrier count pair group, and a thread group wait value may be added to the corresponding memory type.
For example, as shown in fig. 4A, the barrier register set includes a first barrier storage area 100, and the first barrier storage area 100 is used for storing a first thread set memory instruction count value.
For example, in the present disclosure, the memory type information corresponds to the thread group memory instruction count value one to one, if the first execution group includes 3 types of memory execution instructions, three thread group memory instruction count values corresponding to the 3 types of memory execution instructions one to one need to be stored in the barrier memory group corresponding to the first execution group, and the thread group memory instruction count value corresponding to the memory type information can be found and read in the barrier memory group based on the memory type information in the instructions.
For example, the barrier register set corresponds to a thread group, that is, all the execution groups in the thread group correspond to one barrier register set, for example, as shown in fig. 1C, thread group 0 corresponds to barrier register set 0 (i.e., barrier 0), thread group 1 corresponds to barrier register set 1 (i.e., barrier 1), thread group 2 corresponds to barrier register set 2 (i.e., barrier 2), and so on.
For example, the correspondence between the barrier register group and the thread group may be set in the process of dividing each execution group in the thread group, and the barrier identification code corresponding to each execution group is set as the identification code of the barrier register group corresponding to the thread group. As shown in FIG. 1C, for thread group 0, the barrier ids (i.e., barrier identification codes) corresponding to the four execution groups 0-3 included in the thread group 0 are all 0, i.e., the identification codes of the barrier register group 0. If the thread group is the thread group 0 shown in fig. 1C, the first barrier identification code corresponding to the first execution group is 0, and if the thread group is the thread group 1 shown in fig. 1C, the first barrier identification code corresponding to the first execution group is 1 (i.e., the identification code of the barrier register group 1).
For example, as shown in fig. 4A, the barrier register set further includes a second barrier storage area 110, where the second barrier storage area 110 is configured to store a first memory barrier count pair group corresponding to the first memory type information, where the first memory barrier count pair group includes at least one memory barrier count pair, and each memory barrier count pair includes an instruction issue number and an instruction complete number. The first memory barrier count pair group shown in fig. 4A includes four memory barrier count pairs a0-a3, a0 includes an instruction issue number a0 and an instruction complete number a0, a1 includes an instruction issue number a1 and an instruction complete number a1, a2 includes an instruction issue number a2 and an instruction complete number a2, and a3 includes an instruction issue number a3 and an instruction complete number a 3. For example, the second barrier storage region 110 may include four sub-barrier storage regions for storing four memory barrier count pairs a0-a3, respectively.
For example, as shown in fig. 3, step S203: in response to the first thread group memory instruction count value being greater than the first thread group wait value, the first execution group is instructed not to continue executing instructions subsequent to the first memory barrier instruction of the first execution group. Step S204: in response to the first thread group memory instruction count value being less than or equal to the first thread group wait value, instructing the first execution group to continue executing instructions subsequent to the first memory barrier instruction of the first execution group. For example, the first thread group wait value is a thread group wait value in the first execution group, i.e., the first thread group wait value may be added to the first execution group.
For example, in steps S203 and S204, it may be determined whether to continue executing the instructions subsequent to the first memory barrier instruction of the first execution group based on the comparison result between the first thread group memory instruction count value and the first thread group wait value in the first execution group, since the first thread group memory instruction count value may represent the execution condition of the first memory execution instruction in the plurality of execution groups in the thread group, that is, when executing the first memory barrier instruction of the first execution group, it may be determined whether to continue executing the instructions subsequent to the first memory barrier instruction of the first execution group based on the execution condition of the first memory execution instruction in the plurality of execution groups in the thread group.
For example, as shown in fig. 2, step S20 includes: a first memory execution instruction of the first execution group is executed.
For example, the memory type information in the first memory execution instruction of the first execution group is the first memory type information.
For example, in step S20, executing the first memory execution instruction of the first execution group includes: acquiring a first memory barrier index corresponding to a first memory execution instruction of a first execution group, and transmitting the first memory execution instruction and the first memory barrier index of the first execution group; after the first memory execution instruction and the first memory barrier index of the first execution group are issued, the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group are set.
For example, as shown in fig. 4B, the first execution group includes a first execution storage area 200, and the first execution storage area 200 is used for storing a first execution group memory instruction count value, where the first execution group memory instruction count value indicates the number of first memory execution instructions of the first execution group that have been transmitted but have not been executed. For example, the dispatch module may assign a first memory execution instruction counter to each execution group, and when a first memory execution instruction is issued, the first memory execution instruction counter of the execution group is incremented by 1, and when the first memory execution instruction is completed, the first memory execution instruction counter of the execution group is decremented by 1.
For example, in step S20, the obtaining the first memory barrier index corresponding to the first memory execution instruction of the first execution group includes: acquiring a first barrier identification code corresponding to the first execution group; determining a barrier register group corresponding to the first execution group based on the first barrier identification code; analyzing the first memory execution instruction of the first execution group to obtain first memory type information in the first memory execution instruction of the first execution group; reading a first execution group memory instruction count value from a first execution storage area; reading a first thread group memory instruction count value from a first barrier storage area based on a first barrier identification code and first memory type information; in response to the first execution group memory instruction count value being less than or equal to the first thread group memory instruction count value and the first thread group memory instruction count value being less than the first count threshold, a first memory barrier index is obtained.
For example, as shown in fig. 4B, the first execution group further includes an execution memory area 210, and the execution memory area 210 is used to store the first barrier identification code. Acquiring the first barrier identification code corresponding to the first execution group may include: the first barrier identification code is read from the execution memory region 210 of the first execution group.
For example, in the present disclosure, each memory execution instruction may also include a field for storing the memory type information, so that when the memory execution instruction is parsed, the memory type information of the memory execution instruction may be obtained. The memory type information of the memory execution instruction is used to indicate the type of the memory execution instruction.
For example, in the present disclosure, the memory type information and the execution group memory instruction count value correspond to each other, if the first execution group includes 3 types of memory execution instructions, three execution group memory instruction count values corresponding to the 3 types of memory execution instructions one-to-one need to be stored in the first execution group, and the execution group memory instruction count value corresponding to the memory type information can be read in the first execution group based on the memory type information in the instructions. For example, the first memory type information corresponds to a first execution group memory instruction count value, and the first execution storage area 200 may be determined based on the first memory type information.
For example, the first barrier storage area 100 may be determined based on the first barrier identification code and the first memory type information. Reading a first thread group memory instruction count value from a first barrier storage area based on a first barrier identification code and first memory type information, comprising: determining a barrier register group corresponding to the first execution group based on the first barrier identification code; then, based on the first memory type information, determining a first barrier storage area used for storing a first thread group memory instruction count value in the barrier register group; then, the first thread group memory instruction count value is read from the first barrier storage area.
For example, the first count threshold may be determined based on a number of memory barrier count pairs in the first memory barrier count pair group, e.g., the first count threshold is equal to the number of memory barrier count pairs in the first memory barrier count pair group.
For example, when the first execution group memory instruction count value is less than or equal to the first thread group memory instruction count value, and the first thread group memory instruction count value is less than the first count threshold value, it indicates that the memory barrier in the barrier register group for recording the first memory execution instruction that has been issued but has not been executed is not exhausted, and thus, the first memory execution instruction to be currently issued may be issued.
For example, as shown in fig. 4A and 4B, in some examples, the first count threshold is 4, the thread group includes a first execution group and a second execution group, the first execution group only issues a first memory execution instruction, and the second execution group has issued three first memory execution instructions, at this time, the first execution group memory instruction count value is 1, the first thread group memory instruction count value is 3, the instruction issue number a0 is 2, the instruction issue number a1 is 1, the instruction issue number a2 is 1, and the instruction issue number a3 is 0, at this time, for the second execution group, the instruction issue number a0, the instruction issue number a1, and the instruction issue number a2 have recorded the first memory execution instructions issued by the second execution group, so that only three instruction issue numbers a3 in the barrier register group may be used for recording the first memory execution instructions issued by the second execution group, that is, the storage barrier register in the barrier register group for recording the first memory execution instructions issued by the second execution group The number of zones is only 1; for the first execution group, only the command issue number a0 records the first memory execution command issued by the first execution group, so that the command issue number a1, the command issue number a2, and the command issue number a3 may also be used to record the first memory execution command issued by the first execution group, that is, the number of barrier storage areas in the barrier register group for recording the first memory execution command issued by the first execution group is 3.
At this time, in step S20, the first memory execution instruction of the first execution group is the second first memory execution instruction of the first execution group, and at this time, since the first execution group memory instruction count value (i.e. 1) is smaller than the first thread group memory instruction count value (i.e. 3) and the first thread group memory instruction count value (i.e. 3) is smaller than the first count threshold value (i.e. 4), it can be determined that the second first memory execution instruction of the first execution group can be issued, and if the first memory barrier index of the second first memory execution instruction of the first execution group indicates the instruction issue number a1, the instruction issue number a1 is changed from 1 to 2 after the second first memory execution instruction of the first execution group is issued.
For example, as shown in FIG. 4B, the first execution group includes an execution store 220 for storing the current memory barrier index. Obtaining the first memory barrier index may include: the current memory barrier index is read from the execution store 220 as the first memory barrier index and then updated.
For example, updating the current memory barrier index may be accomplished using the following equation:
the current memory barrier index is (current memory barrier index + 1)% N,
where,% represents the modulo operator, and N represents the number of memory barrier count pairs in the first memory barrier count pair group. For example, in some embodiments, the current memory barrier index has a value of 1, N is 4, the first memory barrier index is 1 after the current memory barrier index is read from the execution storage area 220 as the first memory barrier index, and then the current memory barrier index is updated based on the above formula, and the updated current memory barrier index is 2.
Fig. 5 is a schematic diagram illustrating information transfer between a scheduling module and a memory-related module according to some embodiments of the disclosure.
For example, as shown in fig. 5, in the instruction execution method provided in the embodiment of the present disclosure, a field for indicating a memory barrier index (e.g., the first memory barrier index) needs to be added to an interface between the scheduling module and the memory-related module, that is, when a memory execution instruction (e.g., the first memory execution instruction) is sent, the scheduling module needs to send the memory barrier index to the memory-related module. In addition, the scheduling module needs to send the instruction information of the memory execution instruction and the execution group information of the execution group to the memory related module. When the memory execution instruction completes execution, the memory-related module also needs to send the memory barrier index, the instruction information, and the execution group information to the scheduling module. Therefore, based on the memory barrier index, the corresponding instruction transmitting number and instruction completing number in the barrier register set can be set.
For example, the memory-related module includes a memory module and a higher-level module of the memory module, such as a cache, an address translator, and the like.
For example, as shown in FIG. 4A, the barrier register set further includes a third barrier storage area 120, the third barrier storage area 120 being used to store a total number of execution groups, the total number of execution groups representing the number of the plurality of execution groups.
For example, in step S20, the setting of the first thread group memory instruction count value, the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group includes: in response to the instruction issue number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group being a predetermined issue number, increasing the first thread group memory instruction count value by a first value to update the first thread group memory instruction count value, and increasing the instruction issue number in the memory barrier count pair indicated by the first memory barrier index by a second value to update the instruction issue number in the memory barrier count pair indicated by the first memory barrier index; in response to completion of execution of a first memory execute instruction of the first execution group, increasing an instruction completion number in a memory barrier count pair indicated by the first memory barrier index by a second value to update the instruction completion number in the memory barrier count pair indicated by the first memory barrier index; in response to the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the first memory barrier index being equal and equal to the total number of execution groups, the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the first memory barrier index are cleared, and the first thread group memory instruction count value is decremented by a first value to update the first thread group memory instruction count value.
For example, in step S20, executing the first memory execution instruction of the first execution group further includes: in response to the first thread group memory instruction count value being equal to the first count threshold and the first execution group memory instruction count value being equal to the first count threshold, determining that a first memory execution instruction of the first execution group is not to be issued.
For example, when the first thread group memory instruction count value is equal to the first count threshold and the first execution group memory instruction count value is equal to the first count threshold, it indicates that all memory barrier count pairs in the first memory barrier count group have recorded data, which indicates that the memory barrier resources allocated to the first memory execution instruction in the first execution group by the barrier register group are exhausted, and thus, the first memory execution instruction to be issued when the first execution group is unable to issue.
For example, in step S20, setting the first thread group memory instruction count value, the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group, further includes: in response to the instruction issue count in the memory barrier count pair indicated by the first memory barrier index not being the predetermined issue count, the first thread group memory instruction count value is maintained and the instruction issue count in the memory barrier count pair indicated by the first memory barrier index is incremented by a second value to update the instruction issue count in the memory barrier count pair indicated by the first memory barrier index.
For example, the first and second values are the same and are both 1.
For example, the predetermined issue number may be 0, that is, when the instruction issue number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group is 0, the first thread group memory instruction count value may be increased by a first value to update the first thread group memory instruction count value; when the instruction issue number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group is not 0, the first thread group memory instruction count value is kept unchanged.
For example, in some examples, the thread groups only include a first execution group and a second execution group, at a first time, a first memory execution instruction of the first execution group is executed, and a memory barrier index corresponding to the first memory execution instruction of the first execution group is 0, that is, corresponding to a memory barrier count pair a0 in the first memory barrier count pair group, if an instruction issue number a0 in the memory barrier count pair a0 is 0, at this time, the first thread group memory instruction count value may be increased by a first value to update the first thread group memory instruction count value, and the instruction issue number a0 in the memory barrier count pair a0 may be increased by a second value to update the instruction issue number a0 in the memory barrier count pair a0, at the first time, before updating the data, the first thread group memory instruction count value is 0, the instruction issue number a0 is 0, and after updating the data, the first thread group memory instruction count value is 1 and instruction issue number a0 is 1.
At a second time after the first time, if the first memory execution instruction of the second execution group is executed and the memory barrier index corresponding to the first memory execution instruction of the second execution group is 0, that is, the memory barrier count pair a0 corresponds to the memory barrier count pair a0, at this time, the instruction issue number a0 in the memory barrier count pair a0 is 1, the first thread group memory instruction count value is kept unchanged, and the instruction issue number a0 in the memory barrier count pair a0 is increased by a second value to update the instruction issue number a0 in the memory barrier count pair a0, at the second time, before updating the data, the first thread group memory instruction count value is 1, the instruction issue number a0 is 1, after updating the data, the first thread group memory instruction count value is still 1, and the instruction issue number a0 is 2. An instruction issue number a0 of 2 indicates that both execution groups issued the first memory execution instruction.
At a third time after the second time, if a second first memory execute instruction (the instruction following the first memory execute instruction of the first execute group) of the first execute group is executed and the memory barrier index corresponding to the second first memory execute instruction of the first execute group is 1, that is, the memory barrier count pair a1 in the first memory barrier count pair group is corresponded, at this time, the instruction issue number a1 in the memory barrier count pair a1 is 0, at this time, the first thread group memory instruction count value may be increased by the first value to update the first thread group memory instruction count value, and the instruction issue number a1 in the memory barrier count pair a1 is increased by the second value to update the instruction issue number a1 in the memory barrier count pair a1, at the third time, before updating the data, the first thread group memory instruction count value is 1, the instruction issue number a1 is 0, then the first thread group memory instruction count value is 2 and instruction issue a1 is 1 after the data is updated. A first thread group memory instruction count value of 2 indicates that at least one execution group in the thread group has issued two first memory execution instructions.
For example, when the first memory execution instruction of the first execution group is executed completely, the completion count of the instruction in the memory barrier count pair indicated by the first memory barrier index may be updated by increasing the completion count of the instruction in the memory barrier count pair indicated by the first memory barrier index by a second value according to the first memory barrier index and the first barrier identifier code.
For example, at a fourth time after the third time, if the first memory execute instruction of the first execute group has been executed completely, the memory barrier index corresponding to the first memory execute instruction of the first execute group is 0, that is, the memory barrier count pair a0, at this time, the instruction completion number a0 in the memory barrier count pair a0 is 0, at this time, the instruction completion number a0 may be incremented by a second value to update the instruction completion number a0, at the fourth time, before the data is updated, the instruction completion number a0 is 0, and after the data is updated, the instruction completion number a0 is 1. An instruction completion number a0 of 1 indicates that at least one execution group in the thread group has completed executing the first memory execution instruction.
For example, when the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index are equal to and equal to the total number of the execution groups, which indicates that all the execution groups in the thread group have completed a certain memory execution instruction, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index need to be cleared, and the first thread group memory instruction count value needs to be decreased by the first value to update the first thread group memory instruction count value.
For example, at a fifth time after the fourth time, if the first ram execution instruction of the second execution group has been executed completely, the memory barrier index corresponding to the first ram execution instruction of the second execution group is 0, that is, the memory barrier count pair a0, at which time the completion number a0 of the memory barrier count pair a0 is 1, at which time the completion number a0 may be increased by a second value to update the completion number a 0. At the fifth time, the command completion number a0 is 1 before updating the data, and the command completion number a0 is 2 after updating the data. Since the thread group only includes the first execution group and the second execution group, that is, the total number of the execution groups is 2, at this time, since the instruction issue number a0 and the instruction completion number a0 in the memory barrier count pair a0 are both 2 and equal to the total number of the execution groups, the instruction issue number a0 and the instruction completion number a0 in the memory barrier count pair a0 may be cleared, and the first thread group memory instruction count value may be reduced by a first value to update the first thread group memory instruction count value, for example, at a fifth time, before data is updated, the first thread group memory instruction count value is 2, and after data is updated, the first thread group memory instruction count value is 1.
For example, the instruction transmission number may be updated and set after the memory execution instruction is transmitted, and the instruction completion number may be updated and set after the memory execution instruction is executed.
It should be noted that the predetermined transmission number, the first value and the second value may be set according to practical situations, and the disclosure is not limited thereto. In the present disclosure, for example, the instruction issue number and the instruction completion number may be counted by a counter, but the present disclosure does not limit the recording manner of the instruction issue number and the instruction completion number.
For example, in step S20, executing the first memory execution instruction of the first execution group further includes: determining whether a first memory execution instruction of a first execution group is an instruction that needs to participate in a memory barrier; in response to that the first memory execution instruction of the first execution group is an instruction which needs to participate in the memory barrier, executing the operation of setting the memory instruction count value of the first thread group, and the instruction transmitting number and the instruction completing number in the memory barrier count pair indicated by the first memory barrier index; in response to the first memory execution instruction of the first execution group not being an instruction that needs to participate in the memory barrier, the operations of setting the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index are not performed.
For example, in some embodiments, the instruction execution method further comprises: a barrier indicator is set for each first memory execution instruction in the plurality of execution groups. For example, if the value of the barrier indicator of each first memory execute instruction is the first indicator value, it indicates that each first memory execute instruction needs to participate in the memory barrier, and if the value of the barrier indicator is the second indicator value, it indicates that each first memory execute instruction does not need to participate in the memory barrier.
For example, in some embodiments, 1 bit (bit) may be added to the code of each memory execution instruction to indicate whether the memory execution instruction needs to participate in the memory barrier, so that some or all of the memory execution instructions may be controlled to participate in the memory barrier as needed.
For example, determining whether a first memory execution instruction of a first execution group is an instruction that needs to participate in a memory barrier includes: analyzing the first memory execution instruction of the first execution group to obtain a barrier indication code of the first memory execution instruction of the first execution group; based on the value of the barrier indicator of the first memory execute instruction of the first execution group, it is determined whether the first memory execute instruction of the first execution group is an instruction that needs to participate in a memory barrier. When the value of the barrier indicator of the first memory execution instruction of the first execution group is the first indicator value, the first memory execution instruction of the first execution group is an instruction that needs to participate in the memory barrier, and when the value of the barrier indicator of the first memory execution instruction of the first execution group is the second indicator value, the first memory execution instruction of the first execution group is an instruction that does not need to participate in the memory barrier.
For example, the first indication value may be 1 and the second indication value may be 0. The first indication value and the second indication value may be set according to actual conditions, and the disclosure does not limit this.
It should be noted that, the instruction execution method provided by the embodiment of the present disclosure only executes the instruction for the memory with the barrier indication code, and for the memory executing instruction without the barrier indication code, the instruction execution method is not affected by the barrier mechanism determined by the instruction execution method provided by the embodiment of the present disclosure.
It should be noted that, as shown in fig. 4A, the barrier register group further includes a barrier storage area for storing the number of currently executed groups, and as shown in fig. 4B, each of the first execution group and the second execution group further includes a memory count value, a wait value, a PC, and the like.
FIG. 6A is a diagram illustrating another barrier register set according to some embodiments of the present disclosure; fig. 6B is a schematic diagram of another first execution group and a second execution group provided in some embodiments of the present disclosure.
For example, in some embodiments, each of the plurality of execution groups may further execute at least one second memory execution instruction, the memory type information in the second memory execution instruction is second memory type information, and the first memory type information and the second memory type information respectively indicate different memory types. The first memory barrier instruction is further configured to barrier at least one second memory execution instruction within the thread group, that is, the same memory barrier instruction may achieve the barrier of two different memory types of memory execution instructions within the thread group.
It should be noted that, except for the different memory types, the scheduling module has similar mechanisms for scheduling and transmitting the first memory execution instruction and the second memory execution instruction, so that the above description about the first memory execution instruction can also be applied to the second memory execution instruction without being contradicted. For example, the instruction execution method further includes: a barrier indicator is set for each second memory execution instruction in the plurality of execution groups. Those skilled in the art will appreciate that the specific function of the first memory to execute instructions and the specific function of the second memory to execute instructions will generally not be the same.
For example, the instruction execution method further includes: and supervising the execution of all second memory execution instructions in the plurality of execution groups to obtain a second supervision result. For example, the second supervision result includes a second thread group memory instruction count value corresponding to the first memory barrier instruction of the first execution group. For example, in step S20, a first memory barrier instruction of a first execution group may be executed based on the first and second supervision results.
For example, in step S20, executing the first memory barrier instruction of the first execution group further includes: obtaining second memory type information and a second thread group waiting value corresponding to the second memory type information according to a first memory barrier instruction of the first execution group; acquiring a second thread group memory instruction count value, wherein the second thread group memory instruction count value represents the maximum value of a plurality of second uncompleted instruction numbers respectively corresponding to a plurality of execution groups in the current cycle, and the second uncompleted instruction number corresponding to each execution group represents the number of second memory execution instructions which are transmitted but not executed completely in the current cycle of each execution group; in response to the second thread group memory instruction count value being greater than the second thread group wait value, instructing the first execution group not to continue executing instructions subsequent to the first memory barrier instruction of the first execution group; in response to the second thread group memory instruction count value being less than or equal to the second thread group wait value, instructing the first execution group to continue executing instructions subsequent to the first memory barrier instruction of the first execution group.
It should be noted that, in the present disclosure, a memory execution instruction (a first memory execution instruction, a second memory execution instruction, or the like) that has not been executed completely means that a memory request (whether a read request or a write request) is not returned from the memory module.
For example, as shown in fig. 6A, the barrier register set further includes a fourth barrier storage area 130 and a fifth barrier storage area 140, where the fourth barrier storage area 130 is used to store a second thread group memory instruction count value, the fifth barrier storage area 140 is used to store a second memory barrier count pair group corresponding to the second memory type information, the second memory barrier count pair group includes at least one memory barrier count pair, and each memory barrier count pair includes an instruction issue number and an instruction complete number. The second memory barrier count pair group shown in fig. 6A includes four memory barrier count pairs b0-b3, the memory barrier count pair b0 includes an instruction issue number b0 and an instruction completion number b0, the memory barrier count pair b1 includes an instruction issue number b1 and an instruction completion number b1, the memory barrier count pair b2 includes an instruction issue number b2 and an instruction completion number b2, and the memory barrier count pair b3 includes an instruction issue number b3 and an instruction completion number b 3. For example, the fifth barrier storage region 140 may include four sub-barrier storage regions for storing four memory barrier count pairs b0-b3, respectively.
For example, in some embodiments, supervising execution of all second memory execution instructions in the plurality of execution groups to obtain a second supervised result comprises: and executing the second memory execution instruction of the first execution group.
For example, the memory type information in the second memory execution instruction of the first execution group is the second memory type information.
For example, as shown in fig. 6B, the first execution group further includes a third execution storage area 230, where the third execution storage area 230 is used for storing a third execution group memory instruction count value, and the third execution group memory instruction count value represents the number of second memory execution instructions that have been transmitted but have not been executed in the first execution group. For example, the scheduling module may allocate a second memory execution instruction counter for each execution group, and when a second memory execution instruction is issued, the second memory execution instruction counter of the execution group is incremented by 1, and when the second memory execution instruction is issued, the second memory execution instruction counter of the execution group is decremented by 1.
For example, executing the second memory execution instruction of the first execution group includes: acquiring a second memory barrier index corresponding to a second memory execution instruction of the first execution group, and transmitting the second memory execution instruction and the second memory barrier index of the first execution group; after the second memory execution instruction and the second memory barrier index of the first execution group are issued, a second thread group memory instruction count value, an instruction issue number and an instruction complete number in a memory barrier count pair indicated by the second memory barrier index in the second memory barrier count pair group are set.
For example, in some embodiments, obtaining a second memory barrier index corresponding to a second memory execute instruction of the first execution group includes: acquiring a first barrier identification code corresponding to the first execution group; determining a barrier register group corresponding to the first execution group based on the first barrier identification code; analyzing the second memory execution instruction to obtain second memory type information in the second memory execution instruction; reading a third execution group memory instruction count value from a third execution storage area; reading a second thread group memory instruction count value from a fourth barrier storage area based on the first barrier identification code and the second memory type information; in response to the third execution group memory instruction count value being less than or equal to the second thread group memory instruction count value and the second thread group memory instruction count value being less than the second count threshold, a second memory barrier index is obtained.
It should be noted that, for a specific description of the step "obtaining the second memory barrier index corresponding to the second memory execution instruction of the first execution group", reference may be made to the above description of "obtaining the first memory barrier index corresponding to the first memory execution instruction of the first execution group" in step S20, and repeated parts are not repeated herein.
For example, as shown in FIG. 6A, the barrier register set further includes a third barrier storage area 120, the third barrier storage area 120 being used to store a total number of execution groups, the total number of execution groups representing the number of the plurality of execution groups.
Setting a second thread group memory instruction count value, an instruction issue number and an instruction completion number in a memory barrier count pair indicated by a second memory barrier index in a second memory barrier count pair group, comprising: in response to the instruction issue number in the memory barrier count pair indicated by the second memory barrier index in the second memory barrier count pair group being the predetermined issue number, increasing the second thread group memory instruction count value by a first value to update the second thread group memory instruction count value, and increasing the instruction issue number in the memory barrier count pair indicated by the second memory barrier index by a second value to update the instruction issue number in the memory barrier count pair indicated by the second memory barrier index; in response to completion of execution of the second memory execution instruction of the first execution group, increasing the completion number of instructions in the memory barrier count pair indicated by the second memory barrier index by a second value to update the completion number of instructions in the memory barrier count pair indicated by the second memory barrier index; in response to the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the second memory barrier index being equal and equal to the total number of execution groups, clearing the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the second memory barrier index, and decreasing the second thread group memory instruction count value by a second value to update the second thread group memory instruction count value.
It should be noted that, regarding the specific description of the step "setting the second thread group memory instruction count value, the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the second memory barrier index in the second memory barrier count pair group", in the case of no contradiction, reference may be made to the above description of "setting the first thread group memory instruction count value, the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group" in step S20, and repeated parts are not described again.
For example, as shown in fig. 6B, the plurality of execution groups further includes a second execution group, the second execution group includes a second execution storage area 240, the second execution storage area 240 is used for storing a second execution group memory instruction count value, and the second execution group memory instruction count value indicates the number of the first memory execution instructions that have been transmitted but have not been executed by the second execution group.
For example, supervising execution of all first memory execution instructions in the plurality of execution groups to obtain a first supervised result further comprises: the first memory execution instruction of the second execution group is executed. For example, the memory type information in the first memory execution instruction of the second execution group is the first memory type information.
For example, executing a first memory execution instruction of a second execution group includes: acquiring a third memory barrier index corresponding to the first memory execution instruction of the second execution group, and transmitting the first memory execution instruction and the third memory barrier index of the second execution group; after the first memory execution instruction and the third memory barrier index of the second execution group are issued, the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the third memory barrier index in the first memory barrier count pair group are set.
For example, as shown in fig. 6B, the second execution group further includes an execution storage area for storing the second barrier identification code.
For example, obtaining a third memory barrier index corresponding to a first memory execute instruction of the second execution group includes: acquiring a second barrier identification code corresponding to the second execution group; determining a barrier register group corresponding to the second execution group based on the second barrier identification code; analyzing the first memory execution instruction of the second execution group to obtain first memory type information in the first memory execution instruction of the second execution group; reading a second execution group memory instruction count value from a second execution storage area; reading a first thread group memory instruction count value from a first barrier storage area based on a second barrier identification code and first memory type information; in response to the second execution group memory instruction count value being less than or equal to the first thread group memory instruction count value and the first thread group memory instruction count value being less than the first count threshold, a third memory barrier index is obtained.
For example, setting the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the third memory barrier index in the first memory barrier count pair group includes: in response to the instruction issue number in the memory barrier count pair indicated by the third memory barrier index in the first memory barrier count pair group being a predetermined issue number, increasing the first thread group memory instruction count value by a first value to update the first thread group memory instruction count value, and increasing the instruction issue number in the memory barrier count pair indicated by the third memory barrier index by a second value to update the instruction issue number in the memory barrier count pair indicated by the third memory barrier index; in response to completion of execution of the first memory execution instruction of the second execution group, increasing an instruction completion number in the memory barrier count pair indicated by the third memory barrier index by a second value to update the instruction completion number in the memory barrier count pair indicated by the third memory barrier index; in response to the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the third memory barrier index being equal and equal to the total number of execution groups, the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the third memory barrier index are cleared, and the first thread group memory instruction count value is decremented by a first value to update the first thread group memory instruction count value.
It should be noted that, for a specific description of the step "execute the first memory execution instruction of the second execution group", reference may be made to the above description of "execute the first memory execution instruction of the first execution group" in step S20, and repeated descriptions are omitted here.
For example, as shown in fig. 6B, the second execution group further includes an execution fourth execution storage area 250, where the fourth execution storage area 250 is used to store a fourth execution group memory instruction count value, and the fourth execution group memory instruction count value represents the number of second memory execution instructions that have been issued but have not been executed by the second execution group. For a specific process of executing the second memory execution instruction of the second execution group, reference may be made to the above description of executing the second memory execution instruction of the first execution group, and repeated descriptions are omitted here.
It should be noted that, in the embodiments of the present disclosure, barrier storage areas (for example, the above-mentioned first barrier storage area, second barrier storage area, third barrier storage area, fourth barrier storage area, and fifth barrier storage area) in a barrier register group may represent one register, and may also represent a part of one register, for example, g1 data bits (each data bit may store 1 bit (bit) data), and g1 is a positive integer. Similarly, the execution storage areas in the execution group (e.g., the above-mentioned first execution storage area, second execution storage area, third execution storage area, and fourth execution storage area) may represent one register, and may also represent a part of one register, such as g2 data bits (each data bit may store 1 bit (bit) data), and g2 is a positive integer.
It should be noted that, the instruction execution process of each execution group in the thread groups may refer to the instruction execution process of the first execution group and the instruction execution process of the second execution group provided in the present disclosure, and repeated details are omitted.
For example, in some embodiments, each execution group may execute at least one second memory execution instruction and a second memory barrier instruction for blocking the second memory execution instruction of the execution group within the thread group. That is, in embodiments of the present disclosure, different memory barrier instructions may be employed to barrier different types of memory execution instructions within a thread group.
The memory barrier mechanism realized by the instruction execution method provided by the embodiment of the disclosure is also applied to processors of other multi-core multithreading related technologies, and is not limited to GPGPU. According to the memory barrier mechanism, a corresponding register may be added to a global module (e.g., a command processor) to implement a memory barrier of a global nature.
The above encoding of the memory execute instruction and the memory barrier instruction is merely an illustration, and the memory execute instruction and the memory barrier instruction have different expressions in different instruction sets.
It should be noted that in the present disclosure, the memory execution instruction refers to an instruction related to a memory, and the memory includes, but is not limited to, an on-chip memory (HBM (high bandwidth display memory), GDDR (Graphics Double Data Rate memory, Graphics Double Data Rate), etc.), and also includes a local shared memory, a global shared memory, a cache, and the like.
For example, the memory barrier mechanism provided by embodiments of the present disclosure may simultaneously support multiple memory execution instructions of different memory types. If the processor supports memories of different memory types and memory execution instructions, only the thread group memory instruction count value, the memory barrier count pair group, the thread group wait value and the execution group memory instruction count value need to be correspondingly increased.
The instruction execution method provided by the embodiment of the present disclosure is specifically described below with reference to an example of table 4. In table 4, the memory execute instruction (b) indicates that the memory execute instruction needs to participate in the memory barrier, while the memory execute instruction without (b) (e.g., memory execute instruction 3) indicates that the memory execute instruction does not need to participate in the memory barrier.
Table 4:
execution group 0 Execution group 1
Memory execution instruction (b)0 Memory execution instruction (b)0
Memory execution instruction (b)1 Memory execution instruction (b)1
Memory execution instruction (b)2 Memory execution instruction (b)2
Other instruction Block 0 Other instruction Block 0
Memory barrier instruction Memory barrier instruction
Memory execute instruction 3 Memory execute instruction 3
Other instruction Block 1 Other instruction Block 1
For example, when a thread group includes only an execution group 0 (e.g., the first execution group) and an execution group 1 (e.g., the second execution group), the total number of execution groups stored in the barrier register group corresponding to the thread group is 2. For example, in table 4 above, the memory execution instructions (b)0-3 of execution group 0 and execution group 1 are both instructions of the same memory type.
When the memory execution instruction (b)0 of the execution group 0 is executed, the memory execution instruction (b)0 of the execution group 0 may be decoded to obtain the memory type information in the memory execution instruction (b)0 of the execution group 0, the corresponding thread group memory instruction count value may be obtained by executing the barrier identification code corresponding to the group 0 and the memory type information in the memory execution instruction (b)0 of the execution group 0, if the thread group memory instruction count value (e.g., 0) is smaller than the count threshold value (e.g., 4), the instruction issue number may be obtained by executing the memory barrier index 0 corresponding to the memory execution instruction (b)0 of the group 0, at this time, the instruction issue number is 0 (i.e., the predetermined issue number), and thus, the thread group memory instruction count value (e.g., 0) may be increased by 1 to update the thread group memory instruction count value, the updated thread group memory instruction count value is 1. Then, the instruction issue number is also increased by 1 so that the instruction issue number becomes 1.
When the execution group 1 executes the memory execution instruction (b)0, the memory execution instruction (b)0 of the execution group 1 may be decoded to obtain the memory type information in the memory execution instruction (b)0 of the execution group 1, and a corresponding thread group memory instruction count value may be obtained by executing the barrier identification code corresponding to the group 1 and the memory type information in the memory execution instruction (b)0 of the execution group 1, and if the thread group memory instruction count value (e.g., 0) is smaller than the count threshold value (e.g., 4), an instruction issue number may be obtained by executing the memory barrier index 0 corresponding to the memory execution instruction (b)0 of the group 1, where the instruction issue number is 1, that is, not a predetermined issue number, and at this time, the thread group memory instruction count value is kept unchanged, that is, where the thread group memory instruction count value is still 1. Then, the instruction issue number is increased by 1 so that the instruction issue number becomes 2.
When the execution group 0 and the execution group 1 both issue 3 memory execution instructions (b)0-3, the thread group memory instruction count value should be 3, and the instruction issue number of the 3 memory barrier count pairs is 2. If the memory execution instruction (b)0 of the execution group 0 and the memory execution instruction (b)0 of the execution group 1 are both executed, the instruction completion number corresponding to the memory barrier index 0 is 2, at this time, the instruction completion number corresponding to the memory barrier index 0 is equal to the instruction issue number and is equal to the total number of the execution groups, at this time, the thread group memory instruction count value is decremented by 1, that is, the thread group memory instruction count value becomes 2, and the instruction issue number and the instruction completion number corresponding to the memory barrier index 0 are both cleared, that is, both become 0. And by analogy, when the memory execution instructions (b)1 of the execution group 0 and the execution group 1 are executed, the thread group memory instruction count value is decreased by 1, that is, the thread group memory instruction count value becomes 1. When the memory execution instructions (b)2 of execution group 0 and execution group 1 are both executed, the thread group memory instruction count value is decremented by 1, i.e., the thread group memory instruction count value becomes 0.
The final effect of the instruction sequence is as follows: group 0 and group 1 are executed, and after 3 memory execution instructions (b) with memory barrier bits are executed, the other instruction block 0 is executed. Then, if the execution group 0 executes the memory barrier instruction (the thread group wait value corresponding to the memory type information in the memory barrier instruction is 1), at this time, after the memory execution instruction (b)0 and the memory execution instruction (b)1 in the wait execution group 0 and the execution group 1 are both executed, the memory execution instruction 3 and the other instruction blocks 1 in the execution group 0 may be continuously executed. For the execution group 0, after the other instruction block 0 of the execution group 0 is executed, when the memory barrier instruction of the execution group 0 is executed, after the memory execution instruction (b)0 and the memory execution instruction (b)1 in the execution group 0 and the execution group 1 are executed, the instructions (such as the memory execution instruction 3 and the other instruction block 1) located after the memory barrier instruction in the execution group 0 can be started to be executed, without waiting for the memory execution instruction (b)2 in the execution group 1 to be executed completely, and without waiting for the other instruction block 0 in the execution group 1 to be executed completely.
Fig. 7 is a schematic diagram of an instruction execution apparatus according to at least one embodiment of the present disclosure.
At least one embodiment of the present disclosure also provides an instruction execution apparatus, which may be applied to a thread group, for example, where the thread group includes a plurality of execution groups, the plurality of execution groups includes a first execution group, and each of the plurality of execution groups may execute at least one first memory execution instruction and a first memory barrier instruction, and the first memory barrier instruction is used to barrier the at least one first memory execution instruction inside the thread group.
For example, as shown in fig. 7, the instruction execution apparatus 700 includes an instruction supervision unit 701 and a barrier instruction execution unit 702.
For example, the instruction monitor unit 701 is configured to monitor execution of all first memory execution instructions in the plurality of execution groups to obtain a first monitor result. For example, the first supervision result includes a first thread group memory instruction count value corresponding to a first memory barrier instruction of the first execution group.
For example, the barrier instruction execution unit 702 is configured to execute a first memory barrier instruction of a first execution group.
For example, the barrier instruction execution unit 702 includes an instruction fetching module 7021, a decoding module 7022, an obtaining module 7023, and an execution module 7024.
For example, instruction fetch module 7021 is configured to fetch a first memory barrier instruction of a first execution group.
For example, the decoding module 7022 is configured to parse the first memory barrier instruction of the first execution group to obtain the first memory type information in the first memory barrier instruction of the first execution group and the first thread group wait value corresponding to the first memory type information.
For example, the obtaining module 7023 is configured to obtain a first thread group memory instruction count value. For example, the first thread group memory instruction count value represents a maximum value of a plurality of first outstanding instruction numbers corresponding to a plurality of execution groups in the current cycle, and the first outstanding instruction number corresponding to each execution group represents a number of first memory execution instructions that have been issued but have not yet been executed in the current cycle in each execution group.
For example, execution module 7024 is configured to: in response to the first thread group memory instruction count value being greater than the first thread group wait value, indicating to the first execution group not to continue executing instructions subsequent to the first memory barrier instruction of the first execution group; in response to the first thread group memory instruction count value being less than or equal to the first thread group wait value, instructing the first execution group to continue executing instructions subsequent to the first memory barrier instruction of the first execution group.
For example, the instruction fetching module 702, when performing the operation of obtaining the count value of the first thread group memory instruction, performs the following operations: acquiring a first barrier identification code corresponding to the first execution group; determining a barrier register set corresponding to the first execution set based on the first barrier identification code, wherein the barrier register set comprises a first barrier storage area, and the first barrier storage area is used for storing a first thread group memory instruction count value; the first thread group memory instruction count value is read from the first barrier storage area.
For example, the barrier register set further includes a second barrier storage area, where the second barrier storage area is used to store a first memory barrier count pair group corresponding to the first memory type information, the first memory barrier count pair group includes at least one memory barrier count pair, and each memory barrier count pair includes an instruction issue number and an instruction complete number.
For example, the instruction monitor unit 701, when performing an operation of monitoring execution of all first memory execution instructions in the plurality of execution groups to obtain a first monitoring result, performs the following operations: a first memory execution instruction of the first execution group is executed. For example, the memory type information in the first memory execution instruction of the first execution group is the first memory type information.
For example, the instruction monitor unit 701, when executing the operation of executing the first memory execution instruction of the first execution group, performs the following operations: acquiring a first memory barrier index corresponding to a first memory execution instruction of a first execution group, and transmitting the first memory execution instruction and the first memory barrier index of the first execution group; after the first memory execution instruction and the first memory barrier index of the first execution group are issued, the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group are set.
For example, the barrier register group further includes a third barrier storage area for storing an execution group total number indicating the number of the plurality of execution groups. The instruction monitor unit 701, when performing an operation of setting the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group, performs the following operations: in response to the instruction issue number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group being a predetermined issue number, increasing the first thread group memory instruction count value by a first value to update the first thread group memory instruction count value, and increasing the instruction issue number in the memory barrier count pair indicated by the first memory barrier index by a second value to update the instruction issue number in the memory barrier count pair indicated by the first memory barrier index; in response to completion of execution of a first memory execute instruction of the first execution group, increasing an instruction completion number in a memory barrier count pair indicated by the first memory barrier index by a second value to update the instruction completion number in the memory barrier count pair indicated by the first memory barrier index; in response to the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the first memory barrier index being equal and equal to the total number of execution groups, the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the first memory barrier index are cleared, and the first thread group memory instruction count value is decremented by a first value to update the first thread group memory instruction count value.
For example, the instruction monitor unit 701, when performing the operation of setting the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group, further performs the following operations: in response to the instruction issue count in the memory barrier count pair indicated by the first memory barrier index not being the predetermined issue count, the first thread group memory instruction count value is maintained and the instruction issue count in the memory barrier count pair indicated by the first memory barrier index is incremented by a second value to update the instruction issue count in the memory barrier count pair indicated by the first memory barrier index.
For example, the first and second values are the same and are both 1.
For example, the first execution group includes a first execution storage area for storing a first execution group memory instruction count value, which indicates the number of first memory execution instructions of the first execution group that have been issued but have not been executed.
For example, the instruction monitor unit 701, when performing the operation of obtaining the first memory barrier index corresponding to the first memory execution instruction of the first execution group, performs the following operations: acquiring a first barrier identification code corresponding to the first execution group; determining a barrier register group corresponding to the first execution group based on the first barrier identification code; analyzing the first memory execution instruction of the first execution group to obtain first memory type information in the first memory execution instruction of the first execution group; reading a first execution group memory instruction count value from a first execution storage area; reading a first thread group memory instruction count value from a first barrier storage area based on a first barrier identification code and first memory type information; in response to the first execution group memory instruction count value being less than or equal to the first thread group memory instruction count value and the first thread group memory instruction count value being less than the first count threshold, a first memory barrier index is obtained.
For example, the instruction monitor unit 701, when performing the operation of executing the first memory execution instruction of the first execution group, further performs the following operations: in response to the first thread group memory instruction count value being equal to the first count threshold and the first execution group memory instruction count value being equal to the first count threshold, determining that a first memory execution instruction of the first execution group is not to be issued.
For example, the instruction monitor unit 701, when performing the operation of executing the first memory execution instruction of the first execution group, further performs the following operations: determining whether a first memory execution instruction of a first execution group is an instruction that needs to participate in a memory barrier; in response to that the first memory execution instruction of the first execution group is an instruction which needs to participate in the memory barrier, executing the operation of setting the memory instruction count value of the first thread group, and the instruction transmitting number and the instruction completing number in the memory barrier count pair indicated by the first memory barrier index; in response to the first memory execution instruction of the first execution group not being an instruction that needs to participate in the memory barrier, the operations of setting the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index are not performed.
For example, the instruction execution apparatus 700 further includes a barrier indicator setting unit. The barrier indicator setting unit is configured to: a barrier indicator is set for each first memory execution instruction in the plurality of execution groups. For example, if the value of the barrier indicator of each first memory execute instruction is the first indicator value, it indicates that each first memory execute instruction needs to participate in the memory barrier, and if the value of the barrier indicator is the second indicator value, it indicates that each first memory execute instruction does not need to participate in the memory barrier.
For example, the instruction monitor unit 701, when performing the operation of determining whether the first memory execution instruction of the first execution group is an instruction that needs to participate in the memory barrier, performs the following operations: analyzing the first memory execution instruction of the first execution group to obtain a barrier indication code of the first memory execution instruction of the first execution group; based on the value of the barrier indicator of the first memory execute instruction of the first execution group, it is determined whether the first memory execute instruction of the first execution group is an instruction that needs to participate in a memory barrier.
For example, each execution group may also execute at least one second memory execution instruction, and the first memory barrier instruction may also be used to barrier the at least one second memory execution instruction within the thread group.
For example, the instruction supervision unit 701 is further configured to: and supervising the execution of all second memory execution instructions in the plurality of execution groups to obtain a second supervision result. For example, the second supervision result includes a second thread group memory instruction count value corresponding to the first memory barrier instruction of the first execution group.
For example, the barrier instruction execution unit 702, when performing the operation of executing the first memory barrier instruction of the first execution group, further performs the following operations: obtaining second memory type information and a second thread group waiting value corresponding to the second memory type information according to a first memory barrier instruction of the first execution group; acquiring a second thread group memory instruction count value, wherein the second thread group memory instruction count value represents the maximum value of a plurality of second uncompleted instruction numbers corresponding to a plurality of execution groups in the current cycle respectively, and the second uncompleted instruction number corresponding to each execution group represents the number of second memory execution instructions which are transmitted but not executed completely in the current cycle by each execution group; in response to the second thread group memory instruction count value being greater than the second thread group wait value, instructing the first execution group not to continue executing instructions subsequent to the first memory barrier instruction of the first execution group; in response to the second thread group memory instruction count value being less than or equal to the second thread group wait value, instructing the first execution group to continue executing instructions subsequent to the first memory barrier instruction of the first execution group.
For example, the first memory type information and the second memory type information are different.
For example, the barrier register set further includes a fourth barrier storage area and a fifth barrier storage area, the fourth barrier storage area is configured to store a second thread group memory instruction count value, the fifth barrier storage area is configured to store a second memory barrier count pair group corresponding to the second memory type information, the second memory barrier count pair group includes at least one memory barrier count pair, and each memory barrier count pair includes an instruction transmission number and an instruction completion number.
For example, the instruction monitoring unit 701, when performing the operation of monitoring the execution of all the second memory execution instructions in the plurality of execution groups to obtain the second monitoring result, performs the following operations: and executing the second memory execution instruction of the first execution group. For example, the memory type information in the second memory execution instruction of the first execution group is the second memory type information.
For example, the instruction monitor unit 701, when executing the operation of executing the second memory execution instruction of the first execution group, performs the following operations: acquiring a second memory barrier index corresponding to a second memory execution instruction of the first execution group, and transmitting the second memory execution instruction and the second memory barrier index of the first execution group; after the second memory execution instruction and the second memory barrier index of the first execution group are issued, a second thread group memory instruction count value, an instruction issue number and an instruction complete number in a memory barrier count pair indicated by the second memory barrier index in the second memory barrier count pair group are set.
For example, the instruction monitor unit 701, when performing the operation of setting the second thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the second memory barrier index in the second memory barrier count pair group, performs the following operations: in response to the instruction issue number in the memory barrier count pair indicated by the second memory barrier index in the second memory barrier count pair group being the predetermined issue number, increasing the second thread group memory instruction count value by a first value to update the second thread group memory instruction count value, and increasing the instruction issue number in the memory barrier count pair indicated by the second memory barrier index by a second value to update the instruction issue number in the memory barrier count pair indicated by the second memory barrier index; in response to completion of execution of the second memory execution instruction of the first execution group, increasing the completion number of instructions in the memory barrier count pair indicated by the second memory barrier index by a second value to update the completion number of instructions in the memory barrier count pair indicated by the second memory barrier index; in response to the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the second memory barrier index being equal and equal to the total number of execution groups, clearing the instruction issue number and the instruction complete number in the memory barrier count pair indicated by the second memory barrier index, and decreasing the second thread group memory instruction count value by a second value to update the second thread group memory instruction count value.
For example, the plurality of execution groups also includes a second execution group. For example, the instruction monitoring unit 701, when performing the operation of monitoring the execution of all the first memory execution instructions in the plurality of execution groups to obtain the first monitoring result, further performs the following operations: the first memory execution instruction of the second execution group is executed. For example, the memory type information in the first memory execution instruction of the second execution group is the first memory type information.
For example, the instruction monitor unit 701 performs the following operations in executing the instruction in the first memory of the second execution group: acquiring a third memory barrier index corresponding to the first memory execution instruction of the second execution group, and transmitting the first memory execution instruction and the third memory barrier index of the second execution group; after the first memory execution instruction and the third memory barrier index of the second execution group are issued, the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the third memory barrier index in the first memory barrier count pair group are set.
For the present disclosure, there are also the following points to be explained:
(1) the drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to the common design.
(2) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (19)

1. An instruction execution method is applied to a thread group, wherein the thread group comprises a plurality of execution groups, each of the plurality of execution groups executes at least one first memory execution instruction and a first memory barrier instruction, the first memory barrier instruction is used for blocking the at least one first memory execution instruction inside the thread group, the plurality of execution groups comprises a first execution group,
the instruction execution method comprises the following steps:
monitor execution of all first memory execution instructions in the plurality of execution groups for a first monitored result, and,
executing a first memory barrier instruction of the first execution group,
wherein the first supervision result comprises a first thread group memory instruction count value corresponding to a first memory barrier instruction of the first execution group,
executing a first memory barrier instruction of the first execution group, comprising:
obtaining first memory type information in the first memory barrier instruction of the first execution group and a first thread group waiting value corresponding to the first memory type information according to the first memory barrier instruction of the first execution group;
acquiring a memory instruction count value of the first thread group;
in response to the first thread group memory instruction count value being greater than the first thread group wait value, indicating that the first execution group does not continue to execute instructions subsequent to a first memory barrier instruction of the first execution group; and
in response to the first thread group memory instruction count value being less than or equal to the first thread group wait value, instructing the first execution group to continue executing instructions subsequent to a first memory barrier instruction of the first execution group.
2. The instruction execution method of claim 1, wherein the first thread group memory instruction count value indicates a maximum value of a first number of outstanding instructions corresponding to the execution groups in a current cycle, and the first number of outstanding instructions corresponding to each execution group indicates a number of first memory execution instructions that have been issued but have not yet been executed in the current cycle in the execution group.
3. The instruction execution method of claim 1 or 2, wherein obtaining the first thread group memory instruction count value comprises:
acquiring a first barrier identification code corresponding to the first execution group;
determining a barrier register set corresponding to the first execution set based on the first barrier identification code, wherein the barrier register set comprises a first barrier storage area, and the first barrier storage area is used for storing the first thread group memory instruction count value;
reading the first thread group memory instruction count value from the first barrier storage area.
4. The instruction execution method of claim 3, wherein the barrier register set further comprises a second barrier storage area for storing a first memory barrier count-pair group corresponding to the first memory type information, the first memory barrier count-pair group comprising at least one memory barrier count pair, each memory barrier count pair comprising an instruction issue number and an instruction completion number;
supervising execution of all first memory execution instructions in the plurality of execution groups to obtain a first supervised result, comprising: executing a first memory execution instruction of the first execution group,
wherein the memory type information in the first memory execution instruction of the first execution group is the first memory type information,
executing a first memory execution instruction of the first execution group, comprising:
acquiring a first memory barrier index corresponding to a first memory execution instruction of the first execution group, and transmitting the first memory execution instruction of the first execution group and the first memory barrier index;
after the first memory execution instruction and the first memory barrier index of the first execution group are transmitted, setting the first thread group memory instruction count value, the instruction transmitting number and the instruction completing number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group.
5. The instruction execution method of claim 4, wherein the barrier register set further comprises a third barrier storage area to store an execution group total, the execution group total representing a number of the plurality of execution groups,
setting the first thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group, including:
in response to the instruction issue number in the memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group being a predetermined issue number, increasing the first thread group memory instruction count value by a first value to update the first thread group memory instruction count value, and increasing the instruction issue number in the memory barrier count pair indicated by the first memory barrier index by a second value to update the instruction issue number in the memory barrier count pair indicated by the first memory barrier index;
in response to completion of execution of a first memory execute instruction of the first execution group, increasing an instruction completion number in a memory barrier count pair indicated by the first memory barrier index by the second value to update the instruction completion number in the memory barrier count pair indicated by the first memory barrier index;
in response to that the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index are equal to and equal to the total number of execution groups, clearing the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the first memory barrier index, and decreasing the first thread group memory instruction count value by the first value to update the first thread group memory instruction count value.
6. The instruction execution method of claim 5, wherein setting the first thread group memory instruction count value, an instruction issue number and an instruction complete number in a memory barrier count pair indicated by the first memory barrier index in the first memory barrier count pair group, further comprises:
in response to the instruction issue count in the memory barrier count pair indicated by the first memory barrier index not being the predetermined issue count, keeping the first thread group memory instruction count value unchanged, and increasing the instruction issue count in the memory barrier count pair indicated by the first memory barrier index by the second value to update the instruction issue count in the memory barrier count pair indicated by the first memory barrier index.
7. The instruction execution method of claim 5, wherein the first and second numerical values are the same and are both 1.
8. The instruction execution method of claim 4, wherein the first execution group comprises a first execution storage area to store a first execution group memory instruction count value that represents a number of first memory execution instructions of the first execution group that have been issued but not executed;
obtaining a first memory barrier index corresponding to a first memory execution instruction of the first execution group, comprising:
acquiring the first barrier identification code corresponding to the first execution group;
determining the barrier register group corresponding to the first execution group based on the first barrier identification code;
analyzing the first memory execution instruction of the first execution group to obtain the first memory type information in the first memory execution instruction of the first execution group;
reading the first execution group memory instruction count value from the first execution storage area;
reading the first thread group memory instruction count value from the first barrier storage area based on the first barrier identification code and the first memory type information;
in response to the first execution group memory instruction count value being less than or equal to the first thread group memory instruction count value and the first thread group memory instruction count value being less than a first count threshold, obtaining the first memory barrier index.
9. The instruction execution method of claim 8, wherein executing the first memory execution instruction of the first execution group further comprises: determining that a first memory execution instruction of the first execution group is not to be issued in response to the first thread group memory instruction count value being equal to the first count threshold and the first execution group memory instruction count value being equal to the first count threshold.
10. The instruction execution method of claim 4, wherein executing the first memory execution instruction of the first execution group further comprises:
determining whether a first memory execution instruction of the first execution group is an instruction that needs to participate in a memory barrier;
in response to that a first memory execution instruction of the first execution group is an instruction that needs to participate in a memory barrier, executing an operation of setting a memory instruction count value of the first thread group, an instruction issue number and an instruction completion number in a memory barrier count pair indicated by the first memory barrier index;
in response to the first memory execution instruction of the first execution group not being an instruction that needs to participate in a memory barrier, not performing an operation of setting the first thread group memory instruction count value, an instruction issue number and an instruction completion number in a memory barrier count pair indicated by the first memory barrier index.
11. The instruction execution method of claim 10, further comprising: setting a barrier indication code for each first memory execution instruction in the plurality of execution groups, wherein a value of the barrier indication code of each first memory execution instruction is a first indication value, which indicates that each first memory execution instruction needs to participate in a memory barrier, and a value of the barrier indication code is a second indication value, which indicates that each first memory execution instruction does not need to participate in the memory barrier;
wherein determining whether the first memory execution instruction of the first execution group is an instruction that needs to participate in a memory barrier comprises:
analyzing the first memory execution instruction of the first execution group to obtain a barrier indication code of the first memory execution instruction of the first execution group;
determining whether the first memory execution instruction of the first execution group is an instruction that needs to participate in a memory barrier based on a value of a barrier indicator of the first memory execution instruction of the first execution group.
12. The instruction execution method of claim 4, wherein the plurality of execution groups each further execute at least one second memory execution instruction, the first memory barrier instruction further to barrier the at least one second memory execution instruction within the thread group,
the instruction execution method further comprises: supervising execution of all second memory execution instructions in the plurality of execution groups to obtain a second supervised result,
wherein the second supervision result comprises a second thread group memory instruction count value corresponding to a first memory barrier instruction of the first execution group,
executing a first memory barrier instruction of the first execution group, further comprising:
obtaining second memory type information and a second thread group waiting value corresponding to the second memory type information according to a first memory barrier instruction of the first execution group;
acquiring a memory instruction count value of the second thread group;
in response to the second thread group memory instruction count value being greater than the second thread group wait value, indicating that the first execution group does not continue to execute instructions subsequent to a first memory barrier instruction of the first execution group;
in response to the second thread group memory instruction count value being less than or equal to the second thread group wait value, instructing the first execution group to continue executing instructions subsequent to the first memory barrier instruction of the first execution group.
13. The instruction execution method of claim 12, wherein the second thread group memory instruction count value indicates a maximum value of a second number of outstanding instructions corresponding to the execution groups, respectively, in a current cycle, and the second number of outstanding instructions corresponding to each execution group indicates a number of second memory execution instructions that have been issued but have not yet been executed in the current cycle in each execution group.
14. The instruction execution method of claim 12, wherein the first memory type information and the second memory type information are different.
15. The instruction execution method of claim 12, wherein the barrier register set further comprises a fourth barrier storage area and a fifth barrier storage area, the fourth barrier storage area is configured to store the second thread group memory instruction count value, the fifth barrier storage area is configured to store a second memory barrier count pair group corresponding to the second memory type information, the second memory barrier count pair group comprises at least one memory barrier count pair, each memory barrier count pair comprises an instruction issue number and an instruction complete number;
supervising execution of all second memory execution instructions in the plurality of execution groups to obtain a second supervised result, comprising: executing a second memory execution instruction of the first execution group,
wherein the memory type information in the second memory execution instruction of the first execution group is the second memory type information,
executing a second memory execution instruction of the first execution group, comprising:
acquiring a second memory barrier index corresponding to a second memory execution instruction of the first execution group, and transmitting the second memory execution instruction of the first execution group and the second memory barrier index;
after the second memory execution instruction and the second memory barrier index of the first execution group are transmitted, setting the second thread group memory instruction count value, the instruction transmission number and the instruction completion number in the memory barrier count pair indicated by the second memory barrier index in the second memory barrier count pair group.
16. The instruction execution method of claim 15, wherein the barrier register set further comprises a third barrier storage area to store an execution group total, the execution group total representing a number of the plurality of execution groups,
setting the second thread group memory instruction count value, the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the second memory barrier index in the second memory barrier count pair group, including:
in response to the instruction issue count in the memory barrier count pair indicated by the second memory barrier index in the second memory barrier count-to-pair group being a predetermined issue count, increasing the second thread group memory instruction count value by a first value to update the second thread group memory instruction count value, and increasing the instruction issue count in the memory barrier count pair indicated by the second memory barrier index by a second value to update the instruction issue count in the memory barrier count pair indicated by the second memory barrier index;
in response to completion of execution of a second memory execute instruction of the first execution group, increasing the completion number of instructions in the memory barrier count pair indicated by the second memory barrier index by the second value to update the completion number of instructions in the memory barrier count pair indicated by the second memory barrier index;
and in response to that the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the second memory barrier index are equal to and equal to the total number of the execution groups, clearing the instruction issue number and the instruction completion number in the memory barrier count pair indicated by the second memory barrier index, and reducing the second thread group memory instruction count value by the second value to update the second thread group memory instruction count value.
17. The instruction execution method of claim 4, wherein the plurality of execution groups further comprises a second execution group,
supervising execution of all first memory execution instructions in the plurality of execution groups to obtain a first supervised result, further comprising: executing the first memory execution instruction of the second execution group,
wherein the memory type information in the first memory execution instruction of the second execution group is the first memory type information,
executing the first memory execution instructions of the second execution group, comprising:
acquiring a third memory barrier index corresponding to the first memory execution instruction of the second execution group, and transmitting the first memory execution instruction of the second execution group and the third memory barrier index;
after the first memory execution instruction and the third memory barrier index of the second execution group are transmitted, setting the first thread group memory instruction count value, the instruction transmission number and the instruction completion number in the memory barrier count pair indicated by the third memory barrier index in the first memory barrier count pair group.
18. An instruction execution device applied to a thread group, wherein the thread group comprises a plurality of execution groups, the plurality of execution groups comprises a first execution group, each of the plurality of execution groups executes at least one first memory execution instruction and a first memory barrier instruction, the first memory barrier instruction is used for blocking the at least one first memory execution instruction inside the thread group,
the instruction execution device comprises an instruction supervision unit and a barrier instruction execution unit,
the instruction monitor unit is configured to monitor execution of all first memory execution instructions in the plurality of execution groups to obtain a first monitor result, wherein the first monitor result comprises a first thread group memory instruction count value corresponding to a first memory barrier instruction of the first execution group,
the barrier instruction execution unit is configured to execute a first memory barrier instruction of the first execution group,
the barrier instruction execution unit comprises an instruction fetching module, a decoding module, an acquisition module and an execution module,
the fetch module is configured to fetch a first memory barrier instruction of the first execution group;
the decoding module is configured to analyze the first memory barrier instruction of the first execution group to obtain first memory type information in the first memory barrier instruction of the first execution group and a first thread group waiting value corresponding to the first memory type information;
the obtaining module is configured to obtain the first thread group memory instruction count value;
the execution module is configured to: in response to the first thread group memory instruction count value being greater than the first thread group wait value, indicating that the first execution group does not continue to execute instructions subsequent to a first memory barrier instruction of the first execution group; in response to the first thread group memory instruction count value being less than or equal to the first thread group wait value, instructing the first execution group to continue executing instructions subsequent to a first memory barrier instruction of the first execution group.
19. The instruction execution apparatus of claim 18, wherein the first thread group memory instruction count value is indicative of a maximum of a first number of outstanding instructions for the plurality of execution groups in a current cycle, and the first number of outstanding instructions for each execution group is indicative of a number of first memory execution instructions that have been issued but not yet executed in the current cycle for the each execution group.
CN202111027116.0A 2021-09-02 2021-09-02 Instruction execution method and instruction execution device Active CN113721987B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111027116.0A CN113721987B (en) 2021-09-02 2021-09-02 Instruction execution method and instruction execution device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111027116.0A CN113721987B (en) 2021-09-02 2021-09-02 Instruction execution method and instruction execution device

Publications (2)

Publication Number Publication Date
CN113721987A true CN113721987A (en) 2021-11-30
CN113721987B CN113721987B (en) 2022-07-05

Family

ID=78681184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111027116.0A Active CN113721987B (en) 2021-09-02 2021-09-02 Instruction execution method and instruction execution device

Country Status (1)

Country Link
CN (1) CN113721987B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896079A (en) * 2022-05-26 2022-08-12 上海壁仞智能科技有限公司 Instruction execution method, processor and electronic device
CN115016953A (en) * 2022-06-02 2022-09-06 上海壁仞智能科技有限公司 Machine readable medium storing program, computer system, and method of operation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050033A (en) * 2013-03-15 2014-09-17 辉达公司 System and method for hardware scheduling of indexed barriers
US20140282564A1 (en) * 2013-03-15 2014-09-18 Eli Almog Thread-suspending execution barrier
CN106095552A (en) * 2016-06-07 2016-11-09 华中科技大学 A kind of Multi-Task Graph processing method based on I/O duplicate removal and system
CN113298691A (en) * 2020-02-24 2021-08-24 英特尔公司 Barrier synchronization mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050033A (en) * 2013-03-15 2014-09-17 辉达公司 System and method for hardware scheduling of indexed barriers
US20140282564A1 (en) * 2013-03-15 2014-09-18 Eli Almog Thread-suspending execution barrier
CN106095552A (en) * 2016-06-07 2016-11-09 华中科技大学 A kind of Multi-Task Graph processing method based on I/O duplicate removal and system
CN113298691A (en) * 2020-02-24 2021-08-24 英特尔公司 Barrier synchronization mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
迟利华等: "FitenBLAS:面向FT1000微处理器的高性能线性代数库", 《湖南大学学报(自然科学版)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114896079A (en) * 2022-05-26 2022-08-12 上海壁仞智能科技有限公司 Instruction execution method, processor and electronic device
CN114896079B (en) * 2022-05-26 2023-11-24 上海壁仞智能科技有限公司 Instruction execution method, processor and electronic device
CN115016953A (en) * 2022-06-02 2022-09-06 上海壁仞智能科技有限公司 Machine readable medium storing program, computer system, and method of operation
CN115016953B (en) * 2022-06-02 2024-03-22 上海壁仞智能科技有限公司 Machine-readable medium, computer system, and method of operation having stored thereon a program

Also Published As

Publication number Publication date
CN113721987B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN109522254B (en) Arithmetic device and method
US9477465B2 (en) Arithmetic processing apparatus, control method of arithmetic processing apparatus, and a computer-readable storage medium storing a control program for controlling an arithmetic processing apparatus
EP0243892B1 (en) System for guaranteeing the logical integrity of data
CN113721987B (en) Instruction execution method and instruction execution device
US10346507B2 (en) Symmetric block sparse matrix-vector multiplication
US10007527B2 (en) Uniform load processing for parallel thread sub-sets
EP3832499A1 (en) Matrix computing device
US9268595B2 (en) Scheduling thread execution based on thread affinity
CN104050033A (en) System and method for hardware scheduling of indexed barriers
US10268519B2 (en) Scheduling method and processing device for thread groups execution in a computing system
CN104050032A (en) System and method for hardware scheduling of conditional barriers and impatient barriers
US9317456B2 (en) Method and system for performing event-matching with a graphical processing unit
US20140143524A1 (en) Information processing apparatus, information processing apparatus control method, and a computer-readable storage medium storing a control program for controlling an information processing apparatus
US20220121444A1 (en) Apparatus and method for configuring cooperative warps in vector computing system
JP6493088B2 (en) Arithmetic processing device and control method of arithmetic processing device
CN114153500A (en) Instruction scheduling method, instruction scheduling device, processor and storage medium
TW201337829A (en) Shaped register file reads
US10152328B2 (en) Systems and methods for voting among parallel threads
US20220220644A1 (en) Warp scheduling method and stream multiprocessor using the same
CN108021563B (en) Method and device for detecting data dependence between instructions
CN116069480B (en) Processor and computing device
CN112463218B (en) Instruction emission control method and circuit, data processing method and circuit
Schmalstieg et al. Augmented reality–principles and practice tutorial
US20130166887A1 (en) Data processing apparatus and data processing method
TWI428833B (en) Multi-thread processors and methods for instruction execution and synchronization therein and computer program products thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant