CN116069394A - Finger picking method, finger picking device and storage medium - Google Patents

Finger picking method, finger picking device and storage medium Download PDF

Info

Publication number
CN116069394A
CN116069394A CN202211500583.5A CN202211500583A CN116069394A CN 116069394 A CN116069394 A CN 116069394A CN 202211500583 A CN202211500583 A CN 202211500583A CN 116069394 A CN116069394 A CN 116069394A
Authority
CN
China
Prior art keywords
instructions
instruction
sub
region
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211500583.5A
Other languages
Chinese (zh)
Inventor
喻琛
左航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202211500583.5A priority Critical patent/CN116069394A/en
Publication of CN116069394A publication Critical patent/CN116069394A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The present disclosure provides a finger picking method, a finger picking device and a computer readable storage medium. The instruction fetching method is used for fetching the instruction for the thread bundle, the thread bundle corresponds to the instruction temporary storage area, the instruction temporary storage area comprises a first sub-area, and the instruction fetching method comprises the following steps: acquiring a first set of instructions; pre-parsing a first set of instructions to determine a first branch instruction, wherein the first set of instructions includes the first branch instruction; responding to the first subarea to be empty or store invalid instructions, fetching a finger from a target address of the first branch instruction to acquire a second group of instructions; a second set of instructions is stored to the first sub-region.

Description

Finger picking method, finger picking device and storage medium
Technical Field
Embodiments of the present disclosure relate to a finger picking method, a finger picking device, and a computer-readable storage medium.
Background
Branch instructions are a common class of instructions for both the CPU (central processing unit) instruction set and the GPU (graphics processing unit) instruction set, which represent instructions that alter program flow, which may cause a PC (program counter) of a GPGPU (general purpose graphics processing unit, general purpose graphics processor) to jump. Branch instructions may be divided into immediate jump branch instructions and conditional jump branch instructions, depending on whether a jump condition exists; branch instructions may be divided into direct jump branch instructions and indirect jump branch instructions from jump addresses. An immediate jump branch instruction indicates that, when the branch instruction is executed, the PC pointer points to the target address specified by the branch instruction, rather than the next address to which the PC pointer originally points. Conditional jump branch instructions indicate that when a branch instruction is executed, the PC instruction points to the target address specified by the branch instruction when the jump condition specified by the branch instruction is satisfied, and the PC pointer does not jump, i.e., points to the next address originally pointed to, when the jump condition specified by the instruction is not satisfied. A direct jump branch instruction indicates that the target address of the branch instruction is obtained by the current PC plus an immediate address offset in the instruction. An indirect jump branch instruction indicates that the target address of the branch instruction is provided by a register, so the register needs to be read to obtain the target address.
Disclosure of Invention
At least one embodiment of the present disclosure provides a fetching method for fetching a thread bundle, where the thread bundle corresponds to an instruction temporary storage area, the instruction temporary storage area includes a first sub-area, and the fetching method includes: acquiring a first set of instructions; pre-parsing the first set of instructions to determine a first branch instruction, wherein the first set of instructions includes the first branch instruction; responding to the first sub-region being empty or storing an invalid instruction, fetching a finger from a target address of the first branch instruction to acquire a second group of instructions; storing the second set of instructions to the first sub-region.
For example, in the instruction fetching method provided in at least one embodiment of the present disclosure, the instruction temporary storage area further includes a second sub-area and a third sub-area, the first set of instructions is stored in the second sub-area or the third sub-area, and the instruction fetching method further includes: determining an instruction execution result obtained by executing the first branch instruction; in response to the instruction execution result indicating that a jump is required, jumping to execute the second set of instructions stored in the first sub-region.
For example, the finger fetching method provided in at least one embodiment of the present disclosure further includes: after skipping execution of the second set of instructions stored in the first sub-region, fetching a third set of instructions, wherein addresses of instructions in the second set of instructions and addresses of instructions in the third set of instructions are consecutive; storing the third set of instructions to the second sub-region or the third sub-region.
For example, the finger fetching method provided in at least one embodiment of the present disclosure further includes: after storing the third set of instructions to the second sub-region or the third sub-region, in response to not including a branch instruction in the second set of instructions and the first sub-region depositing at least one valid unexecuted instruction, aborting the instruction fetch operation.
For example, the finger fetching method provided in at least one embodiment of the present disclosure further includes: pre-parsing the second set of instructions stored in the first sub-region after jumping to execute the second set of instructions to determine second branch instructions, wherein the second set of instructions includes the second branch instructions; fetching a finger from a target address of the second branch instruction to obtain a fifth group of instructions; store the fifth set of instructions to the second sub-region or the third sub-region.
For example, the finger fetching method provided in at least one embodiment of the present disclosure further includes: after jumping to execute the second set of instructions stored in the first sub-region, all unexecuted instructions in the second and third sub-regions are set as invalid instructions.
For example, the finger fetching method provided in at least one embodiment of the present disclosure further includes: and responding to the instruction execution result to indicate that jump is not needed, executing the instructions after the first branch instruction, and setting all the non-executed instructions in the first subarea as invalid instructions.
For example, in the instruction fetching method provided in at least one embodiment of the present disclosure, the instruction temporary storage area further includes a second sub-area, and the obtaining the first set of instructions includes: in response to the second sub-region being empty or storing an invalid instruction, the first set of instructions is acquired, the instruction fetch method further comprising: storing the first set of instructions to the second sub-region.
For example, in the instruction fetching method provided in at least one embodiment of the present disclosure, the instruction temporary storage area further includes a third sub-area, and the instruction fetching method further includes: responsive to the third sub-region being empty or depositing an invalid instruction, a fourth set of instructions is obtained, wherein an address of an instruction in the fourth set of instructions and an address of an instruction in the first set of instructions are consecutive; storing the fourth set of instructions to the third sub-region.
For example, in an instruction fetching method provided by at least one embodiment of the present disclosure, the number of instructions in the first set of instructions is the same as the number of instructions in the fourth set of instructions.
For example, in an instruction fetch method provided by at least one embodiment of the present disclosure, pre-parsing the first set of instructions to determine a first branch instruction includes: pre-parsing all instructions in the first set of instructions; determining that the first group of instructions comprises at least one branch instruction, and taking the branch instruction which is executed first according to the execution sequence in the at least one branch instruction as the first branch instruction.
For example, in the instruction fetching method provided in at least one embodiment of the present disclosure, the instruction temporary storage area further includes a fourth sub-area, and the instruction fetching method further includes: pre-resolving the first set of instructions to determine a third branch instruction, wherein the first set of instructions further includes the third branch instruction; responding to the fourth sub-region being empty or storing an invalid instruction, fetching a finger from a target address of the third branch instruction to acquire a sixth group of instructions; storing the sixth set of instructions to the fourth sub-region.
At least one embodiment of the present disclosure provides a fetching device, configured to fetch a thread bundle, where the thread bundle corresponds to an instruction temporary storage area, the instruction temporary storage area includes a first sub-area, and the fetching device includes: an instruction fetch unit configured to fetch a first set of instructions; a branch instruction determination unit configured to pre-parse the first set of instructions to determine a first branch instruction, wherein the first set of instructions includes the first branch instruction; the instruction fetch unit is further configured to fetch, in response to the first sub-region being empty or holding an invalid instruction, from a target address of the first branch instruction to obtain a second set of instructions; storing the second set of instructions to the first sub-region.
At least one embodiment of the present disclosure provides an instruction fetching device, comprising a memory and a processor, wherein the memory stores computer executable instructions adapted to be executed by the processor, which when executed by the processor, perform one or more steps in an instruction fetching method according to any embodiment of the present disclosure.
At least one embodiment of the present disclosure provides a computer-readable storage medium having non-transitory computer-executable instructions stored thereon, wherein the computer-executable instructions, when executed by a computer, perform one or more steps in a finger extraction method according to any embodiment of the present disclosure.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.
FIG. 1 is a schematic diagram of a finger picking process;
FIG. 2 is a schematic flow chart diagram of a finger approach provided by at least one embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an instruction temporary storage area provided in accordance with at least one embodiment of the present disclosure;
FIG. 4 is a schematic flow chart of an instruction temporary storage area for storing instructions provided in at least one embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating a jump of an instruction read pointer in an instruction temporary storage area when executing an instruction according to at least one embodiment of the present disclosure;
FIG. 6 is a schematic flow chart diagram of an instruction temporary storage area for storing instructions provided in accordance with at least one embodiment of the present disclosure;
fig. 7 is a schematic diagram of a finger-picking device according to at least one embodiment of the present disclosure;
FIG. 8 is a schematic block diagram of a finger-picking device provided in at least one embodiment of the present disclosure; and
fig. 9 is a schematic diagram of a computer-readable storage medium provided in at least one embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
In the description of the present disclosure, an instruction execution result of a branch instruction is indicated as requiring a jump to be called "token", and an instruction execution result of a branch instruction is indicated as not requiring a jump to be called "not-token".
Fetching in a GPGPU is based on a single thread bundle (warp) in which corresponding instructions are deposited in advance in allocated instruction segments before the GPGPU executes a piece of code. A single thread bundle requires fetching instructions one by one, and then decoding and execution. However, because of the bandwidth capabilities of GPGPU cache system designs, a fetch for a single thread bundle may typically retrieve multiple links at once
The instruction of the subsequent address is placed locally so that subsequent execution is satisfied. Each thread bundle has a buffer space for temporarily storing 5 fetched instructions, i.e. an instruction temporary storage area. Typically, the instructions are temporarily deposited
The size of the region may cover the number of instructions fetched for multiple fetching. Therefore, when the temporary storage area of the instruction is not full of instructions, the instruction fetching unit corresponding to the thread bundle can continue to fetch a plurality of instructions for the thread bundle in advance, so that the execution bandwidth of the subsequent execution unit is met as much as possible.
Fig. 1 shows a schematic diagram of a finger taking process. In each instruction fetching process, the instruction fetching unit 0 corresponding to the thread bundle can fetch a plurality of instructions at one time, and the data of the instruction fetched at one time is assumed to be 8
And dwords (double words), each word is 2 bytes (byte) long, and assuming that the width of each instruction is 1 dword, 8 instructions can be fetched at a time. As shown in fig. 1, the temporary storage area for instructions is a storage array capable of accommodating 16 dwords, where the storage array includes 4 rows L00-L03, and each row can accommodate 4 dwords (i.e., each rectangular block in fig. 1 can accommodate 5 dwords). In the first fetch operation, the 8 instructions fetched by the fetch unit may be stored
In lines L00 and L01, next, in a second fetch operation, the 8 instructions fetched by the fetch unit will be deposited in lines L02 and L03. While the fetched instructions are stored in the temporary storage area, the instruction preresolution unit preresolution the temporarily stored instructions to analyze whether the temporarily stored instructions are stored in the temporary storage area
If a branch instruction is included, then no instruction prefetch is performed, at which point the 0 instruction preresolution unit may issue a abort instruction prefetch to the instruction fetch unit to instruct the instruction fetch unit to abort the instruction
The prefetch is made. After instruction prefetching is suspended, all branch instructions stored in the instruction temporary storage area are executed, if the execution results of all branch instructions stored in the instruction temporary storage area are "not-token", instruction prefetching operation for the thread bundle is resumed, and at this time, the instruction preresolution unit may send a resume instruction prefetch to the instruction fetch unit to instruct the instruction fetch unit to resume instruction prefetching.
5 as shown in fig. 1, the instruction execution unit may fetch and execute a branch instruction from the instruction temporary storage area,
if the execution result of one branch instruction stored in the instruction temporary storage area is 'task', the instruction execution unit may set all the remaining unexecuted instructions in the instruction temporary storage area to be invalid, and at the same time, the instruction execution unit prefetches the send instruction to the instruction fetch unit to notify the instruction fetch unit to fetch the instruction from the target address of the branch instruction.
0 the above finger picking process has the following problems: first when there is a branch instruction in the fetched instruction,
the instruction fetching unit pauses to prefetch the instruction for the thread bundle, if the instruction execution results obtained by executing the branch instructions subsequently are not-token, that means that the PC does not need to jump, the time for pausing the instruction fetching unit to prefetch the thread bundle is equivalent to being wasted, which affects the IPC (instructions per clock, which measures the instruction execution speed of the processor, the IPC represents how many instructions can be executed by the processor on average in a single machine cycle) of the whole GPGPU; second, when there is a branch instruction in the fetched instruction, the instruction fetch unit will suspend prefetching instruction for the thread bundle, if there is a branch instruction whose instruction execution result is "token" in these branch instructions and the branch instruction is not currently in an execution cycle, when it is required to wait until the execution of the branch instruction, the thread bundle finds that the PC needs to jump, and invalidates the non-executed instructions in the instruction temporary storage area, and then notifies the instruction fetch unit to fetch instruction at the target address of the branch instruction, then the thread bundle is in an idle state before the instruction at the target address of the branch instruction is fetched, which also affects IPC of the whole GPGPU.
In response to the deficiencies of the above-described approaches, at least one embodiment of the present disclosure provides a fetching method for fetching a thread bundle. The thread bundle corresponds to an instruction temporary storage area, the instruction temporary storage area comprises a first sub-area, and the instruction fetching method comprises the following steps: acquiring a first set of instructions; pre-parsing a first set of instructions to determine a first branch instruction, wherein the first set of instructions includes the first branch instruction; responding to the first subarea to be empty or store invalid instructions, fetching a finger from a target address of the first branch instruction to acquire a second group of instructions; a second set of instructions is stored to the first sub-region.
In the instruction fetching method provided by the embodiment of the disclosure, when the instruction is pre-parsed and the existence of the branch instruction is determined, a group of instructions are fetched in advance from the target address of the branch instruction which is not executed yet and stored in the instruction temporary storage area, when the instruction execution result of the branch instruction is 'token', the PC jumps, namely, the instruction which is read from the instruction temporary storage area to the target address of the branch instruction can be immediately executed without stopping the thread bundle, so that the time overhead of the jump execution of the PC caused by the branch instruction is reduced to a certain extent, the IPC of the GPGPU is improved to a certain extent, the idle time of the thread bundle is reduced, and the instruction execution efficiency is improved.
At least one embodiment of the present disclosure also provides an index device and a computer-readable storage medium.
The finger taking method provided by the embodiment of the disclosure can be applied to the finger taking device provided by the embodiment of the disclosure, and the finger taking device can be configured on electronic equipment. The electronic device may be a personal computer, a mobile terminal, etc., and the mobile terminal may be a hardware device such as a smart phone, a tablet computer, etc.
In the present disclosure, "branch instruction fetch operation" means an operation in which an instruction fetch unit performs instruction fetch from a target address of a branch instruction, and "normal instruction fetch operation" means an operation in which an instruction fetch unit performs instruction fetch from an address corresponding to a normal instruction flow, both of which are executed by the instruction fetch unit.
In this disclosure, "valid unexecuted instructions" means instructions that belong to a current instruction sequence executed by a processor and that have not yet been executed by the processor and that subsequently need to be executed by the processor, and "invalid instructions" can be divided into two types: the first is that an instruction has been executed by the processor and then the instruction will be deemed invalid, and the second is that an instruction has not been executed by the processor, but that part of the instruction does not need to be executed by the processor afterwards due to the occurrence of some specific event.
Embodiments of the present disclosure will be described in detail below with reference to the attached drawings, but the present disclosure is not limited to these specific embodiments. In order to keep the following description of the embodiments of the present disclosure clear and concise, the present disclosure omits a detailed description of some known functions and known components.
Fig. 2 is a schematic flow chart of a method for fetching instructions according to at least one embodiment of the present disclosure, and fig. 3 is a schematic diagram of a temporary storage area for instructions according to at least one embodiment of the present disclosure.
For example, the fetching method is applied to a thread bundle, i.e. is used to fetch a finger for a thread bundle. A thread bundle is a set of threads executing on a single PC (a set of threads includes multiple threads (threads)) and is also the smallest unit of instruction scheduling in a GPGPU. Multiple threads in each thread bundle may execute in parallel. In some examples, each thread bundle may include 32 threads, 64 threads, and so on. It should be noted that the number of threads included in the thread bundle may be set according to an actual hardware circuit condition, which is not particularly limited by the embodiment of the present disclosure.
For example, each thread bundle corresponds to an instruction temporary storage area for storing instructions fetched for the thread bundle in advance, and the instruction temporary storage area may include a plurality of rows and columns of memory cells, as shown in fig. 3, and the instruction temporary storage area may include 6 rows and 4 columns of memory cells, and in fig. 3, each rectangular block Su represents a memory cell. In some examples, each storage unit may store at least one instruction. In the description of the present disclosure, for clarity and brevity, an instruction is stored in each storage unit as an example.
It is to be noted that the instruction temporary storage area shown in fig. 3 is merely illustrative, and the positions of the respective memory cell rows in the instruction temporary storage area shown in fig. 3 are not arranged in the order of their addresses.
For example, the instruction temporary holding area may include a plurality of sub-areas, each of which may store all instructions fetched by one instruction fetch operation (normal instruction fetch operation or branch instruction fetch operation). It should be noted that, for clarity and brevity, the embodiment of the present disclosure describes taking an example in which all instructions retrieved by the fetching operation are stored only once in each sub-area, however, the present disclosure is not limited thereto, and the division of the sub-areas may be set according to practical situations, for example, it may also be set that each sub-area stores all instructions retrieved by the fetching operation at least twice. In the embodiments of the present disclosure, the sub-regions are provided only for convenience of description, and in an actual scenario, there is no actual division of each sub-region in the instruction temporary holding area.
For example, as shown in fig. 3, the instruction temporary deposit area includes a first sub-area R1. As shown in fig. 2, the finger picking method includes the following steps S10 to S40.
Step S10: a first set of instructions is obtained.
Step S20: the first set of instructions is preresolved to determine a first branch instruction, wherein the first set of instructions includes the first branch instruction.
Step S30: in response to the first sub-region being empty or a deposit invalidate instruction, fetching from a target address of the first branch instruction to obtain a second set of instructions.
Step S40: a second set of instructions is stored to the first sub-region.
For example, the first sub-region R1 may include at least one memory cell row, each including at least one memory cell, and in the example shown in fig. 3, the first sub-region R1 may include a memory cell row L0 and a memory cell row L1, the first sub-region R1 may store at least all instructions fetched for one branch instruction fetch operation of the branch instruction, and the first sub-region R1 may be used to store a second set of instructions pre-fetched from the target address of the first branch instruction.
In the present disclosure, for a pre-fetching instruction of a branch instruction (for example, the branch instruction may be a first non-executed branch instruction), a space may be opened up separately in an instruction temporary storage area of a thread bundle for storing an instruction pre-fetched from a target address of the branch instruction, so that when a result of execution of the branch instruction is "token", a problem that a remaining non-executed instruction in the instruction temporary storage area is set to be invalid, a fetching unit still starts fetching instruction from the target address of the branch instruction is avoided, and when the result of execution of the branch instruction is "token", the thread bundle is in idle time, thereby improving IPC of the GPGPU, and improving effects of instruction execution and fetching.
For example, as shown in FIG. 3, in one embodiment, the instruction temporary holding section further includes a second sub-region R2, and the second sub-region R2 may include a memory cell row L2 and a memory cell row L3.
For example, in step S10, a first set of instructions may be retrieved from external storage based on the PC of the main flow of instruction execution. Step S10 may include: and acquiring the first group of instructions in response to the second sub-region being empty or storing the invalid instructions. In embodiments of the present disclosure, when acquiring an instruction, it is first required to determine whether the instruction temporary storage area has free storage space for storing the retrieved instruction; and executing instruction fetching operation to fetch the instruction when the temporary storage area of the instruction is determined to have free storage space, and storing the fetched instruction into the corresponding storage space.
For example, in some embodiments, the fingering method further comprises: the first set of instructions is stored to the second sub-region.
For example, the first set of instructions is instructions that are fetched for a normal instruction fetch operation, and in some embodiments, the first set of instructions may include 8 instructions. If the memory cells in the second sub-area are empty or store invalid instructions, a normal instruction fetch operation may be performed to fetch the first set of instructions, and then the first set of instructions may be stored in the memory cell line L2 and the memory cell line L3 in the second sub-area.
For example, as shown in FIG. 3, in one embodiment, the instruction temporary holding section further includes a third sub-region R3, and the third sub-region R3 may include a memory cell row L4 and a memory cell row L5.
For example, in some embodiments, the fingering method further comprises: responsive to the third sub-region being empty or depositing an invalid instruction, a fourth set of instructions is obtained, wherein the addresses of instructions in the fourth set of instructions and the addresses of instructions in the first set of instructions are consecutive; a fourth set of instructions is stored to the third sub-region.
In embodiments of the present disclosure, the number of instructions fetched by the fetch unit in each fetch operation (normal fetch operation or branch fetch operation) may be a fixed value, e.g., 8 instructions, 16 instructions, etc. may be fetched per fetch operation. For example, the number of instructions in the first set of instructions, the number of instructions in the second set of instructions, and the number of instructions in the fourth set of instructions may be the same, e.g., in one example, 8 instructions each.
It should be noted that, in some embodiments, the first set of instructions may be stored to the third sub-region R3, and the fourth set of instructions may be stored to the second sub-region R2, and the storage locations of the first set of instructions and the fourth set of instructions are not particularly limited in the embodiments of the present disclosure.
For example, in some embodiments, the time for acquiring the first set of instructions is earlier than the time for acquiring the fourth set of instructions, at this time, after the first set of instructions is stored in the memory cell line L2 and the memory cell line L3 in the second sub-region R2, it may be determined whether the memory cell line L4 and the memory cell line L5 in the third sub-region R3 are both empty or storing invalid instructions, when the memory cell line L4 and the memory cell line L5 are both empty or storing invalid instructions, whether the first set of instructions includes a branch instruction or not, the normal instruction acquiring operation may be performed again, that is, the fourth set of instructions is acquired, and the fourth set of instructions is stored in the memory cell line L4 and the memory cell line L5.
For example, in other embodiments, the time for acquiring the first set of instructions is later than the time for acquiring the fourth set of instructions, at this time, after acquiring the fourth set of instructions and depositing the fourth set of instructions in the memory cell line L4 and the memory cell line L5 in the third sub-region R3, it may be determined whether the memory cell line L2 and the memory cell line L3 in the second sub-region R2 are both empty or depositing invalid instructions, when the memory cell line L2 and the memory cell line L3 are both empty or depositing invalid instructions, whether the fourth set of instructions include branch instructions or not, the normal instruction fetching operation may be performed again, that is, the first set of instructions is acquired, and the first set of instructions is stored in the memory cell line L2 and the memory cell line L3.
In the above description, taking the instruction temporary storage area including the first sub-area R1, the second sub-area R2, and the third sub-area R3 as an example, the disclosure is not limited thereto, in the case where the time for acquiring the first set of instructions is earlier than the time for acquiring the fourth set of instructions, after the first set of instructions is acquired and the first set of instructions is stored in the memory cell row L2 and the memory cell row L3 in the second sub-area, other sub-areas in the instruction temporary access area for storing the instructions retrieved by the normal instruction fetching operation may be judged to determine whether each sub-area in the instruction temporary access area for storing the instructions retrieved by the normal instruction fetching operation stores at least one valid non-executed instruction, and when each sub-area in the instruction temporary access area for storing the instructions retrieved by the normal instruction fetching operation stores at least one valid non-executed instruction, the normal instruction fetching operation is suspended; and when the instruction temporary access area at least further comprises a free storage space for storing all instructions fetched by the normal instruction fetching operation once, continuing to execute the normal instruction fetching operation. For example, the free memory space represents a space in the instruction temporary access area for storing instructions fetched by a normal instruction fetch operation, and may include a space not storing instructions and/or a space occupied by invalid instructions.
In the embodiment of the present disclosure, even if the instruction stored in the second sub-region R2 or the third sub-region R3 includes a branch instruction, the normal instruction fetch operation is not suspended, but the normal instruction fetch operation is continued to be executed until each sub-region in the instruction temporary storage area for storing the instruction fetched by the normal instruction fetch operation stores at least one valid non-executed instruction or the free storage space in the instruction temporary access area for storing the instruction fetched by the normal instruction fetch operation is insufficient to store the instruction fetched by the normal instruction fetch operation once, so that the execution bandwidth of the subsequent execution unit is satisfied as much as possible.
For example, in some embodiments, step S20 may include: pre-parsing all instructions in the first set of instructions; the first group of instructions is determined to include at least one branch instruction, and a branch instruction of the at least one branch instruction in the first group of instructions that is executed first in an execution order is taken as a first branch instruction.
For example, in step S20, when the second set of instructions is acquired, the first branch instruction is not yet executed, so that the instruction fetching method provided by the embodiment of the disclosure may be used to prefetch an instruction for a branch instruction, so that when the instruction execution result of the branch instruction is "taken", the PC jumps, that is, the instruction that is immediately read from the instruction temporary storage area to the target address of the branch instruction is executed, without stopping the thread bundle, the time overhead of the jump execution of the PC caused by the branch instruction is reduced to a certain extent, thereby improving the IPC of the GPGPU to a certain extent, reducing the idle time of the thread bundle, and improving the efficiency of instruction execution.
For example, in step S30 and step S40, when the first group of instructions includes at least one branch instruction and when the memory cell line L0 and the memory cell line L1 of the first sub-region R1 are all empty or all storing invalid instructions, a branch instruction fetching operation may be performed for a branch instruction (i.e., the first branch instruction) that is executed first in the execution order of the at least one branch instruction, that is, a fetching operation is performed from a target address of the first branch instruction to obtain a second group of instructions, and then the pre-fetched second group of instructions is stored in the first sub-region R1. When the memory cell row L0 and the memory cell row L1 of the first sub-region R1 store at least one valid non-executed instruction, and the second sub-region R2 and the third sub-region R3 also store at least one valid non-executed instruction, the prefetch instruction is not performed for the first branch instruction until the instruction temporary storage area includes a free memory space capable of storing instructions fetched by one branch instruction fetch operation.
The following description will take an example in which the instruction temporary holding section includes only the first sub-section R1, the second sub-section R2, and the third sub-section R3. If the time of acquiring the first set of instructions is earlier than the time of acquiring the fourth set of instructions, in the embodiment of the present disclosure, after acquiring and storing the first set of instructions, the first set of instructions may be pre-parsed to determine whether the first set of instructions includes at least one branch instruction, if the first set of instructions does not include a branch instruction or the first set of instructions includes at least one branch instruction (not yet executed), when the memory cell line L4 and the memory cell line L5 are both empty or store invalid instructions, then the next normal instruction acquisition operation is performed, that is, the fourth set of instructions is acquired, and the pre-fetched fourth set of instructions is stored in the memory cell line L4 and the memory cell line L5; when the memory cell line L4 and the memory cell line L5 are not all empty or not fully storing invalid instructions, at this time, each sub-area in the instruction temporary access area for storing instructions retrieved by the normal instruction fetching operation stores at least one valid unexecuted instruction or the free memory space in the instruction temporary access area for storing instructions retrieved by the normal instruction fetching operation is insufficient to store instructions retrieved by the normal instruction fetching operation once, then the normal instruction fetching operation may be aborted. In addition, if the first set of instructions includes at least one branch instruction (not yet executed), when the memory cell line L0 and the memory cell line L1 are all empty or all storing invalid instructions, a branch instruction fetching operation is performed for a first branch instruction (for example, a first branch instruction) included in the first set of instructions, that is, fetching an instruction from a target address of the first branch instruction to obtain a second set of instructions, and then storing the prefetched second set of instructions in the memory cell line L0 and the memory cell line L1.
In embodiments of the present disclosure, a branch instruction fetch operation may be performed without aborting instruction prefetching for a thread bundle when an instruction is preresolved and a branch instruction is determined to be present, e.g., a normal instruction fetch operation may continue to be performed to fetch a fourth set of instructions when the first branch instruction is preresolved and a first set of instructions is determined to be present. Therefore, when the execution result of the branch instruction is 'not-token', the instruction after the branch instruction can be continuously executed, the idle time of a later execution unit due to the fact that the instruction in the temporary storage area of the instruction is not in place can be reduced to a certain extent, and the IPC of the GPGPU can be improved to a certain extent.
If the time for acquiring the first set of instructions is later than the time for acquiring the fourth set of instructions, the fourth set of instructions may be preresolved and whether the fourth set of instructions includes at least one branch instruction may be determined, and the subsequent operations are similar to the operations for preresolving the first set of instructions and determining whether the first set of instructions includes at least one branch instruction described above, and the repetition is omitted.
In the following description, unless otherwise specified, the timing of acquiring the first group of instructions is described as being earlier than the timing of acquiring the fourth group of instructions.
For example, in some embodiments, the fingering method further comprises: determining an instruction execution result obtained by executing the first branch instruction; in response to the instruction execution result indicating that a jump is required, the jump executes a second set of instructions stored in the first sub-area.
When executing the first branch instruction, a branch instruction result may be obtained, and if the instruction execution result indicates that a jump (i.e. "task") is required, the PC of the thread bundle jumps, so that the second set of instructions stored in the first sub-area is jumped to be executed, so that the next instruction to be executed will be the first instruction in the instructions pre-fetched for the first branch instruction, i.e. the first instruction stored in the first sub-area.
For example, in some embodiments, the fingering method further comprises: after jumping to execute the second set of instructions stored in the first sub-region, all unexecuted instructions in the second and third sub-regions are set as invalid instructions.
It should be noted that, in the present disclosure, when the instruction execution result obtained by executing the first branch instruction indicates that a jump instruction is required, all the remaining instructions except the instruction prefetched for the first branch instruction in the instruction temporary storage area may be set as invalid.
For example, if the instruction execution result from executing the first branch instruction indicates a jump (i.e., "token") is required, then the remaining unexecuted instructions in the second subregion are both invalidated and the previously prefetched instructions in the third subregion are both invalidated. It should be noted that, when there are no non-executed instructions in the second sub-area and/or no stored instructions in the third sub-area (i.e., all instructions in the instruction temporary storage area except the instruction pre-fetched for the first branch instruction have been executed or there are no instructions in the instruction temporary storage area except the instruction pre-fetched for the first branch instruction), then the operation of setting the instruction as an invalid instruction may not be executed.
For example, in some embodiments, the fingering method further comprises: pre-parsing the second set of instructions after jumping to execute the second set of instructions stored in the first sub-region to determine second branch instructions, wherein the second set of instructions includes the second branch instructions; fetching a finger from a target address of the second branch instruction to obtain a fifth group of instructions; store a fifth set of instructions to the second sub-region or the third sub-region.
For example, pre-parsing the second set of instructions to determine the second branch instruction may include: pre-parsing all instructions in the second set of instructions; responsive to determining at least one branch instruction included in the second set of instructions; and taking the branch instruction which is executed first according to the execution sequence in at least one branch instruction included in the second group of instructions as a second branch instruction.
For example, storing the fifth set of instructions to the second sub-region or the third sub-region may include: storing a fifth set of instructions to the third sub-region in response to the third set of instructions being stored to the second sub-region; in response to the third set of instructions being stored to the third sub-region, the fifth set of instructions is stored to the fourth sub-region.
In the embodiment of the disclosure, at least one sub-area needs to be reserved in the instruction temporary storage area for storing the instruction fetched by the branch instruction fetching operation, so that the branch instruction fetching operation can be executed after the branch instruction is obtained through pre-resolution. For example, in some embodiments, the fingering method further comprises: after storing the third set of instructions to the second sub-region or the third sub-region, in response to the second set of instructions not including the branch instruction and the first sub-region depositing at least one valid unexecuted instruction, aborting the instruction fetch operation.
For example, in some embodiments, the fingering method further comprises: after skipping execution of the second set of instructions stored in the first sub-region, fetching a third set of instructions, wherein the addresses of the instructions in the second set of instructions and the addresses of the instructions in the third set of instructions are consecutive; store a third set of instructions to the second sub-region or the third sub-region.
For example, the second branch instruction has not been executed while the third set of instructions was fetched.
For example, the number of instructions in the second set of instructions and the number of instructions in the third set of instructions may be the same, e.g., in one example, 8 instructions each.
It should be noted that, in the embodiment of the present disclosure, "after the second set of instructions stored in the first sub-area is executed by the jump" means that the jump is started and the second set of instructions stored in the first sub-area is executed, which does not mean that all instructions in the second set of instructions are executed, but only means that the PC has executed after the jump. After the PC jumps, the second set of instructions may be pre-parsed to determine if the branch instructions are included.
For example, in the embodiment of the present disclosure, after the second set of instructions stored in the first sub-area is executed by skipping, i.e. when the PC is skipped, since all instructions in the second sub-area and the third sub-area are set to be invalid, i.e. equivalent to that the second sub-area and the third sub-area each hold an invalid instruction or are empty, the second sub-area and the third sub-area can both receive and hold the instruction fetched by the instruction fetching unit, then the instruction prefetch may be triggered at this time, and the subsequent process will be similar to the above. It should be noted that, after the PC jumps, the instruction execution flow corresponding to the second group of instructions is changed to the main (non-branch) instruction execution flow, and at this time, the instruction prefetch operation executed for the third group of instructions is a normal instruction fetch operation, and if the second group of instructions includes a branch instruction, the instruction prefetch operation executed for the branch instruction in the second group of instructions is a branch instruction fetch operation. At this time, when the instruction stored in the first sub-area (i.e., the second group of instructions) includes a branch instruction (e.g., the second branch instruction), the instruction fetched in the normal instruction fetch operation is stored in the second sub-area or the third sub-area, and correspondingly, the instruction fetched in the branch instruction fetch operation for the second branch instruction is stored in the third sub-area or the second sub-area.
In the above description, the number of times of performing the normal finger operation may be 2 times, and the number of times of performing the branch finger operation may be 1 time. It should be noted that, the present disclosure is not limited thereto, and the branch instruction fetching operation may be performed multiple times according to the size of the storage space of the instruction temporary storage area.
For example, in some embodiments, the fingering method further comprises: and in response to the instruction execution result indicating that jump is not needed, executing instructions after the first branch instruction, and setting all non-executed instructions in the first subarea as invalid instructions.
For example, when the instruction execution result of executing the first branch instruction is that no jump (i.e., "not-tag") is required, then execution of the remaining unexecuted instructions in the second subregion will continue, while all instructions prefetched for the first branch instruction are set to invalid, e.g., all instructions prefetched for the first branch instruction in the first subregion are set to invalid. At this point, the PC of the thread bundle will not jump. If there are additional branch instructions following, then prefetching for additional branch instructions continues, the prefetched instructions may still be stored in the first sub-region, and so on. For example, if the first group of instructions further includes a next branch instruction different from the first branch instruction, when it is determined that the instruction execution result obtained by the first branch instruction is that no jump is required, a branch instruction fetching operation may be performed on the next branch instruction, and the fetched instruction may be stored in the first sub-area.
For example, in the embodiment of the present disclosure, corresponding instruction storage spaces may be set for multiple branch instructions in the instruction temporary storage area, so as to meet different requirements, further improve IPC of the GPGPU, and reduce idle time of thread bundles. For example, in some embodiments, the instruction temporary storage area further includes a fourth sub-area, and the instruction fetching method further includes: pre-parsing the first set of instructions to determine a third branch instruction, wherein the first set of instructions further includes the third branch instruction; responding to the fourth sub-area to be empty or store an invalid instruction, fetching a finger from a target address of the third branch instruction to acquire a sixth group of instructions; the sixth set of instructions is stored to the fourth sub-region.
In the above description, taking an example in which the instruction temporary storage area includes one sub-area for storing instructions fetched by the branch instruction fetching operation, where the number of branch instruction fetching operations is 1, the present disclosure is not limited thereto, but in embodiments of the present disclosure, a plurality of branch instruction fetching operations may be performed for a branch instruction, for example, in some embodiments, the instruction temporary storage area may include two sub-areas for storing instructions fetched by the branch instruction fetching operation performed for one branch instruction, where two branch instruction fetching operations may be performed for each branch instruction.
Fig. 4 is a schematic flow chart of an instruction temporary storage area for storing instructions provided in at least one embodiment of the present disclosure.
As shown in (1) of fig. 4, a normal instruction fetch operation (i.e., the first normal instruction fetch operation of fig. 4) is performed, for example, a first group of instructions is fetched and stored in the memory cell rows L2 to L3. As shown in (2) of fig. 4, in response to the memory cell rows L4-L5 being empty or storing invalid instructions, a next normal instruction fetch operation (i.e., a second normal instruction fetch operation of fig. 4) may be performed, e.g., a fourth set of instructions is fetched, and the first set of instructions is stored in the memory cell rows L2-L3. For example, when the second normal instruction fetching operation is performed, the pre-parsing and instruction execution operation may be performed, that is, the first group of instructions is pre-parsed and the instructions in the first group of instructions are executed, and if it is determined that the first group of instructions does not include a branch instruction, after all the instructions in the first group of instructions are executed, the fourth group of instructions may be continuously executed. For example, after the first group of instructions is stored in the memory cell rows L2 to L3 and the fourth group of instructions is stored in the memory cell rows L4 to L5, since the instruction temporary storage area does not have a free memory space to store the instruction retrieved by the next normal instruction fetching operation, the instruction fetching operation can be suspended. For example, as shown in (3) in fig. 4, the first set of instructions and the fourth set of instructions may be sequentially preresolved, and when the first set of instructions and/or the fourth set of instructions are preresolved and it is determined that one branch instruction is included in the first set of instructions and/or the fourth set of instructions, for example, the first set of instructions includes the first branch instruction, a branch instruction fetching operation (i.e., the first branch instruction fetching operation in fig. 4) may be performed with respect to the first branch instruction, for example, the second set of instructions is fetched, and the second set of instructions is stored in the memory cell rows L0 to L1. As shown in (4) of fig. 4, if the execution result of the first branch instruction is "token", execution of the second group of instructions in the memory cell lines L0 to L1 is started, and all instructions in the memory cell lines L2 to L5 are set as invalid instructions. As shown in (5) of fig. 4, after the second group of instructions in the memory cell rows L0 to L1 is skipped, the next normal instruction fetch operation (the third normal instruction fetch operation shown in fig. 4) may be continued, for example, the third group of instructions may be fetched and stored in the memory cell rows L2 to L3, and at this time, the memory cell rows L4 to L5 may store the instructions fetched by the branch instruction fetch operation. As shown in fig. 4 (6), after the second group of instructions in the memory cell rows L0 to L1 are executed in a skip manner, when the second group of instructions and/or the third group of instructions are pre-parsed and it is determined that one branch instruction is included in the second group of instructions and/or the third group of instructions, for example, the second group of instructions includes the second branch instruction, a branch instruction fetching operation (i.e., the second branch instruction fetching operation in fig. 4) may be executed with respect to the second branch instruction, for example, a fifth group of instructions is fetched, and the fifth group of instructions is stored in the memory cell rows L4 to L5. As shown in (7) of fig. 4, if the execution result of the second branch instruction is "token", execution of the fifth group of instructions in the memory cell rows L4 to L5 is started, and all instructions in the memory cell rows L0 to L3 are set as invalid instructions. As shown in fig. 4 (8), after the fifth group of instructions in the memory cell rows L4 to L5 is skipped, the next normal instruction fetch operation (the fourth normal instruction fetch operation shown in fig. 4) may be continued, and the obtained instructions may be stored in the memory cell rows L2 to L3, where the memory cell rows L0 to L1 may store the instructions retrieved by the branch instruction fetch operation, and so on.
It should be noted that, if the instruction is acquired, the instruction temporary storage area does not store instructions or all instructions stored are invalid instructions, then the operation of acquiring the instructions is instruction fetching operation; if the instruction temporary storage area stores at least one effective unexecuted instruction when the instruction is acquired, acquiring the instruction as a prefetching instruction operation; in addition, the branch instruction fetch operation is also typically a prefetch instruction operation. As shown in fig. 4, the first normal finger fetch operation is a finger fetch operation (instruction fetch), the second normal finger fetch operation, the third normal finger fetch operation, and the fourth normal finger fetch operation are all prefetch finger operations (instruction pre-fetch), and the first branch finger fetch operation and the second branch finger fetch operation are all prefetch finger operations (instruction pre-fetch for branch).
FIG. 5 is a schematic diagram illustrating a jump of an instruction read pointer in an instruction temporary storage area when executing an instruction according to at least one embodiment of the present disclosure.
Typically, the value of the instruction read pointer rd_ptr is sequentially incremented, the instruction read pointer rd_ptr follows a wrap-around mechanism, i.e. when the value of the instruction read pointer rd_ptr reaches a maximum value, the value of the instruction read pointer rd_ptr will become 0, i.e. from scratch. If a branch instruction "token" occurs, the instruction read pointer rd_ptr jumps to the first memory location of memory location row L0 or the first memory location of memory location row L4.
As shown in fig. 5, the memory cell row L2 includes memory cells S0 to S3, the memory cell row L3 includes memory cells S4 to S7, the memory cell row L3 includes memory cells S8 to S11, the memory cell row L4 includes memory cells S12 to S15, the memory cell row L0 includes memory cells S16 to S19, and the memory cell row L2 includes memory cells S20 to S23.
When the memory cell rows L2 to L5 each store an instruction, the thread bundle may be executed from a first instruction stored in the memory cell row L2, the first instruction being stored in the memory cell S0 of the memory cell row L2, and the instruction read pointer rd_ptr pointing to the memory cell S0 at this time; then, after the instruction stored in the storage unit S0 is executed, the instruction stored in the storage unit S1 is executed, and at this time, the instruction read pointer rd_ptr points to the storage unit S1, and so on until the instruction stored in the storage unit S15 is executed. If it is determined that the instruction stored in the memory unit S2 to L5 does not include a branch instruction after the instruction stored in the memory unit S15 is executed, or if it is determined that the instruction execution result of the branch instruction in the instructions stored in the memory unit lines L2 to L5 is "not-tag", the PC does not jump, and the instruction stored in the memory unit S0 is executed at this time, and the instruction read pointer rd_ptr points to the memory unit S0, after the instruction stored in the memory unit S15 is executed.
As shown in fig. 5, when the instruction stored in the memory cell rows L2 to L3 includes a branch instruction, for example, if the memory cell S5 stores the branch instruction, when executing the branch instruction stored to the memory cell S5, if the instruction execution result of the branch instruction stored in the memory cell S5 is "taken", the instruction read pointer rd_ptr jumps to the memory cell S16 and starts executing the instruction stored in the memory cell S16, at this time, the normal instruction fetch operation may be continued, and the retrieved instruction is stored in the memory cell rows L2 to L3, that is, the next instruction of the instruction stored in the memory cell S23 is stored in the memory cell S0.
As shown in fig. 5, when the instructions stored in the memory cell rows L4 to L5 include a branch instruction, for example, if the memory cell S12 stores the branch instruction, when executing the branch instruction stored to the memory cell S12, if the instruction execution result of the branch instruction stored in the memory cell S12 is "taken", the instruction read pointer rd_ptr jumps to the memory cell S16 and starts executing the instruction stored in the memory cell S16, at which time, the normal instruction fetch operation may be continued, and the retrieved instruction is stored in the memory cell rows L2 to L3, that is, the next instruction of the instruction stored in the memory cell S23 is stored in the memory cell S0.
After the instruction read pointer rd_ptr jumps to the storage unit S16, if it is determined that the instruction stored in the storage unit rows L0 to L1 does not include a branch instruction, or if it is determined that the instruction execution result of the branch instruction in the instruction stored in the storage unit rows L0 to L1 is "not-task", the PC does not jump, at this time, after the instruction stored in the storage unit S23 is executed, the instruction stored in the storage unit S0 is executed, that is, the instruction read pointer rd_ptr jumps to the storage unit S0, and starts executing the instruction stored in the storage unit S0 until the instruction stored in the storage unit S7 is executed, and then the instruction stored in the storage unit S16 is executed, at this time, the instruction read pointer rd_ptr jumps to the storage unit S0, so loops.
After the instruction read pointer rd_ptr jumps to the storage unit S16, if it is determined that the instruction stored in the storage unit line L0 to L1 includes a branch instruction, a branch instruction fetching operation is performed for the branch instruction, and the fetched instruction is stored in the storage unit line L4 to L5, for example, if the storage unit S20 stores the branch instruction, when the branch instruction stored in the storage unit S20 is executed, if the instruction execution result of the branch instruction stored in the storage unit S20 is "taken", the instruction read pointer rd_ptr jumps to the storage unit S8, and the instruction stored in the storage unit S8 starts to be executed, at this time, the normal instruction fetching operation may be continued, and the fetched instruction may be stored in the storage unit line L2 to L3.
As can be seen from this, in the examples shown in fig. 4 and 5, the instruction fetched in the branch instruction fetch operation can be stored only in the first sub-region (memory cell rows L0 to L1) and the third sub-region (memory cell rows L4 to L5), and the second sub-region (memory cell rows L2 to L3) is used only for storing the instruction fetched in the normal instruction fetch operation.
In the embodiment of the disclosure, the memory cell rows L2 to L3 can also store the instruction pre-fetched for the branch instruction fetching operation by controlling the read pointer of the instruction temporary storage area, so that the resource of the whole instruction temporary storage area can be utilized more flexibly.
Fig. 6 is a schematic flow chart of an instruction temporary storage area for storing instructions provided in at least one embodiment of the present disclosure.
The operation procedures (1) - (7) in fig. 6 are the same as the operation procedures (1) - (7) in fig. 4, and the repetition is omitted, and the difference between fig. 6 and fig. 4 is that in (8) in fig. 6, after the fifth group of instructions in the memory cell rows L4-L5 are skipped, the next normal instruction fetch operation (the fourth normal instruction fetch operation shown in fig. 4) may be continued, and the obtained instructions are stored in the memory cell rows L0-L1, and at this time, the memory cell rows L2-L3 may store the instructions retrieved by the branch instruction fetch operation, and so on.
In the example shown in fig. 6, the first sub-region (memory cell rows L0 to L1) to the third sub-region (memory cell rows L4 to L5) may be used to store the instruction fetched by the normal instruction fetch operation, or may be used to store the instruction fetched by the branch instruction fetch operation.
The instruction fetching method provided by the embodiment of the disclosure is used for fetching a plurality of instructions stored in the instruction temporary storage area in advance at the target address of the branch instruction which is not executed yet by adding a small amount of storage space to the instruction temporary storage area of the thread bundle. When the branch instruction is 'token', the PC jumps, and the instruction of the target address of the branch instruction can be read from the temporary storage area of the instruction by the vertical horse for execution, so that the thread bundle is not required to be stopped, the time overhead of the jump execution of the PC caused by the branch instruction can be reduced to a certain extent, and the IPC of the GPGPU can be improved to a certain extent.
Meanwhile, when the instructions stored in the instruction temporary storage area comprise branch instructions, normal instruction fetching operation of the thread bundle is not stopped, and when the branch instructions are not-tag, instructions after the branch instructions can be continuously executed, so that enough instructions are provided in the instruction temporary storage area for the execution of a later execution unit, the idle time of the later execution unit due to the fact that the instructions which are not available for execution in the instruction temporary storage area can be reduced to a certain extent, and the IPC of the GPGPU can be improved to a certain extent.
In addition, the instruction fetching method provided by the embodiment of the present disclosure may also statically control whether the current thread bundle supports instruction prefetching for a branch instruction in a register configuration manner, that is, in a hardware implementation process, a switch register may be added to control whether the thread bundle needs to support a prefetch instruction operation for a target address for an unexecuted branch instruction.
The instruction fetching method provided by the embodiment of the disclosure is mainly directed to direct jump branch instructions, that is, the first branch instruction to the third branch instruction are direct jump branch instructions.
Fig. 7 is a schematic diagram of a finger-picking device according to at least one embodiment of the present disclosure.
At least one embodiment of the present disclosure further provides a fetching device, where the fetching device may be configured to fetch a instruction for a thread bundle, where the thread bundle corresponds to an instruction temporary storage area, and the instruction temporary storage area includes a first sub-area. For example, as shown in fig. 7, the fetch apparatus 700 includes a fetch unit 701 and a branch instruction determination unit 702.
Instruction fetch unit 701 is configured to fetch a first set of instructions.
The branch instruction determination unit 702 is configured to pre-parse the first set of instructions to determine a first branch instruction, e.g., the first set of instructions includes the first branch instruction.
The fetch unit 701 is further configured to fetch a second set of instructions from the target address of the first branch instruction in response to the first sub-region being empty or depositing an invalid instruction; a second set of instructions is stored to the first sub-region.
The instruction fetch unit 701 is configured to perform steps S10, S30, and S40 in the instruction fetch method shown in fig. 2, and the branch instruction determining unit 702 is configured to perform step S20 in the instruction fetch method shown in fig. 2.
For example, in some embodiments, the instruction temporary storage area further includes a second sub-area, and the instruction fetch unit 701, when performing the operation of fetching the first set of instructions, includes performing the following operations: and acquiring the first group of instructions in response to the second sub-region being empty or storing the invalid instructions. The finger fetch unit 701 is further configured to: the first set of instructions is stored to the second sub-region.
For example, in some embodiments, the instruction temporary storage area further includes a third sub-area, and the instruction fetch unit 701 is further configured to: responsive to the third sub-region being empty or depositing an invalid instruction, a fourth set of instructions is obtained, wherein the addresses of instructions in the fourth set of instructions and the addresses of instructions in the first set of instructions are consecutive; a fourth set of instructions is stored to the third sub-region.
For example, the number of instructions in the first set of instructions is the same as the number of instructions in the fourth set of instructions.
For example, in some embodiments, the instruction temporary holding area further includes a second sub-area and a third sub-area, the first set of instructions being stored to the second sub-area or the third sub-area, the instruction fetch device 700 further including an instruction execution unit configured to: executing a first branch instruction; determining an instruction execution result obtained by executing the first branch instruction; in response to the instruction execution result indicating that a jump is required, the jump executes a second set of instructions stored in the first sub-area.
For example, in some embodiments, the finger unit 701 is further configured to: after skipping execution of the second set of instructions stored in the first sub-region, fetching a third set of instructions, wherein the addresses of the instructions in the second set of instructions and the addresses of the instructions in the third set of instructions are consecutive; store a third set of instructions to the second sub-region or the third sub-region.
For example, in some embodiments, the finger unit 701 is further configured to: after storing the third set of instructions to the second sub-region or the third sub-region, in response to the second set of instructions not including the branch instruction and the first sub-region depositing at least one valid unexecuted instruction, aborting the instruction fetch operation.
For example, in some embodiments, the branch instruction determination unit 702 is further configured to: after jumping to execute the second set of instructions stored in the first sub-region, pre-parsing the second set of instructions to determine second branch instructions, wherein the second set of instructions includes the second branch instructions. The finger fetch unit 701 is further configured to: fetching a finger from a target address of the second branch instruction to obtain a fifth group of instructions; store a fifth set of instructions to the second sub-region or the third sub-region.
For example, in some embodiments, the instruction execution unit is further configured to: after jumping to execute the second set of instructions stored in the first sub-region, all unexecuted instructions in the second and third sub-regions are set as invalid instructions.
For example, in some embodiments, the branch instruction determination unit 702 is further configured to: in response to the instruction execution result indicating that no jump is required, instructions following the first branch instruction are executed. The instruction execution unit is further configured to: and setting all unexecuted instructions in the first subarea as invalid instructions in response to the instruction execution result indicating that jump is not required.
For example, branch instruction determination unit 702, when executing the pre-resolved first set of instructions to determine the operation of the first branch instruction, includes performing the following operations: pre-parsing all instructions in the first set of instructions; the first set of instructions is determined to include at least one branch instruction, and a branch instruction of the at least one branch instruction that is executed first in an execution order is taken as the first branch instruction.
For example, in some embodiments, the instruction temporary holding area further includes a fourth sub-area, and the branch instruction determination unit 702 is further configured to: the first set of instructions is preresolved to determine a third branch instruction, wherein the first set of instructions further includes the third branch instruction. The finger fetch unit 701 is further configured to: responding to the fourth sub-area to be empty or store an invalid instruction, fetching a finger from a target address of the third branch instruction to acquire a sixth group of instructions; the sixth set of instructions is stored to the fourth sub-region.
At least one embodiment of the present disclosure also provides a finger-fetching device for fetching fingers for a thread bundle. Fig. 8 is a schematic block diagram of a finger-picking device provided in at least one embodiment of the present disclosure.
For example, as shown in fig. 8, the finger device 80 may include a memory 805 and a processor 810. The memory 805 is used to non-transitory store computer executable instructions adapted to be executed by the processor 810; the processor 810 is configured to execute computer-executable instructions that, when executed by the processor 810, may cause the processor 810 to perform one or more steps in a finger fetch method according to any of the embodiments of the present disclosure. For specific implementation of each step of the finger picking method and related explanation, reference may be made to the above embodiment of the finger picking method, which is not described herein.
It should be noted that the components of the finger device 80 shown in fig. 8 are exemplary only and not limiting, and that the finger device 80 may have other components as desired for practical applications.
For example, the processor 810 and the memory 805 may communicate with each other directly or indirectly.
For example, the processor 810 and the memory 805 may communicate over a network connection. The network may include a wireless network, a wired network, and/or any combination of wireless and wired networks, the disclosure is not limited in type and function of the network herein. For another example, the processor 810 and the memory 805 may also communicate via a bus connection. The bus may be a peripheral component interconnect standard (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc.
For example, the processor 810 and the memory 805 may be disposed on a server side (or cloud side) or on a client side (e.g., a mobile device such as a mobile phone).
For example, the processor 810 may be a Central Processing Unit (CPU), tensor Processor (TPU), or graphics processor GPU, among other devices having data processing capabilities and/or instruction execution capabilities, and may control other components in the pointing device 80 to perform desired functions. The Central Processing Unit (CPU) can be an X86 or ARM architecture, etc.
For example, the memory 805 may comprise any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer-executable instructions may be stored on a computer-readable storage medium and the processor 810 may execute the computer-executable instructions to implement the various functions of the pointing device 80. Various applications and various data, as well as various data used and/or generated by the applications, etc., may also be stored in the memory 805.
It should be noted that the finger picking device 80 can achieve similar technical effects as the aforementioned finger picking method, and the repetition is omitted.
At least one embodiment of the present disclosure also provides a computer-readable storage medium. Fig. 9 is a schematic diagram of a computer-readable storage medium provided in at least one embodiment of the present disclosure.
For example, as shown in FIG. 9, one or more computer-executable instructions 1001 may be non-transitory stored on the computer-readable storage medium 1000.
For example, the computer-executable instructions 1001, when executed by a computer, may perform one or more steps in a finger method according to any embodiment of the present disclosure. The computer-readable storage medium 1000 may achieve similar technical effects as the aforementioned finger picking method, and the repetition is omitted.
For example, the computer readable storage medium 1000 may be applied to the above-mentioned pointing device 80, and for example, it may be the memory 805 in the pointing device 80.
For example, the computer-readable storage medium 1000 may be a non-transitory computer-readable storage medium.
For example, the description of computer-readable storage medium 1000 may refer to memory 805 in the embodiment of pointing device 80, and is not repeated.
For the purposes of this disclosure, the following points are also noted:
(1) The drawings of the embodiments of the present disclosure relate only to the structures related to the embodiments of the present disclosure, and other structures may refer to the general design.
(2) In the drawings for describing embodiments of the present invention, thicknesses and dimensions of layers or structures are exaggerated for clarity. It will be understood that when an element such as a layer, film, region or substrate is referred to as being "on" or "under" another element, it can be "directly on" or "under" the other element or intervening elements may be present.
(3) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.
The foregoing is merely a specific embodiment of the disclosure, but the scope of the disclosure is not limited thereto and should be determined by the scope of the claims.

Claims (15)

1. A fetching method for fetching a thread bundle, wherein the thread bundle corresponds to an instruction temporary storage area, the instruction temporary storage area comprises a first subarea,
the finger taking method comprises the following steps:
acquiring a first set of instructions;
pre-parsing the first set of instructions to determine a first branch instruction, wherein the first set of instructions includes the first branch instruction;
responding to the first sub-region being empty or storing an invalid instruction, fetching a finger from a target address of the first branch instruction to acquire a second group of instructions;
storing the second set of instructions to the first sub-region.
2. The instruction fetch method of claim 1, wherein the instruction temporary holding area further comprises a second sub-area and a third sub-area, the first set of instructions being stored to the second sub-area or the third sub-area,
The finger taking method further comprises the following steps:
determining an instruction execution result obtained by executing the first branch instruction;
in response to the instruction execution result indicating that a jump is required, jumping to execute the second set of instructions stored in the first sub-region.
3. The fingering method according to claim 2, further comprising:
after skipping execution of the second set of instructions stored in the first sub-region, fetching a third set of instructions, wherein addresses of instructions in the second set of instructions and addresses of instructions in the third set of instructions are consecutive;
storing the third set of instructions to the second sub-region or the third sub-region.
4. The fingering method according to claim 3, further comprising:
after storing the third set of instructions to the second sub-region or the third sub-region, in response to not including a branch instruction in the second set of instructions and the first sub-region depositing at least one valid unexecuted instruction, aborting the instruction fetch operation.
5. The fingering method according to claim 2, further comprising:
pre-parsing the second set of instructions stored in the first sub-region after jumping to execute the second set of instructions to determine second branch instructions, wherein the second set of instructions includes the second branch instructions;
Fetching a finger from a target address of the second branch instruction to obtain a fifth group of instructions;
store the fifth set of instructions to the second sub-region or the third sub-region.
6. The fingering method according to claim 2, further comprising:
after jumping to execute the second set of instructions stored in the first sub-region, all unexecuted instructions in the second and third sub-regions are set as invalid instructions.
7. The fingering method according to claim 2, further comprising:
and responding to the instruction execution result to indicate that jump is not needed, executing the instructions after the first branch instruction, and setting all the non-executed instructions in the first subarea as invalid instructions.
8. The pointing method according to claim 1, wherein the instruction temporary storage area further includes a second sub-area,
acquiring a first set of instructions includes: in response to the second sub-region being empty or a deposit invalidate instruction, the first set of instructions is fetched,
the finger taking method further comprises the following steps:
storing the first set of instructions to the second sub-region.
9. The pointing method according to claim 1, wherein the instruction temporary storage area further includes a third sub-area,
The finger taking method further comprises the following steps:
responsive to the third sub-region being empty or depositing an invalid instruction, a fourth set of instructions is obtained, wherein an address of an instruction in the fourth set of instructions and an address of an instruction in the first set of instructions are consecutive;
storing the fourth set of instructions to the third sub-region.
10. The instruction fetch method of claim 9, wherein the number of instructions in the first set of instructions is the same as the number of instructions in the fourth set of instructions.
11. The fingering method of any one of claims 1-10, wherein pre-parsing the first set of instructions to determine a first branch instruction comprises:
pre-parsing all instructions in the first set of instructions;
determining that the first group of instructions comprises at least one branch instruction, and taking the branch instruction which is executed first according to the execution sequence in the at least one branch instruction as the first branch instruction.
12. The instruction fetch method according to any one of claims 1 to 10, wherein the instruction temporary storage area further includes a fourth sub-area,
the finger taking method further comprises the following steps:
pre-resolving the first set of instructions to determine a third branch instruction, wherein the first set of instructions further includes the third branch instruction;
Responding to the fourth sub-region being empty or storing an invalid instruction, fetching a finger from a target address of the third branch instruction to acquire a sixth group of instructions;
storing the sixth set of instructions to the fourth sub-region.
13. A fetching device for fetching a thread bundle, wherein the thread bundle corresponds to an instruction temporary storage area, the instruction temporary storage area comprises a first subarea,
the finger taking device comprises:
an instruction fetch unit configured to fetch a first set of instructions;
a branch instruction determination unit configured to pre-parse the first set of instructions to determine a first branch instruction, wherein the first set of instructions includes the first branch instruction;
the instruction fetch unit is further configured to fetch, in response to the first sub-region being empty or holding an invalid instruction, from a target address of the first branch instruction to obtain a second set of instructions; storing the second set of instructions to the first sub-region.
14. An finger taking device comprises a memory and a processor,
wherein the memory stores computer-executable instructions adapted to be executed by the processor, which when executed by the processor perform one or more steps of the fingering method according to any one of claims 1 to 12.
15. A computer-readable storage medium, non-transitory storing computer-executable instructions,
wherein the computer executable instructions, when executed by a computer, perform one or more steps of the fingering method according to any one of claims 1 to 12.
CN202211500583.5A 2022-11-28 2022-11-28 Finger picking method, finger picking device and storage medium Pending CN116069394A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211500583.5A CN116069394A (en) 2022-11-28 2022-11-28 Finger picking method, finger picking device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211500583.5A CN116069394A (en) 2022-11-28 2022-11-28 Finger picking method, finger picking device and storage medium

Publications (1)

Publication Number Publication Date
CN116069394A true CN116069394A (en) 2023-05-05

Family

ID=86177767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211500583.5A Pending CN116069394A (en) 2022-11-28 2022-11-28 Finger picking method, finger picking device and storage medium

Country Status (1)

Country Link
CN (1) CN116069394A (en)

Similar Documents

Publication Publication Date Title
US8825958B2 (en) High-performance cache system and method
US6848029B2 (en) Method and apparatus for prefetching recursive data structures
JP3820261B2 (en) Data processing system external and internal instruction sets
KR102424121B1 (en) Pre-fetch unit, apparatus having the same and operating method thereof
KR20180124709A (en) System and method for spatial memory streaming training
US9201798B2 (en) Processor instruction based data prefetching
US20010021959A1 (en) Static cache
US5784711A (en) Data cache prefetching under control of instruction cache
US9990299B2 (en) Cache system and method
EP1139222A1 (en) Prefetch for TLB cache
KR101957855B1 (en) Memory control apparatus for optimizing gpu memory access through pre-patched scratchpad memory data and control method thereof
US6092153A (en) Subsettable top level cache
CN116069394A (en) Finger picking method, finger picking device and storage medium
US11288071B2 (en) System and method for prefetching instructions and data
KR101853648B1 (en) Cache bypassing technique, streaming multiprocessor and embedded system performed by the technique
US11416961B2 (en) Variable entry transitional ring storage for efficiently accessing graphics states
CN110889147B (en) Method for resisting Cache side channel attack by using filling Cache
JP5068552B2 (en) Prefetch method and cache mechanism unit
JP2008015668A (en) Task management device
CN111475203B (en) Instruction reading method for processor and corresponding processor
KR101946476B1 (en) Early miss prediction based periodic cache bypassing technique, streaming multiprocessor and embedded system performed by the technique
JP6241164B2 (en) Cache memory control program, processor incorporating cache memory, and cache memory control method
KR101969435B1 (en) Application characteristics-aware sporadic cache bypassing technique, streaming multiprocessor and embedded system performed by the technique
KR101852012B1 (en) Cache bypassing technique, streaming multiprocessor and embedded system performed by the technique
CN114358179B (en) Pre-fetch training method of processor, processing device, processor and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination