WO2022100054A1 - 调度乱序队列和判断队列取消项的方法和装置 - Google Patents

调度乱序队列和判断队列取消项的方法和装置 Download PDF

Info

Publication number
WO2022100054A1
WO2022100054A1 PCT/CN2021/095138 CN2021095138W WO2022100054A1 WO 2022100054 A1 WO2022100054 A1 WO 2022100054A1 CN 2021095138 W CN2021095138 W CN 2021095138W WO 2022100054 A1 WO2022100054 A1 WO 2022100054A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
pointer
read pointer
queue
cache
Prior art date
Application number
PCT/CN2021/095138
Other languages
English (en)
French (fr)
Inventor
郇丹丹
Original Assignee
北京微核芯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京微核芯科技有限公司 filed Critical 北京微核芯科技有限公司
Priority to EP21810884.3A priority Critical patent/EP4027236A4/en
Priority to US17/530,192 priority patent/US11829768B2/en
Publication of WO2022100054A1 publication Critical patent/WO2022100054A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3856Reordering of instructions, e.g. using queues or age tags
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30021Compare instructions, e.g. Greater-Than, Equal-To, MINMAX
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification

Definitions

  • the present application relates to the technical field of out-of-order processors, and in particular, to a method and device for scheduling out-of-order queues and judging queue cancellation items.
  • the instructions of the out-of-order processor When the instructions of the out-of-order processor are in the out-of-order queue, they will not flow in the processor in the order specified in the program. As long as the execution conditions are met, the following instructions can be executed first over the previous instructions to improve the execution speed of the instructions. .
  • the earliest instruction in the program is generally selected for execution, that is, the oldest-first strategy is used for scheduling arbitration. Therefore, it is necessary to judge the order of the instructions. order. This is to consider that the older the instruction, the more instructions related to it exist. Therefore, the oldest instruction is preferentially executed, which can effectively improve the parallelism of the processor's execution of instructions, and the oldest instruction still occupies the processing
  • the hardware resources in the processor including other out-of-order queues, reordering caches, write buffers (Store Buffer) and other components, the earlier these old instructions are executed, the earlier these hardware resources can be released for subsequent instructions. use.
  • the out-of-order queues in the out-of-order processor include the launch queue, the access queue of caches at all levels, the cache access invalidation queue, and the consistency request queue.
  • branch prediction error cancellation determine which instructions in the pipeline have not been submitted are the instructions that need to be canceled together with the branch prediction error instruction after the branch instruction with the branch prediction error, except for the location recorded in the ROB.
  • location information in the B-ROB can also be used to compare which instructions are the instructions that need to be canceled after the branch prediction error instruction.
  • the present application aims to solve one of the technical problems in the related art at least to a certain extent.
  • the first purpose of this application is to propose a method for scheduling out-of-order queues and judging queue cancellation items, which can effectively express the real information of the instruction age, and reduce the use of comparators due to the use of XOR gates during judgment. It effectively reduces the complexity of instruction age judgment, reduces the delay of instruction age judgment, and effectively improves the performance of the out-of-order processor, reduces power consumption, and saves area.
  • the second purpose of the present application is to propose a device for scheduling out-of-order queues and judging queue cancellation items.
  • the third object of the present application is to propose an electronic device.
  • a fourth object of the present application is to propose a non-transitory computer-readable storage medium.
  • a first aspect of the present application provides a method for scheduling out-of-order queues and judging queue cancellation items, comprising the following steps: adding the highest bit before the address of the reordering cache or transferring the reordering cache; using the reordering cache or transferring The highest bit of the read pointer of the reordering cache is increased, and the highest bit of the increased address of the two reordering caches or transfer reordering caches that needs to be compared is XORed, and the address obtained after the XOR is used as the age information of the two instructions to compare the size. To judge the age of the old and new to get the command.
  • the age information of the instruction is obtained by the highest bit of the read pointer of the XOR reordering cache or the transfer reordering cache and the highest bit of the increase of the cache address, and the instructions are compared.
  • the use of the XOR gate reduces the number of comparators used in the judgment, thereby effectively reducing the complexity of the instruction age judgment.
  • the delay of instruction age judgment can effectively improve the performance of out-of-order processors, reduce power consumption, and save area. Solved the problem of confusion of the instruction age when the age judgment is performed according to the address of the reordering cache or the transfer reordering cache due to the inversion of the write pointer.
  • the method further includes: when scheduling the out-of-order queue, selecting the effective and oldest instruction in the queue for execution.
  • an instruction in the queue that causes cancellation and an instruction newer than the age of the instruction causing cancellation are selected for cancellation.
  • the reading of the reordering cache or the transfer reordering cache is controlled by a read pointer, wherein, when the queue is not empty, the read pointer points to the first item to be read next time, and the reordering cache or The transfer reorder cache is an ordered FIFO queue, and the item pointed to by the read pointer is the oldest item in the reorder cache or transfer reorder cache, and the oldest item is the oldest instruction
  • the corresponding item when the queue is empty, the read pointer and the write pointer point to the same empty item, and the value of the highest bit increased by the read pointer and the write pointer is the same.
  • the writing of the reordering cache or the transfer reordering cache is controlled by a write pointer, and the position of each instruction in the reordering cache or the transfer reordering cache is allocated by the write pointer.
  • the write pointer points to the first empty item to be written next; when the queue is full, the write pointer and the read pointer point to the same item, and the values of the highest bits increased by the write pointer and the read pointer are opposite.
  • the write pointer and the read pointer of the reordering cache or the transfer reordering cache both point to the reset item in the reset state, and when a new instruction is written into the reordering cache or the transfer reordering cache, the write pointer points to the reset item.
  • the reset item refers to the same item pointed to by the write pointer and the read pointer during reset, the reset item is any item in the queue, and the highest bit of the address increased by the read pointer and the write pointer during reset is the same.
  • next item to be read pointed to by the read pointer or the next item to be written pointed to by the write pointer is pointed in the direction of increasing address or the direction of decreasing address.
  • the read pointer and the write pointer point to the next item in the direction of address increase, including: when the write pointer or the low bit of the read pointer reaches the queue When the first preset number of items is 1, the lower bit of the next item pointed to by the write pointer or the read pointer is encoded from 0, wherein the first preset number of items is determined by the maximum number of items; when the lower bit of the write pointer or the read pointer reaches the queue When the first preset number of items is reached, the value of the highest bit of the increase of the write pointer or the read pointer is flipped.
  • the read pointer and the write pointer point to the next item in the direction of address increase, including: when the write pointer or the low bit of the read pointer reaches the queue When the first preset number of items is reached, the next item pointed to by the write pointer or read pointer continues to be encoded until it reaches the power of 2 item, and then the lower bit is re-encoded from 0; when the lower bit of the write pointer or read pointer reaches the second preset number of items When , the highest bit of the increase of the write pointer or the read pointer is flipped, wherein the second preset item number is determined by the power of 2.
  • the read pointer and the write pointer point to the next item in the direction of decreasing address, including: when the write pointer or the low bit of the read pointer decreases When it is as small as 0, the lower bit of the next item pointed to by the write pointer or read pointer starts coding from the first preset number of items in the queue; when the lower bit of the write pointer or read pointer decreases to 0, the highest bit of the write pointer or read pointer increases value flips.
  • the read pointer and the write pointer point to the next item in the direction of decreasing address, including: when the write pointer or the low bit of the read pointer decreases When it is as small as 0, the lower bit of the next item pointed to by the write pointer or read pointer is re-encoded from the second preset item number closest to the number of queue items and larger than the number of queue items; when the lower bit of the write pointer or read pointer is reduced to 0 When the write pointer or read pointer is incremented, the MSB is flipped.
  • the age value is smaller in the direction of increasing address, wherein the smaller the age value is, the older the age is, and the larger the age value is, the older the age is. newer.
  • the age value is larger in the direction of decreasing address, wherein, the larger the age value is, the older the age is, and the smaller the age value is, The newer the age.
  • the highest bit added before the address of the reordering cache and the transfer reordering cache is the value A or the value B, wherein the value A and the value B are both 1-bit binary values, and the values of the value A and the value B are opposite.
  • the write pointer and the read pointer are not flipped after writing and exiting the instruction in the reordering cache or the transfer-ordering cache, including: the instruction from the read pointer to the item on the write pointer is valid; When the write pointer is not rolled over, the highest bits of the increased addresses of the instructions allocated by the write pointer are all value A; when the read pointer is not rolled over, the highest bits increased by the read pointer are all value A.
  • the write pointer and the read pointer are not flipped after writing and exiting the instruction in the reordering cache or the transfer-ordering cache, including: the instruction from the read pointer to the item on the write pointer is valid; When the write pointer is not rolled over, the highest bits of the increased addresses of the instructions allocated by the write pointer are all value A; when the read pointer is not rolled over, the highest bits increased by the read pointer are all value A.
  • the write pointer and the read pointer are both flipped after writing and exiting the instruction in the reordering cache or the transfer reordering cache, it includes: the highest digit of the address of the write instruction is the value B, and the highest digit of the read pointer increases. is the value B.
  • the write pointer when writing and exiting the instruction in the reordering cache or transferring the reordering cache, when the write pointer is flipped again and the read pointer is not flipped again, it includes: the highest bit of the address increase of the write instruction is the value A, read The highest bit of the pointer increment is the value B.
  • both the write pointer and the read pointer are flipped again after writing and exiting the instruction in the reordering cache or transferring the reordering cache, it includes: both the increased highest bits of the write pointer and the read pointer are restored to the value A.
  • the instruction in the pipeline obtains corresponding age information to compare with the age information of the instruction whose exception causes an exception, if the age of the instruction in the pipeline is new, then to cancel.
  • each instruction in the pipeline has an instruction reordering cache address or a branch reordering cache address, and the instruction in the pipeline is increased to the highest value.
  • the age information obtained after the highest bit of the read pointer of the XOR reordering cache or the transfer reordering cache is increased, and the highest bit of the cancel instruction increased due to a branch prediction error, the read pointer of the XOR reordering cache or the transfer reordering cache
  • the age information obtained after adding the highest bit is compared, and if the instruction age in the pipeline is new, it will be cancelled.
  • the method further includes: when re-execution caused by memory access correlation occurs, rolling back both the fetch instruction and the instruction newer than the age of the fetch instruction, and re-execute all the rolled back instructions.
  • a second aspect of the present application provides a device for scheduling out-of-order queues and judging queue cancellation items, including: a bit-increasing module for adding the highest bit in front of the address of the reordering cache and transferring the reordering cache; a comparison module; , used to increase the highest bit of the read pointer of the reordering cache or transfer reordering cache, XOR the highest bit of two reordering caches or transfer reordering cache addresses that need to be compared, and use the address obtained after XORing as the two
  • the age information of each instruction is compared with the size of the instruction, so as to determine whether the instruction age is old or new.
  • the age information of the instruction is obtained through the highest bit of the read pointer of the XOR reordering cache or the transfer reordering cache and the highest bit of the increase of the cache address, and the instructions are compared.
  • the use of the XOR gate reduces the number of comparators used in the judgment, thereby effectively reducing the complexity of the instruction age judgment.
  • the delay of instruction age judgment can effectively improve the performance of out-of-order processors, reduce power consumption, and save area. Solved the problem of confusion of the instruction age when the age judgment is performed according to the address of the reordering cache or the transfer reordering cache due to the inversion of the write pointer.
  • a scheduling module when scheduling the out-of-order queue, selects the valid and oldest instruction in the queue for execution.
  • a cancellation module which is used to select an instruction in the queue that causes cancellation and an instruction newer than the age of the instruction that causes cancellation to be canceled when judging the cancellation item in the queue.
  • a third aspect of the present application provides an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores data that can be used by the at least one processor. Instructions executed by the processor, the instructions being configured to execute the methods for scheduling out-of-order queues and judging queue cancellation items as described in the above embodiments.
  • a fourth aspect of the present application provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the above-mentioned implementation.
  • Fig. 1 is a schematic diagram of the read pointer, write pointer and address change of the reordering cache
  • FIG. 2 is a schematic flowchart of a method for scheduling out-of-order queues and judging queue cancellation items according to an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an apparatus for comparing command ages provided according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a resequencing cache during reset provided according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a reordering cache in which the read and write pointers are not flipped according to an embodiment of the present application
  • FIG. 6 is a schematic diagram of a reordering cache in which the write pointer is flipped and the read pointer is not flipped, according to an embodiment of the present application;
  • FIG. 7 is a schematic diagram of a reordering cache after both read and write pointers are flipped according to an embodiment of the present application
  • FIG. 8 is a schematic diagram of a reordering cache in which the write pointer is flipped again and the read pointer has not flipped again according to an embodiment of the present application;
  • FIG. 9 is a schematic diagram of a resequencing cache during reset provided according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a reordering cache in which the read and write pointers provided according to the examples of the present application are not flipped;
  • FIG. 11 is a schematic diagram of a reordering cache with a write pointer flip and a read pointer not flipped according to an example of the present application;
  • FIG. 12 is a schematic diagram of the reordering cache after the read and write pointers provided according to the examples of the present application are flipped;
  • FIG. 13 is a schematic diagram of a resequencing cache provided according to an example of the present application where the write pointer is flipped again and the read pointer has not flipped again;
  • FIG. 14 is a schematic diagram of an arbitration circuit for executing an out-of-order queue selection instruction according to an embodiment of the present application
  • 15 is a schematic diagram of an address generation age value in an instruction reordering cache for out-of-order queue arbitration provided according to an embodiment of the present application;
  • 16 is a schematic diagram of an instruction exception cancellation judgment provided according to an embodiment of the present application.
  • FIG. 17 is a schematic diagram of an instruction transfer cancellation judgment provided according to an embodiment of the present application.
  • FIG. 18 is a schematic diagram of judging an item canceled due to rollback of a fixed-point dispatch queue provided according to an embodiment of the present application.
  • FIG. 19 is a schematic block diagram of an apparatus for scheduling out-of-order queues and judging queue cancellation items according to an embodiment of the present application.
  • ROB or B-ROB is essentially a FIFO queue
  • directly using the address of ROB or B-ROB often cannot express the real information of the instruction age.
  • Directly using the address of the instruction in the ROB cannot express the real age information of the instruction.
  • both the write pointer and the read pointer of the ROB point to address 0 in the reset state, which is the initial state.
  • the write pointer will increase.
  • Figure 1(a) It shows what the ROB looks like after seven instructions are written. At this time, the write pointer and the read pointer are on the same "face", and no one has flipped it. In the ROB at this time, the instruction with a smaller address will be older. .
  • FIG. 2 is a schematic flowchart of a method for scheduling out-of-order queues and judging queue cancellation items according to an embodiment of the present application.
  • the method for scheduling out-of-order queues and judging queue cancellations includes the following steps:
  • step S101 the most significant bit is added before the address of the reorder buffer or the branch reorder buffer.
  • the most significant bit added before the address of the reordering cache and the transfer reordering cache is value A or value B, wherein both value A and value B are 1-bit binary values, and value A and value B are both 1-bit binary values.
  • Value B has the opposite value.
  • value A and value B can be either 0 or 1, that is, if A is 0, then B is 1; if A is 1, then B is 0.
  • step S102 use the highest bit of the read pointer of the reordering cache or the transfer reordering cache to increase the highest bit of the addresses of the two reordering caches or transfer reordering caches that need to be compared, XOR the address obtained after the XOR As the age information of the two instructions, the size is compared to determine whether the instruction age is old or new.
  • the comparison device as shown in FIG. 3 can be used to realize the comparison of the instruction age, that is, the comparison of the instruction age can be realized by adding an XOR gate.
  • the use of the XOR gate reduces the number of comparators used. , thereby effectively reducing the complexity of instruction age judgment, reducing the delay of instruction age judgment, effectively improving the performance of the out-of-order processor, reducing power consumption, and saving area.
  • roqid0_cmp ⁇ roqid0[highestbit] ⁇ roqhead[highestbit], roqid0[highestbit-1:0] ⁇ ;
  • roqid1_cmp ⁇ roqid1[highestbit] ⁇ roqhead[highestbit], roqid1[highestbit-1:0] ⁇ .
  • roqid0_cmp and roqid1_cmp to compare the address size to get the old and new age of the two instructions.
  • roqid0[highestbit] and roqid1[highestbit] represent the highest bit of the increase in the address of the two reordering caches or transfer reordering caches that need to be compared
  • roqhead[highestbit] represents the highest increase in the read pointer of the reordering cache or transfer reordering cache.
  • Bits, roqid0[highestbit-1:0] and roqid1[highestbit-1:0] represent two reordering caches that need to be compared or transfer the reordering cache address 0 to the highest bit -1
  • roqid0_cmp and roqid1_cmp represent XOR to get
  • roqid0_cmp[highestbit] and roqid1_cmp[highestbit] represent the address obtained after the highest bit XOR
  • roqid0_cmp[highestbit-1:0] and roqid1_cmp[highestbit-1:0] and roqid0[highestbit-1: 0] is the same address as roqid1[highestbit-1:0].
  • the address of the reordering cache corresponding to the two instructions to be compared with the instruction age is recorded as roqid0 and roqid1 respectively, and the highest bit number of the increased reordering cache address is recorded as highestbit, the highest bit of the increase of the address allocated by the write pointer during initialization is 0, when the write pointer is flipped and then the highest bit of the allocated address is increased to 1, and the highest bit of the allocated address is 0 again when the write pointer is flipped again.
  • the address obtained after the highest bit of the instruction in the reordering cache is increased by XORing the highest bit of the read pointer of the reordering cache is used as the age information of the instruction to compare the size.
  • the addresses used for size comparison are recorded as roqid0_cmp and roqid1_cmp respectively, and the calculation formula is as follows:
  • roqid0_cmp ⁇ roqid0[highestbit] ⁇ roqhead[highestbit], roqid0[highestbit-1:0] ⁇ ;
  • roqid1_cmp ⁇ roqid1[highestbit] ⁇ roqhead[highestbit],roqid1[highestbit-1:0] ⁇ .
  • the read of the reorder buffer or the transfer reorder buffer is controlled by a read pointer, wherein, when the queue is not empty, the read pointer points to the first item to be read next, and the reorder buffer or The transfer reordering cache is an ordered first-in, first-out queue, and the item pointed to by the read pointer is the oldest item in the reordering cache or transfer reordering cache, and the oldest item is the item corresponding to the oldest instruction; When the queue is empty, the read pointer and the write pointer point to the same empty item, and the value of the highest bit increased by the read pointer and the write pointer is the same.
  • the read pointer of the reordering cache or the transfer reordering cache is denoted as roqhead.
  • the read pointer is also known as the head pointer.
  • the writing of the reordering cache or the transfer reordering cache is controlled by a write pointer, and the position of each instruction in the reordering cache or the transfer reordering cache is allocated by the write pointer.
  • the write pointer points to the first empty item to be written next; when the queue is full, the write pointer and the read pointer point to the same item, and the value of the highest bit increased by the write pointer and the read pointer is opposite.
  • the write pointer of the reordering cache is denoted as roqtail, and the queue entries from roqhead to the previous entry in the reordering queue are valid entries.
  • the write pointer is also known as the tail pointer.
  • the write pointer and the read pointer of the reordering cache or the transfer reordering cache both point to the reset item in the reset state, and when a new instruction is written into the reordering cache or the transfer reordering cache, When the write pointer points to the next item to be written, when the instruction exits, the read pointer points to the next item to be read.
  • the reset item refers to the same item pointed to by the write pointer and the read pointer during reset, the reset item is any item in the queue, and the highest bit of the address increased by the read pointer and the write pointer during reset is the same.
  • the next item to be read pointed to by the read pointer or the next item to be written pointed to by the write pointer is pointed in the direction of increasing address or in the direction of decreasing address.
  • the age value is smaller in the direction of increasing address, wherein, the smaller the age value, the older the age, and the larger the age value, the older newer.
  • the age value is larger in the direction of decreasing address, wherein the larger the age value, the older the age, and the smaller the age value, the newer the age.
  • both the write pointer and the read pointer of the resequencing cache point to address 0, ie, the most initial state, when in the reset state.
  • the write pointer will increase, and after an instruction exits, the read pointer will increase.
  • the write pointer when neither the write pointer nor the read pointer is flipped after writing and exiting the instruction in the reordering cache or the transfer-ordering cache, it includes: from the read pointer to the previous item on the write pointer The instruction between is valid; when the write pointer is not flipped, the highest bit of the increase in the address of the instruction allocated by the write pointer is the value A; when the read pointer is not flipped, the highest bit increased by the read pointer is the value A.
  • Figure 5 shows the situation in which neither the write pointer nor the read pointer has been flipped after the ROB is written and exited.
  • the command is valid. If the write pointer does not flip, the highest bit of the address of the instruction dispatched by the write pointer is 0; if the read pointer does not flip, the highest bit of the roqhead is 0. Therefore, after the highest-order XOR instruction added by roqhead is added to the highest-order address in the ROB, the address remains unchanged, that is, the instruction with a smaller address after the highest-order bit is added will be older.
  • the write pointer when the write pointer is flipped and the read pointer is not flipped after writing and exiting the instruction in the reordering cache or the transfer reordering cache, it includes: the highest bit of the address of the write instruction is the value B, the highest bit of the read pointer increment is the value A.
  • Figure 6 shows the situation in which the write pointer is flipped and the read pointer is not flipped after the ROB is written and withdrawn.
  • the write pointer is flipped, the highest bit of the address of the write instruction after the write pointer is flipped is 1; Therefore, after the highest-order XOR instruction added by roqhead adds the highest-order address of the ROB, the address does not change.
  • the dashed box in Figure 6 between roqhead and roqtail is a valid instruction, and the instruction with a smaller address after the highest bit is added will be older.
  • the write pointer and the read pointer when the write pointer and the read pointer are flipped after writing and exiting the instruction in the reordering cache or the transfer reordering cache, it includes: the highest bit of the address of the write instruction is the value B , the highest bit of the read pointer increment is the value B.
  • Figure 7 shows the situation where the write pointer is flipped and the read pointer is flipped after the ROB is written and withdrawn.
  • the highest bit of the address of the write instruction after the write pointer is flipped is 1; when the read pointer is flipped, the highest bit added by roqhead is 1. Therefore, after the highest-order XOR instruction added by roqhead adds the highest-order bit of the address in the ROB, the high-order bit of the address is equal to the inversion of the original high-order bit.
  • the write pointer when the write pointer is flipped again and the read pointer is not flipped again after writing and exiting the instruction in the reordering cache or the transfer reordering cache, it includes: the address of the write instruction increases. The highest bit is value A, and the highest bit that the read pointer is incremented is value B.
  • Figure 8 shows the situation in which the write pointer is flipped again after the ROB is written and withdrawn, but the read pointer has not flipped again.
  • the write pointer is flipped again, and the highest bit of the increase in the address of the write instruction after the write pointer is flipped is 0; Therefore, after adding the highest bit of the XOR instruction added by roqhead to the highest bit of the address in the ROB, the high bit of the address is equal to the inversion of the original high bit.
  • between roqhead and roqtail is a valid instruction. After the highest bit of the added address is XORed with the highest bit 1 of roqhead, the instruction with a smaller address is older.
  • the write pointer and the read pointer when the write pointer and the read pointer are flipped again after writing and exiting the instruction in the reordering cache or transferring the reordering cache, it includes: the highest bit of the increase of the write pointer and the read pointer return to value A.
  • the write pointer flips again, and the read pointer flips again.
  • the read and write pointers are not flipped.
  • the low-order bit of the write pointer or the read pointer when the low-order bit of the write pointer or the read pointer reaches the first preset number of items in the queue, the low-order bit of the next item pointed to by the write pointer or the read pointer starts coding from 0;
  • the first preset number of items in the queue When the first preset number of items in the queue is reached, the value of the highest bit of the increase of the write pointer or the read pointer is flipped.
  • the first preset number of items is determined by the maximum number of items; for example, the first preset number of items may be the maximum number of items-1.
  • the low-order bit of the write pointer or the read pointer reaches the first preset number of items in the queue, the next item pointed to by the write pointer or the read pointer continues to encode until it reaches the power of 2 item and then the low-order bit starts from 0 is re-encoded; when the lower bit of the write pointer or the read pointer reaches the second preset number of items, the highest bit of the increase of the write pointer or the read pointer is flipped.
  • the number of the second preset items is determined by the power of 2; for example, the number of the second preset items may be the power of 2 -1.
  • the low bit of the write pointer or the read pointer decreases to 0, the low bit of the next item pointed to by the write pointer or the read pointer starts to encode from the first preset number of items in the queue;
  • the low-order bit is reduced to 0, the value of the highest-order bit where the write pointer or read pointer is increased is flipped.
  • the lower bit of the write pointer or the read pointer when the lower bit of the write pointer or the read pointer is reduced to 0, the lower bit of the next item pointed to by the write pointer or the read pointer starts from the second preset that is closest to the number of queue items and larger than the number of queue items.
  • the number of items is re-encoded; when the low bit of the write pointer or the read pointer decreases to 0, the highest bit of the write pointer or the read pointer increment is flipped.
  • the number of queue items is not a power of 2
  • the code of the re-entered instruction starts from 0, that is, the write pointer is flipped.
  • the read pointer reaches the maximum number of items in the queue, it also flips. After flipping, the highest bit of the increment is flipped from 0 to 1, or from 1 to 0.
  • the queue is 6 items, and the read and write pointers are all flipped from 0 to 5.
  • Figure 11 shows the situation where the write pointer is flipped and the read pointer is not flipped after the ROB is written and withdrawn.
  • the write pointer is flipped, and the highest bit of the address of the write instruction is 1 after the write pointer flips.
  • the read pointer has not yet flipped, and the highest bit added by roqhead is 0. It can be seen that after the highest bit of the XOR instruction added by roqhead is added to the highest bit of the address in the ROB, the address remains unchanged. From roqhead to roqtail are valid instructions, as shown in the dashed box in Figure 11, the instruction with a smaller address after the highest bit is added will be older.
  • Figure 12 shows the situation in which the write pointer is reversed and the read pointer is also reversed after the ROB is written and withdrawn.
  • the write pointer is flipped, and the highest bit of the address of the write instruction is 1 after the write pointer flips.
  • the read pointer is flipped, and the highest bit added by roqhead is 1. It can be seen that after the highest bit of the address added in the ROB by the highest bit XOR instruction added by roqhead, the high bit of the address is equal to the inversion of the original high bit.
  • Valid instructions from roqhead to roqtail as shown in the dotted box in Figure 12, after adding the highest bit of the address to XOR the highest bit 1 of the roqhead, the instruction with a smaller address will be older.
  • Figure 13 shows the situation in which the write pointer is flipped over again after the ROB has been written and withdrawn, and the read pointer has not flipped over again.
  • the write pointer is flipped again, and the highest bit of the increase in the address of the write instruction after the write pointer flips is 0.
  • the read pointer has not flipped again, and the highest bit added by roqhead is 1. It can be seen that after the highest bit of the XOR instruction added by roqhead is added to the highest bit of the address in the ROB, the high bit of the address is equal to the inversion of the original high bit. From roqhead to roqtail are valid instructions, as shown in the dashed box in Figure 13, after the highest bit of the added address is XORed with the highest bit 1 of roqhead, the instruction with the smaller address is older.
  • the process is the same as the above-mentioned closest power-of-2 queue item processing process.
  • the write pointer reaches the maximum number of items in the queue, the re-entered instruction continues to encode until it reaches the power of 2 item and then re-encodes, that is, the write pointer flips.
  • Read pointers are treated the same as write pointers. For example, the queue is 6 items, the read and write pointers are recorded from 0 to 7 and then flipped, and then recorded from 8 to 15, and then flipped to 0; therefore, the processing process of the 6-item queue is the same as that of the 8-item queue.
  • the instruction with the oldest valid age in the queue is selected for execution.
  • an instruction in the queue that causes cancellation and an instruction newer than the age of the instruction causing cancellation are selected for cancellation.
  • an out-of-order queue of 8 items is taken as an example, and each item of the queue includes information used by instructions such as the valid field, the rdy field, the roqid field, and the data field. area.
  • the valid field records whether the item is valid (for example, defining valid as 1 means valid, and valid as 0 means invalid); the rdy field records whether the instruction and data are ready (for example, defining rdy as 1 means ready, that is, it is executable rdy is 0 means not ready); the roqid field records the reordering cache address of the instruction, which is used to generate the age information of the instruction; the data field records the commands, data and other information used by the instruction of this item.
  • Roqhead represents the read pointer (head pointer) address of the reorder cache.
  • the instruction that selects the valid and oldest ready item in the queue is executed.
  • the judgment of the instruction age is to obtain the corresponding instruction age information by using the highest bit XOR of the highest bit of the roqhead added by each corresponding roqid, and the command age with the smallest age is the oldest.
  • the first item with the smallest age is selected as the out-of-order queue execution item.
  • the instructions A, B, C, D, E, F, and G in the queue are all in a valid state, that is, the valid bits are all 1.
  • An instruction whose rdy bit is 1 indicates that it is ready, that is, instructions B, C, D, F, and H are in a ready and executable state.
  • Roqid is the address of the instruction in the reordering cache.
  • the state of the instruction in the reordering cache is shown in Figure 15.
  • the highest bit added by roqhead is 1, and the highest bit added by roqhead is XORed with the highest bit added by the instruction roqid to get the instruction age information age.
  • roqid in FIG. 14 is the value including the increased most significant bit.
  • Embodiment 2 of the present application when an exception-causing exception cancellation condition is met, an instruction in the pipeline acquires corresponding age information to compare it with the age information of the instruction whose exception-causing exception occurs, and if the age of the instruction in the pipeline is newer , cancel.
  • each instruction in the pipeline has the reordering cache address of the instruction, that is, the roqid number.
  • the highest bit XOR reordering cache read pointer head pointer
  • the age value obtained after the highest bit of the roqhead is compared with the age value obtained after the highest bit of the roqid number of the instruction that caused the exception to be XORed with the highest bit of the roqhead, if your own If the age is newer than the order in which the exception occurred, it will be cancelled.
  • each instruction in the pipeline has an instruction reordering cache address or a branch reordering cache address, and the instruction in the pipeline is increased to the highest
  • each instruction in the pipeline has the branch reordering cache address of the instruction, that is, the brqid number, and the instructions in the pipeline compare their own brqid number with the highest increase Bit XOR transfer reordering cache read pointer (head pointer)
  • the age value obtained after the highest bit of brqhead is increased, and the age value obtained after the highest bit of the brqid number of the canceled instruction is increased due to a branch prediction error. Values are compared, and if their age is newer than the instruction that caused the cancellation due to the branch prediction error, the cancellation is performed.
  • Embodiment 4 of the present application when re-execution caused by memory fetch correlation occurs, both the fetch instruction and the instruction newer than the age of the fetch instruction are rolled back, and the rolled back instructions are all re-executed.
  • the age information of the instruction is obtained by the highest bit of the read pointer increase of the XOR reorder cache or the transfer reorder cache and the highest bit of the increase of the cache address, And compare the age information of the instruction to obtain the new and old of the instruction age, so as to effectively express the real information of the instruction age.
  • the use of the XOR gate reduces the number of comparators used, thereby effectively reducing the complexity of the instruction age judgment. , reduces the delay of instruction age judgment, effectively improves the performance of the out-of-order processor, reduces power consumption, saves area, and can improve the judgment efficiency of out-of-order queue scheduling and queue cancellation. Solved the problem of confusion of the instruction age when the age judgment is performed according to the address of the reordering cache or the transfer reordering cache due to the inversion of the write pointer.
  • FIG. 19 is a schematic block diagram of an apparatus for scheduling an out-of-order queue and judging a queue cancellation item according to an embodiment of the present application.
  • the apparatus 10 for scheduling out-of-order queues and judging canceled items in a queue includes: an incrementing module 100 and a comparing module 200 .
  • the increment module 100 is used to increase the highest bit in front of the address of the reordering cache and the transfer reordering cache;
  • the comparison module 200 is used to increase the highest bit with the read pointer of the reordering cache or the transfer reordering cache, and two XORs are required. Compare the reordering cache or transfer the highest bit of the reordering cache address increase, and compare the address obtained after XOR as the age information of the two instructions to determine the old and new age of the obtained instruction.
  • the apparatus 10 of the embodiment of the present application further includes: a scheduling module, wherein, when scheduling the out-of-order queue, the scheduling module selects a valid and oldest instruction in the queue for execution.
  • the apparatus 10 in this embodiment of the present application further includes: a cancellation module.
  • the canceling module when used for judging the canceled item in the queue, it selects an instruction in the queue that causes the cancellation and an instruction that is newer than the age of the instruction causing the cancellation to be canceled.
  • the age information of the instruction is obtained by the highest bit of the read pointer increase of the XOR re-order cache or the transfer re-order cache and the highest bit of the increase of the cache address, And compare the age information of the instruction to obtain the new and old of the instruction age, so as to effectively express the real information of the instruction age.
  • the use of the XOR gate reduces the number of comparators used, thereby effectively reducing the complexity of the instruction age judgment.
  • Embodiments of the present application further provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are configured to use For executing the method for scheduling out-of-order queues and judging queue cancellation items as described above.
  • Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the methods for scheduling out-of-order queues and judging queue cancellation items as in the above-mentioned embodiments are implemented.
  • first and second are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with “first”, “second” may expressly or implicitly include at least one of that feature.
  • N means at least two, such as two, three, etc., unless otherwise expressly and specifically defined.
  • a "computer-readable medium” can be any device that can contain, store, communicate, propagate, or transport the program for use by or in conjunction with an instruction execution system, apparatus, or apparatus.
  • computer readable media include the following: electrical connections (electronic devices) with one or N wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.
  • N steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one of the following techniques known in the art, or a combination thereof: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
  • the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种调度乱序队列和判断队列取消项的方法和装置,其中,方法包括以下步骤:在重定序缓存或转移重定序缓存地址前面增加最高位(S101);用重定序缓存或转移重定序缓存的读指针增加的最高位,异或两个需要比较的重定序缓存或转移重定序缓存地址增加的最高位,将异或后得到的地址作为两条指令的年龄信息进行大小比较,以判断得到指令年龄的新老(S102)。该方法可以有效表达出指令年龄的真实信息,在判断时由于异或门的使用减少了比较器的使用数量,从而有效降低了指令年龄判断的复杂度,减小指令年龄判断的延时,有效提高乱序处理器的性能、降低功耗、节约面积。

Description

调度乱序队列和判断队列取消项的方法和装置
相关申请的交叉引用
本申请要求北京微核芯科技有限公司于2020年11月10日提交的、发明名称为“调度乱序队列和判断队列取消项的方法和装置”的、中国专利申请号“202011243930.1”的优先权。
技术领域
本申请涉及乱序处理器技术领域,特别涉及一种调度乱序队列和判断队列取消项的方法和装置。
背景技术
乱序处理器的指令到了乱序队列中,就不会按照程序中指定的顺序在处理器中流动,只要满足执行的条件,后面的指令可以越过前面的指令先执行,以提高指令的执行速度。
在乱序处理器的乱序队列调度上,当乱序队列中有多条指令准备好,一般优先选择程序上最早的指令执行,即采用oldest-first策略进行调度仲裁,因此需要判断指令的先后顺序。这是考虑到越是旧的指令,和它存在相关的指令也就越多,因此优先执行最旧的指令,能够有效地提高处理器执行指令的并行度,而且最旧的指令还占据着处理器中的硬件资源,包括其他的乱序队列、重定序缓存、写缓冲区(Store Buffer)等部件,越早地执行这些旧的指令,就可以越早地释放这些硬件资源,供后面的指令使用。乱序处理器中的乱序队列包括发射队列、各级缓存的访问队列、缓存访问失效队列,一致性请求队列等。
当乱序处理器因为转移预测错、访存相关等引起的重新执行,或因为例外引发的异常等发生取消等情况时,需要判断在流水线中还没有提交的指令中有哪些指令是处于转移预测错的指令、访存相关引起重新执行的指令或例外引发异常的指令的后面,属于需要同引发取消的指令一起被取消的指令,在流水线各个阶段来取消这些指令,让这些指令重新执行,或回滚到流水线的特定阶段再开始执行。
要识别出乱序队列中哪些指令是最旧的,就需要知道这些指令的年龄信息,年龄信息表示指令进入流水线的先后顺序。在普通的顺序执行的处理器中,指令的年龄信息很容易被追踪,而到了乱序处理器的乱序队列中之后,这些年龄信息就被打乱了,但是在处理器中还有一个地方,按照进入流水线的顺序记录着处理器中的所有指令,这个部件就是ROB(Reorder Buffer,重定序缓存),指令被重命名之后,会按照程序中指定的顺序写到ROB中,因此可以使用每条指令在ROB中的位置(也就是寻址ROB的地址值)作为这条指令的年龄信息。
对于处理器发生取消的情况,判断在流水线中还没有提交的指令中哪些是处于引发取消的指令后面的属于需要同引发取消的指令一起被取消的指令,可以通过重定缓存ROB中的位置信息来比较哪些指令是处于引发取消指令后面的指令。
另外,对于转移预测错取消的情况,判断在流水线中还没有提交的指令哪些是处于发生转移预测错的转移指令后面的需要同转移预测错的指令一起被取消的指令,除ROB中记录的位置信息外还可以通过B-ROB(Branch Reorder Buffer,转移重定序缓存)中的位置信息来比较哪些指令是处于发生转移预测错指令后面的需要取消的指令。
发明内容
本申请旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本申请的第一目的在于提出一种调度乱序队列和判断队列取消项的方法,可以有效表达出指令年龄的真实信息,在判断时由于异或门的使用减少了比较器的使用数量,有效降低了指令年龄判断的复杂度,减小指令年龄判断的延时,有效提高乱序处理器的性能、降低功耗、节约面积。
本申请的第二个目的在于提出一种调度乱序队列和判断队列取消项的装置。
本申请的第三个目的在于提出一种电子设备。
本申请的第四个目的在于提出一种非临时性计算机可读存储介质。
为达到上述目的,本申请第一方面提供一种调度乱序队列和判断队列取消项的方法,包括以下步骤:在重定序缓存或转移重定序缓存地址前面增加最高位;用重定序缓存或转移重定序缓存的读指针增加的最高位,异或两个需要比较的重定序缓存或转移重定序缓存地址增加的最高位,将异或后得到的地址作为两条指令的年龄信息进行大小比较,以判断得到指令年龄的新老。
根据本申请的调度乱序队列和判断队列取消项的方法,通过异或重定序缓存或转移重定序缓存的读指针增加的最高位与其缓存地址增加的最高位得到指令的年龄信息,并比较指令的年龄信息得到指令年龄的新老,以有效表达出指令年龄的真实信息,在判断时由于异或门的使用减少了比较器的使用数量,从而有效降低了指令年龄判断的复杂度,减小指令年龄判断的延时,有效提高乱序处理器的性能、降低功耗、节约面积。解决了由于写指针的翻转引起的根据重定序缓存或转移重定序缓存的地址进行年龄判断时,出现的指令年龄大小的混乱问题。
另外,根据本申请上述的调度乱序队列和判断队列取消项的方法还可以具有以下附加的技术特征:
进一步地,还包括:在调度乱序队列时,选择队列中有效且年龄最老的指令进行执行。
进一步地,在判断队列取消项时,选择队列中引起取消的指令及比引起取消的指令的年龄新的指令进行取消。
进一步地,所述重定序缓存或转移重定序缓存的读取由读指针控制,其中,当队列非空时,所述读指针指向下一次将要读取的第一项,所述重定序缓存或转移重定序缓存是有序的先进先出队列,且所述读指针指向的项是重定序缓存或转移重定序缓存中年龄最老的项,所述年龄最老的项为年龄最老的指令对应的项;当队列空时,读指针和写指针指向同一个空项,读指针和写指针增加的最高位的值相同。
进一步地,所述重定序缓存或转移重定序缓存的写入由写指针控制,每条指令在所述重定序缓存或转移重定序缓存中的位置由所述写指针分配,其中,在队列未全满时,所述写指针指向下一次将要写入的第一个空项;在队列满时,写指针和读指针指向同一项,写指针和读指针增加的最高位的值相反。
进一步地,所述重定序缓存或转移重定序缓存的写指针和读指针在复位状态时均指向复位项,当新指令写入到重定序缓存或转移重定序缓存中时,所述写指针指向下一个将要写入的项,当指令退出后,所述读指针指向下一个将要读出的项。
进一步地,所述复位项是指复位时写指针和读指针指向的同一项,复位项为队列的任一项,在复位时读指针和写指针增加的地址最高位相同。
进一步地,所述读指针指向的下一个将要读出的项或写指针指向的下一个将要写入的项,按地址增加的方向或按地址减小的方向进行指向。
进一步地,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地址增加的方向指向下一项时,包括:当写指针或读指针低位到达队列的第一预设项数时,写指针或读指针指向的下一项低位从0开始编码,其中,所述第一预设项数由最大项数确定;当写指针或读指针低位到达队列的第一预设项数时,写指针或读指针增加的最高位的值翻转。
进一步地,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地址增加的方向指向下一项时,包括:当写指针或读指针低位到达队列的第一预设项数时,写指针或读指针指向的下一项继续编码,直到到达2的幂次项再低位从0重新编码;当写指针或读指针低位到达第二预设项数时,写指针或读指针增加的最高位翻转,其中,所述第二预设项数由2的幂次确定。
进一步地,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地址减小的方向指向下一项时,包括:当写指针或读指针低位减小到0时,写指针或读指针指向的下一项低位从队列的第一预设项数开始编码;当写指针或读指针低位减小到0时,写指针或读指针增加的最高位的值翻转。
进一步地,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地 址减小的方向指向下一项时,包括:当写指针或读指针低位减小到0时,写指针或读指针指向的下一项低位从最接近队列项数且比队列项数大的第二预设项数重新进行编码;当写指针或读指针低位减小到0时,写指针或读指针增加的最高位翻转。
进一步地,当所述读指针和写指针按地址增加的方向指向下一项时,地址增加的方向上年龄值小,其中,年龄值越小,则年龄越老,年龄值越大,则年龄越新。
进一步地,当所述读指针和写指针按地址减小的方向指向下一项时,地址减小的方向上年龄值大,其中,年龄值越大,则年龄越老,年龄值越小,则年龄越新。
进一步地,在重定序缓存和转移重定序缓存地址前面增加的最高位为值A或值B,其中,值A和值B均为1位的二进制值,且值A和值B的值相反。
进一步地,当所述重定序缓存或转移定序缓存中写入和退出指令后,写指针和读指针均未翻转时,包括:从读指针到写指针上一项之间的指令有效;在写指针未翻转时,写指针分配的指令的地址增加的最高位均为值A;在读指针未翻转时,读指针增加的最高位均为值A。
进一步地,当所述重定序缓存或转移定序缓存中写入和退出指令后,写指针和读指针均未翻转时,包括:从读指针到写指针上一项之间的指令有效;在写指针未翻转时,写指针分配的指令的地址增加的最高位均为值A;在读指针未翻转时,读指针增加的最高位均为值A。
进一步地,当所述重定序缓存或转移重定序缓存中写入和退出指令后,写指针和读指针均翻转时,包括:写入指令的地址最高位为值B,读指针增加的最高位为值B。
进一步地,当所述重定序缓存或转移重定序缓存中写入和退出指令后,写指针再次翻转、读指针未再次翻转时,包括:写入指令的地址增加的最高位为值A,读指针增加的最高位为值B。
进一步地,当所述重定序缓存或转移重定序缓存中写入和退出指令后,写指针和读指针均再次翻转时,包括:写指针与读指针增加的最高位均恢复到值A。
进一步地,还包括:在满足发生例外引发异常取消条件时,在流水线中的指令获取对应的年龄信息,以与发生例外引起异常的指令的年龄信息进行比较,如果流水线中的指令年龄新,则进行取消。
进一步地,还包括:在满足发生转移预测错引起取消条件时,则流水线中的每一条指令都带有指令的重定序缓存地址或转移重定序缓存地址,将所述流水线中的指令增加的最高位异或重定序缓存或转移重定序缓存的读指针增加的最高位后得到的年龄信息,与发生转移预测错引起取消指令增加的最高位,异或重定序缓存或转移重定序缓存的读指针增加的最高位后得到的年龄信息进行比较,如果流水线中的指令年龄新,则进行取消。
进一步地,还包括:在发生访存相关引起的重新执行时,将取数指令及比所述取数指令年龄新的指令均进行回滚,回滚的指令均重新执行。
为达到上述目的,本申请第二方面提供一种调度乱序队列和判断队列取消项的装置,包括:增位模块,用于在重定序缓存和转移重定序缓存地址前面增加最高位;比较模块,用于用重定序缓存或转移重定序缓存的读指针增加的最高位,异或两个需要比较的重定序缓存或转移重定序缓存地址增加的最高位,将异或后得到的地址作为两条指令的年龄信息进行大小比较,以判断得到指令年龄的新老。
根据本申请的调度乱序队列和判断队列取消项的装置,通过异或重定序缓存或转移重定序缓存的读指针增加的最高位与其缓存地址增加的最高位得到指令的年龄信息,并比较指令的年龄信息得到指令年龄的新老,以有效表达出指令年龄的真实信息,在判断时由于异或门的使用减少了比较器的使用数量,从而有效降低了指令年龄判断的复杂度,减小指令年龄判断的延时,有效提高乱序处理器的性能、降低功耗、节约面积。解决了由于写指针的翻转引起的根据重定序缓存或转移重定序缓存的地址进行年龄判断时,出现的指令年龄大小的混乱问题。
另外,根据本申请上述的调度乱序队列和判断队列取消项的装置还可以具有以下附加的技术特征:
进一步地,还包括:调度模块,在调度乱序队列时,选择队列中有效且年龄最老的指令进行执行。
进一步地,还包括:取消模块,用于判断队列取消项时,选择队列中引起取消的指令及比引起取 消的指令的年龄新的指令进行取消。
为达到上述目的,本申请第三方面提供一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被设置为用于执行如上述实施例所述的调度乱序队列和判断队列取消项的方法。
为达到上述目的,本申请第四方面提供一种非临时性计算机可读存储介质,所述非临时性计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行如上述实施例所述的调度乱序队列和判断队列取消项的方法。
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为重定序缓存的读指针、写指针和地址变化示意图;
图2为根据本申请实施例提供的调度乱序队列和判断队列取消项的方法的流程示意图;
图3为根据本申请实施例提供的用于指令年龄比较的装置结构示意图;
图4为根据本申请实施例提供的复位时的重定序缓存示意图;
图5为根据本申请实施例提供的读写指针都未翻转的重定序缓存示意图;
图6为根据本申请实施例提供的写指针翻转、读指针未翻转的重定序缓存示意图;
图7为根据本申请实施例提供的读写指针都翻转后的重定序缓存示意图;
图8为根据本申请实施例提供的写指针再次发生翻转、读指针还未再次翻转的重定序缓存示意图;
图9为根据本申请实施例提供的复位时的重定序缓存示意图;
图10为根据本申请示例提供的读写指针都未翻转的重定序缓存示意图;
图11为根据本申请示例提供的写指针翻转、读指针未翻转的重定序缓存示意图;
图12为根据本申请示例提供的读写指针都翻转后的重定序缓存示意图;
图13为根据本申请示例提供的写指针再次发生翻转、读指针还未再次翻转的重定序缓存示意图;
图14为根据本申请实施例提供的乱序队列选择指令执行的仲裁电路示意图;
图15为根据本申请实施例提供的乱序队列仲裁的指令用重定序缓存中的地址生成年龄(age)值示意图;
图16为根据本申请实施例提供的指令例外取消判断示意图;
图17为根据本申请实施例提供的指令转移取消判断示意图;
图18为根据本申请实施例提供的定点派遣队列因回滚引起取消项的判断示意图;
图19为根据本申请实施例的调度乱序队列和判断队列取消项的装置的方框示意图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。
由于ROB或B-ROB本质上都是一个FIFO队列,因此直接使用ROB或B-ROB的地址往往无法表达出指令年龄的真实信息,以ROB中包括八个表项(entry)为例,如图1所示。直接使用指令在ROB中的地址,无法表达指令的真实年龄信息。
在图1中,初始状态时,位于ROB底端的地址最小,为0,而位于ROB顶端的地址最大,为7。指令从地址0开始写入到ROB中,因此ROB中地址较小的表项中的指令更老。ROB的写入由尾指针也称写指针控制,读取受到头指针也称读指针的控制,因此每条指令在ROB中的位置由写指针分配,写指针会重复地出现0、1、2、3、4、5、6、7,每次写指针翻转时(也就是7到0),新的写指针值和 旧的写指针值就出现了大小混乱的情况,从而无法仅通过ROB的地址号的大小来判断指令年龄的新旧。
具体的,ROB的写指针和读指针在复位状态时都是指向地址0,这是最初始的状态,当有新的指令写入到ROB中时,写指针会增加,图1的(a)表示了ROB中被写入七条指令后的样子,此时写指针和读指针都出于同一个“面”上,谁也没有翻转过,此时的ROB中,地址较小的指令会更老。
随着时间的推移,又有两条指令进入了ROB,同时ROB中也有两条指令离开了,此时写指针发生了翻转,变为了1,而读指针没有翻转,变为了2。如图1的(b)所示,此时ROB中地址较小的指令不再是旧的了。
随着时间再次推移,ROB中写入了六条指令,又有七条指令离开了,如图1的(c)所示,现在的写指针变为了7,而读指针变为了1,读指针发生了翻转,此时写指针和读指针又都到了一个“面”上,此时可以看出,ROB中地址较小的指令会更老。可见不能直接通过ROB的地址号的大小来判断指令年龄的新旧,因此,需要设计一种机制来解决上述问题。
下面参照附图描述根据本申请实施例提出的调度乱序队列和判断队列取消项的方法和装置,首先将参照附图描述根据本申请实施例提出的调度乱序队列和判断队列取消项的方法。
具体而言,图2为本申请实施例所提供的一种调度乱序队列和判断队列取消项的方法的流程示意图。
如图2所示,该调度乱序队列和判断队列取消项的方法包括以下步骤:
在步骤S101中,在重定序缓存或转移重定序缓存地址前面增加最高位。
在本申请的一个实施例中,在重定序缓存和转移重定序缓存地址前面增加的最高位为值A或值B,其中,值A和值B均为1位的二进制值,且值A和值B的值相反。
例如,值A和值B可以为0也可以为1,即若A为0,则B为1;若A为1,则B为0。
在步骤S102中,用重定序缓存或转移重定序缓存的读指针增加的最高位,异或两个需要比较的重定序缓存或转移重定序缓存地址增加的最高位,将异或后得到的地址作为两条指令的年龄信息进行大小比较,以判断得到指令年龄的新老。
需要说明的是,本申请实施例可以通过如图3所示的比较装置实现指令年龄的比较,即通过增加异或门实现指令年龄的比较,由于异或门的使用减少了比较器的使用数量,从而有效降低了指令年龄判断的复杂度,减小指令年龄判断的延时,有效提高乱序处理器的性能、降低功耗、节约面积。
如图3所示,在重定序缓存或转移重定序缓存地址前面再增加一位最高位,用重定序缓存或转移重定序缓存的读指针增加的最高位,异或两个需要比较的重定序缓存或转移重定序缓存地址增加的最高位,将异或后得到的地址作为指令的年龄信息,即,
roqid0_cmp={roqid0[highestbit]^roqhead[highestbit],roqid0[highestbit-1:0]};
roqid1_cmp={roqid1[highestbit]^roqhead[highestbit],roqid1[highestbit-1:0]}。
再使用roqid0_cmp与roqid1_cmp进行地址大小的比较,得到两条指令年龄的新老。其中,roqid0[highestbit]和roqid1[highestbit]表示两个需要比较的重定序缓存或转移重定序缓存地址增加的最高位,roqhead[highestbit]表示重定序缓存或转移重定序缓存的读指针增加的最高位,roqid0[highestbit-1:0]和roqid1[highestbit-1:0]表示两个需要比较的重定序缓存或转移重定序缓存地址0至最高位-1,roqid0_cmp和roqid1_cmp表示将异或后得到的地址,图3中,roqid0_cmp[highestbit]和roqid1_cmp[highestbit]表示最高位异或后得到的地址,roqid0_cmp[highestbit-1:0]和roqid1_cmp[highestbit-1:0]与roqid0[highestbit-1:0]和roqid1[highestbit-1:0]的地址一样。
以重定序缓存为例,将要进行指令年龄大小比较的两条指令对应的重定序缓存的地址即重定序缓存号分别记为roqid0和roqid1,将重定序缓存地址增加的最高位的位号记为highestbit,初始化时写指针分配的地址增加的最高位为0,当写指针发生翻转后再分配的地址增加的最高位变为1,当写指针再次翻转分配的地址增加的最高位再次为0。用指令在重定序缓存中的地址增加的最高位异或重定序缓存读指针增加的最高位后得到的地址,作为指令的年龄信息进行大小比较。用来进行大小比较的地址分别记为roqid0_cmp、roqid1_cmp,计算公式如下:
roqid0_cmp={roqid0[highestbit]^roqhead[highestbit],roqid0[highestbit-1:0]};
roqid1_cmp={roqid1[highestbit]^roqhead[highestbit],roqid1[highestbit-1:0]}。
在本申请的一个实施例中,重定序缓存或转移重定序缓存的读取由读指针控制,其中,当队列非空时,读指针指向下一次将要读取的第一项,重定序缓存或转移重定序缓存是有序的先进先出队列,且读指针指向的项是重定序缓存或转移重定序缓存中年龄最老的项,年龄最老的项为年龄最老的指令对应的项;当队列空时,读指针和写指针指向同一个空项,读指针和写指针增加的最高位的值相同。其中,在本实施例中,将重定序缓存或转移重定序缓存的读指针记为roqhead。读指针也称为头指针。
在本申请的一个实施例中,重定序缓存或转移重定序缓存的写入由写指针控制,每条指令在重定序缓存或转移重定序缓存中的位置由写指针分配,其中,在队列未全满时,写指针指向下一次将要写入的第一个空项;在队列满时,写指针和读指针指向同一项,写指针和读指针增加的最高位的值相反。其中,在本实施例中,将重定序缓存的写指针记为roqtail,在重定序队列中从roqhead到roqtail上一项之间的队列项是有效的项。写指针也称为尾指针。
进一步地,在本申请的一个实施例中,重定序缓存或转移重定序缓存的写指针和读指针在复位状态时均指向复位项,当新指令写入到重定序缓存或转移重定序缓存中时,写指针指向下一个将要写入的项,当指令退出后,读指针指向下一个将要读出的项。
其中,复位项是指复位时写指针和读指针指向的同一项,复位项为队列的任一项,在复位时读指针和写指针增加的地址最高位相同。读指针指向的下一个将要读出的项或写指针指向的下一个将要写入的项,按地址增加的方向或按地址减小的方向进行指向。
需要说明的是,当读指针和写指针按地址增加的方向指向下一项时,地址增加的方向上年龄值小,其中,年龄值越小,则年龄越老,年龄值越大,则年龄越新。当读指针和写指针按地址减小的方向指向下一项时,地址减小的方向上年龄值大,其中,年龄值越大,则年龄越老,年龄值越小,则年龄越新。
举例而言,如图4所示,重定序缓存的写指针和读指针在复位状态时均指向地址0,即最初始的状态。当有新的指令写入到重定序缓存中时,写指针会增加,有指令退出后,读指针会增加。
进一步地,在本申请的一个实施例中,当重定序缓存或转移定序缓存中写入和退出指令后,写指针和读指针均未翻转时,包括:从读指针到写指针上一项之间的指令有效;在写指针未翻转时,写指针分配的指令的地址增加的最高位均为值A;在读指针未翻转时,读指针增加的最高位均为值A。
以ROB为例,图5表示了ROB中被写入和退出指令后,写指针和读指针都没有翻转过的情况,如图5虚线框部分所示,从roqhead到roqtail上一项之间的指令是有效的。写指针未发生翻转,写指针分派的指令的地址增加的最高位都是0;读指针未发生翻转,roqhead增加的最高位为0。因此,用roqhead增加的最高位异或指令在ROB中的增加的地址最高位后,地址不变,即增加最高位后地址较小的指令会更老。
进一步地,在本申请的一个实施例中,当重定序缓存或转移重定序缓存中写入和退出指令后,写指针翻转、读指针未翻转时,包括:写入指令的地址最高位为值B,读指针增加的最高位为值A。
以ROB为例,图6表示了ROB中被写入和退出指令后,写指针发生了翻转,读指针没有翻转的情况。写指针发生了翻转,写指针翻转后写入指令的地址最高位为1;读指针还未发生翻转,roqhead增加的最高位为0。因此,用roqhead增加的最高位异或指令在ROB中的增加的地址最高位后,地址不变。如图6虚线框部分所示,从roqhead到roqtail之间为有效的指令,增加最高位后地址较小的指令会更老。
进一步地,在本申请的一个实施例中,当重定序缓存或转移重定序缓存中写入和退出指令后,写指针和读指针均翻转时,包括:写入指令的地址最高位为值B,读指针增加的最高位为值B。
以ROB为例,图7表示了ROB中被写入和退出指令后,写指针发生了翻转,读指针也发生了翻转的情况。写指针发生了翻转,写指针翻转后写入指令的地址最高位为1;读指针发生了翻转,roqhead增加的最高位为1。因此,用roqhead增加的最高位异或指令在ROB中增加的地址最高位后,地址的高位等于原来高位取反。如图7虚线框部分所示,从roqhead到roqtail之间为有效的指令,增加的地址最高位异或roqhead的最高位1后,地址小的指令会更老。
进一步地,在本申请的一个实施例中,当重定序缓存或转移重定序缓存中写入和退出指令后,写指 针再次翻转、读指针未再次翻转时,包括:写入指令的地址增加的最高位为值A,读指针增加的最高位为值B。
以ROB为例,图8表示了ROB中被写入和退出指令后,写指针再次发生了翻转,读指针还未再次发生翻转的情况。写指针再次发生了翻转,写指针翻转后写入指令的地址增加的最高位为0;读指针还未再次发生翻转,roqhead增加的最高位为1。因此,用roqhead增加的最高位异或指令在ROB中的增加的地址最高位后,地址的高位等于原来高位取反。如图8虚线框部分所示,从roqhead到roqtail之间为有效的指令,增加的地址最高位异或roqhead的最高位1后,地址小的指令更老。
进一步地,在本申请的一个实施例中,当重定序缓存或转移重定序缓存中写入和退出指令后,写指针和读指针均再次翻转时,包括:写指针与读指针增加的最高位均恢复到值A。
以ROB为例,当ROB中被写入和退出指令后,写指针再次发生了翻转,读指针也再次发生翻转后的情况,读写指针增加的最高位都恢复到了0,与图5所示的读写指针都未发生翻转一样。
在一些实施例中,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地址增加的方向指向下一项时,其中,
作为一种可能实现的方式,当写指针或读指针低位到达队列的第一预设项数时,写指针或读指针指向的下一项低位从0开始编码;当写指针或读指针低位到达队列的第一预设项数时,写指针或读指针增加的最高位的值翻转。其中,第一预设项数由最大项数确定;比如,第一预设项数可以为最大项数-1。
作为另一种可能实现的方式,当写指针或读指针低位到达队列的第一预设项数时,写指针或读指针指向的下一项继续编码,直到到达2的幂次项再低位从0重新编码;当写指针或读指针低位到达第二预设项数时,写指针或读指针增加的最高位翻转。其中,第二预设项数由2的幂次确定;比如,第二预设项数可以为2的幂次-1。
在一些实施例中,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地址减小的方向指向下一项时,其中,
作为一种可能实现的方式,当写指针或读指针低位减小到0时,写指针或读指针指向的下一项低位从队列的第一预设项数开始编码;当写指针或读指针低位减小到0时,写指针或读指针增加的最高位的值翻转。
作为另一种可能实现的方式,当写指针或读指针低位减小到0时,写指针或读指针指向的下一项低位从最接近队列项数且比队列项数大的第二预设项数重新进行编码;当写指针或读指针低位减小到0时,写指针或读指针增加的最高位翻转。
举例而言,对于队列项数不是2的幂次的情况,重定序缓存的写指针分配写入项的处理方法有如下两种:
方法1
对于队列项数不是2的幂次的情况,当写指针到达队列的最大项数时,再进入的指令编码从0开始,即写指针翻转。同样,读指针到达队列的最大项数时,也进行翻转。翻转后,增加的最高位从0翻转为1,或从1翻转为0。例如队列为6项,读写指针均从0记到5后进行翻转,处理过程如下:
(1)如图9所示,重定序缓存的写指针和读指针在复位状态时都是指向地址0,这是最初始的状态。
(2)当有新的指令写入到重定序缓存中时,写指针会增加,有指令退出后,读指针会增加,图10表示了ROB中被写入和退出指令后,写指针和读指针都没有翻转过的情况。从roqhead到roqtail上一项之间的指令是有效的,如图10虚线框部分所示。写指针未发生翻转,写指针分派的指令的地址增加的最高位都是0。读指针未发生翻转,roqhead增加的最高位为0,可见,用roqhead增加的最高位异或指令在ROB中的增加的地址最高位后,地址不变,即增加最高位后地址较小的指令会更老。
(3)图11表示了ROB中被写入和退出指令后,写指针发生了翻转,读指针没有翻转的情况。写指针发生了翻转,写指针翻转后写入指令的地址最高位为1。读指针还未发生翻转,roqhead增加的最高位为0,可见,用roqhead增加的最高位异或指令在ROB中的增加的地址最高位后,地址不变。从roqhead到roqtail之间为有效的指令,如图11虚线框部分所示,增加最高位后地址较小的指令会更老。
(4)图12表示了ROB中被写入和退出指令后,写指针发生了翻转,读指针也发生了翻转的情况。写指针发生了翻转,写指针翻转后写入指令的地址最高位为1。读指针发生了翻转,roqhead增加的最高位为1,可见,用roqhead增加的最高位异或指令在ROB中增加的地址最高位后,地址的高位等于原来高位取反。从roqhead到roqtail之间为有效的指令,如图12虚线框部分所示,增加的地址最高位异或roqhead的最高位1后,地址小的指令会更老。
(5)图13表示了ROB中被写入和退出指令后,写指针再次发生了翻转,读指针还未再次发生翻转的情况。写指针再次发生了翻转,写指针翻转后写入指令的地址增加的最高位为0。读指针还未再次发生翻转,roqhead增加的最高位为1,可见,用roqhead增加的最高位异或指令在ROB中的增加的地址最高位后,地址的高位等于原来高位取反。从roqhead到roqtail之间为有效的指令,如图13虚线框部分所示,增加的地址最高位异或roqhead的最高位1后,地址小的指令更老。
(6)当ROB中被写入和退出指令后,写指针再次发生了翻转,读指针也再次发生翻转后的情况,读写指针增加的最高位都恢复到了0,与图9读写指针都未发生翻转是一样的。
方法2
对于队列项数不是2的幂次的情况,与上述最接近的2的幂次的队列项处理过程相同。当写指针到达队列的最大项数时,再进入的指令继续编码,直到到达2的幂次项再重新编码,即写指针翻转。读指针与写指针同样处理。例如队列为6项,读写指针均从0记到7后再进行翻转,再从8记到15,然后翻转到0;因此6项的队列的处理过程同8项的队列处理过程。
在判断得到指令年龄的新老之后,在一些实施例中,在调度乱序队列时,选择队列中有效且年龄最老的指令进行执行。在另一些实施例中,在判断队列取消项时,选择队列中引起取消的指令及比引起取消的指令的年龄新的指令进行取消。
进一步地,下面将通过一些具体实施例对调度乱序队列和判断队列取消项的方法进行阐述,具体如下:
在本申请实施例1中,对于乱序队列的指令调度的情况,以8项的乱序队列为例,队列每一项包括valid域、rdy域、roqid域和data域等指令用到的信息域。valid域记录该项是否有效(例如定义valid为1表示有效,valid为0表示无效);rdy域记录该项指令和数据是否已准备好(例如定义rdy为1表示已准备好,即达到可执行的状态;rdy为0表示未准备好);roqid域记录该指令的重定序缓存地址,用来生成指令的年龄信息;data域记录该项的指令用到的命令、数据等信息。Roqhead表示重定序缓存的读指针(头指针)地址。
乱序队列在执行的时候,选择队列中有效且年龄最老的准备好的项的指令执行。指令年龄的判断用每项对应的roqid增加的最高位异或roqhead的最高位得到对应的指令年龄信息,age小的指令年龄最老。选出age最小的第一项作为乱序队列执行项。
如图14所示,对于8项的乱序队列,队列中的指令A、B、C、D、E、F、G均处于有效状态,即valid位均为1。rdy位为1的指令表示已准备好,即指令B、C、D、F、H处于已准备好,可执行状态。Roqid为指令在重定序缓存中的地址,指令在重定序缓存中的状态如图15所示,roqhead增加的最高位为1,用roqhead增加的最高位异或指令roqid增加的最高位得到指令的年龄信息age。图14中的roqid为包括增加的最高位后的值。通过指令age大小的比较,仲裁得到指令有效、已经准备好、且年龄值最小为4的指令B来执行。
在本申请实施例2中,在满足发生例外引发异常取消条件时,在流水线中的指令获取对应的年龄信息,以与发生例外引起异常的指令的年龄信息进行比较,如果流水线中的指令年龄新,则进行取消。
如图16所示,对于发生例外(Exception)引发异常取消的情况,流水线中的每一条指令都带有指令的重定序缓存地址,即roqid号,在流水线中的指令比较自己的roqid号增加的最高位异或重定序缓存读指针(头指针)roqhead最高位后得到的年龄值,与发生例外引起异常的指令的roqid号最高位异或roqhead最高位后得到的年龄值进行比较,如果自己的年龄比发生例外的指令新,则进行取消。
在本申请实施例3中,在满足发生转移预测错引起取消条件时,则流水线中的每一条指令都带有指令的重定序缓存地址或转移重定序缓存地址,将流水线中的指令增加的最高位异或重定序缓存或转移重 定序缓存的读指针增加的最高位后得到的年龄信息,与发生转移预测错引起取消指令增加的最高位,异或重定序缓存或转移重定序缓存的读指针增加的最高位后得到的年龄信息进行比较,如果流水线中的指令年龄新,则进行取消。
如图17所示,对于发生转移预测错引起取消的情况,流水线中的每一条指令都带有指令的转移重定序缓存地址,即brqid号,在流水线中的指令比较自己的brqid号增加的最高位异或转移重定序缓存读指针(头指针)brqhead增加的最高位后得到的年龄值,与发生转移预测错引起取消指令的brqid号增加的最高位异或brqhead增加的最高位后得到的年龄值进行比较,如果自己的年龄比转移预测错引起取消的指令新,则进行取消。
在本申请实施例4中,在发生访存相关引起的重新执行时,将取数指令及比取数指令年龄新的指令均进行回滚,回滚的指令均重新执行。
对于回滚的情况,如发生访存相关引起的重新执行,即发现存数指令(Store)后面到下一条相关的存数指令(Store)之间有地址相关的取数指令(Load)已经写回,则将该取数指令(Load)及后面年龄比该取数指令新的都进行回滚,回滚的取数指令及后面的所有指令在派遣队列重新执行。回滚指令的判断也是根据指令的roqid号增加的最高位异或重定序缓存读指针(头指针)roqhead增加的最高位后得到的年龄值,判断哪些指令比这条引起回滚的取数指令新,则需要重新执行。如图18所示,在定点派遣队列中,根据roqid号增加的最高位异或roqhead增加的最高位作为比较的年龄信息,计算所得的取消项的信息,可见定点派遣队列从第3项开始需要被取消重新执行。
根据本申请实施例提出的调度乱序队列和判断队列取消项的方法,通过异或重定序缓存或转移重定序缓存的读指针增加的最高位与其缓存地址增加的最高位得到指令的年龄信息,并比较指令的年龄信息得到指令年龄的新老,以有效表达出指令年龄的真实信息,在判断时由于异或门的使用减少了比较器的使用数量,从而有效降低了指令年龄判断的复杂度,减小指令年龄判断的延时,有效提高乱序处理器的性能、降低功耗、节约面积,并可以提高乱序队列调度和队列取消的判断效率。解决了由于写指针的翻转引起的根据重定序缓存或转移重定序缓存的地址进行年龄判断时,出现的指令年龄大小的混乱问题。
其次参照附图描述根据本申请实施例提出的调度乱序队列和判断队列取消项的装置。
图19是本申请实施例的调度乱序队列和判断队列取消项的装置的方框示意图。
如图19所示,该调度乱序队列和判断队列取消项的装置10包括:增位模块100和比较模块200。
其中,增位模块100用于在重定序缓存和转移重定序缓存地址前面增加最高位;比较模块200用于用重定序缓存或转移重定序缓存的读指针增加的最高位,异或两个需要比较的重定序缓存或转移重定序缓存地址增加的最高位,将异或后得到的地址作为两条指令的年龄信息进行大小比较,以判断得到指令年龄的新老。
在一些实施例中,本申请实施例的装置10还包括:调度模块,其中,调度模块在调度乱序队列时,选择队列中有效且年龄最老的指令进行执行。
在一些实施例中,本申请实施例的装置10还包括:取消模块。其中,取消模块用于判断队列取消项时,选择队列中引起取消的指令及比引起取消的指令的年龄新的指令进行取消。
需要说明的是,前述对调度乱序队列和判断队列取消项的方法实施例的解释说明也适用于该实施例的调度乱序队列和判断队列取消项的装置,此处不再赘述。
根据本申请实施例提出的调度乱序队列和判断队列取消项的装置,通过异或重定序缓存或转移重定序缓存的读指针增加的最高位与其缓存地址增加的最高位得到指令的年龄信息,并比较指令的年龄信息得到指令年龄的新老,以有效表达出指令年龄的真实信息,在判断时由于异或门的使用减少了比较器的使用数量,从而有效降低了指令年龄判断的复杂度,减小指令年龄判断的延时,有效提高乱序处理器的性能、降低功耗、节约面积,并可以提高乱序队列调度和队列取消的判断效率。解决了由于写指针的翻转引起的根据重定序缓存或转移重定序缓存的地址进行年龄判断时,出现的指令年龄大小的混乱问题。
本申请实施例还提供一种电子设备,包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被设置为用于执行如上述实施例的调度乱序队列和判断队列取消项的方法。
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如上述实施例的调度乱序队列和判断队列取消项的方法。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或N个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“N个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更N个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或N个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,N个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (28)

  1. 一种调度乱序队列和判断队列取消项的方法,其特征在于,包括以下步骤:
    在重定序缓存或转移重定序缓存地址前面增加最高位;
    用重定序缓存或转移重定序缓存的读指针增加的最高位,异或两个需要比较的重定序缓存或转移重定序缓存地址增加的最高位,将异或后得到的地址作为两条指令的年龄信息进行大小比较,以判断得到指令年龄的新老。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    在调度乱序队列时,选择队列中有效且年龄最老的指令进行执行。
  3. 根据权利要求1-2任一所述的方法,其特征在于,还包括:
    在判断队列取消项时,选择队列中引起取消的指令及比引起取消的指令的年龄新的指令进行取消。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述重定序缓存或转移重定序缓存的读取由读指针控制,其中,当队列非空时,所述读指针指向下一次将要读取的第一项,所述重定序缓存或转移重定序缓存是有序的先进先出队列,且所述读指针指向的项是重定序缓存或转移重定序缓存中年龄最老的项,所述年龄最老的项为年龄最老的指令对应的项;当队列空时,读指针和写指针指向同一个空项,读指针和写指针增加的最高位的值相同。
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述重定序缓存或转移重定序缓存的写入由写指针控制,每条指令在所述重定序缓存或转移重定序缓存中的位置由所述写指针分配,其中,在队列未全满时,所述写指针指向下一次将要写入的第一个空项;在队列满时,写指针和读指针指向同一项,写指针和读指针增加的最高位的值相反。
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述重定序缓存或转移重定序缓存的写指针和读指针在复位状态时均指向复位项,当新指令写入到重定序缓存或转移重定序缓存中时,所述写指针指向下一个将要写入的项,当指令退出后,所述读指针指向下一个将要读出的项。
  7. 根据权利要求6所述的方法,其特征在于,所述复位项是指复位时写指针和读指针指向的同一项,复位项为队列的任一项,在复位时读指针和写指针增加的地址最高位相同。
  8. 根据权利要求6或7所述的方法,其特征在于,所述读指针指向的下一个将要读出的项或写指针指向的下一个将要写入的项,按地址增加的方向或按地址减小的方向进行指向。
  9. 根据权利要求8所述的方法,其特征在于,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地址增加的方向指向下一项时,包括:
    当写指针或读指针低位到达队列的第一预设项数时,写指针或读指针指向的下一项低位从0开始编码,其中,所述第一预设项数由最大项数确定;
    当写指针或读指针低位到达队列的第一预设项数时,写指针或读指针增加的最高位的值翻转。
  10. 根据权利要求8所述的方法,其特征在于,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地址增加的方向指向下一项时,包括:
    当写指针或读指针低位到达队列的第一预设项数时,写指针或读指针指向的下一项继续编码,直到到达2的幂次项再低位从0重新编码;
    当写指针或读指针低位到达第二预设项数时,写指针或读指针增加的最高位翻转,其中,所述第二预设项数由2的幂次确定。
  11. 根据权利要求8所述的方法,其特征在于,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地址减小的方向指向下一项时,包括:
    当写指针或读指针低位减小到0时,写指针或读指针指向的下一项低位从队列的第一预设项数开始编码;
    当写指针或读指针低位减小到0时,写指针或读指针增加的最高位的值翻转。
  12. 根据权利要求8所述的方法,其特征在于,当重定序缓存或转移重定序缓存的队列项数不是2的幂次,且读指针和写指针按照地址减小的方向指向下一项时,包括:
    当写指针或读指针低位减小到0时,写指针或读指针指向的下一项低位从最接近队列项数且比队列项数大的第二预设项数重新进行编码;
    当写指针或读指针低位减小到0时,写指针或读指针增加的最高位翻转。
  13. 根据权利要求8所述的方法,其特征在于,当所述读指针和写指针按地址增加的方向指向下一项时,地址增加的方向上年龄值小,其中,年龄值越小,则年龄越老,年龄值越大,则年龄越新。
  14. 根据权利要求8所述的方法,其特征在于,当所述读指针和写指针按地址减小的方向指向下一项时,地址减小的方向上年龄值大,其中,年龄值越大,则年龄越老,年龄值越小,则年龄越新。
  15. 根据权利要求1所述的方法,其特征在于,在重定序缓存和转移重定序缓存地址前面增加的最高位为值A或值B,其中,值A和值B均为1位的二进制值,且值A和值B的值相反。
  16. 根据权利要求15所述的方法,其特征在于,当所述重定序缓存或转移定序缓存中写入和退出指令后,写指针和读指针均未翻转时,包括:
    从读指针到写指针上一项之间的指令有效;
    在写指针未翻转时,写指针分配的指令的地址增加的最高位均为值A;
    在读指针未翻转时,读指针增加的最高位均为值A。
  17. 根据权利要求15所述的方法,其特征在于,当所述重定序缓存或转移重定序缓存中写入和退出指令后,写指针翻转、读指针未翻转时,包括:
    写入指令的地址最高位为值B,读指针增加的最高位为值A。
  18. 根据权利要求15所述的方法,其特征在于,当所述重定序缓存或转移重定序缓存中写入和退出指令后,写指针和读指针均翻转时,包括:
    写入指令的地址最高位为值B,读指针增加的最高位为值B。
  19. 根据权利要求15所述的方法,其特征在于,当所述重定序缓存或转移重定序缓存中写入和退出指令后,写指针再次翻转、读指针未再次翻转时,包括:
    写入指令的地址增加的最高位为值A,读指针增加的最高位为值B。
  20. 根据权利要求15所述的方法,其特征在于,当所述重定序缓存或转移重定序缓存中写入和退出指令后,写指针和读指针均再次翻转时,包括:
    写指针与读指针增加的最高位均恢复到值A。
  21. 根据权利要求1所述的方法,其特征在于,还包括:
    在满足发生例外引发异常取消条件时,在流水线中的指令获取对应的年龄信息,以与发生例外引起异常的指令的年龄信息进行比较,如果流水线中的指令年龄新,则进行取消。
  22. 根据权利要求1所述的方法,其特征在于,还包括:
    在满足发生转移预测错引起取消条件时,则流水线中的每一条指令都带有指令的重定序缓存地址或转移重定序缓存地址,将所述流水线中的指令增加的最高位异或重定序缓存或转移重定序缓存的读指针增加的最高位后得到的年龄信息,与发生转移预测错引起取消指令增加的最高位,异或重定序缓存或转移重定序缓存的读指针增加的最高位后得到的年龄信息进行比较,如果流水线中的指令年龄新,则进行取消。
  23. 根据权利要求1所述的方法,其特征在于,还包括:
    在发生访存相关引起的重新执行时,将取数指令及比所述取数指令年龄新的指令均进行回滚,回滚的指令均重新执行。
  24. 一种调度乱序队列和判断队列取消项的装置,其特征在于,包括:
    增位模块,用于在重定序缓存和转移重定序缓存地址前面增加最高位;
    比较模块,用于用重定序缓存或转移重定序缓存的读指针增加的最高位,异或两个需要比较的重定序缓存或转移重定序缓存地址增加的最高位,将异或后得到的地址作为两条指令的年龄信息进行大小比较,以判断得到指令年龄的新老。
  25. 根据权利要求24所述的装置,其特征在于,还包括:
    调度模块,在调度乱序队列时,选择队列中有效且年龄最老的指令进行执行。
  26. 根据权利要求24-25任一所述的装置,其特征在于,还包括:
    取消模块,用于判断队列取消项时,选择队列中引起取消的指令及比引起取消的指令的年龄新的指令进行取消。
  27. 一种电子设备,其特征在于,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序,以实现如权利要求1-23任一项所述的调度乱序队列和判断队列取消项的方法。
  28. 一种非临时性计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行,以用于实现如权利要求1-23任一项所述的调度乱序队列和判断队列取消项的方法。
PCT/CN2021/095138 2020-11-10 2021-05-21 调度乱序队列和判断队列取消项的方法和装置 WO2022100054A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21810884.3A EP4027236A4 (en) 2020-11-10 2021-05-21 METHOD AND APPARATUS FOR SCHEDULING OUT-OF-ORDER QUEUES AND DETERMINING QUEUE ERASING ELEMENTS
US17/530,192 US11829768B2 (en) 2020-11-10 2021-11-18 Method for scheduling out-of-order queue and electronic device items

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011243930.1 2020-11-10
CN202011243930.1A CN112099854B (zh) 2020-11-10 2020-11-10 调度乱序队列和判断队列取消项的方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/530,192 Continuation US11829768B2 (en) 2020-11-10 2021-11-18 Method for scheduling out-of-order queue and electronic device items

Publications (1)

Publication Number Publication Date
WO2022100054A1 true WO2022100054A1 (zh) 2022-05-19

Family

ID=73785061

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095138 WO2022100054A1 (zh) 2020-11-10 2021-05-21 调度乱序队列和判断队列取消项的方法和装置

Country Status (2)

Country Link
CN (1) CN112099854B (zh)
WO (1) WO2022100054A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4027236A4 (en) 2020-11-10 2023-07-26 Beijing Vcore Technology Co.,Ltd. METHOD AND APPARATUS FOR SCHEDULING OUT-OF-ORDER QUEUES AND DETERMINING QUEUE ERASING ELEMENTS
CN112099854B (zh) * 2020-11-10 2021-04-23 北京微核芯科技有限公司 调度乱序队列和判断队列取消项的方法和装置
CN112527239B (zh) * 2021-02-10 2021-05-07 北京微核芯科技有限公司 一种浮点数据处理方法及装置
CN113312278B (zh) * 2021-07-29 2021-11-05 常州楠菲微电子有限公司 一种静态可分配共享多队列缓存的装置及方法
CN114546497B (zh) * 2022-04-26 2022-07-19 北京微核芯科技有限公司 乱序处理器中队列的访问方法及装置
CN115563027B (zh) * 2022-11-22 2023-05-12 北京微核芯科技有限公司 存数指令的执行方法、系统及装置
CN116483741B (zh) * 2023-06-21 2023-09-01 睿思芯科(深圳)技术有限公司 处理器多组访存队列的保序方法、系统及相关设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104823168A (zh) * 2012-06-15 2015-08-05 索夫特机械公司 用于实现从由加载存储重新排序和优化所引发的推测性转发缺失预测/错误中恢复的方法和系统
US20160335088A1 (en) * 2015-05-11 2016-11-17 Arm Limited Available register control for register renaming
CN107729135A (zh) * 2016-08-11 2018-02-23 阿里巴巴集团控股有限公司 按序进行并行数据处理的方法和装置
CN108628759A (zh) * 2017-12-29 2018-10-09 贵阳忆芯科技有限公司 乱序执行nvm命令的方法与装置
CN111198715A (zh) * 2019-12-26 2020-05-26 核芯互联科技(青岛)有限公司 一种面向乱序高性能核的内存控制器命令调度方法及装置
CN112099854A (zh) * 2020-11-10 2020-12-18 北京微核芯科技有限公司 调度乱序队列和判断队列取消项的方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7350056B2 (en) * 2005-09-27 2008-03-25 International Business Machines Corporation Method and apparatus for issuing instructions from an issue queue in an information handling system
US7882335B2 (en) * 2008-02-19 2011-02-01 International Business Machines Corporation System and method for the scheduling of load instructions within a group priority issue schema for a cascaded pipeline
CN111538534B (zh) * 2020-04-07 2023-08-08 江南大学 一种基于指令凋零的多指令乱序发射方法及处理器

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104823168A (zh) * 2012-06-15 2015-08-05 索夫特机械公司 用于实现从由加载存储重新排序和优化所引发的推测性转发缺失预测/错误中恢复的方法和系统
US20160335088A1 (en) * 2015-05-11 2016-11-17 Arm Limited Available register control for register renaming
CN107729135A (zh) * 2016-08-11 2018-02-23 阿里巴巴集团控股有限公司 按序进行并行数据处理的方法和装置
CN108628759A (zh) * 2017-12-29 2018-10-09 贵阳忆芯科技有限公司 乱序执行nvm命令的方法与装置
CN111198715A (zh) * 2019-12-26 2020-05-26 核芯互联科技(青岛)有限公司 一种面向乱序高性能核的内存控制器命令调度方法及装置
CN112099854A (zh) * 2020-11-10 2020-12-18 北京微核芯科技有限公司 调度乱序队列和判断队列取消项的方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4027236A4 *

Also Published As

Publication number Publication date
CN112099854A (zh) 2020-12-18
CN112099854B (zh) 2021-04-23

Similar Documents

Publication Publication Date Title
WO2022100054A1 (zh) 调度乱序队列和判断队列取消项的方法和装置
US7877559B2 (en) Mechanism to accelerate removal of store operations from a queue
JP3729087B2 (ja) マルチプロセッサシステム、データ依存投機実行制御装置およびその方法
JP3729064B2 (ja) データ依存関係検出装置
JP4856646B2 (ja) 連続フロープロセッサパイプライン
WO2022028048A1 (zh) 乱序处理器中乱序执行队列的调度方法和装置
US9052910B2 (en) Efficiency of short loop instruction fetch
US9170947B2 (en) Recovering from data errors using implicit redundancy
US6671196B2 (en) Register stack in cache memory
US11928467B2 (en) Atomic operation predictor to predict whether an atomic operation will complete successfully
US7725686B2 (en) Systems and methods for processing buffer data retirement conditions
US7418551B2 (en) Multi-purpose register cache
US6321299B1 (en) Computer circuits, systems, and methods using partial cache cleaning
CN213482862U (zh) 用于调度乱序队列和判断队列取消项的乱序处理器
US20040148493A1 (en) Apparatus, system and method for quickly determining an oldest instruction in a non-moving instruction queue
US11829768B2 (en) Method for scheduling out-of-order queue and electronic device items
CN213482861U (zh) 用于判断队列项是否取消的裁决电路
CN115904508B (zh) 乱序处理器中队列的队列项选择方法及装置
CN213482864U (zh) 用于调度乱序队列的仲裁电路
WO2024146076A1 (zh) 乱序处理器中队列的队列项选择方法及装置
JP4253319B2 (ja) バッファデータの廃棄条件を処理するためのシステム
CN213482863U (zh) 具有指令年龄比较功能的仲裁电路
US11010318B2 (en) Method and apparatus for efficient and flexible direct memory access
JP3697393B2 (ja) プロセッサ
US8019952B2 (en) Storage device for storing data while compressing same value input data

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021810884

Country of ref document: EP

Effective date: 20211202

NENP Non-entry into the national phase

Ref country code: DE