CN113805944B - Method and device for distributing and executing out-of-order execution queue multiple instructions in out-of-order processor - Google Patents

Method and device for distributing and executing out-of-order execution queue multiple instructions in out-of-order processor Download PDF

Info

Publication number
CN113805944B
CN113805944B CN202111369022.1A CN202111369022A CN113805944B CN 113805944 B CN113805944 B CN 113805944B CN 202111369022 A CN202111369022 A CN 202111369022A CN 113805944 B CN113805944 B CN 113805944B
Authority
CN
China
Prior art keywords
queue
instruction
tail
item
instructions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111369022.1A
Other languages
Chinese (zh)
Other versions
CN113805944A (en
Inventor
李祖松
郇丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Micro Core Technology Co ltd
Original Assignee
Beijing Micro Core Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Micro Core Technology Co ltd filed Critical Beijing Micro Core Technology Co ltd
Priority to CN202111369022.1A priority Critical patent/CN113805944B/en
Publication of CN113805944A publication Critical patent/CN113805944A/en
Application granted granted Critical
Publication of CN113805944B publication Critical patent/CN113805944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3856Reordering of instructions, e.g. using queues or age tags

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a method and a device for distributing and executing out-of-order execution queue multiple instructions in an out-of-order processor, wherein the method comprises the following steps: constructing a sequence maintenance queue to distribute empty items for instructions and data entering the out-of-order execution queue, wherein the out-of-order execution queue comprises a func domain, and the sequence maintenance queue comprises an identifier id domain and a tail pointer tail; numbering the out-of-order execution queues, and recording the out-of-order execution queue id numbers through an order maintenance queue id domain; the instructions enter an out-of-order execution queue item indicated by an id number corresponding to the sequence maintenance queue tail; maintaining id number information of the queue according to the sequence, and selecting prepared items from the out-of-order execution queue; instructions are allocated to FUs according to allocation rules based on the func fields and instruction names and item numbers of the FUs. The invention not only meets the oldest-first strategy, but also meets the requirement that all FUs can be fully utilized when the multiple instructions are executed simultaneously so as to improve the distribution execution efficiency, thereby improving the performance of the processor, reducing the power consumption and reducing the cost.

Description

Method and device for distributing and executing out-of-order execution queue multiple instructions in out-of-order processor
Technical Field
The present invention relates to the field of microprocessor technology, and in particular, to a method and apparatus for distributing and executing out-of-order execution queue multiple instructions in an out-of-order processor.
Background
The out-of-order queue in the out-of-order processor is used for caching a certain number of instructions (the instructions can be instructions of programs, and also can be internal operations decoded by the processor internally, and one instruction can be translated into one operation or a plurality of operations) and data, and is responsible for distributing null items for the instructions and the data entering the queue and selecting the instructions and the data meeting certain conditions from the queue for execution. The instructions of the out-of-order processor flow in the processor according to the sequence specified in the program when entering the out-of-order queue, and the subsequent instructions can be executed before the previous instructions as long as the execution condition is met, so that the execution speed of the instructions is improved.
For an instruction to enter a queue, when there are multiple empty entries, the empty entries that can be entered need to be allocated. For instruction execution in the selection queue, when multiple entries are ready in the queue, the instruction execution of the first entry queue is generally selected, i.e., the oldest-first policy. This is because considering that the older instruction and the more instructions with dependencies, the oldest instruction is executed first, which can effectively improve the parallelism of the instructions executed by the processor, and the oldest instruction also occupies hardware resources in the processor, including other parts such as out-of-order queues, reorder buffers, and Store buffers, and the older the execution of the oldest instruction, the earlier the hardware resources can be released for use by the following instructions. To identify which instructions in the out-of-order queue are oldest, the order in which the instructions enter the pipeline needs to be known. The prior art adopts a counter, a reordering buffer or a method of using a pointer to maintain the sequence of an out-of-order queue to identify the oldest instruction, and the methods can cause the defects of large delay, high power consumption, large occupied area and the like of an out-of-order processor.
A prior art processor generally shares a transmit queue (execution queue) with several FUs (functional units), and if this transmit queue transmits one instruction to an FU at a time, a 1-of-M arbitration circuit is required (M is the capacity of the transmit queue), and if a plurality of instructions are transmitted to N FUs at the same time, an N-of-M arbitration circuit is required, taking as an example two instructions per cycle selected from the transmit queue, a two-stage arbitration circuit is required, and after one instruction is selected by the first stage arbitration circuit, the instruction is marked, so that the effect of the previous stage can be eliminated in the second stage arbitration circuit, but the 2-of-M arbitration circuit thus implemented generates twice the delay of the 1-of-M arbitration circuit, and when this approach is extended to the N-of-M arbitration circuit, the delay it generates can be significant. There is also a method, where a dedicated 1-of-M arbitration circuit is provided for each FU, and when an instruction is written to a certain item in the issue queue, the instruction is alternately or randomly allocated to a corresponding FU according to the type of the instruction, and cannot be selected.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a device for distributing out-of-order execution queue multi-instruction execution, which can meet both the requirement of an old-first strategy and the requirement of reasonably distributing a plurality of instructions to different functional units FU for execution when the multi-instruction is distributed and executed simultaneously, so that all the functional units FU can be fully utilized to improve the efficiency of scheduling and distribution execution, thereby improving the performance of a processor, improving the master frequency, reducing the power consumption and reducing the cost.
To achieve the above object, the present invention provides a method for distributing out-of-order execution queue multi-instruction execution in an out-of-order processor, comprising the steps of:
and constructing a sequence maintenance queue corresponding to the out-of-order execution queue, and distributing empty items for instructions and data entering the out-of-order execution queue, wherein the out-of-order execution queue comprises a functional component func field, and the sequence maintenance queue comprises an identifier id field and a tail pointer tail.
And numbering the items of the multiple out-of-order execution queues in sequence, and recording the id numbers of the out-of-order execution queues through the id fields of the sequence maintenance queues.
And respectively entering the multiple instructions into an out-of-order execution queue entry pointed by an entry pointed by a tail pointer tail of the order maintenance queue and an id number corresponding to a downward entry of the order maintenance queue.
And maintaining id number information given by the queue according to the sequence, and selecting prepared items from the out-of-order execution queue as execution instructions.
The executable instruction names and instruction item numbers of all functional units FU are counted.
And respectively distributing a plurality of execution instructions into the functional unit FU to be executed according to a preset instruction distribution rule according to the value of the functional unit FUnc field of the instruction, the counted instruction name and the counted instruction item number.
Preferably, the number of entries of the sequential maintenance queue is greater than or equal to the number of entries of the out-of-order maintenance queue.
The out-of-order execution queue comprises a valid field and a ready rdy field, wherein the valid field is used for recording whether an item of the out-of-order execution queue is valid or not, the rdy field is used for recording whether instructions and data of the item of the out-of-order execution queue are ready or not, the sequence maintenance queue comprises a tail item, and the tail item is an item pointed by a tail pointer of the sequence maintenance queue.
In the present invention, when the processor is in the initialized state, the method further comprises:
the id fields of the sequence maintenance queue are numbered from top to bottom in sequence, tail is set to be 0, valid fields of all items in the out-of-sequence execution queue are set to be 0, wherein the valid fields are 0 and indicate that the recorded items are invalid.
In the present invention, when the out-of-order execution queue in the processor is a case of multi-instruction entry only, the method further comprises:
and under the condition that k instructions enter, the k instructions enter out-of-order execution queue entries pointed by tails of the sequential maintenance queue and id numbers corresponding to downward entries of the entries, and the tails of the sequential maintenance queue are moved downward by k entries, wherein the next beat of the tails = current beat of the tails + k.
In the present invention, when the out-of-order execution queue in the processor is a case of multi-instruction execution only, the method further comprises:
under the condition that q instructions are executed, according to the sequence from top to bottom of the sequence maintenance queue to tail, selecting q executable items of which valid domains are 1 and rdy domains are 1 in the out-of-sequence execution queue corresponding to the id number; the item is an empty item after execution, the content of an item which can move before the item pointed by the tail pointer tail of the sequential maintenance queue moves upwards, the next beat tail = the current beat tail-q, and an id number corresponding to the empty item is stored in an item pointed by the next beat tail of the sequential maintenance queue and an id field corresponding to an item downward from the item pointed by the next beat tail of the sequential maintenance queue, wherein a valid field of 1 indicates that the recorded item is valid, and a rdy field of 1 indicates that the recorded item is ready for instructions and data.
In the present invention, when the queue is full, the method further comprises:
after the q ready executable instructions are executed, the content of an item which can move before an item pointed by a tail pointer tail is moved upwards, the tail pointer tail of the next beat points to a tail-q item of the sequential maintenance queue, the tail of the next beat = the tail-q item of the current beat, and the id number of an empty item after the instruction execution in the out-of-order execution queue is stored in the id fields corresponding to the item pointed by the tail pointer tail of the next beat and the item below the empty item.
In the present invention, when the out-of-order execution queue in the processor is the situation of simultaneous multiple instruction entry and execution, the method further comprises:
under the condition that k new instructions enter and q instructions are executed, according to the sequence from top to bottom of a sequence maintenance queue to tail, q executable items with valid fields of 1 and rdy fields of 1 in a disorder execution queue corresponding to id numbers are selected for execution, the content of a movable item before a tail + k item moves upwards, the k new instructions enter the disorder execution queue item indicated by the tail-q item of the sequence maintenance queue and the id number corresponding to the downward item of the tail-q item, the next beat tail = the current beat tail-q + k, and the id numbers of the executed empty items are stored in the item indicated by the next beat tail and the id fields corresponding to the downward item of the next beat tail.
Preferably, k is less than or equal to the number of empty entries of the out-of-order maintenance queue.
Preferably, q is less than or equal to the number of idle functional units FU.
The executable function FU field is a bit vector field equal to the number of functions FU and is used to record whether instructions in the out-of-order execution queue can be executed in a function FU. In particular, as with an out-of-order processor having 6 functional units FU, the field of executable functional unit FUnc being 110000 means that the instructions can be executed at functional unit FU1 and functional unit FU 2.
The value of the bit corresponding to the domain bit vector of the executable functional unit func is 0, which means that the corresponding functional unit FU cannot execute the instruction; a value of 1 for the corresponding bit of the domain bit vector of the executable functional unit func indicates that the corresponding functional unit FU can execute the instruction.
According to the value of the executable functional unit func field of the instruction and the counted instruction name and instruction item number, respectively allocating a plurality of execution instructions to the functional unit FU for execution according to a preset instruction allocation rule, including:
according to the counted instruction name and number of instruction items, a plurality of execution instructions are respectively allocated to the idle functional unit FU, which has the instruction name and the smallest number of instruction items, and the bit value of the corresponding bit of the func domain bit vector of the instruction executable functional unit is 1.
When the distributed instruction appears, and no functional unit FU corresponding to the bit vector value 1 of the functional unit func domain of the executable functional unit FU can execute the instruction, the instruction stops, and the instruction execution of the next instruction with the valid domain of 1, the rdy domain of 1 and the bit vector value 1 of the functional unit func domain of the executable functional unit FU corresponding to the idle functional unit FU is called.
In the invention, when the processor cancels due to the exception caused by the re-execution or exception caused by the transfer prediction error and the access memory, the valid of the cancel item in the out-of-order execution queue is set to be 0, the cancel item with the valid set to be 0 is an empty item, no new instruction enters the cancel item, and the value of tail of the sequence maintenance queue is not changed.
When an out-of-order execution queue entry corresponding to the sequential maintenance queue id before tail has an empty entry due to cancellation, that is, the out-of-order execution queue has an entry of valid = =0, the following processing is performed in each case:
and if the current beat has no instruction execution and no instruction entry, and one beat processes a plurality of null items, maintaining the queue according to the processing sequence of the multi-instruction execution.
And if no instruction is executed in the current beat and an instruction enters the current beat and a plurality of null items are processed in one beat, maintaining the queue according to the processing sequence of the simultaneous multi-instruction entering and execution.
Tail = sequence number +1 of the last entry of the sequential maintenance queue in the sequential maintenance queue if the queue is full, and the current beat processes a plurality of empty entries, the sequential maintenance queue being processed as if the queue was full in the case of multiple-instruction execution only.
To achieve the above object, the present invention further provides an out-of-order execution queue multi-instruction allocation executing apparatus in an out-of-order processor, comprising:
the construction module is used for constructing a sequence maintenance queue corresponding to the out-of-order execution queue and distributing empty items for instructions and data entering the out-of-order execution queue, wherein the out-of-order execution queue comprises an executable functional component func domain, and the sequence maintenance queue comprises an identification id domain and a tail pointer tail.
And the numbering module is used for numbering the items of the out-of-order execution queue in sequence, wherein the id field is used for recording the id number of the out-of-order execution queue.
And the entry module is used for respectively entering the multiple instructions into the items of the out-of-order execution queue pointed by the tail pointer tail of the order maintenance queue and the items corresponding to the id numbers of the downward items.
And the selection module is used for maintaining the id number information given by the queue according to the sequence and selecting the prepared items from the out-of-order execution queue as execution instructions.
The distribution module is used for counting the names and the number of the executable instructions of all the functional units FU; and distributing a plurality of execution instructions into the functional unit FU to be executed according to a preset instruction distribution rule according to the value of the functional unit FUnc field of the instruction, the counted instruction name and the counted instruction item number.
The invention relates to a device for distributing and executing multiple instructions in a disorder execution queue in a disorder processor, which maintains the sequence of the disorder execution queue by using a sequence maintenance queue and reasonably distributes a plurality of instructions according to a preset instruction distribution rule, so that the disorder execution queue not only meets an old-first strategy, but also meets the requirement that all functional components FU can be fully utilized when the multiple instructions are simultaneously transmitted, thereby improving the efficiency of scheduling and distribution execution, improving the performance of the processor, improving the main frequency, reducing the power consumption and lowering the cost.
To achieve the above object, the present invention also provides an electronic device, comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform a method of distributed execution of out-of-order execution queue multiple instructions in an out-of-order processor as described above.
To achieve the above object, the present invention also provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method for allocating execution of out-of-order execution queue multiple instructions in an out-of-order processor as described above.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart illustrating a method for out-of-order execution queue allocation according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating initialization of an out-of-order execution queue and an order maintenance queue according to an embodiment of the present invention;
FIG. 3 is a process diagram of items 2 and 5 of an out-of-order execution queue pointed to by tail of two instruction entry sequential maintenance queues according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a process performed by the out-of-order execution queue according to the embodiment of the present invention to find two executable items (i.e., an item with id 5 and id n-1);
fig. 5 is a schematic diagram of a processing procedure of an out-of-order execution queue after items id =0 and id =2 are executed when the out-of-order execution queue is full according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a process of multiple instructions entering and executing simultaneously according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating an execution process of an instruction when the instruction is allocated in a case where none of the functional units FU corresponding to the bit value 1 of the bit vector of the functional unit func can execute the instruction according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating an apparatus for out-of-order queue allocation according to an embodiment of the present invention;
FIG. 9 is a block diagram illustrating an apparatus for allocating execution queues in an out-of-order processor according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes an out-of-order execution queue allocation execution method and apparatus in an out-of-order processor according to an embodiment of the present invention with reference to the accompanying drawings, and first, an out-of-order execution queue allocation execution method in an out-of-order processor according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Specifically, fig. 1 is a flowchart illustrating an out-of-order execution queue allocation execution method according to an embodiment of the present invention.
As shown in fig. 1, the method for allocating and executing out-of-order execution queue includes the following steps:
and constructing a sequence maintenance queue corresponding to the out-of-order execution queue, and distributing empty items for instructions and data entering the out-of-order execution queue, wherein the out-of-order execution queue comprises a functional component func field, and the sequence maintenance queue comprises an identifier id field and a tail pointer tail.
The out-of-order execution queue further comprises one or more items of a valid field, a rdy field and a data field, and the order maintenance queue comprises an id field.
The out-of-order execution queue further comprises information fields used by instructions such as valid field, rdy field and data field. The valid field records whether the item is valid (for example, defining valid as 1 indicates valid, and valid as 0 indicates invalid), the rdy field records whether the item of instruction and data are ready (for example, defining rdy as 1 indicates ready, that is, reach executable state, and rdy as 0 indicates not ready), and the data field records information such as command and data used by the item of instruction.
In step S102, the items of the out-of-order execution queue of the plurality of items are numbered in sequence, and the id number of the out-of-order execution queue is recorded in the id field of the sequence maintenance queue.
It is to be understood that, in the embodiment of the present invention, various items of the out-of-order execution queue may be numbered in various ways, for example, each item of the out-of-order execution queue of n items may be numbered sequentially from 0 to n-1, and further, for example, each item of the out-of-order execution queue of n items may also be numbered sequentially from 1 to n, which is not limited herein.
The number of the items of the preferred sequence maintenance queue is the same as that of the items of the out-of-order maintenance queue, taking the serial number from 0 to n-1 as an example, the id number of the out-of-order execution queue item is recorded in the id field of the sequence maintenance queue and is used for indexing the out-of-order execution queue; i.e., id numbers from 0 to n-1 correspond to entries 0 to n-1, respectively, of the out-of-order execution queue.
In step S103, multiple instructions are entered into the out-of-order execution queue entry indicated by the tail pointer tail of the out-of-order maintenance queue and the id number corresponding to the downward entry of the tail pointer tail, respectively, and the number of entered instructions is less than or equal to the number of empty entries of the out-of-order maintenance queue.
It can be understood that multiple instructions directly enter the out-of-order execution queue entries pointed by the entry pointed by the sequence maintenance queue tail and the id number corresponding to the downward entry, and no search is needed, so that the method is simple and efficient to implement. The selection instruction execution does not need to carry out multi-stage comparison, does not bring a large amount of data movement, has low delay, and saves power consumption and area.
In step S104, id information given by the sequence maintenance queue is maintained, and a prepared item is selected from the out-of-order execution queue as an execution instruction.
It can be understood that the embodiment of the present invention maintains the id number information given by the queue in order, and finds out the ready item from the out-of-order execution queue to be used as the execution instruction to prepare for execution. In addition, under the condition that a plurality of instructions in the out-of-order execution queue are ready to be executed, the embodiment of the invention finds out the oldest instruction execution from the out-of-order execution queue according to the id number information given by the sequence maintenance queue, and the number of executed instructions is less than or equal to the number of the idle functional units FU.
In step S105, the executable instruction names and the number of instruction items of all the functional units FU are counted.
It will be appreciated that the embodiment of the present invention will count all instruction names and instruction item numbers executable by each functional unit FU in all functional units FU, respectively.
In step S106, according to the value of the executable functional unit func field of the instruction and the counted instruction name and instruction item number, a plurality of executed instructions are respectively allocated to the functional unit FU for execution according to a preset instruction allocation rule.
Specifically, the method comprises the following steps: the out-of-order execution queue includes an executable functional unit func field, which is a bit vector field equal to the number of functional units FU, for recording whether instructions of the out-of-order execution queue can be executed in a functional unit FU.
The value of the bit corresponding to the domain bit vector of the executable functional unit func is 0, which means that the corresponding functional unit FU cannot execute the instruction; a value of 1 for the corresponding bit of the domain bit vector of the executable functional unit func indicates that the corresponding functional unit FU can execute the instruction. In particular, as with an out-of-order processor having 6 functional units FU, the field of executable functional unit FUnc being 110000 means that the instructions can be executed at functional unit FU1 and functional unit FU 2.
It can be understood that, according to the value of the functional unit func field executable by multiple instructions and the counted name and number of instruction items of all instructions executable by each functional unit FU, the embodiment of the present invention allocates multiple execution instructions to the functional unit func field executable by the instruction respectively, and the bit vector of the corresponding bit has the value of 1, and the idle functional unit FU which contains the instruction name and has the least number of instruction items executes.
Further, in the embodiment of the present invention, the initialization of the processor, the entry of only multiple instructions in the out-of-order execution queue, the execution of only multiple instructions in the out-of-order execution queue, the simultaneous entry and execution of multiple instructions in the out-of-order execution queue, the cancellation of the occurrence of an exception or other abnormality caused by the re-execution or exception of the processor due to a transfer prediction error or a memory access, and the like are respectively performed, and the specific method is as follows:
in one embodiment of the invention, when the processor is in the initialized state, the method comprises the following steps: the id fields of the sequence maintenance queue are numbered from top to bottom in sequence, tail is set to be 0, valid fields of all items in the out-of-sequence execution queue are set to be 0, wherein the valid fields are 0 and indicate that the recorded items are invalid.
It can be understood that, during initialization, the id fields may be numbered in various ways in the embodiment of the present invention, for example, the id fields of the sequential maintenance queue may be numbered sequentially from top to bottom as 0 to n-1, and for example, the id fields may also be numbered sequentially from top to bottom as 1 to n, where n is a positive integer, and this is not particularly limited herein, and the number of entries of the sequential maintenance queue may be greater than or equal to the number of entries of the out-of-order execution queue.
In the following embodiment, taking the id fields numbered from 0 to n-1 in sequence from top to bottom as an example, the id fields of the sequential maintenance queue are set from 0 to n-1 in sequence from top to bottom, the tail is set to 0, and the valid fields of the out-of-order execution queues are set to 0 at the time of initialization, as shown in fig. 2.
In one embodiment of the invention, when the processor out-of-order execution queue is a multiple-instruction-only entry case, comprising: and k instructions enter, wherein k is less than or equal to the number of empty items of the out-of-order execution queue, the k instructions enter the out-of-order execution queue items pointed by id numbers corresponding to tail items and downward items of the out-of-order execution queue, and the tail items of the out-of-order maintenance queue are moved downward by k items, wherein the tail of the next beat = the tail + k of the current beat.
In an embodiment of the present invention, as shown in fig. 3, the process of entry 2 and entry 5 of the out-of-order execution queue pointed to by tail of two instructions entering the sequential maintenance queue is shown, valid of entry 2 and entry 5 is set to 1, rdy is set to the ready state of the operand of the entered instruction, ready is set to 1, instruction data information is written into the data field, and bit vector information is written into the functional unit func field. The tail of the next beat of the sequential maintenance queue moves down, i.e., the next beat tail = the current beat tail +2, i.e., the next beat tail points to the item with id 3.
Whenever an entry of multiple instructions is received and no instructions are selected for execution in the same beat, the tail of the sequential maintenance queue moves downward. When the out-of-order execution queue is full, tail = n of the order maintenance queue.
In one embodiment of the invention, when the processor is in the case of instruction-only execution, the method comprises the following steps: q instructions are executed, wherein q is less than or equal to the number of idle functional units FU, q executable items with valid fields of 1 and rdy fields of 1 in an out-of-order execution queue corresponding to the id number are selected according to the sequence from top to bottom of the sequence maintenance queue; the item is an empty item after execution, the content of an item which can move before the item pointed by a tail pointer tail of the sequential maintenance queue moves upwards, the next beat tail = the current beat tail-q, and an id number corresponding to the empty item is stored in an item pointed by the next beat tail of the sequential maintenance queue and an id field corresponding to an item downward from the item pointed by the next beat tail of the sequential maintenance queue, wherein a valid field of 1 indicates that the recorded item is valid, and a rdy field of 1 indicates that the instruction and data of the recorded item are ready. In this embodiment, it is assumed that all q instructions can be executed in the functional unit FU, and if an unexecutable condition occurs, after the allocated instruction is executed in a condition when none of the functional units FU corresponding to the executable functional unit func field bit vector with the value of 1 can execute the instruction, all items which are empty items after execution are processed according to the processing method of this embodiment.
It is understood that only when an instruction is executed, the sequential maintenance queue goes from top to bottom to tail, an out-of-order execution queue corresponding to the id number is found, q executable items satisfying valid = =1 and rdy = =1 are executed, the content of an item which is movable before (not including tail) the tail of the sequential maintenance queue is moved upwards, the next beat tail = the current beat tail-q, and the id number corresponding to the out-of-order execution queue item selected for execution is stored in the id field corresponding to the item pointed by the next beat tail of the sequential maintenance queue and the item below the next beat tail of the sequential maintenance queue.
In the embodiment of the present invention, as shown in fig. 4, in the order from top to bottom of the sequential maintenance queue to tail, two executable instructions (i.e., q =2) satisfying valid = =1 and rdy = =1 at the same time, i.e., entries with id 5 and n-1, are found from the out-of-order execution queue for execution. After execution, the value of the 5 th item and the value of the n-1 th item of the out-of-order execution queue are set to be 0, the item with the id of 0 of the order maintenance queue is moved upwards, the next beat of tail = the current beat of tail-2, and the id numbers 5 and n-1 corresponding to the executed empty items are selected from the out-of-order execution queue and stored in the id fields corresponding to the item pointed by the next beat of tail and the item pointed by the next beat of tail +1 of the order maintenance queue.
In this embodiment, 5 functional units FU are preset, and the respective executable operations are: FU1 may perform three operations, add, sub and sll, FU2 may perform two operations, mul and srl, FU3 may perform two operations, add and sub, FU4 may perform one operation, FU5 may perform two operations, mul and div, and 5 functional units FU are all unoccupied.
The item instruction with id 5 is add, the item instruction with id n-1 is sll, and the instruction is allocated to the functional unit FU which can execute the instruction with the least number of instruction items and is free according to the counted number of instruction items and instruction names which can be executed by each functional unit FU in all the functional units FU. The entry func field bit vector with id 5 corresponds to the functional units FU with bit 1 as FU1, FU3 and FU4, i.e. the func field bit vector is 10110, the functional unit FU with the least number of instruction entries and no space is selected, and the entry with id 5 is allocated to FU 4. The entry func field bit vector with id n-1 corresponds to the functional unit FU with bit 1 being FU1, i.e. the func field bit vector is 10000, the instruction can be executed with the least number of instruction entries, and the entry with id n-1 of the idle functional unit FU is allocated to FU 1.
In one embodiment of the invention, when tail = n of the sequential maintenance queue, that is, when the out-of-order execution queue is full, after q executable instructions are executed, the content of the item that can be moved before tail is moved upwards, and the next beat tail = n-q, the id number of the empty item after execution is stored in the id fields corresponding to tail-q and the items below the tail-q, that is, the id fields corresponding to the items below n-q and n-q.
As shown in fig. 5, tail = = n, that is, when the out-of-order execution queue is full, the sequential maintenance queue finds two executable items satisfying valid = =1 and rdy = =1 simultaneously from top to bottom, that is, item instructions with id being 0 and 2 are executed, the sequential maintenance queue id contents are 3, …, and 4 items are moved up, id number 0 corresponding to an empty item after execution is stored in an id field corresponding to the current beat tail-2 item, that is, the n-2 item of the sequential maintenance queue, id number 2 corresponding to an empty item of the execution item is stored in an id field corresponding to the current beat tail-1 item, that is, the n-1 item of the sequential maintenance queue, and the next beat tail = n-2.
In one embodiment of the invention, when the processor out-of-order execution queue is the case of simultaneous instruction entry and execution, the method comprises the following steps: under the condition that k new instructions enter and q instructions are executed, according to the sequence from top to bottom of a sequence maintenance queue to tail, q executable items with valid fields of 1 and rdy fields of 1 in a disorder execution queue corresponding to id numbers are selected for execution, the content of a movable item before the tail + k item moves upwards, the k new instructions enter the disorder execution queue item indicated by the tail-q item of the sequence maintenance queue and the id number corresponding to the downward item of the tail-q item, the next beat tail = the current beat tail-q + k, and the id numbers of the executed empty items are stored in the item indicated by the next beat tail and the id fields corresponding to the downward item of the next beat tail. In this embodiment, it is assumed that all q instructions can be executed in the functional unit FU, and if an unexecutable condition occurs, the q instructions are executed according to the condition when the allocated instruction occurs that the functional unit FU corresponding to the executable functional unit func field bit vector with the value of 1 cannot execute the instruction, and all items that are empty after execution maintain the queue according to the processing order of the processing method of this embodiment.
Preferably, k is less than or equal to the number of empty entries of the out-of-order maintenance queue.
Preferably, q is less than or equal to the number of idle functional units FU.
In the embodiment of the present invention, as shown in fig. 6, in the case where two new instructions enter and three instructions execute, in the order from top to bottom of the sequential maintenance queue to tail, the executable entries in the out-of-order execution queue that satisfy valid = =1 and rdy = =1 are selected for execution, that is, the processes executed by the n-1 th entry, the 0 th entry and the 2 nd entry (i.e., k is 2 and q is 3). Two instructions enter out-of-order execution queue entries indicated by sequence maintenance queues id =3 and id =5, entries with sequence maintenance queue id being 3 and 5 move upwards at the same time, two instructions enter out-of-order execution queue entries indicated by sequence maintenance queue tail-3 entry and the id number corresponding to the next entry, namely, the out-of-order execution queue entries indicated by sequence maintenance queues id =3 and id =5, valid of the 3 rd entry and the 5 th entry of the execution queue are set to be 1, rdy is set to be whether the operand of the dispatched instruction is ready or not, if not ready, 0 is set, the instruction and data information are written into a data field, and bit vector information is written into a functional component func field; and the next beat tail = the current beat tail-3+2, namely the next beat tail = the current beat tail-1, the id number 0 of the executed null item is stored in the id field corresponding to the item pointed by the next beat tail, the id number 2 is stored in the id field corresponding to the item pointed by the next beat tail +1, and the id number n-1 is stored in the id field corresponding to the item pointed by the next beat tail + 2.
In this embodiment, 5 functional units FU are preset, and the respective executable operations are: FU1 may perform three operations, add, sub and sll, FU2 may perform two operations, mul and srl, FU3 may perform two operations, add and sub, FU4 may perform one operation, FU5 may perform two operations, mul and div, and 5 functional units FU are all unoccupied.
And allocating the instructions to the functional units FU which can execute the instructions and have the least number of instruction items according to the counted number of instruction items and instruction names which can be executed by each functional unit FU in all the functional units FU. The term func field bit vector with id 0 corresponds to the functional unit FU with bit 1 as FU1 and FU3, namely the func field bit vector is 10100, and the term with id 0 is distributed into FU3 according to the rule; the term func field bit vector with id of 2 corresponds to the functional unit FU with bit of 1 as FU1, FU3 and FU4, namely the func field bit vector is 10110, and the term with id of 2 is distributed into FU4 according to the rule; the term func-field bit-vector with id n-1 corresponds to the functional unit FU with bit 1 being FU1, i.e. the func-field bit-vector is 10000, and the term with id n-1 is distributed into FU1 according to the rule.
If a plurality of instructions are allocated according to the principle of prior art sequential allocation, the item id =0 is allocated to FU1, the item id =2 is allocated to FU3, and since the item id = n-1 can only be executed in FU1 but FU1 is already occupied, the item id = n-1 has no functional unit FU to which execution can be allocated, and can only be executed after the functional unit FU (i.e. FU 1) which can execute the operation is idle, so that the processing performance of the processor is greatly reduced.
In one embodiment of the present invention, when an instruction is allocated, and none of the functional units FU corresponding to the bit vector of the func domain of the executable functional unit FU with the value of 1 can execute the instruction, the instruction stops, and the next instruction of the sequence maintenance queue with the value of 1 in the valid domain, 1 in the rdy domain, and 1 in the bit vector of the func domain corresponding to the free functional unit FU is called to execute.
As shown in fig. 7, the preset 5 functional units FU can respectively perform the following operations: FU1 may perform three operations of add, sub and sll, FU2 may perform two operations of mul and srl, FU3 may perform two operations of add and sub, FU4 may perform one operation of add, FU5 may perform two operations of mul and div, and functional units FU1 and FU5 are occupied.
The item instruction with id 0 is sub, the item instruction with id 2 is add, the item instruction with id n-1 is sll, the item instruction with id 3 is div, and the item instruction with id 5 is mul, because only three functional units FU are idle, only the first 3 items with valid field 1 and rdy field 1 can be selected for execution, namely the items with id 0, 2 and n-1, and the instructions are distributed to the functional units FU which can execute the instructions and have the least number of instruction items according to the counted number of instruction items and instruction names which can be executed by each functional unit FU in all the functional units FU. The term func field bit vector with id 0 corresponds to the functional unit FU with bit 1 as FU1 and FU3, namely the func field bit vector is 10100, and the term with id 0 is distributed into FU3 according to the rule; the term func field bit vector with id of 2 corresponds to the functional unit FU with bit of 1 as FU1, FU3 and FU4, namely the func field bit vector is 10110, and the term with id of 2 is distributed into FU4 according to the rule; the term func field bit vector with id n-1 corresponds to the functional unit FU with bit 1 as FU1, namely the func field bit vector is 10000, the term with id n-1 can only be distributed into FU1 according to the rule, and the instruction of the term with id n-1 stops because FU1 is already occupied; and calling an entry with a valid field of 1 and a rdy field of 1 next in the sequence maintenance queue, executing an instruction with a value of 1 corresponding to the func field bit vector of the idle functional unit FU2, wherein calling an entry with id of 5 is executed because the func field bit vector with id of 3 is 00001, the func field bit vector with id of 5 is 01001, and the entry with id of 5 can be executed in FU 2. This maximizes the use of all functional units FU and allows the processor to perform optimally.
Setting the valid of the 0 th item, the 2 nd item and the 5 th item of the out-of-order execution queue after execution as 0, moving the item with the id of 3 of the order maintenance queue upwards, and selecting the id numbers 0, 2 and 5 corresponding to the empty item after execution from the out-of-order execution queue to the item pointed by the next beat of tail of the order maintenance queue and the id field corresponding to the item downwards.
In the invention, when the processor cancels due to the conditions of transfer prediction error, re-execution caused by access and memory access or exception caused by exception, the valid of the cancel item in the out-of-order execution queue is set to be 0, the cancel item with the valid set to be 0 is an empty item, and no new instruction enters the cancel item.
The cancellation condition includes various conditions, for example, the cancellation condition occurs because the processor is cancelled due to a branch prediction error, a replay caused by memory access, or an exception caused by an exception, and the like, and is not particularly limited herein.
It can be understood that, when some items in the out-of-order execution queue are cancelled due to cancellation of exceptions or exceptions caused by re-execution or exception caused by a transfer prediction error or access memory of the processor, the valid of the cancellation item corresponding to the out-of-order execution queue is set to 0, a new instruction does not enter the cancelled beat, and the value of tail of the sequence maintenance queue is unchanged.
When an out-of-order execution queue entry corresponding to the sequential maintenance queue id before tail has an empty entry due to cancellation, that is, the out-of-order execution queue has an entry of valid = =0, the following processing is performed in each case:
(1) in an embodiment of the present invention, if there is no instruction execution in the current beat and no instruction entry, and a beat processes multiple null entries, the queue is maintained according to the processing sequence of the multiple-instruction execution, which is not described herein again.
(2) In an embodiment of the present invention, if there is no instruction execution in the current beat and there is an instruction entry, and when a beat processes multiple null entries, the queue is maintained according to a processing sequence of simultaneous multiple instruction entry and execution, which is not described herein again.
(3) In an embodiment of the present invention, if the queue is full, tail = sequence number +1 of the last item of the sequential maintenance queue in the sequential maintenance queue, and the current beat processes a plurality of empty items, and the sequential maintenance queue is processed according to the condition that the queue is full in the case of multi-instruction execution only, which is not described herein again.
Further, an apparatus for distributing and executing multiple instructions in an out-of-order execution queue of the out-of-order processor is shown in FIG. 8. The distribution circuit is used for finding an empty item of the out-of-order execution queue and storing an incoming instruction into the empty item. The arbitration circuit is also called as a selection circuit, when a plurality of instructions in the out-of-order execution queue are ready to execute, the id number information given by the circuit is maintained according to the sequence, the first oldest instructions are found out from the out-of-order execution queue to execute, all instruction names and instruction item numbers which can be executed by each functional unit FU in all the functional units FU are counted, and a plurality of execution instructions are respectively distributed into the functional units FU to execute according to the preset instruction distribution rule according to the value of the executable functional unit func field of the instruction and the counted instruction names and instruction item numbers.
When an instruction enters, the distribution circuit selects an empty entry to enter, and the entry of the out-of-order execution queue is gated by the id field content pointed by the tail pointer of the order maintenance queue. When the instruction is distributed and executed, the arbitration circuit finds the instruction corresponding to the first oldest items ready for execution from the out-of-order execution queue according to the id number information given by the order maintenance queue, performs corresponding operation on the tail pointer and the id field of the order maintenance queue according to the corresponding operation, and reasonably distributes the multiple instructions to each functional unit FU for execution according to the distribution and execution method.
It should be noted that the out-of-order execution queue may include an issue queue, an access queue of each level of cache, a cache access invalidation queue, a coherency request queue, and the like, and the out-of-order execution queue in the processor may all use the allocation execution method according to the embodiment of the present invention.
According to the method for distributing and executing the multiple instructions in the out-of-order execution queue in the out-of-order processor, which is provided by the embodiment of the invention, the out-of-order execution queue can not only meet the oldest-first strategy in execution, but also meet the requirement that the multiple instructions can be simultaneously executed, the multiple instructions can be reasonably distributed to different functional units FU for execution, all the functional units FU can be fully utilized, the scheduling and distributing and executing efficiency is improved, the processor performance is improved, the master frequency is improved, the power consumption is reduced, and the cost is reduced.
Next, an apparatus for allocating execution queues in an out-of-order processor according to an embodiment of the present invention will be described with reference to the drawings.
FIG. 9 is a block diagram illustrating an apparatus for allocating execution queues in an out-of-order processor according to an embodiment of the present invention.
As shown in fig. 9, the apparatus 10 for allocating out-of-order execution queue in an out-of-order processor includes: a construction module 102, a numbering module 102, an entry module 103, a selection module 104 and an assignment module 105.
The construction module 101 is configured to construct a sequence maintenance queue corresponding to the out-of-order execution queue, and allocate an empty item to an instruction and data entering the out-of-order execution queue, where the out-of-order execution queue includes an executable functional component func domain, and the sequence maintenance queue includes an identifier id domain and a tail pointer tail; the numbering module 102 is configured to number items of the out-of-order execution queue in sequence, where an id field is used to record an id number of the out-of-order execution queue; the entry module 103 is configured to enter the multiple instructions into an out-of-order execution queue entry pointed by an entry pointed by a tail pointer tail of the order maintenance queue and an id number corresponding to a downward entry of the tail pointer tail; the selection module 104 is configured to maintain id number information given by the queue in order, and select a prepared item from the out-of-order execution queue as an execution instruction; an allocation module 105, configured to count names and numbers of executable instructions of all functional units FU; and distributing a plurality of execution instructions into the functional unit FU to be executed according to a preset instruction distribution rule according to the value of the functional unit FUnc field of the instruction, the counted instruction name and the counted instruction item number.
It should be noted that the explanation of the foregoing embodiment of the method for allocating and executing multiple instructions of an out-of-order execution queue in an out-of-order processor is also applicable to the apparatus for allocating and executing an out-of-order execution queue in an out-of-order processor in this embodiment, and details thereof are not repeated here.
According to the multi-instruction allocation execution device for the out-of-order execution queue in the out-of-order processor, provided by the embodiment of the invention, the out-of-order execution queue can meet the oldest-first strategy during execution, and can also meet the requirement that multiple instructions are simultaneously executed, multiple instructions can be reasonably allocated to different functional units FU for execution, so that all the functional units FU can be fully utilized, the scheduling and allocation execution efficiency is improved, the processor performance is improved, the master frequency is improved, the power consumption is reduced, and the cost is reduced.
An embodiment of the present invention further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by at least one processor, the instructions being arranged to perform a method of distributed execution of out-of-order execution queue multiple instructions in an out-of-order processor as in the above embodiments.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and when the program is executed by a processor, the method for distributing and executing the out-of-order execution queue multiple instructions in the out-of-order processor is realized.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (14)

1. An out-of-order execution queue multi-instruction distribution execution method in an out-of-order processor, comprising the steps of:
constructing a sequence maintenance queue corresponding to an out-of-order execution queue, and distributing null items for instructions and data entering the out-of-order execution queue, wherein the out-of-order execution queue comprises a functional component func domain, and the sequence maintenance queue comprises an identifier id domain and a tail pointer tail;
numbering items of the multiple out-of-order execution queues in sequence, and recording id numbers of the out-of-order execution queues through id fields of the sequence maintenance queues;
respectively entering a plurality of instructions into an item pointed by a tail pointer tail of a sequence maintenance queue and an out-of-sequence execution queue item pointed by an id number corresponding to a downward item;
according to id number information given by the sequence maintenance queue, selecting prepared items from the out-of-order execution queue as execution instructions;
the executable functional unit func field is a bit vector field equal to the number of functional units FU, and is used for recording whether the instructions of the out-of-order execution queue can be executed in the functional units FU; the value of the bit corresponding to the bit vector of the functional executable FU field is 0, which means that the corresponding functional FU can not execute the instruction; a value of 1 for the corresponding bit of the domain bit vector of the executable functional unit func indicates that the corresponding functional unit FU can execute the instruction;
counting the names and the number of the executable instructions of all the functional units FU;
according to the value of the executable functional unit func field of the instruction and the counted instruction name and instruction item number, respectively allocating a plurality of the executed instructions to the functional unit FU according to a preset instruction allocation rule for execution, including:
the execution instruction is allocated to the idle functional unit FU, which has the instruction name and the minimum number of instruction items, and the value of the corresponding bit of the bit vector of the func domain of the instruction executable functional unit is 1.
2. The method of claim 1, wherein the number of entries in the sequential maintenance queue is greater than or equal to the number of entries in the out-of-order maintenance queue.
3. The method of claim 1, wherein the out-of-order execution queue comprises a valid field for recording whether an entry of the out-of-order execution queue is valid, and a ready rdy field for recording whether instructions and data of the entry of the out-of-order execution queue are ready, and wherein the sequence maintenance queue comprises a tail entry which is an entry pointed by a sequence maintenance queue tail pointer tail.
4. The method of claim 3, further comprising, in the case of initialization of the processor:
numbering the id fields of the sequence maintenance queue from top to bottom in sequence, setting tail to be 0, and setting valid fields of all the items in the out-of-order execution queue to be 0, wherein the valid fields of 0 indicate that the recorded items are invalid.
5. The method of claim 1, further comprising, when the out-of-order execution queue is a multiple-instruction-only entry condition in the processor:
and under the condition that k instructions enter, the k instructions enter out-of-order execution queue entries pointed by tails of the sequential maintenance queue and id numbers corresponding to downward entries of the entries, and the tails of the sequential maintenance queue are moved downward by k entries, wherein the next beat of the tails = current beat of the tails + k.
6. The method of claim 3, wherein when the out-of-order execution queue is for multiple-instruction only execution, the method further comprises:
under the condition that q instructions are executed, according to the sequence from top to bottom of the sequence maintenance queue to tail, selecting q executable items of which valid domains are 1 and rdy domains are 1 in the out-of-sequence execution queue corresponding to the id number; the item is an empty item after execution, the content of an item which can move before an item pointed by a tail pointer tail of the sequential maintenance queue moves upwards, the next beat tail = the current beat tail-q, and an id number corresponding to the empty item is stored into an item pointed by the next beat tail of the sequential maintenance queue and an id field corresponding to an item downward from the item pointed by the next beat tail of the sequential maintenance queue, wherein the valid field is 1 to indicate that the recorded item is valid, and the rdy field is 1 to indicate that the recorded item is ready for instructions and data.
7. The method of claim 6, wherein when the out-of-order execution queue is full, further comprising:
after the q ready executable instructions are executed, the content of an item which can move before the item pointed by the tail pointer tail is moved upwards, the next beat tail = the current beat tail-q, and the id number of the empty item after the instruction execution in the out-of-order execution queue is stored in the id fields corresponding to the item pointed by the next beat tail and the item below the empty item.
8. The method of claim 3, wherein when the out-of-order execution queue is for simultaneous multiple instruction entry and execution, further comprising:
under the condition that k new instructions enter and q instructions are executed, according to the sequence from top to bottom of the sequence maintenance queue to tail, q executable items with valid fields of 1 and rdy fields of 1 in a disorder execution queue corresponding to id numbers are selected for execution, the content of a movable item before the tail + k item moves upwards, the k new instructions enter the disorder execution queue item pointed by the tail-q item of the sequence maintenance queue and the id numbers corresponding to the downward items of the tail-q item, the next beat tail = the current beat tail-q + k, and the id numbers of the executed empty items are stored in the item pointed by the next tail and the id fields corresponding to the downward items of the next beat tail.
9. The method of claim 5 or 8, wherein k is less than or equal to the number of empty entries in the out-of-order maintenance queue.
10. The method of distributed execution of out-of-order execution queue multiple instructions in an out-of-order processor of any of claims 6 to 8, wherein q is less than or equal to the number of free Functional Units (FUs).
11. The method according to claim 1, wherein the method for allocating execution instructions to the functional unit FU according to a preset instruction allocation rule according to the value of the functional unit func field of the executable instruction and the counted instruction name and number of instruction items, further comprises; when the distributed instruction appears, and no functional unit FU corresponding to the bit vector value 1 of the functional unit func domain of the executable functional unit FU can execute the instruction, the instruction stops, and the instruction execution of the next instruction with the valid domain of 1, the rdy domain of 1 and the bit vector value 1 of the functional unit func domain of the executable functional unit FU corresponding to the idle functional unit FU is called.
12. An apparatus for out-of-order execution queue multiple instruction dispatch execution in an out-of-order processor, comprising:
the system comprises a construction module, a data processing module and a data processing module, wherein the construction module is used for constructing a sequence maintenance queue corresponding to an out-of-order execution queue and distributing empty items for instructions and data entering the out-of-order execution queue, the out-of-order execution queue comprises an executable functional component func domain, and the sequence maintenance queue comprises an identifier id domain and a tail pointer tail;
the numbering module is used for numbering a plurality of items of the out-of-order execution queue in sequence, wherein the id field is used for recording the id number of the out-of-order execution queue;
the entry module is used for respectively entering the multiple instructions into an item pointed by a tail pointer tail of the sequence maintenance queue and an out-of-sequence execution queue item pointed by an id number corresponding to a downward item;
the selection module is used for maintaining id number information given by the queue according to the sequence and selecting prepared items from the out-of-order execution queue as execution instructions;
the executable functional unit func field is a bit vector field equal to the number of functional units FU, and is used for recording whether the instructions of the out-of-order execution queue can be executed in the functional units FU; the value of the bit corresponding to the bit vector of the functional executable FU field is 0, which means that the corresponding functional FU can not execute the instruction; a value of 1 for the corresponding bit of the domain bit vector of the executable functional unit func indicates that the corresponding functional unit FU can execute the instruction;
the distribution module is used for counting the names and the number of the executable instructions of all the functional units FU; and
according to the value of the executable functional unit func field of the instruction and the counted instruction name and instruction item number, respectively allocating a plurality of execution instructions to the functional unit FU for execution according to a preset instruction allocation rule, including:
the execution instruction is allocated to the idle functional unit FU, which has the instruction name and the minimum number of instruction items, and the value of the corresponding bit of the bit vector of the func domain of the instruction executable functional unit is 1.
13. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of distributed execution of out-of-order execution queue multiple instructions in an out-of-order processor according to any of claims 1 to 11.
14. A non-transitory computer readable storage medium having stored thereon a computer program for execution by a processor to implement a method of distributed execution of out-of-order execution queue multiple instructions in an out-of-order processor according to any of claims 1 to 11.
CN202111369022.1A 2021-11-18 2021-11-18 Method and device for distributing and executing out-of-order execution queue multiple instructions in out-of-order processor Active CN113805944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111369022.1A CN113805944B (en) 2021-11-18 2021-11-18 Method and device for distributing and executing out-of-order execution queue multiple instructions in out-of-order processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111369022.1A CN113805944B (en) 2021-11-18 2021-11-18 Method and device for distributing and executing out-of-order execution queue multiple instructions in out-of-order processor

Publications (2)

Publication Number Publication Date
CN113805944A CN113805944A (en) 2021-12-17
CN113805944B true CN113805944B (en) 2022-02-25

Family

ID=78938395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111369022.1A Active CN113805944B (en) 2021-11-18 2021-11-18 Method and device for distributing and executing out-of-order execution queue multiple instructions in out-of-order processor

Country Status (1)

Country Link
CN (1) CN113805944B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114546497B (en) * 2022-04-26 2022-07-19 北京微核芯科技有限公司 Method and device for accessing queue in out-of-order processor
CN115904508B (en) * 2023-01-06 2023-05-05 北京微核芯科技有限公司 Queue item selection method and device for queues in out-of-order processor

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1266592C (en) * 2003-11-26 2006-07-26 中国人民解放军国防科学技术大学 Dynamic VLIW command dispatching method according to determination delay
US10942747B2 (en) * 2017-11-30 2021-03-09 International Business Machines Corporation Head and tail pointer manipulation in a first-in-first-out issue queue
CN109062604B (en) * 2018-06-26 2021-07-23 飞腾技术(长沙)有限公司 Emission method and device for mixed execution of scalar and vector instructions
US20210200552A1 (en) * 2019-12-27 2021-07-01 Intel Corporation Apparatus and method for non-speculative resource deallocation
CN111966406B (en) * 2020-08-06 2021-03-23 北京微核芯科技有限公司 Method and device for scheduling out-of-order execution queue in out-of-order processor

Also Published As

Publication number Publication date
CN113805944A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN111966406B (en) Method and device for scheduling out-of-order execution queue in out-of-order processor
CN113805944B (en) Method and device for distributing and executing out-of-order execution queue multiple instructions in out-of-order processor
US7363467B2 (en) Dependence-chain processing using trace descriptors having dependency descriptors
US5313584A (en) Multiple I/O processor system
US8407454B2 (en) Processing long-latency instructions in a pipelined processor
US5465373A (en) Method and system for single cycle dispatch of multiple instructions in a superscalar processor system
TWI518504B (en) Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
EP0605866B1 (en) Method and system for enhanced instruction dispatch in a superscalar processor system utilizing independently accessed intermediate storage
JP5548037B2 (en) Command issuing control device and method
TW201305819A (en) Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US8661228B2 (en) Multi-level register file supporting multiple threads
US20070143582A1 (en) System and method for grouping execution threads
TW201303736A (en) Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN104679663B (en) The soft sectoring of register file cache
WO2015153121A1 (en) A data processing apparatus and method for executing a stream of instructions out of order with respect to original program order
US10649780B2 (en) Data processing apparatus and method for executing a stream of instructions out of order with respect to original program order
EP1760581A1 (en) Processing operations management systems and methods
KR0122527B1 (en) Method and system for nonsequential instruction dispatch and execution a superscalar processor system
JPH06110688A (en) Computer system for parallel processing of plurality of instructions out of sequence
US10585701B2 (en) Dynamically allocating storage elements to provide registers for processing thread groups
CN117492844B (en) Register renaming method, device and storage medium
KR102170966B1 (en) Apparatus and method for managing reorder buffer of high-performance out-of-order superscalar cores
CN116244005A (en) Multithreading asynchronous data transmission system and method
CN117667223A (en) Data adventure solving method, computing engine, processor and electronic equipment
JPH1091442A (en) Processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant