CN110297662B - Method for out-of-order execution of instructions, processor and electronic equipment - Google Patents

Method for out-of-order execution of instructions, processor and electronic equipment Download PDF

Info

Publication number
CN110297662B
CN110297662B CN201910600769.XA CN201910600769A CN110297662B CN 110297662 B CN110297662 B CN 110297662B CN 201910600769 A CN201910600769 A CN 201910600769A CN 110297662 B CN110297662 B CN 110297662B
Authority
CN
China
Prior art keywords
instruction
scoreboard
instructions
index
index tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910600769.XA
Other languages
Chinese (zh)
Other versions
CN110297662A (en
Inventor
杨龚轶凡
郑瀚寻
闯小明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhonghao Xinying (Hangzhou) Technology Co.,Ltd.
Original Assignee
Zhonghao Xinying Hangzhou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhonghao Xinying Hangzhou Technology Co ltd filed Critical Zhonghao Xinying Hangzhou Technology Co ltd
Priority to CN201910600769.XA priority Critical patent/CN110297662B/en
Publication of CN110297662A publication Critical patent/CN110297662A/en
Application granted granted Critical
Publication of CN110297662B publication Critical patent/CN110297662B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a method for executing instructions out of order, a processor and electronic equipment. Distributing index labels for received instructions through a scoreboard, enabling each instruction to have a specific label, preferentially sending the instructions without structure hazards and read-write hazards to an arithmetic unit for execution, sending the operation results to a re-sequencing cache, and delivering the results to a target register according to the sequence of the index labels after the delivery conditions are met, so that the out-of-sequence execution and in-sequence delivery of the instructions are realized. The scoreboard is matched with the reordering cache, write-read hazards and write-write hazards in data hazards are eliminated, the applicable scene of out-of-order execution is expanded, the utilization rate of an arithmetic unit is improved, the overall speed of instruction execution is accelerated by reducing the waiting time of the instruction execution, and the hardware cost is reduced.

Description

Method for out-of-order execution of instructions, processor and electronic equipment
Technical Field
The present invention relates to the field of processor instruction execution, and in particular, to a method for out-of-order instruction execution, a processor, and an electronic device.
Background
In a sequentially executed computer system, different instructions require different times, and the waiting process may cause the operator to be idle. In order to improve the operating efficiency of the computer program, people need a comprehensive optimization mode to dynamically schedule the execution of the instructions, and the execution is completed in the shortest time. Out-of-order execution can reduce arithmetic unit latency, but may result in data read and write violating the original instruction order, i.e., data hazards occur, such that the instruction execution results are not as expected. The current common way to eliminate Write-Read hazards (Write-After-Read Hazard) and Write-Write hazards (Write-After-Write Hazard) is Register Renaming (Register Renaming), but the extra registers required in this way result in increased chip complexity, area, and power consumption. There is a need for a dynamic scheduling implementation that can achieve a good balance between performance and cost.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, a processor, and an electronic device for instruction out-of-order execution, which are used to solve the problem of write-read hazard and write-write hazard during instruction out-of-order execution.
In order to achieve the purpose, the invention adopts the technical scheme that: a method of out-of-order execution of instructions is provided, along with instruction memory, scoreboards, operators, reorder buffers, data memory, and registers. The method for executing the instructions out of order comprises the following steps:
step 101: the instructions in the instruction memory enter the scoreboard in sequence;
step 102: the scoreboard distributes index labels for the received instructions; the instructions comprise a first instruction; the scoreboard records the information of each instruction, including recording the index tag of each instruction; the index tag ordering mode comprises sequential increasing or sequential decreasing;
step 103: the scoreboard sequentially checks the structure hazards of all the instructions and the read-write hazards in the data hazards, and when the first instruction has the structure hazards or the read-write hazards, the first instruction is set to be in a waiting state; the scoreboard can preferentially send any instruction except the first instruction, which does not have the structural risk and the read-write risk, to the corresponding arithmetic unit; when the structure hazard and the read-write hazard do not exist or disappear, a first instruction is sent to the arithmetic unit,
step 104: the arithmetic unit executes the received instruction and writes an arithmetic result into a result storage position corresponding to the received instruction in the reordering cache;
step 105: and the re-sequencing cache arranges the operation results in the sequence of the row index tags and finally delivers the operation results to the register in sequence.
The received instructions are distributed with index labels through the scoreboard, so that each instruction has a unique identifier, the instructions without read-write hazards and structure hazards are preferentially sent to the arithmetic unit to be executed, out-of-order execution is realized, the re-order cache delivers the operation results to the target register in sequence, the utilization rate of the arithmetic unit is improved, and the instruction execution waiting time is greatly reduced. The scoreboard is matched with the re-sequencing cache to deliver the results of the instructions executed out of sequence in sequence, on the basis of scoreboard hazard detection, the write-read hazard or write-write hazard in the data hazard is eliminated, the applicable scene of out-of-sequence execution is expanded, and therefore the overall speed of instruction execution is improved.
Preferably, in step 103, the scoreboard further transfers instruction operation information of each instruction to the reorder buffer, wherein the instruction operation information includes an index tag of each instruction and operator information required for executing each instruction. The instruction operation information provides a precondition for matching the instruction operation result with the index tag, so that the result of the out-of-order executed instruction can be corresponding to the index tag, and the instruction index tag and the instruction operation result can be correctly corresponding. It should be understood that the instruction operation information passed by the scoreboard to the reorder buffer is not limited to the tag information and the specific operator information, and necessary information stored in the fields in other scoreboards may be transmitted according to actual situations.
More preferably, in step 104, after the operation result is written into the re-ordering cache, the re-ordering cache associates the operation result of each instruction with the index tag of each instruction one by one according to the instruction operation information. The instruction operation result and the index label of the instruction are correspondingly stored, so that the out-of-order calculation result can be rearranged according to the predetermined sequence in the scoreboard, and the correctness of the data sequence when the subsequent result is delivered is ensured. The introduction of the reordering cache avoids redundant register renaming and reservation stations, reduces power consumption, reduces chip area and reduces chip complexity when viewed from the processor.
Preferably, in step 105, the conditions for the in-sequence final delivery are: when the first index tag is the first index tag defined in the scoreboard, the reordering cache directly delivers the operation result of the first index tag to a target register, and when all the index tags before the first index tag are in the same order as all the index tags in the scoreboard and the corresponding operation results are delivered, the reordering cache delivers the operation result to the corresponding register. And when the delivery condition is met, the instruction operation result is delivered to the target register, so that label delivery with an empty calculation result is avoided during delivery, and the delivery sequence is the same as the index label sequence in the scoreboard. Although sometimes waiting for the results of the computation occurs during delivery, the time-consuming computation process is optimized into a parallel execution form through out-of-order execution, and the time saved by the delivery waiting is obviously a better solution compared with the time saved by the parallel computation.
Preferably, the reordering is buffered in the process of or after the delivery is executed, the delivered target information is output and finally transmitted to the scoreboard, and the target information comprises the index tag and the register information of the corresponding instruction; and the scoreboard updates the register occupation state of the corresponding instruction according to the target information. The scoreboard continuously updates the result state of the target register, so that the next instruction related to the target register can be executed in time after the state is updated, and the overall time required by instruction execution is reduced. The timely feedback mechanism of the register state ensures that the register is not vacant in the using process to the maximum extent, and improves the overall operation efficiency.
Preferably, the scoreboard further comprises a second instruction, and when the result of the second instruction is delivered and is no longer related to any other instruction, the scoreboard clears the information of the second instruction; the scoreboard can also enter the information of a new instruction after clearing the information of the second instruction. The design of timely clearing the information of the used instruction establishes a connection mechanism of old instruction elimination and new instruction input optimization, realizes instruction pipelining in the scoreboard, improves the utilization rate of the scoreboard, and accelerates the overall execution speed of the instruction.
Preferably, the corresponding operator and source operand registers of each instruction in the scoreboard are not marked for occupation again after being marked for occupation for the first time and before the marked occupation is released for the first time, and the target register of each instruction can be marked for occupation for at least one time after being marked for occupation for the first time. According to the scheme, the target register is marked to be occupied for multiple times, namely the target register is multiplexed, the situation that the target register which needs to wait in the existing scheme is occupied is changed into the situation of an executable command, the situation that the target register needs to wait when write-write hazards and write-read hazards exist is eliminated through expansion, the range of the executable command is greatly expanded, and out-of-order execution is more realistic.
It is another object of the present invention to provide a processor for out-of-order execution of instructions to solve the problem of hardware cost and performance conflicts.
In order to achieve the purpose, the invention adopts the technical scheme that: a processor is provided for out-of-order execution of instructions, comprising an operator, further comprising a scoreboard and a reorder cache.
The scoreboard distributes index tags for the received instructions and records the index tags of the instructions; the scoreboard comprises an index field used for recording index labels of all instructions.
The reordering cache comprises an index field, and the index field is used for recording index tags of all instructions; the reorder buffer also receives the results of the operations for each instruction. The reordering cache is also used for delivering the operation result of each instruction to a target register.
The scoreboard is directly or indirectly connected with the re-sequencing cache, the scoreboard finally sends the instruction operation information of each instruction to the re-sequencing cache, and the instruction operation information comprises the index tag of each instruction and a specific arithmetic unit of an arithmetic unit required by executing each instruction. The reorder buffer ultimately passes the delivered target information to the scoreboard. The target information includes target register information for the instruction. The scoreboard updates the register occupancy status of instructions related to the commit operation based on the target information.
The processor provided by the invention realizes information interaction among the scoreboard, the arithmetic unit and the reordering cache through direct or indirect connection relation among the scoreboard, the arithmetic unit and the reordering cache, eliminates write hazard and write-read hazard, and realizes out-of-order instruction sequential delivery of instructions. Complex operations such as reservation station and register renaming are not needed, the device composition is greatly simplified, the instruction operation performance and the execution efficiency are improved, and meanwhile, the hardware cost and the wiring difficulty of the processor are reduced.
Preferably, the re-ordering cache is directly or indirectly connected with an arithmetic unit, and the arithmetic unit finally transmits the operation result of each instruction to the re-ordering cache. And the re-sequencing cache inputs the operation result into the operation result storage bit corresponding to the index tag direction according to the instruction operation information. The scoreboard is matched with the re-sequencing cache to deliver the results of the instructions executed out of order in sequence, so that the correct delivery of the subsequent operation results is ensured, and the overall speed of instruction execution is improved.
The invention also provides an electronic device which at least comprises the processor or uses the method for out-of-order execution of the instructions.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of a processor that provides out-of-order execution of instructions according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for out-of-order execution of instructions according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating state and information changes during instruction execution according to an embodiment of the present invention;
FIG. 4 is a flow diagram of a method for out-of-order execution of instructions in conjunction with processor 100 and portions of program instructions in an embodiment of the present invention;
FIG. 5 is a table of the status recorded in the scoreboard when the number of commands is greater than the scoreboard capacity in an embodiment of the present invention;
fig. 6 is a table of the status recorded in the scoreboard when the number of instructions is less than or equal to the scoreboard capacity in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that when an element is referred to as being "connected to" another element, or "coupled" to one or more other elements, it can be directly connected to the other element or indirectly connected to the other element or elements.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The following describes embodiments of a method, a processor, and an electronic device for out-of-order execution of instructions according to embodiments of the present invention.
An embodiment of the present invention provides a processor for executing instructions out of order, and fig. 1 is a schematic structural diagram of a processor 100, where the processor is merely an example of a processing apparatus for executing instructions out of order provided in the embodiment of the present invention, and in practical applications, the apparatus is not limited to a processor, and may be other apparatuses capable of implementing instructions out of order execution. The processor comprises the following components: instruction memory 101, scoreboard 102, registers 103, operator 104, data memory 105, and reorder buffer 106. Both the scoreboard 102 and the reorder buffer include an index field for recording index tags for instructions.
As shown in fig. 1, one of the input terminals of the scoreboard 102 is connected to the output terminal of the instruction memory 101, and the scoreboard 102 receives the instructions output by the instruction memory 101 and assigns index tags to the received instructions; one of the outputs of the scoreboard 102 is connected to one of the inputs of the register 103 for the scoreboard 102 to pass control signals to the register 103; one of the output ends of the scoreboard 102 is further connected with one of the input ends of the arithmetic unit 104, and the scoreboard 102 sends an instruction without structural risk and read-write risk (or structural risk and read-write risk disappear) to the arithmetic unit for execution after the instruction is subjected to risk check; one of the outputs of the scoreboard 102 is further connected to one of the inputs of the re-ordering cache 106, for the scoreboard 102 to send instruction operation information to the re-ordering cache 106, where the instruction operation information includes an index tag of an instruction and a specific arithmetic unit of an arithmetic unit required for executing the instruction.
An operator 104 is connected to the scoreboard 102 and is also connected to the registers 103, data memory 105 and reorder buffer 106. The operator 104 receives the instruction sent by the scoreboard 102, executes the instruction in combination with the data in the register 103 and the data memory 105, and then passes the instruction execution result to the reorder buffer 106. The operator 104 includes at least one arithmetic logic unit pipeline and at least one load/store pipeline.
One input end of the re-sequencing cache 106 is connected with one output end of the arithmetic unit 104, the re-sequencing cache 106 receives the instruction execution result transmitted by the arithmetic unit 104, and the instruction execution result and the index tag of the instruction are in one-to-one correspondence by combining the instruction operation information transmitted by the scoreboard 102; one output end of the reorder buffer 106 is connected to one input end of the register 103, and the reorder buffer 106 delivers the operation result which reaches the delivery condition to the register 103; one of the outputs of reorder buffer 106 is also coupled to one of the inputs of scoreboard 102 for reorder buffer 106 to return target information to scoreboard 102, and scoreboard 102 updates the target register occupancy state of the associated instruction based on the target information sent from reorder buffer 106.
Through the direct or indirect connection relation among the scoreboard, the arithmetic unit and the reordering cache, the information interaction among the scoreboard, the arithmetic unit and the reordering cache is realized, the instruction arithmetic result can be finally and correctly delivered to the target register, the out-of-order instruction sequential delivery of the instruction is realized, and the write-read hazard and the write-write hazard are eliminated with low cost. The hardware cost and the wiring difficulty of the processor are reduced while the performance and the execution efficiency of the instructions are improved.
It should be understood that, in the embodiment of the present invention, the Processor 100 may be a Central Processing Unit (CPU), and the Processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In an embodiment of the present invention, there is also provided a method for out-of-order execution of instructions, the method providing an instruction memory, a scoreboard, an operator, a register, a data memory, and a reorder buffer, fig. 2 is a flowchart of steps of the method, including the steps of:
step 201: the instructions in the instruction memory are sequenced into the scoreboard. The sequence may be the original sequence of the instructions, or the sequence in which the instructions are arranged or read according to a certain rule.
Step 202: the scoreboard receives each instruction transmitted by the instruction memory and distributes index tags to the instruction, each instruction comprises a first instruction, the scoreboard records information of each instruction, the information comprises the index tags recording each instruction, and the ordering mode of the index tags comprises sequential increasing or sequential decreasing. The index tag may also be referred to as an index value. There are various rules for assigning index labels, which may be implemented by a counter or a queue. The index allocation method obtained by the person skilled in the art without paying creative efforts also belongs to the protection scope of the present invention. The specific method realized by the counter is as follows: each instruction enters a scoreboard in sequence, the scoreboard sets the initial value of a counter as the index tag of the first instruction, the value of the counter is increased or decreased every time an instruction is entered later, and the new value of the counter is the index tag of the newly entered instruction. The specific method realized by the queue is as follows: setting a queue, sequentially storing tokens of 0, 1 and 2 … … n from beginning to end after the queue is initialized, taking out one token from the head of the queue and assigning the value of the token to a corresponding instruction each time a new instruction enters the scoreboard. When the reorder buffer commits the instruction, the token is recycled, putting back the tail of the queue.
And step 203, the scoreboard checks whether the structural risk or the read-write risk exists in each instruction in sequence. The order may be in index tag order or in other orders according to the teachings of the present invention. The checking may be in a polling manner, or may be in other checking manners applicable to the actual application scenario according to the present invention. If the instruction does not have the structure hazard and the read-write hazard, step 205 is executed, and if the instruction does have the read-write hazard and the structure hazard, step 204 is executed. The risk checking of the scoreboard on each instruction is carried out in sequence, the maximum value of all instructions in the scoreboard depends on the maximum capacity of the scoreboard, or partial instructions in the scoreboard are checked in sequence, and specifically, whether the total instructions or the partial instructions are checked depends on the requirement on performance. When a plurality of instructions are ready, the instructions are conventionally sent to the arithmetic unit in sequence according to the original instruction sequence for execution.
Step 204: when a certain instruction has a structure hazard and a read-write hazard, the scoreboard marks the instruction as a waiting state until the structure hazard and the read-write hazard disappear, and then step 205 is executed.
Step 205: when the instruction does not have structural hazard and read-write hazard, the scoreboard sends the instruction to an arithmetic unit corresponding to the instruction, and also sends instruction operation information to the re-sequencing cache, wherein the instruction operation information comprises an index tag of the instruction and arithmetic unit information required by executing the instruction. In addition, the scoreboard marks registers and operators related to the instructions as occupied, wherein the operators and source operand registers corresponding to the instructions in the scoreboard cannot be marked and occupied again before the first marked and occupied state is released after the first marked and occupied registers are marked and occupied for any time after any target register of the instructions in the scoreboard is marked and occupied for any time no matter whether the occupation is released or not. The arbitrary number of times may be one time, 0 time, or a plurality of times, and the maximum value of the plurality of times depends on the number of instructions related to the target register.
Step 206: after receiving the instruction, the arithmetic unit calls the needed data from the related data memory and the register, executes the instruction, and sends the result to the reordering cache after the execution is finished. At the same time, the scoreboard frees up the correlation operator and source operand registers. Note that only the operator and source operand registers are freed here, and the destination register is not yet freed.
Step 207: and the reordering cache receives the operation result transmitted by the operator, and the operator result and the index tag are in one-to-one correspondence according to the instruction operation information transmitted by the scoreboard. And when the operation result meets the delivery condition, delivering the operation result to the corresponding target register by the reordering cache. The delivery conditions are: when the sequence of the index tags before the first index tag is the same as the sequence of the index tags before the first index tag in the scoreboard and the corresponding operation results are delivered, the re-sequencing cache delivers the operation results corresponding to the first index tag to the corresponding register. Note that the re-order cache will deliver the result of the operation of the first index tag defined in the scoreboard directly. The reorder buffer may send target information to the scoreboard, including corresponding index tags and register information, either while the commit operation is performed or after the commit operation is performed. And after receiving the target information, the scoreboard updates the target register occupation state of the corresponding instruction.
Step 208: when the result of any instruction in the scoreboard has been delivered and the instruction has no data correlation with any other instruction, the scoreboard will clear the information of the instruction; after the information is cleared, the scoreboard can also input a new instruction, so that the instruction pipelining in the scoreboard is realized.
The method will be briefly described below in conjunction with the processor 100 and portions of the program instructions. Specifically, four instructions exist in the instruction memory, the maximum capacity in the scoreboard is 3, namely, only three pieces of instruction information can be recorded in the scoreboard at most. FIG. 3 is a diagram illustrating the state and information changes during instruction execution according to an embodiment of the present invention. FIG. 4 is a flowchart of a method for out-of-order execution of instructions in conjunction with processor 100 and portions of program instructions, according to an embodiment of the present invention.
(1) The program instructions are as follows:
r0< -mem 5] (read data from memory mem address 5, store the result in register R0, occupy the load/store pipeline);
r2< -R0+ R1 (read registers R0 and R1, store the result in register R2, occupy the arithmetic logic unit pipeline);
r0< -R3+ R4 (read registers R3 and R4, store the result in register R0, occupy the arithmetic logic unit pipeline);
r5< -mem < 6 > (read data from memory mem address 6, store the result in register R5, occupy the load/store pipeline).
(2) The execution steps are as follows, as shown in fig. 4:
step 401: the instructions in the instruction memory enter a scoreboard according to an original sequence, and the scoreboard distributes index labels to the received instructions through a counter; the scoreboard records the information of the first three instructions, including recording the index tags thereof; the sequence of the index tags is set to be a positive sequence, and the method is realized by applying a counter, and the adopted specific rule is as follows: the initial value of the counter is set to be 0, each instruction enters the scoreboard in sequence, the scoreboard sets the initial value of the counter to be the index tag of the first instruction, the value of the counter is increased by 1 every time one instruction enters later, and the new value of the counter is the index tag of the newly entered instruction. The instruction state table shown in fig. 3(a) is obtained after the index tag is allocated.
Step 402: the scoreboard checks the instruction with index tag 0, which needs to occupy the load/store pipeline in the operator, and now no instruction occupies the corresponding component, launches the instruction, sends the instruction operation information of the instruction to the reorder buffer, and the instruction operation information includes the index tag 0 of the instruction and the operator unit L/S used to execute the instruction, as shown in fig. 3(b), where L/S denotes the load/store pipeline. The scoreboard also marks the load/store pipeline and register R0 as busy.
Step 403: the scoreboard checks the instruction with index tag 1, needs to occupy the arithmetic logic unit pipeline ALU in the arithmetic unit, reads the registers R0 and R1, now the register R0 is occupied, and can be read only after the previous instruction is written, that is, there is read-write hazard, and needs to wait until the hazard disappears and then can enter the arithmetic unit to execute.
Step 404: the scoreboard checks for instructions with index 2, requiring the arithmetic logic pipeline in the operator to be occupied, reading registers R3 and R4, and no instruction now occupies the corresponding operator and register. The scoreboard emits the instruction, and sends the instruction operation information of the instruction to the reorder buffer, wherein the instruction operation information includes the index tag 1 of the instruction and an arithmetic unit ALU used by the instruction to execute, as shown in fig. 3(c), wherein ALU represents an arithmetic logic unit pipeline. The scoreboard also marks the arithmetic logic unit pipeline and registers R0, R3, and R4 as occupied. The R0 register here has been marked as occupied in step 2 and can still be marked as occupied again here.
It should be noted that the scoreboard performs hazard checking on the three instructions in index tag order. Since the latency of the ALU pipeline in the operator is less than the latency of the load/store pipeline, instructions with index tag 2 are executed before instructions with index tag 0. Since there is a read-write hazard for the instruction with index tag 1, the instruction needs to wait for the hazard to disappear before it can be executed.
Step 405: after the instruction with index tag 2 is executed by the arithmetic unit, the result is returned to the re-sequencing cache, and the re-sequencing cache associates the operation result with the index tag in combination with the instruction operation information, as shown in fig. 3 (e). Since the result of the instruction is still pending before, the result of the instruction with index tag 2 cannot be delivered to register R0 yet, and is ready. After the instruction with index tag 2 is completed, the scoreboard frees the corresponding arithmetic logic unit pipeline and source operand registers R3, R4. Note that register R0 is still marked occupied by instructions with index tags 0 and 2 at this time.
Step 406: after the instruction with the index tag of 0 is executed, the scoreboard relieves occupation of the corresponding loading/storing assembly line, and the operation result is returned to the reordering cache. Since index tag 0 is the first index tag defined by the scoreboard, it is delivered directly to register R0 after the corresponding result in the redirection cache is ready. When the commit operation is performed, the re-order cache delivers target information 0, including index tag 0 and target register R0 associated with the commit operation, to the scoreboard, as shown in fig. 3 (i). The scoreboard, upon receiving target information 0, deallocates register R0 with the instruction having index tag 0. At this point R0 is also marked as occupied by the instruction with index label 2. At this time, in the re-order buffer, as shown in fig. 3(f), although the calculation results of all the index tags before the index tag 2 have already been delivered, the order of all the index tags before the index tag 2 does not match the order of all the index tags before the index tag 2 in the scoreboard, and therefore the condition for delivering the result is not satisfied. Furthermore, the scoreboard cannot clear the target register in the index tag 0 instruction from the source operand register of the index tag 1 instruction, or the data dependency of the book.
Step 407: the scoreboard continues to check for instructions with index tag 1. No instructions occupy the arithmetic logic unit pipeline and register R0 in the operator at this time. The scoreboard emits the instruction, and sends the instruction operation information of the instruction to the reorder buffer, wherein the instruction operation information includes the index tag 1 of the instruction and the arithmetic logic unit pipeline used by the execution instruction, as shown in fig. 3(d), where ALU represents the arithmetic logic unit pipeline. And marks the arithmetic logic unit pipeline and registers R2, R0, and R1 in the operator as occupied. In the re-order cache, as shown in fig. 3(g), since the result of the instruction with index tag 1 is pending, the result of the instruction with index tag 2 still cannot be delivered. The instruction with the index label of 1 is executed, the register R0 in the instruction with the index label of 0 has no correlation or data correlation with any other instruction in the scoreboard, at this time, the scoreboard clears the information of the instruction, enters a new instruction, sets the index label to 3, detects the hazard of the scoreboard, sends the scoreboard to the arithmetic unit for execution if the scoreboard meets the condition, waits until the hazard disappears if the scoreboard does not meet the condition, and the subsequent execution process is not repeated here.
Step 408: and after the instruction with the index tag of 1 is executed, returning the result to the reordering cache. The scoreboard deallocates the arithmetic logic unit pipeline and registers R0, R1 corresponding to the instruction with index tag 1. At this time, as shown in fig. 3(h), since the order of all index tags before the index tag 1 is consistent with the order of all index tags before the index tag 1 in the scoreboard and the result is committed to the corresponding register, the operation result of the instruction with the index tag 1 satisfies the commit condition, the re-ordering cache commits the result to the register R2, and the re-ordering cache returns the target information 1 including the index tag 1 and the target register R2 related to the commit operation to the scoreboard, as shown in fig. 3 (j). The scoreboard, upon receiving target information 1, deallocates register R2 with the instruction having index tag 1. At this time, the instruction with the index tag of 1 has no correlation or data correlation with any other instruction in the scoreboard, at this time, the scoreboard clears the information of the instruction, and no new instruction is to be entered because no new instruction is to be entered, the new instruction is not entered. Furthermore, the results of all instructions before the instruction with index tag 2 are committed, and the order of the index tags before index tag 2 is consistent with the order of all index tags before index tag 2 in the scoreboard, so that the commit condition is also reached, at this time, the re-ordering cache commits the operation result of the instruction with index tag 2 to the corresponding register R0, and simultaneously, the re-ordering cache returns target information 2 including the index tag 2 and the target register R0 related to the commit operation to the scoreboard, as shown in fig. 3 (k). The scoreboard, upon receiving target information 2, deallocates register R0 with the instruction labeled 2.
Further, the above example may be generalized to out-of-order execution of m instructions, with specific descriptions given below. Fig. 5 is a table of states recorded in the scoreboard when the number of commands is greater than the capacity of the scoreboard in the embodiment of the present invention. Specifically, m instructions exist in the instruction memory, at most n pieces of instruction information can be recorded in the scoreboard, and the maximum value of n depends on the maximum value of the storage capacity of the scoreboard.
When m is smaller than n, the specific implementation process is as follows:
the first n instructions in the instruction memory enter the scoreboard in the original order of the instructions. The scoreboard receives the n instructions and assigns index labels to them in the manner described in the previous example, with the final record shown in fig. 5. The scoreboard polls and checks the structure risk and the read-write risk according to the sequence of the index tags, and sends the instruction with the index tag i to the arithmetic unit when detecting that the instruction with the index tag i does not have the structure risk and the read-write risk; the instruction with the index label of j has read-write hazard and is set to be in a waiting state until the hazard disappears; and the instruction with the index label of k has no structure hazard and read-write hazard, and is preferentially sent to the arithmetic unit.
As shown in FIG. 5, when the target registers of the instructions with indices i and k are both Rx, both instructions can still be sent to the arithmetic unit, since the calculation result will be buffered in the reorder buffer and will not be directly transferred to the target register. When the result is finally input into the target register, the label in the scoreboard is followed to be sequentially delivered one by one, so that the result of sequential delivery is obtained. Obviously, the number of times that the target register of the instruction can be occupied by the flag is not limited to 2 times in the present embodiment. In other words, when k represents that k is the same as i in the target register, there is no data of read-write hazard and structure hazard, and the value range of k can be between 1 and m-2 theoretically. It should be appreciated that in practical applications, the target register of an instruction can be occupied by multiple tags depending on the actual computational requirements.
In an embodiment of the application of the present invention, x instructions are provided with read-write hazards, and the instruction index tag having the read-write hazards in the first instruction in the index tag sequence is y, at this time, only the instruction result satisfying the delivery condition that the instruction index tag is smaller than y can be delivered in the re-sequencing cache, and the other x-y-1 subsequent instructions can be delivered according to the delivery condition when the read-write hazards in all the instructions having the instruction index tags smaller than the instruction completely disappear.
When the result of the instruction with the index label i is delivered and has no correlation or data correlation with other instructions in the scoreboard, the scoreboard clears the instruction with the index label i, and meanwhile, the subsequent instruction is moved forward, and the record of the entered new instruction is placed at the bottom of the scoreboard state table. It should be noted that the operations of deallocating, re-sequencing the buffered received results and the like in the above process are the same as those in the above embodiments, and are not described herein again.
When m is less than or equal to n, fig. 6 is a table of the status recorded in the scoreboard when the number of commands is less than or equal to the capacity of the scoreboard in the embodiment of the present invention. The m instructions in the instruction memory enter the scoreboard in the original order of the instructions, which the scoreboard assigns index tags to, as shown in fig. 6, where fig. 6(a) is the case when m is less than n and fig. 6(b) is the case when m is equal to n. The above examples of the specific processes of performing out-of-order delivery operation and the like have been described in detail, and are not described herein again. Note that in this case, since all the commands have been entered into the scoreboard, the scoreboard will clear the old commands and then will not enter new commands.
In the embodiment, the scoreboard distributes the index tags to the received instructions, so that each instruction has a specific tag, the instructions without structure hazards and read-write hazards are preferentially sent to the arithmetic unit to be executed, the operation results are sent to the re-sequencing cache, and the results are delivered to the target register in sequence after the delivery conditions are met, so that the out-of-order execution and in-sequence delivery of the instructions are realized, the utilization rate of the arithmetic unit is improved, and the instruction execution waiting time is greatly reduced. The scoreboard is matched with the re-sequencing cache to deliver the results of the instructions executed out of sequence in sequence, on the basis of scoreboard hazard detection, the write-read hazard or write-write hazard in the data hazard is eliminated, the applicable scene of out-of-sequence execution is expanded, and therefore the overall speed of instruction execution is improved.
In addition, the embodiment of the present invention further relates to an electronic device, which includes the processor 100 or an internal device thereof, and uses the method for executing the instructions out of order. The electronic device may be a computer, a server, a data processing device, a smart terminal, a mobile phone, a printer, a sensor, a watch, a vehicle data recorder, a vehicle-mounted computer, a navigator, a household appliance, and/or a medical device.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. A method of out-of-order execution of instructions providing an instruction memory, registers and an operator, and a reorder buffer and scoreboard, the method comprising the steps of:
step 101: instructions in the instruction memory enter the scoreboard in order;
step 102: the scoreboard distributes index labels for the received instructions; each instruction comprises a first instruction; the scoreboard records the information of each instruction, including recording the index tag of each instruction; the index tag ordering mode comprises sequential increasing or sequential decreasing;
step 103: the scoreboard checks the structure risk and the read-write risk of each instruction according to the index tag sequence, when the first instruction has the structure risk or the read-write risk, the scoreboard sets the first instruction to be in a waiting state, and the scoreboard can preferentially send any instruction which does not have the structure risk and the read-write risk except the first instruction to the arithmetic unit corresponding to the instruction; when the first instruction does not have the structural hazard and the read-write hazard, or the structural hazard and the read-write hazard disappear, the scoreboard sends the first instruction to the arithmetic unit; after the corresponding arithmetic unit and the source operand register of each instruction in the scoreboard are marked and occupied for the first time, the corresponding arithmetic unit and the source operand register of each instruction cannot be marked and occupied again before the first marked and occupied state is removed, and the target register of each instruction can be marked and occupied after being marked and occupied for the first time;
step 104: the arithmetic unit executes the received instruction and writes an arithmetic result into a result storage position corresponding to the received instruction in the reordering cache;
step 105: and the reordering cache arranges the operation results according to the sequence of the index tags and finally delivers the operation results to a register in sequence.
2. The method of claim 1, wherein in step 103, the scoreboard further passes instruction operation information of the instructions to the reorder buffer, the instruction operation information comprising at least the index tag of the instructions and the operator information required to execute the instructions.
3. The method of claim 2, wherein after the operation result is written into the reorder buffer in step 104, the reorder buffer associates the operation result of each instruction with the index tag of each instruction one by one according to the instruction operation information.
4. The method of out-of-order execution of instructions according to claim 1, wherein in step 105, the condition for in-order final commit is: the reordering cache comprises at least one instruction operation result and an index tag, wherein the index tag is the same as the index tag in the scoreboard, the index tag comprises a first index tag, when the first index tag is a first index tag defined in the scoreboard, the reordering cache directly delivers the operation result of the first index tag to a target register, and when all the index tag sequences before the first index tag are the same as all the index tag sequences before the first index tag in the scoreboard and the operation results corresponding to all the index tags before the first index tag are delivered, the reordering cache delivers the operation result corresponding to the first index tag to a corresponding register.
5. The method of out-of-order execution of instructions according to claim 1, wherein the re-ordering caches outputs committed target information, including at least index tag and register information of corresponding instructions, at or after the commit execution and eventually passes to the scoreboard; and the scoreboard updates the register occupation state of the corresponding instruction according to the target information.
6. The method of out-of-order execution of instructions according to claim 1, wherein the scoreboard further comprises a second instruction, and when the result of the second instruction is committed and is no longer relevant to any other instruction, the scoreboard clears the information of the second instruction; and after the scoreboard clears the information of the second instruction, the information of a new instruction is also input.
7. A processor for out-of-order execution of instructions comprises an arithmetic unit, and is characterized by further comprising a scoreboard and a reordering cache;
the scoreboard distributes index tags for the received instructions and records the index tags of the instructions, the scoreboard comprises an index field, and the index field is used for recording the index tags of the instructions; after the corresponding arithmetic unit and the source operand register of each instruction in the scoreboard are marked and occupied for the first time, the corresponding arithmetic unit and the source operand register of each instruction cannot be marked and occupied again before the first marked and occupied state is removed, and the target register of each instruction can be marked and occupied after being marked and occupied for the first time;
the reordering cache comprises an index field, and the index field is used for recording the index tag of each instruction; the reordering cache is used for receiving the operation result of each instruction; the reordering cache is also used for delivering the operation result of each instruction to a target register;
the scoreboard is directly or indirectly connected with the re-sequencing cache, the scoreboard finally sends the instruction operation information of each instruction to the re-sequencing cache, and the instruction operation information at least comprises the index tag of each instruction and an arithmetic unit required by executing each instruction; the reordering buffer finally transmits the delivered target information to the scoreboard when the delivery is executed or after the delivery is executed; the target information at least comprises target register information of each instruction; and the scoreboard updates the register occupation state of the instruction related to the delivery operation according to the target information.
8. The processor of claim 7, wherein the reorder cache is coupled directly or indirectly to an operator that issues results of operations of the instructions and ultimately enters the reorder cache; and the reordering cache searches the index tag position corresponding to the operation result according to the instruction operation information and writes the operation result into the operation result storage bit pointed by the index tag.
9. An electronic device, characterized in that it uses the method of any of claims 1-6 or comprises the processor of claim 7 or 8.
CN201910600769.XA 2019-07-04 2019-07-04 Method for out-of-order execution of instructions, processor and electronic equipment Active CN110297662B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910600769.XA CN110297662B (en) 2019-07-04 2019-07-04 Method for out-of-order execution of instructions, processor and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910600769.XA CN110297662B (en) 2019-07-04 2019-07-04 Method for out-of-order execution of instructions, processor and electronic equipment

Publications (2)

Publication Number Publication Date
CN110297662A CN110297662A (en) 2019-10-01
CN110297662B true CN110297662B (en) 2021-11-30

Family

ID=68030285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910600769.XA Active CN110297662B (en) 2019-07-04 2019-07-04 Method for out-of-order execution of instructions, processor and electronic equipment

Country Status (1)

Country Link
CN (1) CN110297662B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538534B (en) * 2020-04-07 2023-08-08 江南大学 Multi-instruction out-of-order transmitting method and processor based on instruction wither
CN112199118A (en) * 2020-10-13 2021-01-08 Oppo广东移动通信有限公司 Instruction merging method, out-of-order execution equipment, chip and storage medium
CN114528021B (en) * 2022-01-28 2022-11-08 中国人民解放军战略支援部队信息工程大学 Time-sharing multiplexing quantum measurement and control system and low-power-consumption high-efficiency quantum measurement and control compiling method
CN117667223B (en) * 2024-02-01 2024-04-12 上海登临科技有限公司 Data adventure solving method, computing engine, processor and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566942A (en) * 2009-06-03 2009-10-28 上海高性能集成电路设计中心 Flying scoreboard device for controlling out-order transmission in superscale microprocessor
CN103207776A (en) * 2013-03-11 2013-07-17 浙江大学 Out-of-order gene issuing processor core
CN105426160A (en) * 2015-11-10 2016-03-23 北京时代民芯科技有限公司 Instruction classified multi-emitting method based on SPRAC V8 instruction set

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5809275A (en) * 1996-03-01 1998-09-15 Hewlett-Packard Company Store-to-load hazard resolution system and method for a processor that executes instructions out of order
US7454598B2 (en) * 2005-05-16 2008-11-18 Infineon Technologies Ag Controlling out of order execution pipelines issue tagging
US8914616B2 (en) * 2011-12-02 2014-12-16 Arm Limited Exchanging physical to logical register mapping for obfuscation purpose when instruction of no operational impact is executed
US9535695B2 (en) * 2013-01-25 2017-01-03 Apple Inc. Completing load and store instructions in a weakly-ordered memory model
CN105528195B (en) * 2015-12-03 2017-12-15 上海高性能集成电路设计中心 A kind of flight scoreboard processing method for supporting simultaneous multi-threading to instruct out of order transmitting
CN105549952A (en) * 2015-12-03 2016-05-04 上海高性能集成电路设计中心 Two-stage buffer issue regulation and control device based on scoreboard principle
CN108628639B (en) * 2017-03-21 2021-02-12 华为技术有限公司 Processor and instruction scheduling method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101566942A (en) * 2009-06-03 2009-10-28 上海高性能集成电路设计中心 Flying scoreboard device for controlling out-order transmission in superscale microprocessor
CN103207776A (en) * 2013-03-11 2013-07-17 浙江大学 Out-of-order gene issuing processor core
CN105426160A (en) * 2015-11-10 2016-03-23 北京时代民芯科技有限公司 Instruction classified multi-emitting method based on SPRAC V8 instruction set

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions;Rafael Ubal et al.;《 IEEE Transactions on Parallel and Distributed Systems 》;20111025;第23卷(第8期);全文 *
流处理器MASA-I在FPGA上的实现;杨乾明 等;《计算机工程与科学》;20080331;第30卷(第3期);全文 *

Also Published As

Publication number Publication date
CN110297662A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
CN110297662B (en) Method for out-of-order execution of instructions, processor and electronic equipment
CN1294484C (en) Breaking replay dependency loops in processor using rescheduled replay queue
US10552163B2 (en) Method and apparatus for efficient scheduling for asymmetrical execution units
US7243200B2 (en) Establishing command order in an out of order DMA command queue
CN102750130B (en) Method and system for allocating counters to track mappings
US6877086B1 (en) Method and apparatus for rescheduling multiple micro-operations in a processor using a replay queue and a counter
CN108845830B (en) Execution method of one-to-one loading instruction
US20080189501A1 (en) Methods and Apparatus for Issuing Commands on a Bus
US20100161945A1 (en) Information handling system with real and virtual load/store instruction issue queue
CN101281460B (en) Method and system for procesing multi threads
CN111966406B (en) Method and device for scheduling out-of-order execution queue in out-of-order processor
EP0605866B1 (en) Method and system for enhanced instruction dispatch in a superscalar processor system utilizing independently accessed intermediate storage
CN101377736A (en) Disorder performing microcomputer and macro instruction processing method
CN109564546B (en) Tracking stores and loads by bypassing load store units
US10649780B2 (en) Data processing apparatus and method for executing a stream of instructions out of order with respect to original program order
CN115576610B (en) Instruction distribution processing method and device suitable for general sequence emission processor
KR0122527B1 (en) Method and system for nonsequential instruction dispatch and execution a superscalar processor system
US20090164734A1 (en) Multiple concurrent sync dependencies in an out-of-order store queue
US8930680B2 (en) Sync-ID for multiple concurrent sync dependencies in an out-of-order store queue
US6209073B1 (en) System and method for interlocking barrier operations in load and store queues
US7293162B2 (en) Split data-flow scheduling mechanism
CN100573489C (en) DMAC issue mechanism via streaming ID method
US20040128484A1 (en) Method and apparatus for transparent delayed write-back
US9047092B2 (en) Resource management within a load store unit
CN118276950A (en) Instruction processing method, apparatus, electronic device, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210209

Address after: 311201 No. 602-11, complex building, 1099 Qingxi 2nd Road, Hezhuang street, Qiantang New District, Hangzhou City, Zhejiang Province

Applicant after: Zhonghao Xinying (Hangzhou) Technology Co.,Ltd.

Address before: 518057 5-15, block B, building 10, science and technology ecological park, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen Xinying Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant