CN110647362A - Two-stage buffering transmitting device based on scoreboard principle - Google Patents

Two-stage buffering transmitting device based on scoreboard principle Download PDF

Info

Publication number
CN110647362A
CN110647362A CN201910858592.3A CN201910858592A CN110647362A CN 110647362 A CN110647362 A CN 110647362A CN 201910858592 A CN201910858592 A CN 201910858592A CN 110647362 A CN110647362 A CN 110647362A
Authority
CN
China
Prior art keywords
instruction
scoreboard
queue
stage
transmitting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910858592.3A
Other languages
Chinese (zh)
Other versions
CN110647362B (en
Inventor
胡向东
范好好
李俊
尹飞
王国澎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Integrated Circuits with Highperformance Center
Original Assignee
Shanghai Integrated Circuits with Highperformance Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Integrated Circuits with Highperformance Center filed Critical Shanghai Integrated Circuits with Highperformance Center
Priority to CN201910858592.3A priority Critical patent/CN110647362B/en
Publication of CN110647362A publication Critical patent/CN110647362A/en
Application granted granted Critical
Publication of CN110647362B publication Critical patent/CN110647362B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention relates to a two-stage buffering transmitting device based on a scoreboard principle, which comprises a first-stage waiting queue and a second-stage transmitting queue, wherein a guessing scoreboard is arranged between the first-stage waiting queue and the second-stage transmitting queue, a transmitting part of the second-stage transmitting queue is provided with an accurate scoreboard, and the guessing scoreboard unlocking time is the time for unlocking the guessing scoreboard after an instruction is transmitted from the first-stage waiting queue to the second-stage transmitting queue according to the period of the instruction executed by an executing part; the accurate scoreboard unlocking opportunity is the opportunity of solving the integral accurate scoreboard after the instruction is transmitted from the secondary transmission sub-queue to the execution unit according to the period of the instruction executed by the execution unit. The invention simplifies the complex transmitting selection logic, adjusts the utilization rate of the secondary queue and improves the transmitting efficiency.

Description

Two-stage buffering transmitting device based on scoreboard principle
Technical Field
The invention relates to the technical field of instruction pipeline design of superscalar microprocessors, in particular to a two-stage buffering transmitting device based on a scoreboard principle.
Background
Modern superscalar processors typically include basic pipeline stages for fetching, decoding, renaming, launching, executing, exiting, etc., and contain multiple execution units that allow multiple instructions to be executed in parallel. As a bridge connecting the instruction pipeline and the execution unit, the emission unit can judge the current running state of the processor in real time, mine instructions which can be parallel from the instruction window and dynamically schedule the instructions to the execution unit for execution. In the pipeline station before the transmitting component, the instructions enter in sequence and flow out in sequence; for the issue unit, the instructions enter in sequence and exit out of sequence.
To support dynamic instruction scheduling, scoreboard techniques are often employed in superscalar processors, the principle of which is: the status of all current operands is centrally recorded in the scoreboard status to indicate whether they are available, i.e. can be read out for use by the instruction. When an instruction is to write some operand, blocking the operand to make it unavailable; when the instruction execution is completed, the operand is unlocked, indicating that the operand is available. During the period that the operand is blocked, the instruction taking the operand as the source operand can not be transmitted to the execution unit, so as to eliminate read-after-write hazard and ensure that the instruction with data dependency is executed strictly according to the program order. And the instruction without data correlation has no relevance on the scoreboard, and can be transmitted to the execution unit out of order and executed out of order.
In particular, a scoreboard status table (hereinafter abbreviated as scoreboard status table) for maintaining data dependency between instructions in a system contains n bits of information, and collectively records whether n operands are available, i.e., blocked. Each bit corresponds to an operand, a "0" indicates that the operand is available, i.e., has been unlocked; a "1" indicates that the operand is unavailable, i.e., blocked.
Each instruction passing through the rename station has a source scoreboard of n bits, each bit corresponding to an operand. The location of "0" in the source scoreboard indicates that execution of the present instruction does not require the corresponding operand to be available, and the location of "1" indicates that execution of the present instruction requires the corresponding operand to be available. The source scoreboard for each instruction may have zero bits or multiple bits of "1" depending on the number of its source operands.
Meanwhile, each instruction passing through the rename station is provided with an n-bit target scoreboard, and each bit corresponds to an operand. The position of "0" in the target scoreboard indicates that the instruction will not modify the corresponding operand, and the position of "1" indicates that the instruction will modify the corresponding operand. The target scoreboard for each instruction may have zero bits or one bit of "1" depending on the number of its target operands.
When an instruction enters the transmitting part from a previous stage platform, if the mth bit in the target indication board is 1, the mth position of the scoreboard state table is blocked by '1'; after the instruction is transmitted to the execution unit and the execution is completed, the m-th bit of the scoreboard status table is cleared to be 0, namely, the scoreboard status table is unlocked.
For instructions entering the launch component, each cycle compares its own source scoreboard to the scoreboard status table, and as long as both appear with a "1" in the same location, it means that there is a read-after-write hazard for the data, and the instruction is not allowed to be launched.
Generally, since the instructions cached in the transmitting component have been processed by the stations such as decoding and renaming, the instructions contain more information, and the control logic such as searching and judging is more complex, during physical implementation, the instructions are densely wired, have more logic levels and longer delay, which is a difficult point of physical design. As processor frequency and transmission bandwidth increase, instruction issue logic tends to become a pipeline critical path. The transmit section design therefore requires a compromise between performance and physical implementation to achieve the best results.
Therefore, in the design of the transmitting part, it is often difficult for the conventional one-level buffer design to satisfy both the performance and timing requirements. If a two-level buffer design is used, the instructions may be stored in two buffers, respectively. On the premise that the total number of cached instructions is equivalent, compared with the condition that only one-level transmission buffer is adopted, the physical implementation difficulty can be reduced, and the frequency of the processor can be improved.
In the design of two-stage buffer emission, in order to have good performance as much as possible, the two-stage buffer function is fully exerted, the instruction with the operand not ready is always placed in the first-stage buffer close to the upstream of the pipeline as much as possible, and the instruction with the operand close to the ready is placed in the second-stage buffer close to the downstream of the pipeline, so that the resources are fully utilized, and the influence on the performance is reduced to the greatest extent. Therefore, it is important to control the timing of the transmission of instructions from the primary buffer to the secondary buffer.
Disclosure of Invention
The invention aims to solve the technical problem of providing a two-stage buffering transmitting device based on a scoreboard principle, simplifying complex transmitting selection logic, adjusting the utilization rate of a two-stage queue and improving the transmitting efficiency.
The technical scheme adopted by the invention for solving the technical problems is as follows: the two-stage buffering transmitting device based on the scoreboard principle comprises a first-stage waiting queue and a second-stage transmitting queue, wherein a guessing scoreboard is arranged between the first-stage waiting queue and the second-stage transmitting queue and used for regulating and controlling the time for transmitting all instructions from the first-stage waiting queue to the second-stage transmitting queue, and a precise scoreboard is arranged at the transmitting position of the second-stage transmitting queue and used for regulating and controlling the time for transmitting all instructions from the second-stage transmitting queue; the speculative scoreboard unlocking opportunity is an opportunity for de-speculation of a scoreboard after an instruction is transmitted from the primary waiting queue to the secondary transmitting queue according to the period of execution of the instruction in an execution unit; the accurate scoreboard unlocking opportunity is the opportunity of solving the integral accurate scoreboard after the instruction is transmitted from the secondary transmission sub-queue to the execution unit according to the period of the instruction executed by the execution unit.
Assuming that the mth bit of a target scoreboard of an instruction is an effective position and the number of execution cycles is N, immediately blocking the mth bit of the speculative scoreboard after the instruction enters the primary waiting queue from the renamed station; the instruction unlocks the mth bit of the speculative scoreboard at cycle N-2 after being launched from the secondary issue queue to an execution unit.
For a single beat instruction, unlocking the mth bit of the speculative scoreboard when an instruction is transmitted from the primary wait queue to the secondary transmit queue; for a LOAD class instruction, the number of execution cycles is considered to be the same as the number of execution cycles when hitting the primary data Cache.
When an instruction enters the primary waiting queue, if any one or more bits of the states of the speculative scoreboards corresponding to the effective positions of the source scoreboards of the instruction are found to be blocked, the instruction is prohibited from being transmitted to the secondary transmitting queue.
Assuming that the mth bit of a target scoreboard of an instruction is an effective position and the number of execution cycles is N, immediately blocking the mth bit of the accurate scoreboard after the instruction enters the primary waiting queue from the renamed station; the instruction unlocks the mth bit of the accurate scoreboard at the N-1 th cycle after being transmitted from the secondary transmission queue to the execution unit.
For a single beat instruction, unlocking the mth bit of the precision scoreboard when an instruction is transmitted from the secondary transmit queue to the execution unit; regarding the LOAD instruction, the execution period number of the LOAD instruction is considered to be the same as the execution period number when the LOAD instruction hits a primary data Cache, the LOAD instruction execution period is considered to be 4 beats, after a LOAD instruction is transmitted, the scoreboard number set by the instruction is recorded, when the LOAD instruction is transmitted, the scoreboard number set by the LOAD instruction is translated into 64 bits, the m-th bit blocking of an accurate scoreboard is removed, when the LOAD instruction is transmitted, whether the speculation is successful or not is judged according to a DCache hit signal, and if the speculation is successful; if the speculation fails, the blocking beat instruction is transmitted, and the m bit of the accurate scoreboard is blocked again to wait for the completion of the blocking beat instruction and then is unlocked.
The number of the first-stage waiting queues is 3, and the first-stage waiting queues are respectively an integer waiting queue, a floating point waiting queue and an access waiting queue; the number of the secondary transmitting queues is 9, and the secondary transmitting queues comprise 3 integer transmitting queues, 2 floating point transmitting queues, 2 access transmitting queues, 1 integer storage data transmitting queue and 1 floating point storage data transmitting queue; the instructions in the integer waiting queue are respectively sent to 3 integer transmitting queues according to the distributed assembly lines; the instructions in the floating point waiting queue are respectively sent to 2 floating point transmitting queues according to the distributed assembly lines; if the instruction in the access waiting queue is a LOAD instruction, the instruction is sent to 2 access transmitting queues according to the distributed assembly line, if the instruction is a STORE instruction, the instruction is sent to 2 access transmitting queues according to the distributed assembly line on one hand, and on the other hand, the instruction is sent to an integer storage data transmitting queue or a floating point storage data transmitting queue according to the STORE data type.
Advantageous effects
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention has no more than 3 dispatch ports of each instruction from waiting buffering to transmitting buffering, thereby greatly reducing the complexity of logic, reducing the difficulty of physical realization, improving the transmitting efficiency and being beneficial to improving the frequency of a processor. The invention sets a guess scoreboard between a first-level waiting queue and a second-level transmitting queue, controls the time when an instruction enters the second-level transmitting queue from the first-level waiting queue, uses an accurate scoreboard at the outlet of the second-level transmitting queue, so that the operand is obviously not ready to wait for transmitting in the first-level waiting queue as much as possible, the operand is close to or the ready instruction waits for transmitting in the second-level transmitting queue as much as possible, and the two scoreboards can control the time when the instruction is transmitted from the first-level buffer to the second-level buffer, adjust the utilization rate of the second-level queue and improve the.
Drawings
FIG. 1 is a schematic structural view of the present invention;
FIG. 2 is a diagram of a primary wait queue being sent to a secondary transmit queue;
FIG. 3 is a schematic diagram of a two-stage transmit queue transmission;
FIG. 4 is a schematic diagram of a speculative scoreboard being blocked;
FIG. 5 is a schematic view of the speculative scoreboard being unlocked;
FIG. 6 is a schematic diagram of the accurate scoreboard being pre-unlocked by a LOAD-like instruction;
FIG. 7 is a schematic diagram of the failure of the accurate scoreboard to be pre-unlocked by a LOAD-like instruction;
FIG. 8 is a schematic diagram of the accurate scoreboard being unlocked by a LOAD-type command.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The embodiment of the invention relates to a two-stage buffering launching device based on a scoreboard principle, which comprises a first-stage waiting queue and a second-stage launching queue, wherein a guessing scoreboard is arranged between the first-stage waiting queue and the second-stage launching queue and used for regulating and controlling the time of sending all instructions from the first-stage waiting queue to the second-stage launching queue, and a precise scoreboard is arranged at the launching position of the second-stage launching queue and used for regulating and controlling the time of accurately launching all instructions from the second-stage launching queue.
This embodiment is primarily directed to a processor that employs scoreboard technology and that requires out-of-order transmission. The first-stage buffer near the upstream of the pipeline, referred to as a first-stage wait queue in this embodiment, and the second-stage buffer near the downstream of the pipeline, referred to as a second-stage transmit queue in this embodiment.
The number of the first-stage waiting queues is 3, and the first-stage waiting queues are respectively an integer waiting queue, a floating point waiting queue and an access waiting queue; the number of the secondary transmission queues is 9, and the secondary transmission queues comprise 3 integer transmission queues, 2 floating point transmission queues, 2 access transmission queues, 1 integer storage data transmission queue and 1 floating point storage data transmission queue. The instructions in the integer waiting queue are respectively sent to 3 integer transmitting queues according to the distributed assembly lines; the instructions in the floating point waiting queue are respectively sent to 2 floating point transmitting queues according to the distributed assembly lines; if the instruction in the access waiting queue is a LOAD instruction, the instruction is sent to 2 access transmitting queues according to the distributed assembly line, if the instruction is a STORE instruction, the instruction is sent to 2 access transmitting queues according to the distributed assembly line on one hand, and on the other hand, the instruction is sent to an integer storage data transmitting queue or a floating point storage data transmitting queue according to the STORE data type.
In the embodiment, the instructions of which the upstream decoding is finished are respectively distributed into 3 first-level waiting queues according to the types of instruction integers, floating points or accesses, the instructions of which the speculative scoreboard is not untwisted are stored in the first-level waiting queues to wait for the speculative scoreboard to be untwisted, and the instructions of which the speculative scoreboard is untwisted are sent to a second-level transmitting queue (see fig. 2). The secondary issue queue stores all the instructions that the accurate scoreboard is unlocked, which can be immediately sent to the execution unit, and fig. 3 is a schematic diagram of the secondary issue queue issue.
The speculative scoreboard unblocking opportunity is an opportunity to de-speculate a scoreboard after an instruction is transmitted from the primary wait queue to the secondary transmit queue determined according to a cycle of execution of the instruction at an execution unit. The width of the presumed scoreboard in this embodiment is consistent with the width of the accurate scoreboard, and each bit has two states: a value of "0" indicates the unlocked state, and a value of "1" indicates the locked state.
Assuming that the mth bit of a target scoreboard of an instruction is an effective position and the number of execution cycles is N, immediately blocking the mth bit of the speculative scoreboard after the instruction enters the primary waiting queue from the renamed station; the instruction unlocks the mth bit of the speculative scoreboard at cycle N-2 after being launched from the secondary issue queue to an execution unit.
For a single beat instruction, the mth bit of the speculative scoreboard is unlocked when an instruction is transmitted from the primary wait queue to the secondary transmit queue.
For a LOAD class instruction, the number of execution cycles is considered to be the same as the number of execution cycles when hitting the primary data Cache.
The instruction enters a first-stage waiting queue, if any one or more bits of the states of the speculative scoreboard corresponding to the effective position of the source scoreboard of the instruction are found to be blocked, the instruction is prohibited to be transmitted to a second-stage transmitting queue, and therefore the condition that no ready instruction is waited in the first-stage waiting queue by operands can be guaranteed, and the items of the second-stage transmitting queue are not occupied.
As shown in fig. 4, the valid location of the target scoreboard for instruction j is location 2, and after the instruction j is transmitted to the primary waiting queue from rename, location 2 of the speculative scoreboard is blocked immediately according to the valid location of its target scoreboard. As shown in fig. 5, the effective position of the target scoreboard of instruction j is position 2, the number of execution cycles is N, and the position 2 of the speculative scoreboard is unlocked according to the effective position of the target scoreboard N in the nth cycle after the target scoreboard is transmitted from the primary waiting queue to the secondary transmitting queue.
The accurate scoreboard unlocking opportunity is the opportunity of solving the integral accurate scoreboard after the instruction is transmitted from the secondary transmission sub-queue to the execution unit according to the period of the instruction executed by the execution unit.
Assuming that the mth bit of a target scoreboard of an instruction is an effective position and the number of execution cycles is N, immediately blocking the mth bit of the accurate scoreboard after the instruction enters the primary waiting queue from the renamed station; the instruction unlocks the mth bit of the accurate scoreboard at the N-1 th cycle after being transmitted from the secondary transmission queue to the execution unit.
For a single beat instruction, the mth bit of the precision scoreboard is unlocked when an instruction is transmitted from the secondary issue queue to the execution unit.
For the LOAD instruction, the execution period number is considered to be the same as that when the instruction hits the primary data Cache, if the DCache is not hit, the mth bit of the accurate scoreboard is blocked again, and the instruction is unlocked after the instruction is completed. Specifically, for integer LOAD class instructions, the LOAD instruction is always speculated as hitting DCache, i.e., the LOAD instruction execution cycle is considered to be 4 beats. After a LOAD instruction is transmitted, the scoreboard number set by the instruction is recorded, when the instruction is transmitted in the 3 rd beat, the scoreboard number set by the LOAD instruction is translated into 64 bits, then the corresponding scoreboard bit is removed, and whether the speculation is successful or not is judged according to a DCache hit signal given by DBOX in the 4 th beat. If the speculation is successful, continuing the subsequent operation; if the speculation fails, the shot instruction issue is blocked and the speculatively unwrapped scoreboard bits are restored.
When the speculation is successful, the subsequent instruction can be transmitted 1 beat in advance, i.e. the scoreboard is released one beat in advance. A high DCache hit rate means a high presumed hit rate, and performance can be improved. If the LOAD instruction misses the DCache, a speculative LOAD miss is generated, in which case the issue is stalled if the instruction to be issued is associated with the speculative LOAD instruction.
As shown in fig. 6, the instruction j is a LOAD type instruction, the effective position of the target scoreboard is position 2, and when the instruction j is shot 3 after being transmitted from the secondary transmission queue to the execution unit, the position 2 of the accurate scoreboard is unlocked according to the effective position of the target scoreboard. As shown in fig. 7, the instruction j is a LOAD type instruction, the effective position of the target scoreboard is position 2, the instruction j is transmitted from the secondary transmission queue to the execution unit, then the 4 th beat receives the signal that the DCache is not hit, and the position 2 of the accurate regulation scoreboard is blocked again. As shown in fig. 8, the instruction j is a LOAD type instruction, the effective position of the target scoreboard is position 2, and when the instruction j is really completed, the position 2 of the accurate scoreboard is unlocked according to the effective position of the target scoreboard.
It is not difficult to find that the invention sets a guess scoreboard between the first-level waiting queue and the second-level transmitting queue, controls the time when the instruction enters the second-level transmitting queue from the first-level waiting queue, uses an accurate scoreboard at the outlet of the second-level transmitting queue, so that the operand obviously has no ready instruction to wait for transmitting in the first-level waiting queue as much as possible, the operand is close to or the ready instruction waits for transmitting in the second-level transmitting queue as much as possible, the two scoreboards can control the time when the instruction is transmitted from the first-level buffer to the second-level buffer, adjust the utilization rate of the second-level queue.

Claims (7)

1. A two-stage buffering launching device based on a scoreboard principle comprises a first-stage waiting queue and a second-stage launching queue, wherein a guess scoreboard is arranged between the first-stage waiting queue and the second-stage launching queue and used for regulating and controlling the time of sending all instructions from the first-stage waiting queue to the second-stage launching queue; the speculative scoreboard unlocking opportunity is an opportunity for de-speculation of a scoreboard after an instruction is transmitted from the primary waiting queue to the secondary transmitting queue according to the period of execution of the instruction in an execution unit; the accurate scoreboard unlocking opportunity is the opportunity of solving the integral accurate scoreboard after the instruction is transmitted from the secondary transmission sub-queue to the execution unit according to the period of the instruction executed by the execution unit.
2. The two-stage buffering launching device based on the scoreboard principle as claimed in claim 1, wherein assuming that the mth bit of the target scoreboard of the instruction is valid and the number of execution cycles is N, the mth bit of the speculative scoreboard is blocked immediately after the instruction enters the first-stage waiting queue from the rename station; the instruction unlocks the mth bit of the speculative scoreboard at cycle N-2 after being launched from the secondary issue queue to an execution unit.
3. A two-stage cache-launching device based on scoreboard principle as claimed in claim 2, characterized in that for a single beat instruction, the mth bit of the speculative scoreboard is unlocked when an instruction is launched from the primary wait queue to the secondary launch queue; for a LOAD class instruction, the number of execution cycles is considered to be the same as the number of execution cycles when hitting the primary data Cache.
4. A two-stage buffering transmission device based on the scoreboard principle as claimed in claim 2, wherein when an instruction enters the first-stage waiting queue, if any one or more bits of the states of the speculative scoreboard corresponding to the valid position of the source scoreboard of the instruction are found to be blocked, the instruction is prohibited from being transmitted to the second-stage transmission queue.
5. The two-stage buffering launching device based on the scoreboard principle as claimed in claim 1, wherein assuming that the mth bit of the target scoreboard of the instruction is valid and the number of execution cycles is N, the mth bit of the accurate scoreboard is blocked immediately after the instruction enters the first-stage waiting queue from the rename station; the instruction unlocks the mth bit of the accurate scoreboard at the N-1 th cycle after being transmitted from the secondary transmission queue to the execution unit.
6. The two-stage cache transmission device based on the scoreboard principle according to claim 5, wherein for a single beat instruction, the mth bit of the accurate scoreboard is unlocked when the instruction is transmitted from the two-stage transmission queue to the execution unit; regarding the LOAD instruction, the execution period number of the LOAD instruction is considered to be the same as the execution period number when the LOAD instruction hits a primary data Cache, the LOAD instruction execution period is considered to be 4 beats, after a LOAD instruction is transmitted, the scoreboard number set by the instruction is recorded, when the LOAD instruction is transmitted, the scoreboard number set by the LOAD instruction is translated into 64 bits, the m-th bit blocking of an accurate scoreboard is removed, when the LOAD instruction is transmitted, whether the speculation is successful or not is judged according to a DCache hit signal, and if the speculation is successful; if the speculation fails, the blocking beat instruction is transmitted, and the m bit of the accurate scoreboard is blocked again to wait for the completion of the blocking beat instruction and then is unlocked.
7. The two-stage buffering launching device based on the scoreboard principle as claimed in claim 1, wherein the number of the first-stage waiting queues is 3, and the first-stage waiting queues are respectively an integer waiting queue, a floating point waiting queue and an access waiting queue; the number of the secondary transmitting queues is 9, and the secondary transmitting queues comprise 3 integer transmitting queues, 2 floating point transmitting queues, 2 access transmitting queues, 1 integer storage data transmitting queue and 1 floating point storage data transmitting queue; the instructions in the integer waiting queue are respectively sent to 3 integer transmitting queues according to the distributed assembly lines; the instructions in the floating point waiting queue are respectively sent to 2 floating point transmitting queues according to the distributed assembly lines; if the instruction in the access waiting queue is a LOAD instruction, the instruction is sent to 2 access transmitting queues according to the distributed assembly line, if the instruction is a STORE instruction, the instruction is sent to 2 access transmitting queues according to the distributed assembly line on one hand, and on the other hand, the instruction is sent to an integer storage data transmitting queue or a floating point storage data transmitting queue according to the STORE data type.
CN201910858592.3A 2019-09-11 2019-09-11 Two-stage buffering transmitting device based on scoreboard principle Active CN110647362B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910858592.3A CN110647362B (en) 2019-09-11 2019-09-11 Two-stage buffering transmitting device based on scoreboard principle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910858592.3A CN110647362B (en) 2019-09-11 2019-09-11 Two-stage buffering transmitting device based on scoreboard principle

Publications (2)

Publication Number Publication Date
CN110647362A true CN110647362A (en) 2020-01-03
CN110647362B CN110647362B (en) 2023-03-31

Family

ID=68991755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910858592.3A Active CN110647362B (en) 2019-09-11 2019-09-11 Two-stage buffering transmitting device based on scoreboard principle

Country Status (1)

Country Link
CN (1) CN110647362B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117667223A (en) * 2024-02-01 2024-03-08 上海登临科技有限公司 Data adventure solving method, computing engine, processor and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02211534A (en) * 1989-02-10 1990-08-22 Mitsubishi Electric Corp Parallel processor
US5509130A (en) * 1992-04-29 1996-04-16 Sun Microsystems, Inc. Method and apparatus for grouping multiple instructions, issuing grouped instructions simultaneously, and executing grouped instructions in a pipelined processor
CN105528195A (en) * 2015-12-03 2016-04-27 上海高性能集成电路设计中心 Flying scoreboard processing method supporting out-order issue of simultaneous multithreading instructions
CN105549952A (en) * 2015-12-03 2016-05-04 上海高性能集成电路设计中心 Two-stage buffer issue regulation and control device based on scoreboard principle

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02211534A (en) * 1989-02-10 1990-08-22 Mitsubishi Electric Corp Parallel processor
US5509130A (en) * 1992-04-29 1996-04-16 Sun Microsystems, Inc. Method and apparatus for grouping multiple instructions, issuing grouped instructions simultaneously, and executing grouped instructions in a pipelined processor
CN105528195A (en) * 2015-12-03 2016-04-27 上海高性能集成电路设计中心 Flying scoreboard processing method supporting out-order issue of simultaneous multithreading instructions
CN105549952A (en) * 2015-12-03 2016-05-04 上海高性能集成电路设计中心 Two-stage buffer issue regulation and control device based on scoreboard principle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王磊: "Tomasulo算法与记分牌调度算法研究", 《自动化技术与应用》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117667223A (en) * 2024-02-01 2024-03-08 上海登临科技有限公司 Data adventure solving method, computing engine, processor and electronic equipment
CN117667223B (en) * 2024-02-01 2024-04-12 上海登临科技有限公司 Data adventure solving method, computing engine, processor and electronic equipment

Also Published As

Publication number Publication date
CN110647362B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
US7861066B2 (en) Mechanism for predicting and suppressing instruction replay in a processor
KR102601858B1 (en) Pipelined processor with multi-issue microcode unit having local branch decoder
US7685410B2 (en) Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects
US8782384B2 (en) Branch history with polymorphic indirect branch information
KR101508566B1 (en) Execute at commit state update instructions, apparatus, methods, and systems
US6094717A (en) Computer processor with a replay system having a plurality of checkers
US7809933B2 (en) System and method for optimizing branch logic for handling hard to predict indirect branches
US6279105B1 (en) Pipelined two-cycle branch target address cache
US7937574B2 (en) Precise counter hardware for microcode loops
US7032097B2 (en) Zero cycle penalty in selecting instructions in prefetch buffer in the event of a miss in the instruction cache
JP3871336B2 (en) Method, completion table and processor for tracking multiple outstanding instructions
US6212626B1 (en) Computer processor having a checker
US20050149689A1 (en) Method and apparatus for rescheduling operations in a processor
US6338133B1 (en) Measured, allocation of speculative branch instructions to processor execution units
JP2004501471A (en) A mechanism for delivering precise exceptions in out-of-order processors using speculative execution
US7725659B2 (en) Alignment of cache fetch return data relative to a thread
US20040216001A1 (en) Mechanism for avoiding check stops in speculative accesses while operating in real mode
US8799628B2 (en) Early branch determination
TWI457827B (en) Distributed dispatch with concurrent, out-of-order dispatch
US11645078B2 (en) Detecting a dynamic control flow re-convergence point for conditional branches in hardware
CN110647362B (en) Two-stage buffering transmitting device based on scoreboard principle
CN105549952A (en) Two-stage buffer issue regulation and control device based on scoreboard principle
US7328327B2 (en) Technique for reducing traffic in an instruction fetch unit of a chip multiprocessor
EP1296228B1 (en) Instruction Issue and retirement in processor having mismatched pipeline depths
CN213482862U (en) Out-of-order processor for scheduling out-of-order queues and determining queue kill

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant