CN111226196B - Scalable dependency matrix with one or more digest bits in an out-of-order processor - Google Patents
Scalable dependency matrix with one or more digest bits in an out-of-order processor Download PDFInfo
- Publication number
- CN111226196B CN111226196B CN201880066923.3A CN201880066923A CN111226196B CN 111226196 B CN111226196 B CN 111226196B CN 201880066923 A CN201880066923 A CN 201880066923A CN 111226196 B CN111226196 B CN 111226196B
- Authority
- CN
- China
- Prior art keywords
- instructions
- instruction
- issue queue
- dependencies
- groups
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000011159 matrix material Substances 0.000 title claims description 86
- 238000000034 method Methods 0.000 claims description 35
- 239000013598 vector Substances 0.000 description 36
- 238000010586 diagram Methods 0.000 description 33
- 238000004590 computer program Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- KHESEVVLJIFTFE-UHFFFAOYSA-N CN(C1=CC=C2C=C(C(=NC2=C1)N)C1=CN(C2=CC=CC=C12)C)C Chemical compound CN(C1=CC=C2C=C(C(=NC2=C1)N)C1=CN(C2=CC=CC=C12)C)C KHESEVVLJIFTFE-UHFFFAOYSA-N 0.000 description 7
- 230000008878 coupling Effects 0.000 description 5
- 238000010168 coupling process Methods 0.000 description 5
- 238000005859 coupling reaction Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- -1 table Substances 0.000 description 4
- 238000007667 floating Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 101150081243 STA1 gene Proteins 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
- G06F9/30189—Instruction operation extension or modification according to execution mode, e.g. mode flag
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Advance Control (AREA)
Abstract
Aspects of the invention include tracking dependencies between instructions in an issue queue. For each instruction in the issue queue, tracking includes identifying whether the instruction depends on each of a threshold number of instructions added to the issue queue prior to the instruction. Including identifying whether the instruction depends on one or more other instructions added to the issue queue before instructions that are not included in each of the threshold number of instructions. Dependencies between an instruction and each of the other instructions are tracked. Instructions are issued from the issue queue based at least in part on the tracking.
Description
Technical Field
Embodiments of the present invention relate generally to out-of-order (OoO) processors and, more particularly, relate to scalable dependency matrices having one or more digest bits in an issue queue of an out-of-order processor.
Background
In an out-of-order processor, an instruction ordering unit (ISU) dispatches instructions to various issue queues, renames registers to support out-of-order execution, issues instructions from the various issue queues to an execution pipeline, completes executed instructions, and handles exception conditions. Register renaming is typically performed by mapper logic in the ISU prior to placing instructions into respective issue queues.
The ISU includes one or more issue queues containing dependency matrices for tracking dependencies between instructions. The dependency matrix typically includes a row and a column in the issue queue for each instruction. As the number of instructions in the issue queue increases, the space and power consumption that each dependency matrix occupies increases.
Disclosure of Invention
Embodiments of the present invention include methods, systems, and computer program products for implementing a scalable dependency matrix with one or more digest bits in an issue queue of an out-of-order (OoO) processor. Non-limiting example methods include tracking dependencies between instructions in an issue queue. For each instruction in the issue queue, tracking includes identifying whether the instruction depends on each of a threshold number of instructions added to the issue queue prior to the instruction.
In one embodiment having a plurality of digest bits, tracking further includes identifying whether the instruction depends on one or more other instructions added to the issue queue prior to the instruction not included in each of the threshold number of instructions. Dependencies between the instructions and each of the other instructions are tracked in groups, the tracking indicating that dependencies between the instructions and one group exist based on identifying dependencies between the instructions and at least one of the instructions in the group. Each other instruction is assigned to at least one group. Instructions are issued from the issue queue based at least in part on the tracking.
In another embodiment with a single digest bit, dependencies between instructions and each of the threshold number of instructions are tracked separately. For each instruction in the issue queue, tracking further includes identifying whether the instruction depends on one or more other instructions added to the issue queue before instructions not included in each of the threshold number of instructions. Dependencies between instructions and other instructions are tracked as a single group, including by indicating that dependencies between instructions and other instructions exist between the single group based on identifying dependencies between instructions and at least one other instruction. A single group includes all instructions in the issue queue that are not included in the separately tracked threshold number of instructions. Instructions are issued from the issue queue based at least in part on the tracking.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention. For a better understanding of the invention with advantages and features, refer to the description and to the drawings.
Drawings
The details of the ownership described herein are particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other features and advantages of embodiments of the invention will be apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 depicts a block diagram of a system, in accordance with one or more embodiments of the present invention, the system 100 includes an instruction ordering unit (ISU) of an out-of-order processor for implementing an extensible dependency matrix with digest bits in an issue queue;
FIG. 2 depicts a block diagram of an issue queue in an ISU of an out-of-order processor in accordance with one or more embodiments of the present invention;
FIG. 3 depicts a block diagram of an issue queue in an ISU for implementing an out-of-order processor with an extensible dependency matrix of digest bits in accordance with one or more embodiments of the invention;
FIG. 4 depicts a block diagram of a logical view of an extensible dependency matrix in accordance with one or more embodiments of the present invention;
FIG. 5 depicts a block diagram of a trapezoidal dependency matrix in accordance with one or more embodiments of the present invention;
FIG. 6 depicts a block diagram of a vertically compressed ladder dependency matrix in accordance with one or more embodiments of the invention;
FIG. 7 depicts a block diagram of a horizontally compressed trapezoidal dependency matrix in accordance with one or more embodiments of the present invention;
FIG. 8 depicts a block diagram of an extensible dependency matrix that may be implemented in accordance with one or more embodiments of the invention;
FIG. 9 depicts a block diagram of a dependency matrix operating in a single-threaded (ST) mode and a simultaneous multi-threaded (SMT) mode in accordance with one or more embodiments of the present invention;
FIG. 10 depicts a block diagram of a dependency matrix operating in ST mode and SMT mode in accordance with one or more embodiments of the present invention; and
FIG. 11 is a block diagram of a computer system for implementing some or all aspects of an extensible dependency matrix with digest bits in an out-of-order processor in accordance with one or more embodiments of the invention.
The figures depicted herein are illustrative. There may be many variations to the diagrams or the operations described therein without departing from the spirit of the invention. For example, acts may be performed in a different order, or acts may be added, deleted, or modified. Also, the term "couple" and its variants describe having a communication path between two elements and do not mean that there are no intervening elements/connections between the elements. All of these variations therebetween are considered a part of the specification.
In the drawings and the following detailed description of the described embodiments, each element shown in the drawings is provided with two or three numerical reference numerals. With a few exceptions, the leftmost digit of each reference number corresponds to the first figure of the element.
Detailed Description
As described above, the number of instructions stored in the current generation issue queue continues to increase. Contemporary issue queues include dependency matrices that grow with the square of the number of instructions, and the ever-increasing size of the dependency matrices may have an impact on time at some point in time. One or more embodiments of the invention described herein provide a reduced-size dependency matrix in an issue queue of an instruction ordering unit (ISU) of an out-of-order processor. The reduction in size is based on the recognition that there is typically a dependency between instructions added to the issue queue in close temporal proximity to each other. According to one or more embodiments of the invention, dependencies of instructions on instructions within a close time range of instructions added to an issue queue are tracked separately. In accordance with one or more embodiments of the invention, the digest bits are used to track dependencies of instructions in a single group on instructions that are outside of the proximity time range of instructions added to the issue queue.
In accordance with one or more embodiments of the invention, for a processor operating in Single Threaded (ST) mode and having an issue queue containing 64 instructions (n=64), the dependency of instructions on the previous 32 instructions (N/2) may be tracked specifically (e.g., identifying certain instructions) in the issue queue. The digest bit may be used to indicate that the instruction depends on any instruction earlier than N/2. An entry (instruction) in the dependency matrix of the issue queue with the digest bit set must wait until the earliest N/2 entry is cleared to indicate that the instruction is ready to issue. The combination of precision tracking (forward N/2) and imprecise tracking (digest bits earlier than N/2) and the implementation of the issue queue as a first-in-first-out (FIFO) queue allows the dependency matrix to be scalable and logically implemented as a ladder that can be physically stored in a memory-space-saving manner. In addition, implementing the issue queue as a FIFO queue may eliminate the need to include a lifetime array (age array) in the issue queue, thereby saving additional storage space. According to one or more embodiments of the invention, the digest bit is not used when the processor is executing in Simultaneous Multithreading (SMT) mode, and the amount of precise tracking is the same as modern implementations that track all N/2-1 previous instructions belonging to a given thread.
In accordance with one or more embodiments of the invention, groups of instructions that are outside the close time range of instructions added to the issue queue (i.e., those instructions that are not specifically tracked) are broken down into one or more subgroups, each subgroup being tracked using its own digest bit. In these embodiments, multiple digest bits are utilized to track the dependency of an instruction on an instruction in an instruction group.
According to one or more embodiments of the present invention, the number of previous instructions specifically tracked is selectable/programmable and is not limited to N/2 previous instructions.
Turning now to FIG. 1, a block diagram of a system is generally depicted in accordance with one or more embodiments of the present invention, the system 100 includes an instruction ordering unit (ISU) of an out-of-order processor for implementing an extensible dependency matrix with digest bits in an issue queue. The system 100 shown in fig. 1 includes an instruction fetch unit/instruction decode unit (IFU/IDU) 106 that fetches and decodes instructions for input to a setup block 108, which setup block 108 prepares decoded instructions for input to a mapper 110 of the ISU. In accordance with one or more embodiments, IFU/IDU 106 may fetch and decode six instructions from a thread at a time. In accordance with one or more embodiments of the present invention, the six instructions sent to setup block 108 may include six non-branch instructions, five non-branch instructions and one branch instruction, or four non-branch instructions and two branch instructions. In accordance with one or more embodiments, setup block 108 checks whether there are sufficient resources in the entries in, for example, issue queues, completion tables, mappers, and register files before sending the fetched instructions to those blocks in the ISU.
The mapper 110 shown in fig. 1 maps programmer instructions (e.g., logical register names) to physical resources (e.g., physical register addresses) of a processor. The various mappers 110 shown in FIG. 1 include a Condition Register (CR) Mapper, a link/count (LNK/CNT) register Mapper, an integer exception register (XER) Mapper, a unified Mapper (UMapplet) for mapping General Purpose Registers (GPRs) and Vector Scalar Registers (VSRs), an architected Mapper (ARCH Mapper) for mapping GPRs and VSRs, and a Floating Point Status and Control Register (FPSCR) Mapper.
The output from the setup block 108 is also input to a Global Completion Table (GCT) 112 for tracking all instructions currently in the ISU. The output of the setup block 108 is also input to a dispatch unit 114 for dispatching instructions to an issue queue. The embodiment of the ISU shown in fig. 1 includes a CR issue queue-CRISQ 116 that receives and tracks instructions from a CR mapper and issues 120 to an Instruction Fetch Unit (IFU) 124 to execute CR logical instructions and move instructions. Also shown in FIG. 1 is a branch issue queue, branch ISQ 118, which receives and tracks branch instructions and LNK/CNT physical addresses from the LNK/CNT mapper. If the predicted branch address and/or direction is incorrect, branch ISQ 118 may issue 122 an instruction to IFU 124 to redirect instruction fetching.
Instruction outputs from scheduling logic and rename registers from LNK/CNT mapper, XER mapper, UMapplicator (GPR/VSR), ARCH mapper (GPR/VSR) and FPSCR mapper are input to issue queue 102. In FIG. 1, issue queue 102 tracks scheduled fixed point instructions (Fx), load instructions (L), store instructions (S) and Vector and Scaler Unit (VSU) instructions. As shown in the embodiment of FIG. 1, issue queue 102 is divided into two parts, ISQ0 1020 and ISQ1 1021, each holding N/2 instructions. When the processor is executing in ST mode, issue queue 102 may be used as a single logical issue queue containing ISQ0 1020 and ISQ1 1021 to process all instructions (in this example, all N instructions) of a single thread.
ISQ0 1020 may be used to process N/2 instructions from a first thread and ISQ1 1021 may be used to process N/2 instructions from a second thread ISQ1 1021 when the processor is executing in SMT (simultaneous multithreading) mode.
As shown in FIG. 1, issue queue 102 issues instructions to execution units 104, which are divided into two groups of execution units 1040 and 1041. The two sets of execution units 1040 and 1041 shown in FIG. 1 include complete fixed point execution units (complete FX0, complete FX 1), load execution units (LU 0, LU 1), simple fixed point, store data and store address execution units (simple FX0/STD0/STA0, simple FX1/STD1/STA 1), floating point, vector multimedia extension, decimal floating point and store data execution units (FP/VMX/DFP/STD 0, FP/VMX/DFP/STD 1). As shown in fig. 1, when the processor is executing in ST mode, the first set of execution units 1040 execute instructions issued by ISQ0 1020 and the second set of execution units 1041 execute instructions issued by ISQ1 1021. In an alternative embodiment, when the processor is executing in ST mode, instructions issued by both ISQ0 1020 and ISQ1 1021 in issue queue 102 may be issued to any execution units 1040 in first set of execution units 1040 and 1041 in the second set of execution units.
According to one or more embodiments, when the processor is executing in SMT mode, the first set of execution units 1040 execute instructions of a first thread issued by ISQ0 1020 and the second set of execution units 1041 execute instructions of a second thread issued by ISQ1 1021.
The number of entries in issue queue 102 and the size of other elements (e.g., bus width, queue size) shown in FIG. 1 are exemplary in nature, as embodiments of the invention may be implemented in a variety of different sized issue queues and other elements. According to one or more embodiments of the invention, the size is selectable or programmable.
Turning now to FIG. 2, a block diagram of a issue queue 200 in accordance with one or more embodiments of the present invention is generally shown. Issue queue 200 shown in FIG. 2 includes a matrix, table, and vector for tracking instructions waiting to issue. The matrix and table each include a corresponding row of each instruction to be tracked, and the vector includes an entry for the instruction to be tracked. As the number of instructions in the issue queue increases, the space and power consumption occupied by each matrix, table, and vector increases. By using digest bits for tracking dependencies on earlier instructions, embodiments of the invention described herein may be used to reduce the size of the dependency matrix 202 in the issue queue 200. In addition, by storing sequentially received instructions in issue queue 200, lifetime array 216 may be deleted from issue queue 200.
Issue queue 200 tracks instructions waiting for execution by an execution unit the instructions are scheduled and allocated to issue queue 200 (e.g., CR ISQ 116, branch ISQ 118, issue queue 102). When the dependency of an instruction is satisfied, i.e., when the instruction has issued and the corresponding result is available, the instruction is ready to be issued from issue queue 200. Issue queue 200 issues instructions to execution units (e.g., execution units 104). After issuing the instruction, issue queue 200 continues to track the instruction at least until the instruction passes the reject point. The reject point is different for different instructions and refers to a point where no instruction has to be reissued (e.g., in a read memory operation, a reject point may be passed once the cache is accessed for read data). Once an instruction passes the reject point, it can be released from the issue queue and the entry in the issue queue cleared for reuse by a new instruction. Once execution of the instruction by the execution unit is complete, the instruction ends.
The issue queue 200 shown in fig. 2 includes: a dependency matrix 202 for tracking dependencies between the instructions in issue queue 200; and a completion table 204 for indicating that execution of an instruction has passed the reject point and that the instruction may be released from issue queue 200; an instruction dispatch unit 206 (e.g., dispatch unit 114 of FIG. 1) for receiving instructions to be added to the issue queue; a result available vector 208 for indicating that all instructions on which the instruction depends have been issued; IV vector 214 for indicating valid and issuable instructions; AND (AND) logic 210 for logically anding the output of the dependency matrix with the IV vector; a ready vector 212 for indicating that results from all instructions on which the instruction depends are available and that the instruction is valid and issuable; a lifetime array 216 for tracking the order in which instructions enter the issue queue so that when two or more instructions are ready to execute, an earlier instruction may be selected before a newer instruction; a reset IV control 218 for updating IV status to prevent reissuing selected instructions or to allow reissuing after rejection; address 220, which serves as a read index corresponding to the instruction to be selectively issued; the data array 222 includes instruction text (e.g., opcodes, pointers to register file addresses, immediate data) used by the execution unit to execute the instructions.
As shown in the dependency matrix 202 of FIG. 2, N instructions waiting in the issue queue may be tracked, with instructions at location "u" depending on instructions at locations "v" and "w". The dependency matrix 202 shown in FIG. 2 has N rows and N columns, one for each instruction in the issue queue. As shown in age array 216 of FIG. 2, instructions at locations "j", "k", and "l" are earlier than instructions at location "i".
Turning now to FIG. 3, a block diagram of an issue queue 300 in an ISU for implementing an out-of-order processor with an extensible dependency matrix with a digest bit section 360 is generally shown in accordance with one or more embodiments of the invention. The issue queue 300 shown in FIG. 3 is similar to the issue queue 200 described above with reference to FIG. 2, except that instructions are inserted into the matrix, table, and vector in FIFO order and because instructions are inserted into the matrix, table, and vector in receive order, no logic or circuitry associated with the lifetime array 216 shown in FIG. 2 is required. Instead of age array 216 of FIG. 2, issue queue 300 shown in FIG. 3 includes priority selection logic 324 for selecting from two or more ready instructions based on relative position in issue queue 200. Thus, the dependency matrix 302, completion table 304, and vector 310, ready vector 312, address 320, and data array 322 contain entries corresponding to the order received from the instruction dispatch unit 206. In addition, a digest bit section 360 is provided for each entry in the dependency matrix 302 to track dependencies on earlier instructions. The digest bit section 360 may include one or more digest bits.
Turning now to FIG. 4, a block diagram of a logical view of an extensible dependency matrix 400 in accordance with one or more embodiments of the present invention is generally shown. The scalable dependency matrix 400 shown in FIG. 4 represents an issue queue that holds N instructions and specifically tracks the dependencies of one instruction on four other instructions (thresholds in this example). The entries in the scalable dependency matrix 400 are inserted in FIFO order and thus receive the instructions of line 5 into the issue queue, for example, immediately before the instructions of line 6.
In accordance with one or more embodiments of the invention, the scalable dependency matrix 400 may wrap around and thus receive instructions of line N-1 into the issue queue, for example, immediately prior to instructions of line 0. When wrap-around of the circular queue is used, the head and tail pointers may be used to track the earliest and latest entries and clear the digest bit as the head pointer (earliest entry) advances. As a result, all digest bits that are currently closer to the head than the threshold (e.g., 4) are cleared, as the head pointer advances only when the entry is released, which requires that the corresponding instructions have issued and passed their reject points.
In one or more exemplary embodiments, instead of using a surrounding matrix, instructions are shifted toward the top row within the queue whenever space is made available for upward shifting as the queue position is released. The content of the matrix may be shifted up and down by amounts, etc. In this case, only the open (unshaded) frame in fig. 4 is required, and the matrix takes a trapezoidal shape.
As shown in FIG. 4, dependencies between instructions and four instructions that were inserted into the issue queue immediately prior to the instructions are tracked with the dependencies being identified individually in the first portion (the specifier portion 404) of the dependency matrix 400; while the digest bit section 402 of the dependency matrix 400 is used to track dependencies between instructions and any other instructions in the issue queue.
The distinct portion 404 of the dependency matrix 400 shown in FIG. 4 may be used, for example, to determine whether instructions in line 6 depend on instructions in line 5, whether instructions in line 6 depend on instructions in line 4, whether instructions in line 6 depend on instructions in line 3, and whether instructions in line 6 depend on instructions in line 2. If the instruction in line 6 depends on the instruction in line 3 and not on the instructions in lines 5, 4 and 2 (the digest bit is not set), then only the instruction in line 3 issues and the instruction in line 6 is ready. The digest bit section 402 of the dependency matrix 400 may be used to determine whether an instruction in row 6 depends on any of the instructions in rows 0 through 1 and 7 through N-1. A digest bit may represent a dependency of row 6 on any of rows 0 through 1 and 7 through N-1, so that if the digest bit identifies a dependency, only all instructions in rows 1 and 7 through N-1 issue will instructions in row 6 issue. As described herein, the digest bit or the special bit is cleared when the corresponding instruction is released.
In accordance with one or more embodiments of the present invention, when an instruction depends on any of the 0 st through 1 st and 7 th through N-1 st instructions, the digest bit section 402 or other latch for the instruction in line 6 is set and reset when all of the 1 st and 7 th through N-1 st instructions present in the issue queue are issued, passed through the respective reject points, and released when instruction 6 is scheduled. Similarly, the corresponding bit or other latch in the specifier portion 404 is set to indicate the dependency of the instruction in line 6 on a particular previous instruction and is reset when the instruction has issued, passed the reject point, and the entry has been released. Once all dependencies of the instructions in line 6 are satisfied (e.g., the instructions have been issued), the instructions in line 6 may be issued once all the resources required are available.
The digest bit section 402 shown in fig. 4 may include a single digest bit for tracking dependencies between an instruction and any one of more than a threshold number of instructions immediately preceding the instruction. In the embodiment of the invention shown in FIG. 4, where the threshold is 4, a single digest bit is used to track instructions in the issue queue other than the four instructions that were inserted into the issue queue immediately prior to the instruction as a single group. A single group includes all instructions in the issue queue except for four instructions (threshold 4 in this example) that are inserted into the issue queue immediately before the instruction. Thus, if an instruction depends on one of the instructions in the single group, it must wait until all instructions in the single group have issued (and passed their reject points) before the dependency can be satisfied. Then, if all other dependencies are satisfied, the instruction may issue. The single digest bit is reset only if all instructions in the single group have been issued from the issue queue (and passed their reject points).
In accordance with one or more embodiments of the invention, the digest bit section 402 shown in FIG. 4 includes a plurality of digest bits for tracking dependencies between instructions and any greater than a threshold number of instructions that were inserted into the issue queue immediately prior to the instructions. In an embodiment of the present invention, a plurality of digest bits are used to track instructions in an issue queue other than a threshold number of instructions inserted into the issue queue immediately prior to the instruction as a plurality of groups. The instructions and a threshold number of each instruction in the issue queue inserted into the issue queue immediately prior to the instructions are included in at least one group. The contents of the plurality of groups may be mutually exclusive or may overlap. Using multiple digest bits for multiple groups may provide better granularity than using a single digest bit for any instruction that does not use any specific trace.
The following is a simplified example of a simplified description, where n=16, the threshold is N/2=8, and an instruction is inserted at line 15 in the issue queue including lines 0 to 15. The dependency of instructions in line 15 on eight instructions inserted into the issue queue immediately prior to instructions in line 15 (i.e., instructions in lines 14, 13, 12, 11, 10, 9, 8, and 7) is specifically tracked. Thus, it is possible to accurately determine which, if any, of the 14 th, 13 th, 12 th, 11 th, 10 th, 9 th, 8 th and 7 th lines of the issue queue the instructions in line 15 depend on. A plurality of digest bits representing dependencies on instructions in the group of instructions in lines 0, 1, 2, 3, 4, 5, and 6 may be implemented by exemplary embodiments of the invention. For example, the first group may include instructions in lines 3, 4, 5, and 6 of the issue queues, and the second group may include instructions in lines 0, 1, and 2 of the issue queues. Dependencies of instructions in line 15 on the first group of instructions may be tracked by the first digest bit, while dependencies on the second group of instructions may be tracked by the second digest bit. Alternatively, the first group may include instructions in lines 3, 4, 5 of the issue queue, the second group may include instructions in lines 6 and 2 of the issue queue, and the third group may include instructions in lines 0 and 1 of the issue queue. The first digest bit may be used to track dependencies of instructions on a first group of instructions in row 15, the second digest bit may be used to track dependencies on a second group of instructions, and the third digest bit may be used to track dependencies on a third group of instructions. In another alternative, the groups may overlap, for example, a first group may include instructions in row 0, row 1, row 2, row 3, row 4, row 5, and row 6 in the issue queue and a second group may include instructions in row 0, row 1, and row 2 in the issue queue.
The foregoing is merely exemplary in that any number of instruction sets and corresponding digest bits may be implemented by exemplary embodiments of the present invention. In general, the larger the number of group and summary bits, the higher the cost in terms of bits tracked, and the lower the cost in terms of throughput. Exemplary embodiments of the present invention may be customized to adjust granularity based on implementation requirements. These two extremes include a single digest bit that provides the coarsest granularity and a digest bit for each instruction that provides the finest granularity.
Turning now to FIG. 5, a block diagram of a ladder dependency matrix 500 in accordance with one or more embodiments of the present invention is generally shown. As shown in fig. 5, the scalable dependency matrix may result in open space in the dependency matrix, such as in the distinct portion 404 of the dependency matrix 400 shown in fig. 4. The distinct portions 508A, 508B, 508C, and 508D are contained in the discontinuous portions of the trapezoidal dependency matrix 500 shown in fig. 5. Also shown in FIG. 5 is a ready vector with an entry corresponding to each instruction, indicating whether all dependencies of the instructions have been satisfied (e.g., all specific instructions have been issued). The ready vector shown in fig. 5 is decomposed into a ready state vector 506A and a ready state vector 506B. FIG. 5 also shows an available vector with an entry corresponding to each instruction, indicating whether all results of previously issued instructions on which the instruction depends are available for the instruction. The available vectors shown in fig. 5 are decomposed into available vector 502A and available vector 502B. Other columns of bits not shown in FIG. 5 hold the digest bit, and if the digest bit is set, other logic may prevent the ready vector from indicating that the instruction is ready.
Turning now to FIG. 6, a block diagram of a vertically compressed ladder dependency matrix 600 in accordance with one or more embodiments of the invention is generally shown. As shown in fig. 6, the trapezoidal dependency matrix 500 of fig. 5 is merged in a vertical manner to remove open spaces between the distinct portions 508A, 508B, 508C, and 508D. In accordance with one or more embodiments of the present invention, vertically compressed trapezoidal dependency matrix 600 is one way to physically store scalable dependency matrices to reduce space requirements. The availability information contained in 502A is used to determine 508A whether the dependency of the bit-specific indication is satisfied and update the ready state 506A1 accordingly. While determining 508C with availability information contained in 502A whether the dependency of the bit-specific indication is satisfied and updating 506B1 ready accordingly. Similarly, availability information in 502B is used to determine whether the dependencies indicated by the bit specificities in 508B and 508D are satisfied so that ready states 506A2 and 506B2, respectively, may be updated. The corresponding units of the two components (506A 1 and 506A2 or 506B1 and 506B 2) in the ready state must indicate that a particular dependency has been satisfied to indicate that all dependencies have been satisfied.
Turning now to FIG. 7, a block diagram of a horizontally compressed ladder dependency matrix 700 in accordance with one or more embodiments of the invention is generally shown. As shown in fig. 7, the trapezoidal dependency matrix 500 of fig. 5 is merged in a horizontal manner to remove open spaces between the distinct portions 508A, 508B, 508C, and 508D. In accordance with one or more embodiments of the invention, the horizontally compressed trapezoidal dependency matrix 700 is one way to physically store scalable dependency matrices to reduce space requirements. The availability information contained in 502A1 is used to determine whether the dependency of the bit-specific indication in 508D is satisfied, while the availability information contained in 502B1 is used to determine whether the dependency of the bit-specific indication in 508C is satisfied, and the merged result is used to update the ready state vector 506A accordingly. The availability information contained in 502A2 is used to determine whether the dependency of the bit-specific indication in 508B is satisfied, while the availability information contained in 502B2 is used to determine whether the dependency of the bit-specific indication in 508A is satisfied, and the merged result is used to update the ready state vector 506B accordingly.
The vertically compressed ladder dependency matrix 600 shown in FIG. 6 or the horizontally compressed ladder dependency matrix 700 shown in FIG. 7 may be selected for implementation based on, for example, the expected type of system workflow.
Turning now to FIG. 8, a block diagram of a dependency matrix 800 in accordance with one or more embodiments of the present invention is generally shown. As shown in FIG. 8, when the processor is executing in SMT mode, a first thread uses portion 8020 of the logical dependency matrix 802 and a second thread uses portion 8021 of the logical dependency matrix 802. The logical dependency matrix 802 may be physically stored as a physical dependency matrix 804, including a matrix 8040 for storing the contents of the portion 8020, and a matrix 8041 for storing the contents of the portion 8021.
As described herein, when an out-of-order processor executes in ST mode, the entire dependency matrix in the issue queue is used to track dependencies between instructions. This is in contrast to the dependency matrix that is typically only half used when out-of-order processors are executing in SMT mode. As described herein, according to embodiments of the invention, the dependency matrix in the issue queue is reorganized such that it is logically trapezoidal. In ST mode, one entry can track dependencies directly on the previous N/2 entries, in addition to which it can track collective dependencies only through summary bits. Entries (instructions) with the digest bit set must wait until (at most) the earliest N/2 entries are cleared. The same concept can be used to specifically track fewer than N/2 entries. The exemplary embodiments of this invention may also be used when the processor is executing in SMT mode to track the full dependence (i.e., specificity) of N/2 entries in each half of the issue queue. N entries can be tracked when the processor is executing in ST mode, but the visibility to the first half of the entry is limited. Such an embodiment is illustrated in FIG. 9, which generally shows a block diagram of a dependency matrix 900 operating in ST mode and SMT mode in accordance with one or more embodiments of the present invention.
Turning now to FIG. 9, the dependency matrix 900 shown in FIG. 9 includes a first half matrix 903 and a second half matrix 904 as previously described and accurately tracks dependencies on N/2 previous instructions for each of the N instructions. As shown in the embodiment of the present invention in fig. 9, the source availability information is maintained in an available bit vector 901A, set by an available bit vector 901B after instruction issue of lines 0 to N/2-1, and the available bit vector 901B is set after instruction issue of lines N/2 to N-1. Availability information is transferred to the dependency matrices 903 and 904 through N lines on each matrix. An instruction corresponding to a row in the first half matrix 903 having dependent bits set in the lower triangle area shown in matrix 903A receives availability information directly from the bit vector 901A. Such an instruction with a dependency bit set in the upper triangle area 903B shown receives availability information through multiplexer 902. The multiplexer 902 may select between the bit vectors 901A and 901B, and in ST mode the multiplexer 902 routes the availability information from the bit vector 901B to the region 903B.
Similarly, an instruction corresponding to a row of the second half matrix 904 in the lower triangle area 904A in which a dependency bit is set receives availability information from the available bit vector 901B. Such an instruction with the dependency bit set in the upper triangle area 904B shown receives availability information from a second multiplexer within the multiplexer 902. Multiplexer 902 may select between bit vectors 901A and 901B and route information from available bit vector 901A to region 904B in ST mode. Through multiplexer 902, as described, each row of the dependency matrix receives information corresponding to the availability of results produced by instructions in the previous N/2 rows. Since instructions are added to the issue queue in FIFO order, this corresponds to the N/2 instructions previously added to the queue. The right side of fig. 9 illustrates the operation of an exemplary embodiment of the scalable dependency matrix in SMT mode. In this condition, the multiplexer 902 transfers the availability information contained in the available bit vectors 901A and 901B to the upper right triangle parts 903B and 904B, respectively. Fig. 9 also shows state vectors 905A and 905B.
Thus, an instruction corresponding to a given thread having dependency information contained in a row of the dependency matrix receives availability information corresponding to results produced by instructions in the same half of the issue queue, the results corresponding to all instructions of the same thread. Thus, the exemplary embodiment of the present invention shown in FIG. 9 provides accurate dependency tracking across all instructions of a given thread in SMT mode, while providing accurate dependency tracking across the previous N/2 instructions in ST mode, while the earlier dependencies are inaccurately tracked by the digest bits.
One or more embodiments of the invention described herein provide a scalable dependency matrix in which an N row by M column dependency matrix (e.g., m=n/2) is required, where M is less than N, without requiring a dependency matrix comprising N rows by N columns. In addition, placing instructions in the issue queue in FIFO order can save the area of the lifetime array and can improve the performance of the issue queue.
Turning now to FIG. 10, a block diagram of a dependency matrix 1000 operating in ST mode and SMT mode is generally illustrated in accordance with one or more embodiments of the present invention. The dependency matrix 10031004 shown in FIG. 10 has a different row arrangement than the dependency matrix shown in FIG. 9. With the connection topology shown in fig. 9, the desired behavior can be achieved in ST and SMT modes and simple switching between multiplexer controlled modes. The rows in fig. 9 may be arranged without changing the connection topology. One of many example arrangements of the rows of fig. 9 without altering the connection topology is shown in fig. 10.
Turning now to FIG. 11, a block diagram of a computer system 1100 for implementing some or all aspects of an extensible dependency matrix with digest bits in an out-of-order processor is shown generally in accordance with one or more embodiments of the invention. The processes described herein may be implemented in hardware, software (e.g., firmware), or a combination thereof. In an exemplary embodiment, the methods described can be implemented at least in part in hardware and can be a special purpose or general purpose computer system 1100 (e.g., a mobile device, personal computer, workstation, minicomputer, or mainframe computer).
In an exemplary embodiment, as shown in FIG. 11, computer system 1100 includes a processor 1105, a memory 1112 coupled to a memory controller 1115, and one or more input devices 1145 and/or output devices 1147, such as peripheral devices, these devices 1147 and 1145 being communicatively coupled via a local I/O controller 1135. These devices 1147 and 1145 may include, for example, printers, scanners, microphones, and the like. A conventional keyboard 1150 and mouse 1155 may be coupled to the I/O controller 1135. The I/O controller 1135 may be, for example, one or more buses or other wired or wireless connections as known in the art. The I/O controller 1135 may have additional elements omitted for simplicity, such as controllers, buffers (caches), drivers, repeaters, and receivers, to enable communications.
The I/O devices 1147, 1145 may further include devices such as both input and output, disk and tape storage, a Network Interface Card (NIC) or modulator/demodulator (for accessing other files, devices, systems, or networks), a Radio Frequency (RF) or other transceiver, a telephone interface, a bridge, a router, and the like.
The processor 1105 is a hardware device for executing hardware instructions or software, particularly stored in the memory 1112. The processor 1105 may be a custom made or commercially available processor, a Central Processing Unit (CPU), an auxiliary processor, a processor associated with the computer system 1100, a semiconductor based microprocessor (in the form of a microchip or chip set), a microprocessor, or other device for executing instructions. The processor 1105 may include caches such as, but not limited to, an instruction cache for accelerating executable instruction fetches, a data cache for accelerating data fetches and stores, and a Translation Lookaside Buffer (TLB) for acceleration. Virtual-to-physical address translation of instructions and data may be performed. Caches may be organized into a hierarchy of more cache levels (L1, L2, etc.).
The memory 1112 may include volatile storage elements (e.g., random access memory, RAM, e.g., DRAM, SRAM, SDRAM, etc.) and nonvolatile storage elements (e.g., ROM, erasable programmable read-only memory (EPROM), electronically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic tape, compact disc read-only memory (CD-ROM), magnetic disk, floppy disk, cassette, cartridge, etc.). Furthermore, memory 1112 may incorporate electronic, magnetic, optical, or other types of storage media. Note that the memory 1112 may have a distributed architecture, where various components are located remotely from each other, but are accessible by the processor 1105.
The instructions in memory 1112 may include one or more separate programs, each comprising an ordered listing of executable instructions for implementing logical functions. In the example of FIG. 11, the instructions in memory 712 include a suitable Operating System (OS) 1111. The operating system 1111 may in essence control the execution of other computer programs and provide scheduling, input-output control, file and data management, memory management, communication control, and related services.
Additional data, including, for example, instructions for the processor 1105 or other retrievable information, may be stored in the memory 1127, which may be a storage device such as a hard disk drive or a solid state drive. The instructions stored in memory 1112 or memory 1127 may include those that enable processor 1105 to perform one or more aspects of the scheduling systems and methods of the present disclosure.
The computer system 1100 may further include a display controller 1125 coupled to the display 1130. In an exemplary embodiment, computer system 1100 may further include a network interface 1160 for coupling to a network 1165. The network 1165 may be an IP-based network for communication between the computer system 1100 and external servers, clients, etc. via broadband connections. The network 1165 sends and receives data between the computer system 1100 and external systems. In an exemplary embodiment, the network 1165 may be a managed IP network managed by a service provider. The network 1165 may be implemented wirelessly (e.g., using wireless protocols and technologies such as WiFi, wiMax, etc.). The network 1165 may also be a packet-switched network such as a local area network, wide area network, metropolitan area network, regional network, the Internet, or other similar type of network environment. The network 1165 may be a fixed wireless network, a wireless Local Area Network (LAN), a wireless Wide Area Network (WAN), a Personal Area Network (PAN), a Virtual Private Network (VPN), an intranet, or other suitable network system, and may include devices for receiving and transmitting signals.
The systems and methods described herein for providing an extensible dependency matrix may be embodied in whole or in part in a computer program product or computer system 1100, as shown in FIG. 11.
Various embodiments of the present invention are described herein with reference to the associated drawings. Alternative embodiments may be devised without departing from the scope of the invention. Although various connections and positional relationships between elements (e.g., above, below, adjacent, etc.) are set forth in the following description and drawings, those skilled in the art will recognize that many of the positional relationships described herein are orientation-even if the orientation is altered-may be used independently while maintaining the described functionality. These connections and/or positional relationships may be direct or indirect, unless stated otherwise, and the invention is not intended to be limited in this regard. Thus, coupling of entities may refer to direct or indirect coupling, and the positional relationship between entities may be direct or indirect. Furthermore, the various tasks and process steps described herein may be incorporated into a more comprehensive procedure or process having additional steps or functionality not described in detail herein.
The following definitions and abbreviations will be used for the interpretation of the claims and the specification. As used herein, the terms "comprises," "comprising," "includes," "including," "having" or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a composition, mixture, process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such composition, mixture, process, method, article, or apparatus.
In addition, the term "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. The terms "at least one" and "one or more" are to be understood as including any integer greater than or equal to one, i.e. one, two, three, four, etc. The term "plurality" is to be understood as including. Any integer greater than or equal to 2, i.e., two, three, four, five, etc. The term "coupled" may include both indirect "coupling" and direct "coupling".
The terms "about," "approximately," "substantially," and variations thereof are intended to include the degree of error associated with a measurement based on a particular number of devices available at the time of filing the application. For example, "about" may include a range of + -8% or 5%, or 2% of a given value.
For the sake of brevity, conventional techniques related to the manufacture and use of the aspects of the invention may or may not be described in detail herein. In particular, it is well known to implement various aspects of the computing systems and specific computer programs of the various features described herein. Accordingly, in the interest of brevity, many conventional implementation details are only briefly mentioned or omitted herein without providing the well-known system and/or process details.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to perform aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement of the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims (31)
1. A computer-implemented method, comprising:
tracking dependencies between instructions in the issue queue, wherein tracking includes, for each instruction in the issue queue:
identifying whether the instruction depends on each of a threshold number of instructions added to an issue queue prior to the instruction, wherein dependencies between the instruction and each of the threshold number of instructions are tracked separately; and
identifying whether the instruction depends on one or more other instructions added to the issue queue before instructions that were not included in each of the threshold number of instructions, wherein dependencies between the instruction and each of the other instructions are tracked as one or more groups by indicating that dependencies exist between the instruction and one of the one or more groups based on identifying dependencies between the instruction and at least one of the other instructions or groups of the plurality of groups, the other instructions of the one group including all instructions in the issue queue that were not included in the separately tracked threshold number of instructions, and wherein each of the other instructions is assigned to at least one of the plurality of groups; and is also provided with
Instructions are issued from the issue queue based at least in part on the tracking.
2. The computer-implemented method of claim 1, comprising: identifying whether the instruction depends on one or more other instructions added to the issue queue before instructions not included in each of the threshold number of instructions, wherein dependencies between the instruction and other instructions are tracked as a single group that includes all instructions in the issue queue that are not included in the threshold number of instructions that are separately tracked by indicating that dependencies exist between the instruction and the single group of other instructions based on identifying dependencies between the instruction and at least one other instruction.
3. The computer-implemented method of claim 1, comprising: identifying whether the instruction depends on one or more other instructions that were not included in the issue queue before the instruction of each of the threshold number of instructions, wherein dependencies between the instruction and each of the other instructions are tracked as a plurality of groups, the tracking indicating that dependencies exist between the instruction and one of the plurality of groups by identifying dependencies between the instruction and at least one of the groups based on the identifying, wherein each of the other instructions is assigned to at least one of the groups.
4. The computer-implemented method of claim 1 or 2, wherein dependencies between the instruction and a single group of other instructions are tracked using a single digest bit.
5. A computer-implemented method as in claim 1 or 3, wherein a different digest bit is used for each of the plurality of groups to track dependencies between the instruction and each of the plurality of groups.
6. The computer-implemented method of claim 4, wherein a single digest bit is set to indicate that there is a dependency between the instruction and at least one instruction in the single group of other instructions, and the digest bit is reset after the single digest bit is set to indicate that there is no longer a dependency between the instruction and the group of other instructions based at least in part on detecting that all other instructions in the single group of other instructions have been issued from the issue queue.
7. The computer-implemented method of claim 5, wherein a digest bit for each of a plurality of groups is set to indicate that there is a dependency between the instruction and at least one instruction in the group, and the digest bit is reset after the digest bit is set to indicate that there is no longer a dependency between the instruction and the group based at least in part on detecting that all instructions assigned to the group have been issued from an issue queue.
8. The computer-implemented method of claim 1, wherein dependencies between the instructions and each of the threshold number of instructions are tracked with one separate bit for each of the threshold number of instructions.
9. The computer-implemented method of claim 1, wherein the issue queue is a first-in-first-out (FIFO) queue and the instructions in the issue queue are ordered based on the order in which they were added to the issue queue.
10. The computer-implemented method of claim 1, wherein the instructions in the issue queue are from a single thread executed by the out-of-order processor.
11. The computer-implemented method of claim 1, wherein an issue queue holds N instructions and the threshold number of instructions is N/2.
12. The computer-implemented method of claim 1, wherein the instructions of each of the threshold number of instructions comprise all instructions in an issue queue corresponding to a single thread in a multi-threaded environment.
13. The computer-implemented method of claim 1, wherein the threshold number is programmable.
14. A system, comprising:
dependency matrix in issue queue of out-of-order processor;
A memory having computer readable instructions; and
one or more processors for executing computer readable instructions that control the one or more processors to:
tracking dependencies between instructions in the issue queue, wherein tracking includes, for each instruction in the issue queue:
identifying whether the instruction depends on each of a threshold number of instructions added to an issue queue prior to the instruction, wherein dependencies between the instruction and each of the threshold number of instructions are tracked separately; and
identifying whether the instruction depends on one or more other instructions added to the issue queue before instructions that were not included in each of the threshold number of instructions, wherein dependencies between the instruction and each of the other instructions are tracked as one or more groups by indicating that dependencies exist between the instruction and one of the one or more groups based on identifying dependencies between the instruction and at least one of the other instructions or groups of the plurality of groups, the other instructions of the one group including all instructions in the issue queue that were not included in the separately tracked threshold number of instructions, and wherein each of the other instructions is assigned to at least one of the plurality of groups; and is also provided with
Instructions are issued from the issue queue based at least in part on the tracking.
15. The system of claim 14, comprising: identifying whether the instruction depends on one or more other instructions added to the issue queue before instructions not included in each of the threshold number of instructions, wherein dependencies between the instruction and other instructions are tracked as a single group that includes all instructions in the issue queue that are not included in the threshold number of instructions that are separately tracked by indicating that dependencies exist between the instruction and the single group of other instructions based on identifying dependencies between the instruction and at least one other instruction.
16. The system of claim 14, comprising: identifying whether the instruction depends on one or more other instructions that were not included in the issue queue before the instruction of each of the threshold number of instructions, wherein dependencies between the instruction and each of the other instructions are tracked as a plurality of groups, the tracking indicating that dependencies exist between the instruction and one of the plurality of groups by identifying dependencies between the instruction and at least one of the groups based on the identifying, wherein each of the other instructions is assigned to at least one of the groups.
17. The system of claim 14 or 15, wherein a single digest bit is used to track dependencies between the instruction and a single group of other instructions.
18. The system of claim 14 or 16, wherein a different digest bit is used for each of the plurality of groups to track dependencies between the instruction and each of the plurality of groups.
19. The system of claim 17, wherein a single digest bit is set to indicate that there is a dependency between the instruction and at least one instruction in the single set of other instructions, and the digest bit is reset after the single digest bit is set to indicate that there is no longer a dependency between the instruction and the set of other instructions based at least in part on detecting that all other instructions in the single set of other instructions have been issued from the issue queue.
20. The system of claim 18, wherein the digest bit for each of the plurality of groups is set to indicate that there is a dependency between the instruction and at least one instruction in the group, and the digest bit is reset after the digest bit is set to indicate that there is no longer a dependency between the instruction and the group based at least in part on detecting that all instructions assigned to the group have been issued from the issue queue.
21. The system of claim 14, wherein a separate bit is used for each of the threshold number of instructions to track dependencies between the instructions and each of the threshold number of instructions.
22. The system of claim 14, wherein the issue queue is a first-in-first-out (FIFO) queue and the instructions in the issue queue are ordered based on the order in which they were added to the issue queue.
23. The system of claim 14, wherein the instructions in the issue queue are from a single thread executed by an out-of-order processor.
24. The system of claim 14, wherein an issue queue holds N instructions and the threshold number of instructions is N/2.
25. The system of claim 14, wherein the instructions of each of the threshold number of instructions comprise all instructions in an issue queue corresponding to a single thread in a multi-threaded environment.
26. The system of claim 14, wherein the threshold number is programmable.
27. A computer readable storage medium having program instructions embodied thereon, the program instructions being executable by a processor to cause the processor to:
Tracking dependencies between instructions in the issue queue, wherein tracking includes, for each instruction in the issue queue:
identifying whether the instruction depends on each of a threshold number of instructions added to an issue queue prior to the instruction, wherein dependencies between the instruction and each of the threshold number of instructions are tracked separately; and
identifying whether the instruction depends on one or more other instructions added to the issue queue before instructions that were not included in each of the threshold number of instructions, wherein dependencies between the instruction and each of the other instructions are tracked as one or more groups by indicating that dependencies exist between the instruction and one of the one or more groups based on identifying dependencies between the instruction and at least one of the other instructions or groups of the plurality of groups, the other instructions of the one group including all instructions in the issue queue that were not included in the separately tracked threshold number of instructions, and wherein each of the other instructions is assigned to at least one of the plurality of groups; and is also provided with
Instructions are issued from the issue queue based at least in part on the tracking.
28. The computer-readable storage medium of claim 27, comprising: identifying whether the instruction depends on one or more other instructions added to the issue queue before instructions not included in each of the threshold number of instructions, wherein dependencies between the instruction and other instructions are tracked as a single group that includes all instructions in the issue queue that are not included in the threshold number of instructions that are separately tracked by indicating that dependencies exist between the instruction and the single group of other instructions based on identifying dependencies between the instruction and at least one other instruction.
29. The computer-readable storage medium of claim 27, comprising: identifying whether the instruction depends on one or more other instructions that were not included in the issue queue before the instruction of each of the threshold number of instructions, wherein the dependencies between the instruction and each of the other instructions are tracked as a plurality of groups, the tracking indicating that there is a dependency between the instruction and one of the plurality of groups based on identifying the dependency between the instruction and at least one of the groups.
30. The computer-readable storage medium of claim 27 or 28, wherein dependencies between the instructions and a single group of other instructions are tracked using a single digest bit;
setting a single digest bit to indicate that there is a dependency between the instruction and at least one instruction in a single group of other instructions, an
The digest bit is reset after the single digest bit is set to indicate that there is no longer a dependency between the instruction and the group of other instructions based at least in part on detecting that all other instructions in the single group of other instructions have been issued from the issue queue.
31. The computer-readable storage medium of claim 27 or 29, wherein each of the other instructions is assigned to at least one group, wherein a different digest bit is used for each of the plurality of groups to track dependencies between the instructions and each of the plurality of groups;
setting a digest bit for each of the plurality of groups to indicate that there is a dependency between the instruction and at least one instruction in the group, and
the digest bit is reset after the digest bit is set to indicate that there is no longer a dependency between the instruction and the group based at least in part on detecting that all instructions assigned to the group have been issued from the issue queue.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/826,746 US10564976B2 (en) | 2017-11-30 | 2017-11-30 | Scalable dependency matrix with multiple summary bits in an out-of-order processor |
US15/826,734 US10929140B2 (en) | 2017-11-30 | 2017-11-30 | Scalable dependency matrix with a single summary bit in an out-of-order processor |
US15/826,746 | 2017-11-30 | ||
US15/826,734 | 2017-11-30 | ||
PCT/IB2018/058801 WO2019106462A1 (en) | 2017-11-30 | 2018-11-09 | Scalable dependency matrix with one or a plurality of summary bits in an out-of-order processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111226196A CN111226196A (en) | 2020-06-02 |
CN111226196B true CN111226196B (en) | 2023-12-01 |
Family
ID=66665478
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201880066923.3A Active CN111226196B (en) | 2017-11-30 | 2018-11-09 | Scalable dependency matrix with one or more digest bits in an out-of-order processor |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP7403450B2 (en) |
CN (1) | CN111226196B (en) |
DE (1) | DE112018006103B4 (en) |
GB (1) | GB2581945B (en) |
WO (1) | WO2019106462A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114327643B (en) * | 2022-03-11 | 2022-06-21 | 上海聪链信息科技有限公司 | Machine instruction preprocessing method, electronic device and computer-readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6988183B1 (en) * | 1998-06-26 | 2006-01-17 | Derek Chi-Lan Wong | Methods for increasing instruction-level parallelism in microprocessors and digital system |
CN101008928A (en) * | 2006-01-26 | 2007-08-01 | 国际商业机器公司 | Method and apparatus for tracking command order dependencies |
CN101034345A (en) * | 2007-04-16 | 2007-09-12 | 中国人民解放军国防科学技术大学 | Control method for data stream and instruction stream in stream processor |
CN102362257A (en) * | 2009-03-24 | 2012-02-22 | 国际商业机器公司 | Tracking deallocated load instructions using a dependence matrix |
CN104040492A (en) * | 2011-11-22 | 2014-09-10 | 索夫特机械公司 | Microprocessor accelerated code optimizer and dependency reordering method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6463523B1 (en) * | 1999-02-01 | 2002-10-08 | Compaq Information Technologies Group, L.P. | Method and apparatus for delaying the execution of dependent loads |
US8255671B2 (en) | 2008-12-18 | 2012-08-28 | Apple Inc. | Processor employing split scheduler in which near, low latency operation dependencies are tracked separate from other operation dependencies |
CN102360309B (en) * | 2011-09-29 | 2013-12-18 | 中国科学技术大学苏州研究院 | Scheduling system and scheduling execution method of multi-core heterogeneous system on chip |
US10235180B2 (en) | 2012-12-21 | 2019-03-19 | Intel Corporation | Scheduler implementing dependency matrix having restricted entries |
JP6520416B2 (en) | 2015-06-02 | 2019-05-29 | 富士通株式会社 | Arithmetic processing apparatus and processing method of arithmetic processing apparatus |
US10108417B2 (en) | 2015-08-14 | 2018-10-23 | Qualcomm Incorporated | Storing narrow produced values for instruction operands directly in a register map in an out-of-order processor |
-
2018
- 2018-11-09 DE DE112018006103.5T patent/DE112018006103B4/en active Active
- 2018-11-09 GB GB2009499.1A patent/GB2581945B/en active Active
- 2018-11-09 CN CN201880066923.3A patent/CN111226196B/en active Active
- 2018-11-09 WO PCT/IB2018/058801 patent/WO2019106462A1/en active Application Filing
- 2018-11-09 JP JP2020527796A patent/JP7403450B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6988183B1 (en) * | 1998-06-26 | 2006-01-17 | Derek Chi-Lan Wong | Methods for increasing instruction-level parallelism in microprocessors and digital system |
CN101008928A (en) * | 2006-01-26 | 2007-08-01 | 国际商业机器公司 | Method and apparatus for tracking command order dependencies |
CN101034345A (en) * | 2007-04-16 | 2007-09-12 | 中国人民解放军国防科学技术大学 | Control method for data stream and instruction stream in stream processor |
CN102362257A (en) * | 2009-03-24 | 2012-02-22 | 国际商业机器公司 | Tracking deallocated load instructions using a dependence matrix |
CN104040492A (en) * | 2011-11-22 | 2014-09-10 | 索夫特机械公司 | Microprocessor accelerated code optimizer and dependency reordering method |
Also Published As
Publication number | Publication date |
---|---|
GB2581945B (en) | 2021-01-20 |
GB202009499D0 (en) | 2020-08-05 |
GB2581945A (en) | 2020-09-02 |
WO2019106462A1 (en) | 2019-06-06 |
JP7403450B2 (en) | 2023-12-22 |
DE112018006103T5 (en) | 2020-09-17 |
DE112018006103B4 (en) | 2022-04-21 |
CN111226196A (en) | 2020-06-02 |
JP2021504791A (en) | 2021-02-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10884753B2 (en) | Issue queue with dynamic shifting between ports | |
US10901744B2 (en) | Buffered instruction dispatching to an issue queue | |
US10802829B2 (en) | Scalable dependency matrix with wake-up columns for long latency instructions in an out-of-order processor | |
US11204772B2 (en) | Coalescing global completion table entries in an out-of-order processor | |
US10564976B2 (en) | Scalable dependency matrix with multiple summary bits in an out-of-order processor | |
US10942747B2 (en) | Head and tail pointer manipulation in a first-in-first-out issue queue | |
CN111213124B (en) | Global completion table entry to complete merging in out-of-order processor | |
US10922087B2 (en) | Block based allocation and deallocation of issue queue entries | |
US10929140B2 (en) | Scalable dependency matrix with a single summary bit in an out-of-order processor | |
US10209757B2 (en) | Reducing power consumption in a multi-slice computer processor | |
US10970079B2 (en) | Parallel dispatching of multi-operation instructions in a multi-slice computer processor | |
US9652246B1 (en) | Banked physical register data flow architecture in out-of-order processors | |
US20190087195A1 (en) | Allocating and deallocating reorder queue entries for an out-of-order processor | |
CN111226196B (en) | Scalable dependency matrix with one or more digest bits in an out-of-order processor | |
US10705847B2 (en) | Wide vector execution in single thread mode for an out-of-order processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |