EP1050808B1 - Délivrance d'instructions d'ordinateur - Google Patents

Délivrance d'instructions d'ordinateur Download PDF

Info

Publication number
EP1050808B1
EP1050808B1 EP99410058A EP99410058A EP1050808B1 EP 1050808 B1 EP1050808 B1 EP 1050808B1 EP 99410058 A EP99410058 A EP 99410058A EP 99410058 A EP99410058 A EP 99410058A EP 1050808 B1 EP1050808 B1 EP 1050808B1
Authority
EP
European Patent Office
Prior art keywords
instructions
pipelines
data
dependency
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP99410058A
Other languages
German (de)
English (en)
Other versions
EP1050808A1 (fr
Inventor
Andrew Cofler
Laurent Ducousso
Bruno Fel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
STMicroelectronics SA
Original Assignee
STMicroelectronics SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by STMicroelectronics SA filed Critical STMicroelectronics SA
Priority to EP99410058A priority Critical patent/EP1050808B1/fr
Priority to DE69938621T priority patent/DE69938621D1/de
Priority to US09/563,154 priority patent/US7281119B1/en
Priority to JP2000134618A priority patent/JP2000330790A/ja
Publication of EP1050808A1 publication Critical patent/EP1050808A1/fr
Application granted granted Critical
Publication of EP1050808B1 publication Critical patent/EP1050808B1/fr
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units

Definitions

  • the invention relates to apparatus and methods of scheduling instructions in a computer system and more particularly to a computer system arranged to operate in different instruction modes.
  • Embodiments of the invention may provide a very high rate of instruction execution while avoiding a complex programming model for the software tool chain and assembly programmer.
  • EP 0,551,090 discloses a system for processing either superscalar instructions or VLIW instructions whereby a compiler pre-arranges the VLIW instructions so as not to conflict with one another.
  • the instruction scheduling should be deterministic i.e. that the hardware does not do too much "intelligent" rescheduling of instructions which are uncontrollable (and non-deterministic) because this makes the programmer/software toolchain's task extremely difficult when the real-time nature of the application has to be respected.
  • a method of operating a computer system in which a plurality of instructions are obtained from memory, decoded, and supplied in a common machine cycle to respective parallel execution pipelines, said instructions being grouped in at least two different instruction modes, one being a superscalar mode and another being a very long instruction word (VLIW) mode; characterised in that the method includes:
  • the invention also provides a method of scheduling instructions in a computer system which method comprises supplying simultaneously a group of instructions each machine cycle for execution in parallel pipelines, decoding each instruction in the group, checking the instructions in the group to determine if any horizontal data dependencies exist between any pair of instructions in the group during execution in a respective pair of parallel pipelines, and in response to determination of such a data dependency, providing a split signal to one of the pipelines to introduce a temporary delay in one of the pair of pipelines to resolve the data dependency, said method further including selecting an instruction grouping mode from either a superscalar mode with a first predetermined number of instructions in the same group or a very long instruction word (VLIW) mode having a second larger number of instructions in the same group, providing a control signal to indicate which grouping mode is selected and using said control signal to prevent supply of a said split signal when VLIW mode is selected and to allow supply of a said split signal when superscalar mode is selected.
  • VLIW very long instruction word
  • any vertical data dependencies between instructions in successive cycles in the same pipeline are resolved by effecting the vertical dependency check within the pipeline and operating a bypass or causing a temporary stall if no bypass is available when a vertical dependency is located.
  • the dependency may be a data dependency or a guard value dependency.
  • an instruction dispatch queue is formed in each of the parallel pipelines and instructions are supplied to the instruction dispatch queue of respective pipelines after decoding the instructions.
  • said pipelines include accesses to a data memory, said pipelines including a first set of pipelines for use in executing instructions needed for memory access operations and a second set of pipelines arranged to carry out arithmetic operations, thereby providing decoupling of memory access operations from arithmetic operations.
  • a computer system comprising:
  • the invention also provides a computer system comprising a plurality of parallel execution pipelines, instruction decoding circuitry, and instruction supply circuitry for supplying simultaneously a group of instructions each machine cycle to said pipelines through the decoding circuitry, instruction grouping mode circuitry to generate a control signal indicating either instruction grouping in a superscalar mode with a first predetermined number of instructions in the same group or a very long instruction word (VLIW) having a second larger number of instructions in the same group, data dependency checking circuitry arranged to check instructions to determine if any horizontal data dependencies exist between any pair of instructions in a group during execution in a respective pair of parallel pipelines, and split signal generating circuitry responsive to said data dependency checking circuitry and to said control signal to generate a split signal for introducing a delay in one of said pair of pipelines to resolve the horizontal data dependency when in superscalar mode but preventing operation of the split signal to introduce the delay when in VLIW mode.
  • VLIW very long instruction word
  • the system includes a data manipulation unit having a plurality of parallel execution pipelines accessing a first set of registers for use in executing instructions for arithmetic operations, and an address unit having a plurality of parallel pipelines accessing a second set of registers for use in executing instructions for memory access operations, whereby the execution of instructions for memory accesses are decoupled from execution of instructions for arithmetic operations.
  • said split signal generating circuitry is operable to resolve a data dependency between two instructions entering the two pipelines of the data units simultaneously or to resolve a data dependency between two instructions entering the two address unit pipelines simultaneously.
  • the computer system of this example is arranged for the parallel execution of a plurality of instructions and is particularly suited to providing a high digital signal processing (DSP) performance.
  • Instructions are held in a program memory 11 and after passing through a control unit 12 are supplied to four parallel execution pipelines 13, 14, 15 and 16.
  • Pipelines 13 and 14 are shown as slot 0 and slot 1 of a data unit 18 arranged to execute instructions carrying out arithmetic operations.
  • Pipelines 15 and 16 are shown as slot 0 and slot 1 of an address unit 19 used to execute instructions for memory accesses to a data memory 20.
  • Slot 1 or Slot 0 of the address unit 19 may also be used to supply instructions to a general unit 21 which shares some resources with the address unit 19.
  • the general unit 21 includes a control register file and branch circuitry and is used to provide instruction branch information on line 23 to the control unit 12.
  • the two pipelines 13 and 14 in the data unit 18 share a common data register file 26 and a common guard register file 27 holding guard values which may be associated with the instructions. Guarded instruction execution has the same meaning as predicated instruction execution.
  • the two pipelines also derive instructions from a common instruction dispatch queue 29 in the data unit 18 and instructions in the queue 29 are checked for data dependency by common dependency check circuitry 39 in the data unit 18. This dependency check refers to data dependency between instructions taken off the queue 29 in successive cycles into the same pipeline and is referred to as a vertical dependency.
  • the sequence of operations in each of the pipeline stages of the data unit 18 is illustrated schematically as a first stage 30 which is a data operand fetch usually accessing one of the register files 26 or 27.
  • Two successive execution stages 31 and 32 may occur in subsequent cycles using for example ALU units 33 or a multiply and accumulate unit 34 which may form part of the pipeline.
  • the second execution stage 32 is followed by a data writeback stage 35 at which the result of an arithmetic operation is returned to the register file 26 or 27.
  • a similar pipeline exists for the two parallel pipelines of the data unit 18.
  • both pipelines 15 and 16 access a common register file 40 holding pointer values for use in load or store operations in the data memory 20.
  • the two pipelines each take their instructions from a common instruction dispatch queue 41 and a similar vertical dependency check is provided in common for both pipelines 15 and 16 through the address unit 19.
  • the vertical dependency check 42 is similar to that referred to as the vertical dependency check 39 in the data unit.
  • Each of the pipelines in the address unit has pipeline stages as illustrated. The first stage is an address operand fetch 44 followed by an execution stage 45 and an address write back stage 46.
  • bypass circuitry 47 is provided to allow bypassing of some stages of the pipeline.
  • the machine respects the program alignment as defined by the programmer/software toolchain ie if this latter has placed instructions in the program memory space aligned so as to avoid, for example a slot 0 - slot 1 horizontal dependency, then the machine (at the align stage) will always respect the alignment. eg: in GP32 programming mode
  • the Control Unit 12 includes an aligner which controls supply of instructions from a prefetch buffer to the decoder 82. In the alignment stage the aligner ensures that instruction alignment is maintained in the decoder and consequently in the microinstructions that are supplied in the same cycle to each of the execution slots.
  • Both the data unit 18 and the address unit 19 are connected to the data memory 20 through a data memory interface control 50 and a data memory controller 51.
  • the data memory interface control 50 is connected by a bidirectional bus 53 to both the data unit 18 and address unit 19.
  • the interface control 50 includes a plurality of queues each connected by a bus to the bus 53. These queues include load data queues 60 and 61 for slots 0 and 1 respectively. Queues 62 and 63 hold pointer values to be transferred to data registers for slot 0 and slot 1. Queues 64 and 65 hold data values for transfer to pointer registers for slots 0 and 1.
  • the data memory controller 51 includes store data queues 70 and store address queues 71.
  • the computer system operates access decoupling in that the memory accesses are effected independently of the arithmetic operations carried out within the data unit 18. This reduces the problem of memory latency. In a digital signal processing system which operates regular and repeated operations, the effective memory latency can be hidden from the executing program.
  • the control unit 12 shown in Figure 1 is also arranged to provide a horizontal dependency check.
  • a data dependency between instructions that are supplied to the parallel pipelines in the same machine cycle is referred to as a horizontal dependency.
  • the control unit 12 includes a program counter and address generator 80 to provide a memory address for an instruction fetch operation from the program memory 11.
  • the control unit includes an instruction mode register 81 to indicate the instruction mode in which the machine is operating at any instant.
  • the machine may operate in a selected one of a plurality of modes including superscalar modes of variable instruction bit length or in very long instruction word (VLIW). Examples of the different modes of this example are shown in Figure 5 .
  • a pair of 16 bit instructions are supplied during each machine cycle to a decoder 82 in the control unit 12. This pair is denoted as slot 0, slot 1 with a bit sequence W0, W1 etc.
  • Each bit sequence W0, W1 is issued in one machine cycle and this mode is referred to herein as GP16 mode which is a superscalar mode.
  • a second superscalar instruction mode two instructions each having a length of 32 bits are supplied to the decoder 82 in each machine cycle. In this case both bit sequences W0 and W1 are issued in cycle 0 and bit sequences W2 and W3 are issued in cycle 1.
  • This mode is referred to herein as GP32 mode.
  • a third instruction mode four instructions are formed by the bit sequences W0, W1, W2 and W3 each of 32 bits in length. These are supplied in a single cycle as a result of a single fetch operation to the decoder 82. This is referred to herein as a VLIW mode.
  • the instruction mode is not necessarily identical everywhere in the pipeline eg micro-instructions that are being executed in the DU pipelines may originate from GP32 instructions, but the decoder 82 has changed mode and is now in GP16 mode. However, the instruction mode is only important in the decoder 82; micro-instructions are independent of the instruction mode.
  • GP16 and GP32 modes have different encoding whereas a VLIW instruction is formed of four GP32 instructions and does not have different encoding.
  • VLIW mode vertical dependency checks are carried out by hardware in Figure 1 but the horizontal data dependency checks are disabled. Instructions which can be grouped together in a single word in VLIW mode are governed by specified rules of instruction compatibility. Although Figure 5 refers to slots 0-3 for VLIW mode, it will be understood that the four slots in question correspond to the two slots of the data unit and the two slots of the address unit. Consequently the grouping of the instructions within the VLIW word must always include zero, one or two instructions for the address unit and one or two instructions for the data unit.
  • One of the address unit slots of the VLIW mode may include a control instruction for use by the general unit 21.
  • the instruction mode register 81 is a two-bit register as shown in Figure 2 . These two bits indicate which of the instruction modes is being used. Both bits are set to 1 in the case of a VLIW instruction.
  • the output of this register 81 is fed to an AND gate 84 in a horizontal dependency control circuit 85.
  • the output of the gate 84 thereby indicates on line 86 whether or not the instruction is in VLIW mode.
  • dependency checking circuitry 87 When the instructions obtained by a single fetch operation in one cycle are decoded by the decoder 82, they are checked for horizontal data dependency by dependency checking circuitry 87.
  • the checker 87 provides an output to the control circuit 85 to indicate if a horizontal data dependency has been located.
  • That output from the checker 87 is fed to a selector circuit 90 which depending on the instructions which have been decoded by the decoder 82 provide either an Hdep signal on line 91 or a split signal on line 92.
  • the split signal 92 indicates that a split in the operation of a pair of parallel execution pipelines is necessary in order to resolve a horizontal dependency.
  • the split line 92 is supplied to a gate circuit 95 which also receives a signal on line 86 indicating whether or not the instruction mode is that of a VLIW instruction. If the instruction is in VLIW mode, the gate 95 disables the split output so that output 96 from the gate 95 is disabled. This only occurs in the event of a VLIW instruction mode.
  • the split may be enabled on line 96 in the event of a horizontal dependency being located with a GP16 or GP32 mode instruction.
  • Instructions from the decoder 82 are passed to a microinstruction generator 98 which generates a plurality of parallel microinstructions which are output by a dispatch circuit 99 through lines 100 to the four parallel execution pipelines 13, 14, 15 and 16.
  • a split bit will be set in the microinstruction passed to the respective execution pipeline so that the data dependency can be resolved.
  • the split bit will be detected so that the execution of two instructions where a data dependency occurs will pass through the pipelines with a split in the cycle synchronisation during sequential passage of the instructions through the pipelines.
  • the pipeline which requires delay in order to await data from a pipeline stage of another instruction will receive from the instruction dispatch queue 29 signals indicating a no operand fetch for one or more cycles of operation until the delayed instruction execution can proceed without the loss of data being available from the other pipeline at the required stage in its execution. It will therefore be seen that the use of the split signal on line 96 enables a horizontal data dependency check between parallel instructions issued in the same machine cycle to be resolved by a vertical adjustment in the timing of passage through the respective pipelines. In the absence of the split signal, microinstructions entering the two pipelines (slot 0 and slot 1) of either the data unit 18 or address unit 19 are tightly coupled so that they can only enter their respective pipelines simultaneously. The split signal decouples the two slots and allows entry into the slot 0 pipeline at least one cycle before entry is allowed in the slot 1 pipeline.
  • the split signal 96 is always disabled so that horizontal dependencies between instructions within a single VLIW word must be resolved by software techniques in formation of the acceptable VLIW instruction words.
  • the selector 90 will provide an output on line 91 indicating Hdep rather than a split signal on line 92.
  • the signal Hdep on line 91 is only provided in relation to microinstructions supplied to the data unit 18.
  • the microinstruction generator 98 will include an indication in the microinstructions that Hdep from line 91 has been set and this will be supplied to the microinstructions in the instruction dispatch queue of the data unit 18.
  • one pipeline 13 of the data unit 18 may be executing an instruction to load a value into a register in the register file 26 while the other pipeline 14 is attempting to use the value of that data register as an operand in an arithmetic operation.
  • the use of the Hdep signal in the microinstruction will indicate to the arithmetic operation attempting to locate the correct operand that it is not necessary to stall the pipeline as the value it requires as an operand will already be available from a load data queue 60.
  • the Hdep indication is supplied to the instruction dispatch queue 29 of the data unit 18 as part of the microinstructions. It is however information available for use by the execution units of the data unit and these instructions normally include guard values to determine whether or not the instructions should be executed within the data unit 18. Consequently the bypasses which are indicated as a possibility by inclusion of the Hdep signal will only be activated if the guard values of the instructions confirm that the instructions are to be executed.
  • the two illustrated DU micro-instructions have a horizontal RAW dependency on R1: split is therefore set by CU.
  • FIG. 3 An example of the bypass in such a vertical dependency is shown in Figure 3 .
  • This example illustrates a repeated multiply and accumulate operation in two MAC pipeline stages. Two source values 110 and 111 are obtained in one cycle of operation and multiplied in unit 112 during a second cycle. The output 113 is supplied to an accumulate operation 114 in cycle 3. The output 115 of the accumulate operation is fed back through a pipeline bypass to be available to MAC 2 in the next cycle.
  • the accumulate operation 114 has immediately available the result of the preceding accumulate operation without needing to go through the write back stage of the pipeline before the accumulated value is available.
  • the output 115 is therefore fed to the multiplexer 116 at the input of the accumulate operation 115 thereby providing the bypass.
  • FIG 4 shows in more detail the full bypass circuits available in the particular example of data unit 18 shown in Figure 1 .
  • pipeline 13 is represented as slot 0 or DU0 whereas as pipeline 14 is represented as slot 1(DU1).
  • each pipeline is shown with the data operand fetch stage 30 followed by the execution stages 31 and 32 and the final data write back stage 35.
  • the first execution stage 31 includes an Arithmetic and Logic Unit (ALU) 33 and a first multiply and accumulate operation (MAC) 34.
  • the second execution stage includes a second multiply and accumulate operation (MAC) 120.
  • Bypasses which exist between various stages of the same pipeline have been indicated by solid lines whereas bypasses which exist between the two pipelines 13 and 14 are indicated in broken lines.
  • FIG. 4 illustrates that each pipeline has four possible bypasses operative within the various stages of the pipeline.
  • Bypass 1 allows the result of the ALU operation to be used immediately in the operand fetch of the next cycle.
  • Bypass 2 allows the result of an MAC operation (available after two cycles of pipeline operation) to be directly used in the next cycle for an ALU operation (which will have needed to stall for one cycle) or in a next cycle for a new MAC operation without any stall.
  • Bypass 3 uses a property of the register files in that a value written into the data register file during a data write back operation is available in the data operand fetch on the same cycle.
  • Bypass 4 acts as a buffer for one cycle for the output of an ALU operation so that the write back into data register file can be done in the same pipe stage for the output of an ALU operation which required one cycle or an MAC operation which required two cycles.
  • Bypass 4 simplifies synchronisation in that all data unit operations have the same latency of 2 cycles even though it is only MAC operations which make use of the second cycle.
  • bypasses make DU0 and DU1 totally symmetrical.
  • results of different stages of execution may be supplied to an earlier stage of the same pipeline or to an appropriate stage of the parallel pipeline so as to avoid unnecessary stall operations in either of the pipelines.
  • the destination register of a load operation is visible as the source register of an arithmetic operation in the data unit.
  • the destination register of an arithmetic operation in the data unit is visible as the source register of a store operation in the data unit. Consequently in executing a plurality of instructions which are formed as part of the same VLIW word, some instructions will be executed using old register values from the preceding VLIW word whereas other instructions to which Hdep has been applied will use new values related to other instructions in the same VLIW word. This allows the correct data values to be used while minimising the extent of pipeline stalls.
  • a VLIW instruction of this type is shown in Figure 6 .
  • DR1 is a source register of the DU operation and DR4 is the destination register of the DU operation; in VLIW instr2 DR4 is the source register of the DU operation and DR3 is the destination register of the DU operation.
  • Horizontal dependency between instructions 0 and 1 may cause Hdep resulting in the data unit using the new value of DR1 in executing instruction 1.
  • Hdep will not be set for the pair of instructions 1 and 2 as two arithmetic operations will not cause the decoder 82 of Figure 2 to set Hdep. Consequently execution of instruction 2 will use the old value of DR4.
  • Hdep will be set for the horizontal dependency of instructions 2 and 3 and consequently execution of instruction 3 will use the new value of DR3.
  • Hdep may be used to resolve a dependency between two instructions entering the data unit, but it may also be used to resolve dependencies between instructions entering the address unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)

Claims (17)

  1. Procédé d'utilisation d'un système d'ordinateur dans lequel une pluralité d'instructions sont obtenues à partir d'une mémoire (11), décodées, et fournies dans un cycle machine commun à des pipelines d'exécution parallèle correspondants (D0, D1, A0, A1), lesdites instructions étant regroupées dans au moins deux modes d'instructions différents, l'un étant un mode superscalaire et l'autre un mode à mot d'instruction très long VLIW,
    caractérisé en ce que le procédé comprend:
    l'utilisation de circuits de vérification des dépendances verticales (39, 42) dans les pipelines pour réaliser et résoudre des vérifications des dépendances verticales entre toutes les instructions fournies dans des cycles machine successifs ;
    et l'utilisation de circuits de vérification des dépendances horizontales (87) pour réaliser et résoudre une vérification des dépendances horizontales entre les instructions fournies dans le même cycle machine pour les instructions en mode superscalaire, la vérification des dépendances horizontales étant désactivée pour les instructions en mode VLIW.
  2. Procédé selon la revendication 1, comprenant la planification d'instructions dans un système d'ordinateur par:
    la fourniture simultanée d'un groupe d'instructions à chaque cycle de machine en vue d'une exécution dans des pipelines parallèles (D0, D1, A0, A1) ;
    le décodage de chaque instruction du groupe ;
    la vérification des instructions du groupe pour déterminer s'il existe des dépendances horizontales de données entre une paire quelconque d'instructions dans le groupe au cours de l'exécution dans une paire correspondante de pipelines parallèles ;
    et en réponse à la détermination d'une telle dépendance de données, la production d'un signal de division vers l'un des pipelines afin d'introduire un délai temporaire dans l'un des pipelines de la paire pour résoudre la dépendance de données ;
    lequel procédé comprend en outre la sélection d'un mode de regroupement des instructions parmi un mode superscalaire avec un premier nombre prédéterminé d'instructions dans le même groupe ou un mode à mot d'instruction très long VLIW avec un deuxième nombre, plus grand, d'instructions dans le même groupe ;
    la production d'un signal de commande pour indiquer quel mode de regroupement est sélectionné ;
    et l'utilisation dudit signal de commande pour empêcher la fourniture d'un signal de division quand le mode VLIW est sélectionné et permettre la fourniture dudit signal de division quand le mode superscalaire est sélectionné.
  3. Procédé selon la revendication 2, dans lequel les dépendances de données verticales sont recherchées dans les instructions fournies dans des cycles successifs à chaque pipeline et résolues par une dérivation ou des délais temporaires dans un pipeline où une dépendance verticale est détectée.
  4. Procédé selon la revendication 3, dans lequel les dépendances de données verticales entre des instructions dans des cycles successifs sur le même pipeline sont résolues en exécutant la vérification des dépendances verticales dans le pipeline et en effectuant une dérivation ou en créant une suspension temporaire du fonctionnement du pipeline quand une dépendance de données verticale est localisée.
  5. Procédé selon l'une quelconque des revendications précédentes, dans lequel une file d'attente d'expédition des instructions (29, 41) est formée dans chacun des pipelines parallèles (D0, D1, A0, A1) et des instructions sont fournies à la file d'attente d'expédition des instructions de pipelines correspondants après le décodage des instructions.
  6. Procédé selon la revendication 5 dans lequel, après le décodage, chaque instruction est utilisée pour générer des micro-instructions requises pour chaque pipeline (D0, D1, A0, A1), lesquelles micro-instructions sont fournies à la file d'attente d'expédition des instructions (29, 41) appropriée pour chaque pipeline avec un éventuel signal de division indiquant une dépendance de données horizontale.
  7. Procédé selon l'une quelconque des revendications précédentes, dans lequel lesdits pipelines (D0, D1, A0, A1) comprennent des accès à une mémoire de données (20), lesquels pipelines comprennent un premier ensemble de pipelines (A0, A1) utilisés pour l'exécution des instructions nécessaires pour les opérations d'accès à la mémoire et un deuxième ensemble de pipelines (D0, D1) disposés pour exécuter des opérations arithmétiques, en découplant ainsi les opérations d'accès à la mémoire des opérations arithmétiques.
  8. Procédé selon la revendication 7, dans lequel deux pipelines de manipulation de données parallèles (D0, D1) sont prévus, chacun accédant à un ensemble commun de registres de données (29).
  9. Procédé selon la revendication 8, dans lequel des étages des deux pipelines de manipulation de données (D0, D1) fonctionnent de façon synchrone, sauf si un signal de division est généré pour interrompre temporairement la synchronie.
  10. Procédé selon l'une quelconque des revendications 7 à 9, dans lequel deux pipelines parallèles (A0, A1) sont prévus pour des opérations d'adressage utilisées pour les accès à la mémoire, lesdits deux pipelines accédant à un fichier de registre (40) commun pour les opérations d'accès à la mémoire.
  11. Procédé selon la revendication 10, dans lequel des instructions d'accès à la mémoire exécutées dans les deux pipelines parallèles (A0, A1) utilisés pour l'accès à la mémoire sont exécutées de manière synchrone dans les deux pipelines, sauf si le signal de division provoque une interruption temporaire de la synchronie.
  12. Procédé selon l'une quelconque des revendications précédentes, dans lequel le système d'ordinateur est utilisé comme processeur de signaux numériques et lesdits pipelines d'exécution (D0, D1, A0, A1) comprennent l'exécution d'opérations répétées de multiplication et de cumul,
  13. Système d'ordinateur comprenant:
    une pluralité de pipelines d'exécution parallèles (D0, D1, A0, A1) ;
    des circuits de décodage des instructions ;
    des circuits de fourniture d'instructions pour fournir simultanément un groupe d'instructions dans chaque cycle de machine auxdits pipelines par l'intermédiaire des circuits de décodage ; et
    des circuits de mode de regroupement des instructions pour générer un signal de commande indiquant soit un regroupement des instructions en mode superscalaire avec un premier nombre prédéterminé d'instructions dans le même groupe, soit un très long mot d'instruction VLIW ayant un deuxième nombre, plus grand, d'instructions dans le même groupe ;
    caractérisé en ce que le système d'ordinateur comprend en outre :
    des circuits de vérification des dépendances de données verticales (39, 42) dans les pipelines, conçus pour réaliser et résoudre des vérifications des dépendances verticales entre toutes les instructions fournies dans des cycles machine successifs ; et
    des circuits de vérification des dépendances de données disposés pour vérifier les instructions afin de déterminer s'il existe des dépendances de données horizontales entre toute paire d'instructions dans un groupe au cours de l'exécution dans une paire correspondante de pipelines parallèles, la vérification des dépendances horizontales étant désactivée pour les instructions en mode VLIW.
  14. Système d'ordinateur selon la revendication 13, comprenant en outre :
    des circuits pour la génération de signaux de division répondant auxdits circuits de vérification des dépendances de données et audit signal de commande pour produire un signal de division introduisant un délai dans l'un de ladite paire de pipelines afin de résoudre la dépendance de données horizontale en mode superscalaire mais empêchant le fonctionnement du signal de division introduisant le délai en mode VLIW.
  15. Système d'ordinateur selon la revendication 13 ou 14, comprenant une unité de manipulation de données (18) ayant une pluralité de pipelines d'exécution parallèles (D0, D1) qui évaluent un premier ensemble de registres utilisés pour exécuter des instructions d'opérations arithmétiques, et une unité d'adresse (19) ayant une pluralité de pipelines parallèles (A0, A1) accédant à un second ensemble de registres utilisés pour exécuter des instructions pour des opérations d'accès à la mémoire, l'exécution des instructions d'accès à la mémoire étant découplée de l'exécution des instructions d'opérations arithmétiques.
  16. Système d'ordinateur selon la revendication 15, dans lequel lesdits circuits de génération de signaux de division (85) peuvent être utilisés pour résoudre une dépendance de données entre deux instructions entrant dans les deux pipelines des unités de données (18) en même temps ou pour résoudre une dépendance de données entre deux instructions entrant dans les pipelines des deux unités d'adresse (19) en même temps.
  17. Système d'ordinateur selon l'une quelconque des revendications 13 à 16, dans lequel lesdits circuits de vérification des dépendances verticales (39, 42) comprennent un circuit de vérification des dépendances verticales dans chaque pipeline d'exécution (D0, D1, A0, A1) pour rechercher les dépendances de données verticales entre les instructions entrant dans ce pipeline dans les cycles successifs et produire un délai temporaire dans l'exécution du pipeline afin de résoudre la dépendance.
EP99410058A 1999-05-03 1999-05-03 Délivrance d'instructions d'ordinateur Expired - Lifetime EP1050808B1 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP99410058A EP1050808B1 (fr) 1999-05-03 1999-05-03 Délivrance d'instructions d'ordinateur
DE69938621T DE69938621D1 (de) 1999-05-03 1999-05-03 Befehlausgabe in einem Rechner
US09/563,154 US7281119B1 (en) 1999-05-03 2000-05-02 Selective vertical and horizontal dependency resolution via split-bit propagation in a mixed-architecture system having superscalar and VLIW modes
JP2000134618A JP2000330790A (ja) 1999-05-03 2000-05-08 コンピュータシステム動作方法、コンピュータシステムにおける命令スケジューリング方法およびコンピュータシステム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP99410058A EP1050808B1 (fr) 1999-05-03 1999-05-03 Délivrance d'instructions d'ordinateur

Publications (2)

Publication Number Publication Date
EP1050808A1 EP1050808A1 (fr) 2000-11-08
EP1050808B1 true EP1050808B1 (fr) 2008-04-30

Family

ID=8242261

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99410058A Expired - Lifetime EP1050808B1 (fr) 1999-05-03 1999-05-03 Délivrance d'instructions d'ordinateur

Country Status (4)

Country Link
US (1) US7281119B1 (fr)
EP (1) EP1050808B1 (fr)
JP (1) JP2000330790A (fr)
DE (1) DE69938621D1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI402752B (zh) * 2009-01-07 2013-07-21 Azbil Corp Information processing device, scheduler, and scheduling method

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6351802B1 (en) * 1999-12-03 2002-02-26 Intel Corporation Method and apparatus for constructing a pre-scheduled instruction cache
US6658551B1 (en) 2000-03-30 2003-12-02 Agere Systems Inc. Method and apparatus for identifying splittable packets in a multithreaded VLIW processor
WO2007143278A2 (fr) 2006-04-12 2007-12-13 Soft Machines, Inc. Appareil et procédé de traitement d'une matrice d'instruction spécifiant des opérations parallèles et dépendantes
EP2527972A3 (fr) 2006-11-14 2014-08-06 Soft Machines, Inc. Appareil et procédé de traitement de formats d'instruction complexes dans une architecture multifilière supportant plusieurs modes de commutation complexes et schémas de virtualisation
US8135975B2 (en) * 2007-03-09 2012-03-13 Analog Devices, Inc. Software programmable timing architecture
WO2009000624A1 (fr) * 2007-06-27 2008-12-31 International Business Machines Corporation Transfert de données dans un processeur
US7769987B2 (en) 2007-06-27 2010-08-03 International Business Machines Corporation Single hot forward interconnect scheme for delayed execution pipelines
US7984272B2 (en) 2007-06-27 2011-07-19 International Business Machines Corporation Design structure for single hot forward interconnect scheme for delayed execution pipelines
US7870368B2 (en) * 2008-02-19 2011-01-11 International Business Machines Corporation System and method for prioritizing branch instructions
US20090210677A1 (en) * 2008-02-19 2009-08-20 Luick David A System and Method for Optimization Within a Group Priority Issue Schema for a Cascaded Pipeline
US20090210672A1 (en) * 2008-02-19 2009-08-20 Luick David A System and Method for Resolving Issue Conflicts of Load Instructions
US8108654B2 (en) * 2008-02-19 2012-01-31 International Business Machines Corporation System and method for a group priority issue schema for a cascaded pipeline
US20090210669A1 (en) * 2008-02-19 2009-08-20 Luick David A System and Method for Prioritizing Floating-Point Instructions
US7996654B2 (en) * 2008-02-19 2011-08-09 International Business Machines Corporation System and method for optimization within a group priority issue schema for a cascaded pipeline
US7865700B2 (en) * 2008-02-19 2011-01-04 International Business Machines Corporation System and method for prioritizing store instructions
US20090210666A1 (en) * 2008-02-19 2009-08-20 Luick David A System and Method for Resolving Issue Conflicts of Load Instructions
US7877579B2 (en) * 2008-02-19 2011-01-25 International Business Machines Corporation System and method for prioritizing compare instructions
US7984270B2 (en) * 2008-02-19 2011-07-19 International Business Machines Corporation System and method for prioritizing arithmetic instructions
US7882335B2 (en) * 2008-02-19 2011-02-01 International Business Machines Corporation System and method for the scheduling of load instructions within a group priority issue schema for a cascaded pipeline
US8095779B2 (en) * 2008-02-19 2012-01-10 International Business Machines Corporation System and method for optimization within a group priority issue schema for a cascaded pipeline
JP2010257199A (ja) * 2009-04-24 2010-11-11 Renesas Electronics Corp プロセッサ及びプロセッサにおける命令発行の制御方法
GB2487684B (en) * 2009-11-16 2016-09-14 Ibm Method for scheduling plurality of computing processes including all-to-all (a2a) communication across plurality of nodes (processors) constituting network, p
CN103250131B (zh) 2010-09-17 2015-12-16 索夫特机械公司 包括用于早期远分支预测的影子缓存的单周期多分支预测
KR101966712B1 (ko) 2011-03-25 2019-04-09 인텔 코포레이션 분할가능한 엔진에 의해 인스턴스화된 가상 코어를 이용한 코드 블록의 실행을 지원하는 메모리 프래그먼트
EP2710481B1 (fr) 2011-05-20 2021-02-17 Intel Corporation Attribution décentralisée de ressources et structures d'interconnexion pour la prise en charge de l'exécution de séquences d'instructions par une pluralité de moteurs
KR101842550B1 (ko) 2011-11-22 2018-03-28 소프트 머신즈, 인크. 다중 엔진 마이크로프로세서용 가속 코드 최적화기
CN104040491B (zh) 2011-11-22 2018-06-12 英特尔公司 微处理器加速的代码优化器
US9558003B2 (en) * 2012-11-29 2017-01-31 Samsung Electronics Co., Ltd. Reconfigurable processor for parallel processing and operation method of the reconfigurable processor
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
WO2014150806A1 (fr) 2013-03-15 2014-09-25 Soft Machines, Inc. Procédé d'alimentation de structure de donnees de vues de registre au moyen d'instantanés de modèle de registre
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
WO2014150991A1 (fr) 2013-03-15 2014-09-25 Soft Machines, Inc. Procédé de mise en œuvre de structure de données de vue de registre à taille réduite dans un microprocesseur
WO2014151018A1 (fr) 2013-03-15 2014-09-25 Soft Machines, Inc. Procédé pour exécuter des instructions multi-fils groupées en blocs
KR20150130510A (ko) 2013-03-15 2015-11-23 소프트 머신즈, 인크. 네이티브 분산된 플래그 아키텍처를 이용하여 게스트 중앙 플래그 아키텍처를 에뮬레이션하는 방법
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
KR102179385B1 (ko) * 2013-11-29 2020-11-16 삼성전자주식회사 명령어를 실행하는 방법 및 프로세서, 명령어를 부호화하는 방법 및 장치 및 기록매체

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2810068B2 (ja) * 1988-11-11 1998-10-15 株式会社日立製作所 プロセッサシステム、コンピュータシステム及び命令処理方法
CA2016068C (fr) * 1989-05-24 2000-04-04 Robert W. Horst Architecture d'ordinateur pour l'emission d'instructions multiples
JP2911278B2 (ja) * 1990-11-30 1999-06-23 松下電器産業株式会社 プロセッサ
JP2874351B2 (ja) * 1991-01-23 1999-03-24 日本電気株式会社 並列パイプライン命令処理装置
US5408658A (en) * 1991-07-15 1995-04-18 International Business Machines Corporation Self-scheduling parallel computer system and method
EP0551090B1 (fr) * 1992-01-06 1999-08-04 Hitachi, Ltd. Ordinateur possédant une capacité de traitement en parallèle
JP3146707B2 (ja) * 1992-01-06 2001-03-19 株式会社日立製作所 並列演算機能を有する計算機
US5416913A (en) * 1992-07-27 1995-05-16 Intel Corporation Method and apparatus for dependency checking in a multi-pipelined microprocessor
JPH0793152A (ja) * 1993-09-20 1995-04-07 Fujitsu Ltd マイクロプロセッサ制御装置
EP0652510B1 (fr) * 1993-11-05 2000-01-26 Intergraph Corporation Architecture d'ordinateur superscalaire avec ordonnancement logiciel
WO1995022102A1 (fr) * 1994-02-08 1995-08-17 Meridian Semiconductor, Inc. Procede et appareil d'execution simultanee d'instructions dans un microprocesseur a chevauchement
US5727177A (en) * 1996-03-29 1998-03-10 Advanced Micro Devices, Inc. Reorder buffer circuit accommodating special instructions operating on odd-width results
JPH09274567A (ja) * 1996-04-08 1997-10-21 Hitachi Ltd プログラムの実行制御方法及びそのためのプロセッサ
JP3745450B2 (ja) * 1996-05-13 2006-02-15 株式会社ルネサステクノロジ 並列処理プロセッサ
US5832205A (en) * 1996-08-20 1998-11-03 Transmeta Corporation Memory controller for a microprocessor for detecting a failure of speculation on the physical nature of a component being addressed
US6170051B1 (en) * 1997-08-01 2001-01-02 Micron Technology, Inc. Apparatus and method for program level parallelism in a VLIW processor
US6076159A (en) * 1997-09-12 2000-06-13 Siemens Aktiengesellschaft Execution of a loop instructing in a loop pipeline after detection of a first occurrence of the loop instruction in an integer pipeline

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI402752B (zh) * 2009-01-07 2013-07-21 Azbil Corp Information processing device, scheduler, and scheduling method

Also Published As

Publication number Publication date
DE69938621D1 (de) 2008-06-12
EP1050808A1 (fr) 2000-11-08
JP2000330790A (ja) 2000-11-30
US7281119B1 (en) 2007-10-09

Similar Documents

Publication Publication Date Title
EP1050808B1 (fr) Délivrance d'instructions d'ordinateur
EP0111776B1 (fr) Processeur d'interruption
US5185872A (en) System for executing different cycle instructions by selectively bypassing scoreboard register and canceling the execution of conditionally issued instruction if needed resources are busy
JP2864421B2 (ja) 命令の多機能ユニットへの同時ディスパッチのための方法及び装置
US6260189B1 (en) Compiler-controlled dynamic instruction dispatch in pipelined processors
EP0150177A1 (fr) Systeme de traitement de donnees
JP2010526392A (ja) システムおよびパイプラインプロセッサにおける条件命令実行の加速のためのローカル条件コードレジスタの使用方法
US20010005882A1 (en) Circuit and method for initiating exception routines using implicit exception checking
JPH02201651A (ja) データ処理装置
JPH1021074A (ja) 割り込み制御方式、プロセッサ及び計算機システム
US6725365B1 (en) Branching in a computer system
JP2620511B2 (ja) データ・プロセッサ
EP0378415A2 (fr) Mécanisme d'aiguillage de plusieurs instructions
US11789742B2 (en) Pipeline protection for CPUs with save and restore of intermediate results
US7111152B1 (en) Computer system that operates in VLIW and superscalar modes and has selectable dependency control
US6629238B1 (en) Predicate controlled software pipelined loop processing with prediction of predicate writing and value prediction for use in subsequent iteration
US5745725A (en) Parallel instruction execution with operand availability check during execution
JP2874351B2 (ja) 並列パイプライン命令処理装置
US6401195B1 (en) Method and apparatus for replacing data in an operand latch of a pipeline stage in a processor during a stall
EP1050805B1 (fr) Transmission de valeurs de protection dans un système d' ordinateur
US6789185B1 (en) Instruction control apparatus and method using micro program
US6260133B1 (en) Processor having operating instruction which uses operation units in different pipelines simultaneously
US6721873B2 (en) Method and apparatus for improving dispersal performance in a processor through the use of no-op ports
EP1050800A1 (fr) Unité d'exécution en pipeline
CN112579168B (zh) 指令执行单元、处理器以及信号处理方法

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB IT

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17P Request for examination filed

Effective date: 20010419

AKX Designation fees paid

Free format text: DE FR GB IT

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: STMICROELECTRONICS S.A.

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB IT

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 69938621

Country of ref document: DE

Date of ref document: 20080612

Kind code of ref document: P

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20080530

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20080626

Year of fee payment: 10

ET Fr: translation filed
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20080731

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20090202

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20080430

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20090503

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20100129

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20090602

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20090503