EP1290548A2 - Synchronisation d'instructions partiellement executees en pipeline dans des processeurs vlim - Google Patents

Synchronisation d'instructions partiellement executees en pipeline dans des processeurs vlim

Info

Publication number
EP1290548A2
EP1290548A2 EP01938991A EP01938991A EP1290548A2 EP 1290548 A2 EP1290548 A2 EP 1290548A2 EP 01938991 A EP01938991 A EP 01938991A EP 01938991 A EP01938991 A EP 01938991A EP 1290548 A2 EP1290548 A2 EP 1290548A2
Authority
EP
European Patent Office
Prior art keywords
pipeline
subcommand
subcommands
execution
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01938991A
Other languages
German (de)
English (en)
Inventor
Marc Tremblay
Sharada Yeluri
Jeffrey Meng Wah Chan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Microsystems Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Publication of EP1290548A2 publication Critical patent/EP1290548A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines

Definitions

  • the invention relates to the field of very-long instruction word (VLIW) processor architecture.
  • the invention relates to synchronization of subcommands in the pipelines of VLIW machines.
  • a pipeline is generally a hardware execution unit that provides multiple stages of execution, each stage of execution occupying one or more clock cycles. Further, there may be several instructions, each at a different stage of the multiple stages of execution, executing simultaneously. Modern computing machines often possess two or more execution unit pipelines. Each of these pipelines provides multiple stages of execution. For example, a processor may have an integer execution pipeline, for executing integer subcommands and a floating point execution pipeline, for executing floating point subcommands. Often, it is possible for an integer execution pipeline and a floating-point execution pipeline of a machine to be executing one or more stages of subcommands simultaneously.
  • a late stage often found in typical execution pipelines is a "trap" stage. The trap stage is where it is determined whether a processor exception (including any of the trap conditions in the IEEE 754 specification) is to occur, or an interrupt is to suspend execution of a currently executing instruction sequence .
  • traps When a trap or interrupt, occurs, it is often necessary to determine the state of the processor at the time the trap or interrupt occurred. It is preferable that traps be handled precisely, such that they can be easily diagnosed, possible corrections made, and execution resumed. If traps are to be handled precisely, it is desirable that no prior-state data be overwritten by any current or later instruction before all pipelines executing the current instruction reach the trap stage, when it can be determined if a trap is required. It is believed by many engineers that a processor having a large number of pipelines capable of execution in parallel can provide better overall performance than one that has fewer pipelines - provided that instructions can be decoded, operands fetched, and these fed to the pipelines in parallel and at a high rate.
  • processors Unfortunately, many processors digest binary instruction languages that have no inherent parallelism. For processors that execute these binary languages to execute instructions in parallel, they must parse their instruction sequence and discover which instructions can be executed in parallel, a non-trivial task. Further, this parsing for potential parallelism is done at execution time, and therefore must be done quickly by very complex hardware.
  • VLIW processors normally execute a binary instruction language that has explicit parallelism, where each instruction may incorporate subcommands for simultaneous execution in parallel in separate pipelines. This generally requires more bits per binary instruction word than required on conventional processors because each subcommand requires a bit field in the instruction word, these instruction words therefore become very long, hence the term Very Long Instruction Word.
  • VLIW processors allow separation of the parsing for potential parallelism from execution; this parsing may therefore occur in a separate instruction translation unit, or may be done at compile time. Parsing at compile time for potential parallelism has the advantage that it can permit use of simpler hardware than otherwise required for the same high performance .
  • processors have pipelines that do not execute all instructions in exactly the same number of clock cycles. It is known that division generally requires more clock cycles than does multiplication. Further, it is known that integer operations are much simpler to execute than are floating point operation, hence floating point pipelines generally require more clock cycles to perform an addition than do integer pipelines.
  • the number of clock cycles a pipeline takes to execute an instruction is the latency of the pipeline. While extra stages. may be added to integer pipelines so that they have the same latency as a floating point pipeline of the same machine, and extra stages may be added to a floating point pipeline so that all instructions have the same latency, this is known to be inefficient. It is desirable that subcommands executing in a pipeline complete as early as possible unless a dependency requires that they wait for another pipeline to complete execution. Many modern processors, including VLIW processors, are optimized for execution of thirty-two- bit data.
  • sixty-four-bit-operand instructions such as double-precision floating point instructions as described in the IEEE-754 floating point specification
  • a sixty-four-bit multiply may be executed in a thirty-two-bit array multiplier stage of an integer pipeline by making four passes through the array multiplier, while a thirty-two-bit multiply requires only one pass through the array multiplier.
  • execution of a sixty-four-bit instruction in a pipeline may be controlled by passing a base instruction into the pipeline, followed by a helper instruction.
  • a base instruction may, for example, process the low half of a sixty-four-bit addition, saving the carry output from the addition.
  • a following helper instruction for the pipeline may then process the high half of the sixty-four-bit addition, injecting the saved carry during the addition .
  • VLIW processors provide for parallel execution of subcommands of instructions in multiple, often dissimilar, pipelines. These subcommands tend to complete at slightly different times, especially if some are thirty- two-bit and some are sixty-four-bit and the pipelines are optimized for thirty-two-bit data. Worse, a given pipeline may complete some subcommands substantially more quickly than others. Hence, a given pipeline may have a latency that varies from instruction to instruction.
  • Figure 1 is a block diagram of a computer system having a VLIW processor
  • Figure 2 a block diagram of a prior-art VLIW processor having several pipelines that does not enforce precise traps;
  • Figure 3 a timing diagram of data flow in the pipelines of a VLIW processor, showing how subcommands tend to reach the trap stage at different times if no stalls are injected;
  • FIG. 4 a block diagram of a VLIW processor incorporating the present invention
  • Figure 5 a timing diagram of data flow in the pipelines of the preferred embodiment of a VLIW processor of the present invention, showing the stall states required to align the trap stages of several pipelines after the execution of the instruction; and
  • Figure 6 a timing diagram of data flow in the pipelines of a VLIW processor of the present invention, showing the stall states required to align the trap stages of several pipelines during execution of the instruction.
  • a computer system has at least one processor 100 (Figure 1) having internal first level cache.
  • the system also has a second level cache 101 and may, but need not, have a third level cache 102.
  • There may, but need not, be an additional processor 105, having its own second level and optional third level caches (not shown) .
  • References by the processor 100 that are not satisfied from cache are directed over a high speed local bus 106 to a main memory 107 or through a bus bridge 108 to a system bus 109, which is preferably a PCI bus .
  • Attached to the PCI bus is a storage controller 115, typically of the Ultra Wide SCSI type, for connection to one or more storage subsystems 116.
  • the storage subsystems 116 typically include a CD reader and/or writer and a disk drive; multiple disk drives may be utilized, as may other peripherals, like RAID storage systems and tape drives.
  • Many computer systems also have a video display subsystem 118, a network interface 120, a USB (universal serial bus) interface 122, as well as keyboard, mouse, serial, printer, and floppy disk ports 124.
  • the first level cache of the processor 100 may be implemented as separate instruction cache 126 and data cache 128; alternatively these may be combined into a single fast combined cache.
  • the processor 100 of the computer system may be a VLIW processor.
  • instructions from the instruction cache 126 are aligned by an instruction aligner 200 ( Figure 2) , and buffered in an instruction buffer 202. Instructions are then processed by an instruction decoder and dispatcher 204 and dispatched to the various execution pipelines 206, 208, and 210 of the processor.
  • figure 2 illustrates three pipelines of a machine that may have more than three pipelines.
  • the pipelines illustrated have operand fetch stages 212, 214, and 216 that are connected to and may fetch operands from a register file 218, and operand store stages 220, 222, and 224 that are connected to and may store results to the register file 218.
  • the processor also has a load/store unit 226 for transferring between the data cache 128 and register file 218.
  • a VLIW instruction dispatches a thirty- two-bit subtract subcommand to one pipeline, and sixty-four-bit add subcommand to another pipeline. Further, assume, as in the preferred embodiment of the present invention, that sixty-four-bit operations are performed through execution of a sequence of thirty- two-bit sub-operations. As illustrated in Figure 3, the sixty-four-bit subcommand then requires a pair of fetch operations 300 to fetch the operands, each taking a cycle in the fetch stage of the pipeline upon which it executes, and a pair of operate cycles 301, while the thirty-two-bit subcommand requires fewer cycles 302 to fetch operands and operate 303 upon the operands because the thirty-two-bit operands match the width of the datapath of the pipeline.
  • Stall stage 307 may take the form of a recycling of the operation within a stage of the pipeline, assuming that that stage has adequate storage in a recycle buffer for the longer operands, intermediate or final results. Therefore, if no stall is injected into the thirty- two-bit subcommand, the sixty-four-bit subcommand will reach its trap cycle 305 after the thirty-two-bit subcommand reaches its trap cycle 306.
  • instructions are received from the instruction cache 126 into an instruction aligner 400 ( Figure 4) and instruction buffer 402, and processed by an instruction decoder and dispatcher 404.
  • a helper subcommand inserter 406 which may be a part of the instruction decoder and dispatcher 404, inserts helper subcommands as required for proper execution of any sixty-four-bit or other subcommands that require additional time in execution.
  • a sixty- four-bit addition is executed in a first pipeline 410 of the processor, the first pipeline being based upon thirty-two-bit data paths.
  • a fetch stage 412 of the pipeline therefore fetches the low halves of the operands in a cycle 500 (figure 5) , the fetch stage 412 fetches the high halves of the operands in a following cycle 501, while the operate stage 414 executes the addition on the low halves of the operands 502.
  • the helper subcommand executes 504 in the operate stage 414
  • the results of the low half of the operands is held 506 in a recycle buffer 416 of the operate stage, or in a stall stage of the pipeline.
  • any trap conditions are resolved, and in the next cycles 512 and 514 the results of the operation are stored in the register file 420 by storage stage 418 of the pipeline.
  • the helper subcommand inserter, 406 When a sixty-four-bit subcommand is processed that requires one cycle of operation before the trap stage beyond the timing of a thirty-two-bit subcommand from the same VLIW instruction word, the helper subcommand inserter, 406, also inserts a NOP, or no- operation, helper subcommand into the instruction stream flow to the second pipeline 425 in which that thirty-two-bit subcommand executes, and marks that subcommand as having a stall.
  • NOP helper subcommand also known as a helper stall subcommand, is inserted after the subcommand dispatched to a pipeline, and causes data at the stage prior to the trap stage of the associated pipeline to remain unchanged, or be recycled, for one cycle.
  • This NOP helper subcommand is not injected if the first pipeline 410 receives a thirty-two-bit subcommand having similar timing to that intended for the second pipeline 425. That thirty-two-bit subcommand therefore executes with a fetch in the first cycle 530 and a NOP in the second cycle 532 at the fetch stage.
  • the operate stage executes the operation 534, and in the third cycle the operate stage does a NOP 536.
  • the results of the operation are held 538, such that they enter the trap cycle 540, or stage 430, of the second pipeline 425, simultaneously with the results of the sixty-four-bit subcommand executing in the first pipeline 410 reaching its trap stage 422.
  • the results of the long, sixty-four-bit, operation and the shorter, thirty-two-bit operation, are therefore synchronized at the trap stages.
  • the helper subcommand inserter, 406 when a sixty- four- bit subcommand is processed that requires one cycle of operation before the trap stage beyond the timing of a thirty-two-bit subcommand from the same VLIW instruction word, the helper subcommand inserter, 406, also inserts a NOP, or no-operation, helper subcommand into the instruction stream flow to the second pipeline 425 in which that thirty-two-bit subcommand executes.
  • Each NOP helper subcommand also known as a helper stall subcommand, is dispatched before of the associated thirty-two-bit instruction, as shown in Figure 6. This NOP helper subcommand is not injected if the first pipeline 410 receives a thirty-two-bit subcommand having similar timing to that intended for the second pipeline 425. Execution of the sixty-four- bit subcommand in this embodiment is as shown in Figure 5.
  • the thirty-two-bit subcommand therefore executes with a fetch in the second cycle 630 and a NOP in the first cycle 632 at the fetch stage.
  • the operate stage executes the operation 534, and in the second cycle the operate stage does a NOP 536.
  • Data associated with the subcommand then enters the trap cycle 640, or stage 430, of the second pipeline 425, simultaneously with the results of the sixty-four-bit subcommand executing in the first pipeline 410 reaching its trap stage 422 at.512.
  • the results of the long, sixty-four-bit, operation and the shorter, thirty-two-bit operation, are therefore synchronized at the trap stages.
  • subcommands where multiple NOP helper subcommands must be entered into the pipeline to ensure that the results of a sixty-four-bit subcommand are synchronized with the results of a thirty-two-bit subcommand at the trap stage.
  • sixty-four-bit multiplication operations are performed in a thirty-two-bit array multiplier. Multiplying a pair of sixty-four-bit operands in a thirty-two-bit multiplier, generating a one-hundred- twenty-eight-bit result, requires four passes through the multiplier.
  • the VLIW instruction word is . therefore decoded to insert three helper subcommands into the instruction stream for a pipeline receiving a sixty-four-bit multiply subcommand.
  • the VLIW instruction word having such a multiply subcommand is decoded to inject three NOP helper subcommands after each simultaneously executed thirty-two-bit addition or subtraction operation, instead of the one required when a sixty-four-bit addition is executed.
  • any simultaneously executing sixty-four-bit addition receives two NOP helper subcommands such that its results will be held in the recycle buffers of the execute stage and enter the trap stage simultaneously with the results of the sixty-four-bit multiply subcommand .
  • the number of pipeline stages may vary from those discussed, and that operations other than addition and subtraction may be executed. It is also expected that the present invention is applicable to machines that process data in words of other than the thirty-two- and sixty-four-bit lengths operating on thirty-two-bit hardware herein disclosed, it being applicable to machines having multiple pipelines where a pipeline may execute upon a longer data word than the width of the hardware.
  • no-operation helper subcommand herein described as following a thirty-two-bit subcommand may instead be injected ahead of the thirty-two-bit subcommand, such that the results of the thirty-two-bit subcommand reach the trap stage simultaneously with the results of a sixty-four-bit subcommand .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

Un processeur à très long mot instruction (VLIM) possède de multiples pipelines (410, 425) permettant d'exécuter des sous commandes d'instruction VLIM en parallèle. Chaque pipeline possède au moins un étage d'exécution (412, 414) et un étage filtre (422, 430). Au moins un de ces étages peut fonctionner sur des opérandes d'une première et d'une deuxième longueur de mot, la deuxième longueur de mot étant plus longue que la première, laquelle première longueur de mot est la même que la largeur du chemin de données du pipeline (410, 425). L'exécution d'opérations sur les opérandes de deuxième longueur de mot nécessite de multiples cycles dans au moins un étage d'exécution (412, 414) du pipeline. Un décodeur d'instruction (404) décode des sous commandes d'une séquence d'instructions VLIM en sous commandes de pipeline, et répartit ces dernières dans le premier et le deuxième pipeline (410, 425). Ce décodeur (404) d'instruction injecte au moins une sous commande de renfort dans le premier pipeline (410) lorsqu'une première sous commande de l'instruction VLIM s'exécute sur des opérandes de deuxième longueur de mot. Ce décodeur d'instruction introduit aussi des sous commandes de renfort de non exécution dans le deuxième pipeline (425), le cas échéant, pour faire en sorte que des informations associées à la première sous commande entrent dans un étage filtre (422) du premier pipeline (410), de façon synchronisée avec des informations associées à une deuxième sous commande de cette même instruction VLIM, et soient réparties dans le deuxième pipeline (425) rejoignant un étage filtre (430) de ce deuxième pipeline (425). Ces sous commandes de renfort de non exécution maintiennent une arrivée synchrone des informations aux étages filtre (422, 425), même si la première sous commande s'exécute sur des opérandes de deuxième longueur de mot, et si la deuxième sous commande s'exécute sur des opérandes de première longueur de mot.
EP01938991A 2000-06-02 2001-05-30 Synchronisation d'instructions partiellement executees en pipeline dans des processeurs vlim Withdrawn EP1290548A2 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US58619000A 2000-06-02 2000-06-02
US586190 2000-06-02
PCT/US2001/010839 WO2001095101A2 (fr) 2000-06-02 2001-05-30 Synchronisation d'instructions partiellement executees en pipeline dans des processeurs vlim

Publications (1)

Publication Number Publication Date
EP1290548A2 true EP1290548A2 (fr) 2003-03-12

Family

ID=24344684

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01938991A Withdrawn EP1290548A2 (fr) 2000-06-02 2001-05-30 Synchronisation d'instructions partiellement executees en pipeline dans des processeurs vlim

Country Status (5)

Country Link
EP (1) EP1290548A2 (fr)
JP (1) JP2003536132A (fr)
KR (1) KR20030017982A (fr)
AU (1) AU2001264560A1 (fr)
WO (1) WO2001095101A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11204768B2 (en) 2019-11-06 2021-12-21 Onnivation Llc Instruction length based parallel instruction demarcator

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6934951B2 (en) * 2002-01-17 2005-08-23 Intel Corporation Parallel processor with functional pipeline providing programming engines by supporting multiple contexts and critical section
EP1479039A1 (fr) * 2002-02-26 2004-11-24 Eisei Matsumura Dispositif de commande d'imprimante
JP5395383B2 (ja) * 2008-08-21 2014-01-22 株式会社東芝 パイプライン演算プロセッサを備える制御システム

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0779577B1 (fr) * 1993-10-18 2002-05-22 VIA-Cyrix, Inc. Commande de pipeline et traduction de registre pour microprocesseur
US6128721A (en) * 1993-11-17 2000-10-03 Sun Microsystems, Inc. Temporary pipeline register file for a superpipelined superscalar processor
TW448403B (en) * 1995-03-03 2001-08-01 Matsushita Electric Ind Co Ltd Pipeline data processing device and method for executing multiple data processing data dependent relationship
US6279100B1 (en) * 1998-12-03 2001-08-21 Sun Microsystems, Inc. Local stall control method and structure in a microprocessor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0195101A2 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11204768B2 (en) 2019-11-06 2021-12-21 Onnivation Llc Instruction length based parallel instruction demarcator

Also Published As

Publication number Publication date
AU2001264560A1 (en) 2001-12-17
JP2003536132A (ja) 2003-12-02
KR20030017982A (ko) 2003-03-04
WO2001095101A3 (fr) 2002-03-21
WO2001095101A2 (fr) 2001-12-13

Similar Documents

Publication Publication Date Title
US6675376B2 (en) System and method for fusing instructions
US5799165A (en) Out-of-order processing that removes an issued operation from an execution pipeline upon determining that the operation would cause a lengthy pipeline delay
US6279105B1 (en) Pipelined two-cycle branch target address cache
US5559977A (en) Method and apparatus for executing floating point (FP) instruction pairs in a pipelined processor by stalling the following FP instructions in an execution stage
US6085312A (en) Method and apparatus for handling imprecise exceptions
US20070022277A1 (en) Method and system for an enhanced microprocessor
US5619664A (en) Processor with architecture for improved pipelining of arithmetic instructions by forwarding redundant intermediate data forms
KR100507415B1 (ko) 마이크로프로세서내의공유데이터경로를통해정수데이터및부동소수점데이터를통신하기위한장치및그통신방법
US7418580B1 (en) Dynamic object-level code transaction for improved performance of a computer
US7228403B2 (en) Method for handling 32 bit results for an out-of-order processor with a 64 bit architecture
JP3773769B2 (ja) 命令のインオーダ処理を効率的に実行するスーパースケーラ処理システム及び方法
US5590351A (en) Superscalar execution unit for sequential instruction pointer updates and segment limit checks
US6021488A (en) Data processing system having an apparatus for tracking a status of an out-of-order operation and method thereof
US7809932B1 (en) Methods and apparatus for adapting pipeline stage latency based on instruction type
US6539471B2 (en) Method and apparatus for pre-processing instructions for a processor
US6799266B1 (en) Methods and apparatus for reducing the size of code with an exposed pipeline by encoding NOP operations as instruction operands
US20100211762A1 (en) Mechanism for Efficient Implementation of Software Pipelined Loops in VLIW Processors
JP2001142699A (ja) パイプラインプロセッサにおける命令データの転送メカニズム
US20040230782A1 (en) Method and system for processing loop branch instructions
US6829699B2 (en) Rename finish conflict detection and recovery
US6115730A (en) Reloadable floating point unit
JP2001142701A (ja) プロセッサにおけるパイプライン制御用メカニズムおよび方法
WO2001095101A2 (fr) Synchronisation d'instructions partiellement executees en pipeline dans des processeurs vlim
US6044460A (en) System and method for PC-relative address generation in a microprocessor with a pipeline architecture
Ozer et al. A fast interrupt handling scheme for VLIW processors

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20021227

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: SUN MICROSYSTEMS, INC.

17Q First examination report despatched

Effective date: 20030508

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20061010