EP1105793A4 - Element de traitement s'appliquant plus particulierement a des fonctions de branchement - Google Patents

Element de traitement s'appliquant plus particulierement a des fonctions de branchement

Info

Publication number
EP1105793A4
EP1105793A4 EP99943848A EP99943848A EP1105793A4 EP 1105793 A4 EP1105793 A4 EP 1105793A4 EP 99943848 A EP99943848 A EP 99943848A EP 99943848 A EP99943848 A EP 99943848A EP 1105793 A4 EP1105793 A4 EP 1105793A4
Authority
EP
European Patent Office
Prior art keywords
processor
instructions
instruction
branch
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99943848A
Other languages
German (de)
English (en)
Other versions
EP1105793A1 (fr
Inventor
Rajit Manohar
Alain Martin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
California Institute of Technology CalTech
Original Assignee
California Institute of Technology CalTech
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by California Institute of Technology CalTech filed Critical California Institute of Technology CalTech
Publication of EP1105793A1 publication Critical patent/EP1105793A1/fr
Publication of EP1105793A4 publication Critical patent/EP1105793A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/445Exploiting fine grain parallelism, i.e. parallelism at instruction level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
    • G06F9/381Loop buffering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3826Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage
    • G06F9/3828Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage with global bypass, e.g. between pipelines, between clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units

Definitions

  • the present application describes a processor architecture which uses independent instruction streams: one for the main processor that decodes and forms instructions forming the actual computation and another for the branch processor that determines sequences of program counter values to fetch instructions for the main processor.
  • Branches can be found many places in programs. Examples include branches to subroutine calls, loops and if statements. Fixed length loops and subroutine calls facilitate prediction of how the branches behave when the program is compiled.
  • the present application describes a processor architecture that provides additional information that has an application in determining branch information.
  • This processor uses two separate instruction streams.
  • a main or data processor instruction stream includes the instruction information.
  • the branch processor instruction stream determines the program flow.
  • an asynchronous processor which carries out this function.
  • Another aspect teaches a synchronous design style.
  • Figure 1 shows a block diagram of a basic branch processor- modified processing system.
  • a processor that does not have a branch delay slot has a number of different instruction types. Different instructions are used to compute different functions. Each instruction has implicit control flow information based on its content. An instruction such as
  • next program counter is either pc+4, or L, depending on whether or not registers rl and r3 are equal.
  • a processor computes the sequence of program counter values.
  • existing instruction sets can encode this information very inefficiently from the density point of view. Most of the time, an instruction needs to be examined simply to determine that the next instruction to be executed is at pc+4.
  • the present application defines a new separate and independent instruction sequence that encodes the sequence of program counter (PC) values.
  • a program such as the one shown in (1) is compiled into two instruction streams. A first instruction stream determines the computations to be performed. The second instruction stream determines the flow of control.
  • a "data processor” executes the first instruction stream that determines the computation to be performed. This a traditional processor operation.
  • the second instruction stream includes branch processor instructions. These are executed on a separate processor, the branch processor 110.
  • this second instruction stream is a sequence of instructions that computes, or includes information to compute, the sequence of program counter values.
  • the program counter information constitutes memory interface 102 that is sent to memory.
  • the sequence of instructions 104 is responsively received from memory. These instructions are passed as instruction stream 106 to the data processor 120, which is shown as a traiditional processors with an instruction decoder and instruction-executing registers.
  • the branch processor 110 also receives specified feedback from the data processor 120.
  • the feedback indicates information such as synchronization information from a sync register 124. This information is used by certain types of instructions to enable information from the data processor 124 to control and provide information to, the branch processor 110. For example, some feedback from the data processor 124 is obtained when executing code that has conditional branches.
  • the feedback channel 122 can also occasionally synchronize the two processors.
  • Control flow in a program normally follows a call/return pattern.
  • a hardware stack in the branch processor is used for storing program counter values. However, there are times when the control flow information is only available at run time.
  • a special instruction in the main data processor called " send! is defined to allow executing programs when the control flow is only available at run time.
  • the send instruction sends a data value from the data processor 120 to the branch processor 110 via the synchronization channel 122.
  • the branch processor produces an instruction that reads the data from this channel and reads values from this channel which have a "?” appended thereto.
  • the information is described in the following.
  • the instruction addr refers to the address of instructions to be executed on the data processor, and braddr refers to addresses for branch processor instructions.
  • Block fetch instructions are introduced to compress control flow information within basic blocks. Instruction fetch addr, N means “fetch and execute N instructions that begin at address addr. " This enables the static determination of the number N instructions that will be executed sequentially. This control- flow information is compressed using this single instruction.
  • This instruction can be used to implement "straight-line" microcode.
  • a sequential stream of instructions can implement a complex task without increasing code size significantly.
  • the single fetch instruction can result in a smaller instruction cache footprint for a program in the case when common code can be shared among different parts of the program.
  • the push instruction stores the pair (baddr, N) on the hardware stack. Branch processor execution continues with the next instruction.
  • the "dec" instruction examines the pair (baddr, N) stored on the top of the stack, and decrements N. If the result is zero (or negative) , the stack is popped; otherwise, the branch processor begins execution at address baddr. For example, the code corresponding to a loop that executes a sequence of 15 instructions 10 times would be:
  • the number of iterations in a loop is not always known at compile time.
  • the following instruction is used to permit the execution of loops with iteration counts determined at run time: pushN? baddr This instruction receives the next data value from the synchronization channel and uses it as the loop count N (as in the normal push instruction) ; other than that it behaves like a push instruction.
  • the hardware stack When breaking out of a loop, the hardware stack still has state information in it which needs to be destroyed.
  • the pop instruction can explicitly pop the top of the hardware stack.
  • call baddr pushes (nextpc, 1 ) onto the branch processor stack, where nextpc is the program counter address immediately following the call, and then transfers control to baddr.
  • nextpc is the program counter address immediately following the call
  • Returning from a function is implemented by a "ret" instruction, that jumps to the address on the top of the stack and pops the stack.
  • a function call to an address determined at run time may occur when executing a function determined by looking at a function pointer stored in a table, or in the case of dynamic dispatch of methods in object-oriented languages.
  • the call? instruction reads the address to branch to/from the synchronization channel. It otherwise behaves like a call.
  • the push and pop instructions can be used to implement control flow in loops.
  • goto instructions of two flavors are introduced: goto baddr goto?
  • the first instruction unconditionally changes the branch processor execution address to baddr.
  • the second instruction reads the address to branch to/from the synchronization channel.
  • the synchronization channel is used to determine the direction of the branch.
  • the if? instruction is used for this purpose.
  • the instruction reads a value from the synchronization channel. It continues execution at address baddr if the value received is non-negative. Otherwise, execution continues with the next branch processor instruction. Performance of execution is maximized if the matching send! is executed earlier in the data processor. Therefore, programs that have short sequences of instructions that are interspersed with conditional branches and depend on computation just performed would not be executed efficiently. In such cases, the predicted execution, that executes the instructions conditionally, could be used to preform to improve performance.
  • Predicated is a block of instructions using the instruction fetch? addr, N. If the value received from the data processor is non-negative, then the block of N instructions stored at address addr are executed otherwise, the instruction behaves like a no-op (nop) .
  • Table 1 shows a summary of the new instructions.
  • the following provides examples showing how code is generated for the branch processor. This code is generated, for example, in a compiler.
  • Embodiment 1 - code that has a control flow that can be determined when the program is compiled.
  • the branch processor does not synchronize with the data processor because the control flow can be determined when the program is compiled.
  • Embodiment 2 the same program with a modification that permits the program to exit the loop early.
  • the underlined instructions are deleted. In one case, since the branch is condition, it is replaced by the send! instruction shown.
  • the additional branch processor code would be: fetch E, 5; push LI, 100; LI: fetch L, 10; if? B; fetch P, 2; dec; push LI, 1;
  • Another stream of instructions synchronizes the branch processor to the data processor. Since two separate instructions are separate, misoperation between can cause deadlock, exceptions, or context switching.
  • the architecture includes instructions that synchronize the data processor 120 with the branch processor 110. Incorrect code could deadlock the hardware based on lack of synchronization.
  • a deadlock detector is provided. The deadlock detector detects conditions which indicate deadlock, and responds thereto. At least two expected sources of deadlock include
  • Every send! instruction must be fetched before the corresponding receive is executed in the branch processor. Therefore, the first case can only be caused by an incorrect program. This possibility can be avoided in the compiler.
  • the second case could occur if multiple sends have been dispatched in advance, causing the synchronization channel to become full before any receives could be executed.
  • This case can also be prevented by a compiler.
  • the compiler must keep track of the number of outstanding send! operations at any point in the program, and ensure that the number of pending send operations does not exceed the hardware limit.
  • Deadlock can be detected by using a timing assumption or by running a deadlock detection program. Simple timing assumptions include assuming that the processor has deadlocked if instructions have not been decoded for a long interval-e. g. a microsecond. We could also execute a simple termination detection algorithm to detect deadlock. 3 In the latter case, only have to involve the two ends of the synchronization channel in the termination detection algorithm along with counters to detect that there are no data values in transit from the branch processor to the data processor.
  • the processor architecture just proposed has state stored in both the data processor and the branch processor.
  • the processor should also include the capability of storing the entire state to memory.
  • the state of the data processor can be saved to and restored from memory in the same way as in traditional processors.
  • the state of the branch processor is stored in the contents of the branch processor stack and the contents of the synchronization channel between the data processor and branch processor.
  • the hardware stack as well as the synchronization channel is be memory-mapped. Therefore, we can save and restore the state of these parts of the branch processor can be saved and restored using load and store instructions from the data processor. Since a mechanism for saving and restoring the state of the processor is described, context switching can be used. Exceptions can be handled using a conventional handling mechanism.
  • the send! instructions can be treated as instructions that modify the state of the processor. In addition, if an exception is encountered in the middle of a block fetch instruction, execution should be restored from the middle of the block. Therefore, the branch processor should keep track of pending block fetch instructions, allowing them to be restarte
  • Exceptions that occur in the branch processor itself include items such as address translation errors and stack underflow. These can be handled by sending them to the data processor with a special bit set indication of a branch processor exception. The instruction is executed as a nop in the data processor, and raises an exception in the usual way. Since the writeback unit in the data processor handles branch processor exceptions, the exceptions can be handled in program order.
  • PC the channel on which program counter values are sent to the data processor
  • SYNC the channel used to read data values from the data processor
  • Variable bpc is the program counter for the branch processor instructions
  • S is a stack.
  • a stack element has an addr field and an N field. Three stack operations are used.
  • Top (S) is the top element of stack S; Push (S, addr, N) pushes the pair (addr,N) onto the stack and returns a new stack; Pop(S) deletes the top element of stack S and returns the new stack.
  • the branch processor can be compared to the instruction fetch in a standard processor.
  • the channel SYNC corresponds to the channel from the core of the processor that is used to communicate register values and immediate values to the instruction fetch.
  • An additional COND channel is used on which condition codes for branches are sent to the instruction fetch.
  • the branch processor computes program counter values earlier than this simple instruction fetch because the communication I?i , which synchronizes the instruction fetch with the rest of the data processor, is not done on every instruction. Further, the branch processor only synchronizes with the data processor when the synchronization becomes necessary.
  • the slowest possible execution of the branch processor architecture corresponds to the case when the last PC ⁇ communication in the branch processor fetches a send! instruction, and the next branch processor instruction is either pushN?, call?, goto?, if?, of fetch?.
  • the branch processor waits for the send! instruction to be fetched, decoded, and executed. Assume the time taken to fetch, decode, and execute the send! instruction is ⁇ 0 .
  • the branch processor overhead can be analyzed for each potentially slow instruction.
  • the branch processor waits for the data on SYNC to arrive, before it can fetch the next branch processor instruction.
  • the next data processor instruction has an additional data processor latency of ⁇ 0 + ⁇ seconds.
  • the branch processor stalls for ⁇ 0 + ⁇ seconds for branches that are taken, goto? and call? instructions, and ⁇ 0 seconds for branches that are not taken and fetch? instructions.
  • the latency of fetching the branch instruction and executing the instruction ( ⁇ 0 ) is typically avoided by the introduction of l ⁇ 0 / ⁇ l branch delay slots.
  • a standard instruction set is directly translated to branch processor code by replacing branches by send! instructions. Each send! instruction is followed by l ⁇ 0 / ⁇ l instruction that corresponds to the branch delay slot. Therefore, the only additional stall that the branch processor encounters is ⁇ . This would be completely hidden if the original architecture had an additional branch delay slot.
  • An additional memory read for branch processor instructions is unsynchronized with the memory for data processor instructions of the data memory.
  • each unique data processor opcode would be stored once. This implies that an upper bound on the number of instructions required to be stored in the instruction cache is given by the number of distinct instructions in the program.
  • the inventors collected instruction count statistics for 267 executables that were compiled using the GNU C compiler for an R3000-based DECstation. It was found that the number of distinct opcodes grows at a rate that is less than linear in the size of the executable.
  • Table 2 shows the percentage of programs that would completely fit in an instruction cache depending on whether total instructions or the number of unique instructions in the program are counted.
  • a branch processor architecture most programs would fit in a typical instruction cache (8K words) . Therefore, the number of instruction cache misses in the data processor.
  • the cache misses for the branch processor are increased.
  • the number of cache misses for the branch processor can be bounded by the number of cache misses for the original instruction set, since each ordinary instruction is translated into at most one branch processor instruction. Therefore, the additional memory bandwidth requirements for a branch processor can be reduced significantly by sharing instructions from the data processor-but at a performance cost. This conservative analysis shows that introducing a branch processor will not have a large impact on the instruction memory bandwidth required by the processor.
  • Branch prediction and prefetching techniques attempt to improve performance by predicting what the program will execute.
  • Incorporating branch prediction into this architecture corresponds to guessing the value being sent on the feedback channel for if? instructions. Since simple loops no longer contribute branch instructions, the effectiveness of branch prediction will be decreased because the cases which can be easily predicted (loops) are no longer present.
  • Prefetch instructions attempt to hide the latency of cache misses by dispatching reads to the caches before the data value is actually needed. These prefetch instructions can be inserted into the instruction stream of both the branch processor (for instruction cache prefetches) and the data processor (for data cache prefetches) .
  • the instruction s fetch addr,N means “fetch and speculatively execute N instructions that begin at address addr. " These instructions are fetched from memory and dispatched to the data processor.
  • the commit instruction informs the data processor if the last speculatively executed block should be permitted to modify the state of the processor. Therefore, the sequence "sfetch addr,N; commit true” is equivalent to "fetch addr,N” .
  • the sequence "sfetch addr,N; commit false” is equivalent to a skip. Speculative execution is used to begin execution of a block of code before knowing whether it should be executed.
  • the condition under which the code should be permitted to execute is computed in the data processor, and sent back to the branch processor via a send! instruction. Often, this information determines which of "commit true” or “commit false” should be executed. To optimize this case, the sfetch? Instruction is used. "Sfetch! addr,N” behaves like sfetch. In addition, it receives a value from the data processor and uses this value to determine which commit instruction should be executed.
  • Table 3 Instructions supporting speculative execution.
  • a standard instruction set is translated directly into branch processor instructions by replacing conditional branches with send! and if? pairs, and using fetch instructions to dispatch instructions within a basic block.
  • Both fixed length and variable length loops can be detected by modern compilation systems.
  • Most programming languages have constructs for simple iterated loops, simplifying the problem of loop detection. Therefore, a compiler can generate push instructions for loops.
  • subroutine call and returns are explicit in the language. Therefore, these instructions can be easily generated by standard compilation systems. Indeed, the branch processor instruction set is easier to map to because the call and return semantics are provided by the hardware directly.
  • Peephole optimization can be used to move a send! instruction before any other instructions in the data processor that it depends on. Recall that early send! instruction will improve the performance of the branch processor architecture.
  • Loop unrolling and loop peeling are transformations used to improve the performance of programs. Both transformations replicate the body of the loop in order to statically determine the direction of some of the branches in the loop body. Observe that such program transformations replicate code just in the branch processor; streams of instructions in the data processor can be re-used because they no longer encode any control flow information. This implies that we will not worsen instruction cache performance by applying such transformations.
  • Fetch instructions provide a simple interface for implementing microcode.
  • a sequence of instructions stored at fixed addresses in memory can be used to create complex "instructions" of the form of fetch addr,N.
  • the effect of executing these instructions is to execute the sequence of instructions stored at the specified memory address, providing the same effect as an architecture that included programmable microcode.
  • a standard instruction set is translated directly into branch processor instructions by replacing conditional branches with send! and if? pairs, and using fetch instructions to dispatch instructions within a basic block.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Advance Control (AREA)

Abstract

Un système de traitement est formé à partir d'un processeur de branchement (figure 1, unité 110) et d'un processeur principal. Le processeur de données principal (figure 1, unité 120) fonctionne comme les processeurs classiques. Le processeur de branchement agit pour déterminer le nombre de branchements et les informations permettant d'utiliser des informations qui sont habituellement calculées théoriquement. Un synchroniseur est utilisé occasionnellement pour synchroniser le processeur de branchement et le processeur de données par une voie de rétroaction (figure 1, unité 122).
EP99943848A 1998-08-21 1999-08-20 Element de traitement s'appliquant plus particulierement a des fonctions de branchement Withdrawn EP1105793A4 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US9751598P 1998-08-21 1998-08-21
US97515P 1998-08-21
PCT/US1999/019197 WO2000011547A1 (fr) 1998-08-21 1999-08-20 Element de traitement s'appliquant plus particulierement a des fonctions de branchement

Publications (2)

Publication Number Publication Date
EP1105793A1 EP1105793A1 (fr) 2001-06-13
EP1105793A4 true EP1105793A4 (fr) 2007-07-25

Family

ID=22263771

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99943848A Withdrawn EP1105793A4 (fr) 1998-08-21 1999-08-20 Element de traitement s'appliquant plus particulierement a des fonctions de branchement

Country Status (3)

Country Link
EP (1) EP1105793A4 (fr)
AU (1) AU5686599A (fr)
WO (1) WO2000011547A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0284364A2 (fr) * 1987-03-27 1988-09-28 Seiko Instruments Inc. Système d'ordinateur à haute vitesse
WO1994016383A1 (fr) * 1993-01-06 1994-07-21 The 3Do Company Architecture d'un processor de signaux numeriques
EP0660223A2 (fr) * 1993-11-30 1995-06-28 Texas Instruments Incorporated Unité arithméthique et logique à trois entrées avec rotateur à tambour et générateur de masque
US5485629A (en) * 1993-01-22 1996-01-16 Intel Corporation Method and apparatus for executing control flow instructions in a control flow pipeline in parallel with arithmetic instructions being executed in arithmetic pipelines

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4338661A (en) * 1979-05-21 1982-07-06 Motorola, Inc. Conditional branch unit for microprogrammed data processor
US5539911A (en) * 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
US5781752A (en) * 1996-12-26 1998-07-14 Wisconsin Alumni Research Foundation Table based data speculation circuit for parallel processing computer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0284364A2 (fr) * 1987-03-27 1988-09-28 Seiko Instruments Inc. Système d'ordinateur à haute vitesse
WO1994016383A1 (fr) * 1993-01-06 1994-07-21 The 3Do Company Architecture d'un processor de signaux numeriques
US5485629A (en) * 1993-01-22 1996-01-16 Intel Corporation Method and apparatus for executing control flow instructions in a control flow pipeline in parallel with arithmetic instructions being executed in arithmetic pipelines
EP0660223A2 (fr) * 1993-11-30 1995-06-28 Texas Instruments Incorporated Unité arithméthique et logique à trois entrées avec rotateur à tambour et générateur de masque

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO0011547A1 *

Also Published As

Publication number Publication date
EP1105793A1 (fr) 2001-06-13
WO2000011547A9 (fr) 2000-08-10
WO2000011547A1 (fr) 2000-03-02
AU5686599A (en) 2000-03-14

Similar Documents

Publication Publication Date Title
US6157988A (en) Method and apparatus for high performance branching in pipelined microsystems
McFarling et al. Reducing the cost of branches
US6631514B1 (en) Emulation system that uses dynamic binary translation and permits the safe speculation of trapping operations
EP0459232B1 (fr) Antémémoire d'instructions décodées partiellement et méthode correspondante
Ditzel et al. Branch folding in the CRISP microprocessor: Reducing branch delay to zero
US6523110B1 (en) Decoupled fetch-execute engine with static branch prediction support
US6928645B2 (en) Software-based speculative pre-computation and multithreading
US5692169A (en) Method and system for deferring exceptions generated during speculative execution
US7730263B2 (en) Future execution prefetching technique and architecture
US5421020A (en) Counter register implementation for speculative execution of branch on count instructions
Schlansker et al. EPIC: An architecture for instruction-level parallel processors
US20020087849A1 (en) Full multiprocessor speculation mechanism in a symmetric multiprocessor (smp) System
US6687812B1 (en) Parallel processing apparatus
GB2293671A (en) Reducing delays due to branch instructions
US20100287358A1 (en) Branch Prediction Path Instruction
Nakra et al. Value prediction in VLIW machines
US20020161987A1 (en) System and method including distributed instruction buffers holding a second instruction form
US5737562A (en) CPU pipeline having queuing stage to facilitate branch instructions
EP1105793A1 (fr) Element de traitement s'appliquant plus particulierement a des fonctions de branchement
Hwu et al. Efficient instruction sequencing with inline target insertion
Song Demystifying epic and ia-64
Steven et al. Using a resource-limited instruction scheduler to evaluate the iHARP processor
Tyagi et al. Dynamic branch decoupled architecture
Thakkar et al. An instruction fetch unit for a graph reduction machine
González A survey of branch techniques in pipelined processors

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20010320

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB

A4 Supplementary search report drawn up and despatched

Effective date: 20070621

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 9/45 20060101ALI20070615BHEP

Ipc: G06F 9/38 20060101ALI20070615BHEP

Ipc: G06F 9/32 20060101AFI20070615BHEP

17Q First examination report despatched

Effective date: 20071008

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20100302