WO2000011547A9 - Processing element with special application for branch functions - Google Patents
Processing element with special application for branch functionsInfo
- Publication number
- WO2000011547A9 WO2000011547A9 PCT/US1999/019197 US9919197W WO0011547A9 WO 2000011547 A9 WO2000011547 A9 WO 2000011547A9 US 9919197 W US9919197 W US 9919197W WO 0011547 A9 WO0011547 A9 WO 0011547A9
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- processor
- instructions
- instruction
- branch
- information
- Prior art date
Links
- 230000006870 function Effects 0.000 title claims description 13
- 238000012545 processing Methods 0.000 title claims description 4
- 238000000034 method Methods 0.000 claims description 29
- 230000007246 mechanism Effects 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 4
- 230000011664 signaling Effects 0.000 claims 2
- 230000009466 transformation Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000000844 transformation Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000796 flavoring agent Substances 0.000 description 1
- 235000019634 flavors Nutrition 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/433—Dependency analysis; Data or control flow analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/445—Exploiting fine grain parallelism, i.e. parallelism at instruction level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3804—Instruction prefetching for branches, e.g. hedging, branch folding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
- G06F9/381—Loop buffering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/3826—Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage
- G06F9/3828—Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage with global bypass, e.g. between pipelines, between clusters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
Definitions
- the present application describes a processor architecture which uses independent instruction streams: one for the main processor that decodes and forms instructions forming the actual computation and another for the branch processor that determines sequences of program counter values to fetch instructions for the main processor.
- Branches can be found many places in programs. Examples include branches to subroutine calls, loops and if statements. Fixed length loops and subroutine calls facilitate prediction of how the branches behave when the program is compiled.
- the present application describes a processor architecture that provides additional information that has an application in determining branch information.
- This processor uses two separate instruction streams.
- a main or data processor instruction stream includes the instruction information.
- the branch processor instruction stream determines the program flow.
- an asynchronous processor which carries out this function.
- Another aspect teaches a synchronous design style.
- Figure 1 shows a block diagram of a basic branch processor- modified processing system.
- a processor that does not have a branch delay slot has a number of different instruction types. Different instructions are used to compute different functions. Each instruction has implicit control flow information based on its content. An instruction such as
- next program counter is either pc+4, or L, depending on whether or not registers rl and r3 are equal.
- a processor computes the sequence of program counter values.
- existing instruction sets can encode this information very inefficiently from the density point of view. Most of the time, an instruction needs to be examined simply to determine that the next instruction to be executed is at pc+4.
- the present application defines a new separate and independent instruction sequence that encodes the sequence of program counter (PC) values.
- a program such as the one shown in (1) is compiled into two instruction streams. A first instruction stream determines the computations to be performed. The second instruction stream determines the flow of control.
- a "data processor” executes the first instruction stream that determines the computation to be performed. This a traditional processor operation.
- the second instruction stream includes branch processor instructions. These are executed on a separate processor, the branch processor 110.
- this second instruction stream is a sequence of instructions that computes, or includes information to compute, the sequence of program counter values.
- the program counter information constitutes memory interface 102 that is sent to memory.
- the sequence of instructions 104 is responsively received from memory. These instructions are passed as instruction stream 106 to the data processor 120, which is shown as a traiditional processors with an instruction decoder and instruction-executing registers.
- the branch processor 110 also receives specified feedback from the data processor 120.
- the feedback indicates information such as synchronization information from a sync register 124. This information is used by certain types of instructions to enable information from the data processor 124 to control and provide information to, the branch processor 110. For example, some feedback from the data processor 124 is obtained when executing code that has conditional branches.
- the feedback channel 122 can also occasionally synchronize the two processors.
- Control flow in a program normally follows a call/return pattern.
- a hardware stack in the branch processor is used for storing program counter values. However, there are times when the control flow information is only available at run time.
- a special instruction in the main data processor called " send! is defined to allow executing programs when the control flow is only available at run time.
- the send instruction sends a data value from the data processor 120 to the branch processor 110 via the synchronization channel 122.
- the branch processor produces an instruction that reads the data from this channel and reads values from this channel which have a "?” appended thereto.
- the information is described in the following.
- the instruction addr refers to the address of instructions to be executed on the data processor, and braddr refers to addresses for branch processor instructions.
- Block fetch instructions are introduced to compress control flow information within basic blocks. Instruction fetch addr, N means “fetch and execute N instructions that begin at address addr. " This enables the static determination of the number N instructions that will be executed sequentially. This control- flow information is compressed using this single instruction.
- This instruction can be used to implement "straight-line" microcode.
- a sequential stream of instructions can implement a complex task without increasing code size significantly.
- the single fetch instruction can result in a smaller instruction cache footprint for a program in the case when common code can be shared among different parts of the program.
- the push instruction stores the pair (baddr, N) on the hardware stack. Branch processor execution continues with the next instruction.
- the "dec" instruction examines the pair ⁇ baddr, N) stored on the top of the stack, and decrements N. If the result is zero (or negative) , the stack is popped; otherwise, the branch processor begins execution at address baddr. For example, the code corresponding to a loop that executes a sequence of 15 instructions 10 times would be:
- the number of iterations in a loop is not always known at compile time.
- the following instruction is used to permit the execution of loops with iteration counts determined at run time: pushN? baddr This instruction receives the next data value from the synchronization channel and uses it as the loop count N (as in the normal push instruction) ; other than that it behaves like a push instruction.
- the hardware stack When breaking out of a loop, the hardware stack still has state information in it which needs to be destroyed.
- the pop instruction can explicitly pop the top of the hardware stack.
- call baddr pushes (nextpc, 1 ) onto the branch processor stack, where nextpc is the program counter address immediately following the call, and then transfers control to baddr.
- nextpc is the program counter address immediately following the call
- Returning from a function is implemented by a "ret" instruction, that jumps to the address on the top of the stack and pops the stack.
- a function call to an address determined at run time may occur when executing a function determined by looking at a function pointer stored in a table, or in the case of dynamic dispatch of methods in object-oriented languages.
- the call? instruction reads the address to branch to/from the synchronization channel. It otherwise behaves like a call.
- the push and pop instructions can be used to implement control flow in loops.
- goto instructions of two flavors are introduced: goto baddr goto ?
- the first instruction unconditionally changes the branch processor execution address to baddr.
- the second instruction reads the address to branch to/from the synchronization channel.
- the synchronization channel is used to determine the direction of the branch.
- the if? instruction is used for this purpose.
- the instruction reads a value from the synchronization channel. It continues execution at address baddr if the value received is non-negative. Otherwise, execution continues with the next branch processor instruction. Performance of execution is maximized if the matching send! is executed earlier in the data processor. Therefore, programs that have short sequences of instructions that are interspersed with conditional branches and depend on computation just performed would not be executed efficiently. In such cases, the predicted execution, that executes the instructions conditionally, could be used to preform to improve performance.
- Predicated is a block of instructions using the instruction fetch? addr, N. If the value received from the data processor is non-negative, then the block of N instructions stored at address addr are executed otherwise, the instruction behaves like a no-op (nop) .
- Table 1 shows a summary of the new instructions.
- the following provides examples showing how code is generated for the branch processor. This code is generated, for example, in a compiler.
- Embodiment 1 - code that has a control flow that can be determined when the program is compiled.
- the branch processor does not synchronize with the data processor because the control flow can be determined when the program is compiled.
- Embodiment 2 the same program with a modification that permits the program to exit the loop early.
- Another stream of instructions synchronizes the branch processor to the data processor. Since two separate instructions are separate, misoperation between can cause deadlock, exceptions, or context switching.
- the architecture includes instructions that synchronize the data processor 120 with the branch processor 110. Incorrect code could deadlock the hardware based on lack of synchronization.
- a deadlock detector is provided. The deadlock detector detects conditions which indicate deadlock, and responds thereto. At least two expected sources of deadlock includ
- Every send! instruction must be fetched before the corresponding receive is executed in the branch processor. Therefore, the first case can only be caused by an incorrect program. This possibility can be avoided in the compiler.
- the second case could occur if multiple sends have been dispatched in advance, causing the synchronization channel to become full before any receives could be executed.
- This case can also be prevented by a compiler.
- the compiler must keep track of the number of outstanding send! operations at any point in the program, and ensure that the number of pending send operations does not exceed the hardware limit.
- Deadlock can be detected by using a timing assumption or by running a deadlock detection program. Simple timing assumptions include assuming that the processor has deadlocked if instructions have not been decoded for a long interval-e. g. a microsecond. We could also execute a simple termination detection algorithm to detect deadlock. 3 In the latter case, only have to involve the two ends of the synchronization channel in the termination detection algorithm along with counters to detect that there are no data values in transit from the branch processor to the data processor.
- the processor architecture just proposed has state stored in both the data processor and the branch processor.
- the processor should also include the capability of storing the entire state to memory.
- the state of the data processor can be saved to and restored from memory in the same way as in traditional processors.
- the state of the branch processor is stored in the contents of the branch processor stack and the contents of the synchronization channel between the data processor and branch processor.
- the hardware stack as well as the synchronization channel is be memory-mapped. Therefore, we can save and restore the state of these parts of the branch processor can be saved and restored using load and store instructions from the data processor. Since a mechanism for saving and restoring the state of the processor is described, context switching can be used. Exceptions can be handled using a conventional handling mechanism.
- the send! instructions can be treated as instructions that modify the state of the processor. In addition, if an exception is encountered in the middle of a block fetch instruction, execution should be restored from the middle of the block. Therefore, the branch processor should keep track of pending block fetch instructions, allowing them to be restarte
- Exceptions that occur in the branch processor itself include items such as address translation errors and stack underflow. These can be handled by sending them to the data processor with a special bit set indication of a branch processor exception. The instruction is executed as a nop in the data processor, and raises an exception in the usual way. Since the writeback unit in the data processor handles branch processor exceptions, the exceptions can be handled in program order.
- PC the channel on which program counter values are sent to the data processor
- SYNC the channel used to read data values from the data processor
- Variable bpc is the program counter for the branch processor instructions
- S is a stack.
- a stack element has an addr field and an N field. Three stack operations are used.
- Top (S) is the top element of stack S; Push (S, addr, N) pushes the pair (addr,N) onto the stack and returns a new stack; Pop(S) deletes the top element of stack S and returns the new stack.
- the data processor reads the PC channel to determine which instruction should be executed next.
- the high-level CHP for the data processor is shown below.
- the branch processor can be compared to the instruction fetch in a standard processor.
- the channel SYNC corresponds to the channel from the core of the processor that is used to communicate register values and immediate values to the instruction fetch.
- An additional COND channel is used on which condition codes for branches are sent to the instruction fetch.
- the branch processor computes program counter values earlier than this simple instruction fetch because the communication I?i , which synchronizes the instruction fetch with the rest of the data processor, is not done on every instruction. Further, the branch processor only synchronizes with the data processor when the synchronization becomes necessary.
- the slowest possible execution of the branch processor architecture corresponds to the case when the last PC communication in the branch processor fetches a send! instruction, and the next branch processor instruction is either pushN?, call?, goto?, if?, of fetch?.
- the branch processor waits for the send! instruction to be fetched, decoded, and executed. Assume the time taken to fetch, decode, and execute the send! instruction is ⁇ 0 .
- the branch processor overhead can be analyzed for each potentially slow instruction.
- the branch processor waits for the data on SYNC to arrive, before it can fetch the next branch processor instruction.
- the next data processor instruction has an additional data processor latency of ⁇ 0 + ⁇ seconds.
- the branch processor stalls for ⁇ 0 + ⁇ seconds for branches that are taken, goto? and call? instructions, and ⁇ 0 seconds for branches that are not taken and fetch? instructions.
- the latency of fetching the branch instruction and executing the instruction ( ⁇ 0 ) is typically avoided by the introduction of l ⁇ 0 / ⁇ l branch delay slots.
- a standard instruction set is directly translated to branch processor code by replacing branches by send! instructions. Each send! instruction is followed by l ⁇ 0 / ⁇ l instruction that corresponds to the branch delay slot. Therefore, the only additional stall that the branch processor encounters is ⁇ . This would be completely hidden if the original architecture had an additional branch delay slot.
- An additional memory read for branch processor instructions is unsynchronized with the memory for data processor instructions of the data memory.
- each unique data processor opcode would be stored once. This implies that an upper bound on the number of instructions required to be stored in the instruction cache is given by the number of distinct instructions in the program.
- the inventors collected instruction count statistics for 267 executables that were compiled using the GNU C compiler for an R3000-based DECstation. It was found that the number of distinct opcodes grows at a rate that is less than linear in the size of the executable.
- Table 2 shows the percentage of programs that would completely fit in an instruction cache depending on whether total instructions or the number of unique instructions in the program are counted.
- a branch processor architecture most programs would fit in a typical instruction cache (8K words). Therefore, the number of instruction cache misses in the data processor.
- the cache misses for the branch processor are increased.
- the number of cache misses for the branch processor can be bounded by the number of cache misses for the original instruction set, since each ordinary instruction is translated into at most one branch processor instruction. Therefore, the additional memory bandwidth requirements for a branch processor can be reduced significantly by sharing instructions from the data processor-but at a performance cost. This conservative analysis shows that introducing a branch processor will not have a large impact on the instruction memory bandwidth required by the processor.
- Branch prediction and prefetching techniques attempt to improve performance by predicting what the program will execute.
- Incorporating branch prediction into this architecture corresponds to guessing the value being sent on the feedback channel for if? instructions. Since simple loops no longer contribute branch instructions, the effectiveness of branch prediction will be decreased because the cases which can be easily predicted (loops) are no longer present.
- Prefetch instructions attempt to hide the latency of cache misses by dispatching reads to the caches before the data value is actually needed. These prefetch instructions can be inserted into the instruction stream of both the branch processor (for instruction cache prefetches) and the data processor (for data cache prefetches) .
- the instruction s fetch addr,N means “fetch and speculatively execute N instructions that begin at address addr. " These instructions are fetched from memory and dispatched to the data processor.
- the commit instruction informs the data processor if the last speculatively executed block should be permitted to modify the state of the processor. Therefore, the sequence "sfetch addr,N; commit true” is equivalent to "fetch addr,N” .
- the sequence "sfetch addr,N; commit false” is equivalent to a skip. Speculative execution is used to begin execution of a block of code before knowing whether it should be executed.
- the condition under which the code should be permitted to execute is computed in the data processor, and sent back to the branch processor via a send! instruction. Often, this information determines which of "commit true” or “commit false” should be executed. To optimize this case, the sfetch? Instruction is used. "Sfetch! addr,N” behaves like sfetch. In addition, it receives a value from the data processor and uses this value to determine which commit instruction should be executed.
- Table 3 Instructions supporting speculative execution.
- a standard instruction set is translated directly into branch processor instructions by replacing conditional branches with send! and if? pairs, and using fetch instructions to dispatch instructions within a basic block.
- Both fixed length and variable length loops can be detected by modern compilation systems.
- Most programming languages have constructs for simple iterated loops, simplifying the problem of loop detection. Therefore, a compiler can generate push instructions for loops.
- subroutine call and returns are explicit in the language. Therefore, these instructions can be easily generated by standard compilation systems.
- the branch processor instruction set is easier to map to because the call and return semantics are provided by the hardware directly. Peephole optimization can be used to move a send! instruction before any other instructions in the data processor that it depends on. Recall that early send! instruction will improve the performance of the branch processor architecture.
- Loop unrolling and loop peeling are transformations used to improve the performance of programs. Both transformations replicate the body of the loop in order to statically determine the direction of some of the branches in the loop body. Observe that such program transformations replicate code just in the branch processor; streams of instructions in the data processor can be re-used because they no longer encode any control flow information. This implies that we will not worsen instruction cache performance by applying such transformations.
- Fetch instructions provide a simple interface for implementing microcode.
- a sequence of instructions stored at fixed addresses in memory can be used to create complex "instructions" of the form of fetch addr,N.
- the effect of executing these instructions is to execute the sequence of instructions stored at the specified memory address, providing the same effect as an architecture that included programmable microcode .
- a standard instruction set is translated directly into branch processor instructions by replacing conditional branches with send! and if? pairs, and using fetch instructions to dispatch instructions within a basic block.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Advance Control (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP99943848A EP1105793A4 (en) | 1998-08-21 | 1999-08-20 | Processing element with special application for branch functions |
AU56865/99A AU5686599A (en) | 1998-08-21 | 1999-08-20 | Processing element with special application for branch functions |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US9751598P | 1998-08-21 | 1998-08-21 | |
US60/097,515 | 1998-08-21 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2000011547A1 WO2000011547A1 (en) | 2000-03-02 |
WO2000011547A9 true WO2000011547A9 (en) | 2000-08-10 |
Family
ID=22263771
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/019197 WO2000011547A1 (en) | 1998-08-21 | 1999-08-20 | Processing element with special application for branch functions |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1105793A4 (en) |
AU (1) | AU5686599A (en) |
WO (1) | WO2000011547A1 (en) |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4338661A (en) * | 1979-05-21 | 1982-07-06 | Motorola, Inc. | Conditional branch unit for microprogrammed data processor |
JP3137117B2 (en) * | 1987-03-27 | 2001-02-19 | 将容 曽和 | High-speed processing computer |
US5539911A (en) * | 1991-07-08 | 1996-07-23 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
WO1994016383A1 (en) * | 1993-01-06 | 1994-07-21 | The 3Do Company | Digital signal processor architecture |
US5485629A (en) * | 1993-01-22 | 1996-01-16 | Intel Corporation | Method and apparatus for executing control flow instructions in a control flow pipeline in parallel with arithmetic instructions being executed in arithmetic pipelines |
EP0660223B1 (en) * | 1993-11-30 | 2001-10-04 | Texas Instruments Incorporated | Three input arithmetic logic unit with barrel rotator |
US5781752A (en) * | 1996-12-26 | 1998-07-14 | Wisconsin Alumni Research Foundation | Table based data speculation circuit for parallel processing computer |
-
1999
- 1999-08-20 WO PCT/US1999/019197 patent/WO2000011547A1/en active Application Filing
- 1999-08-20 EP EP99943848A patent/EP1105793A4/en not_active Withdrawn
- 1999-08-20 AU AU56865/99A patent/AU5686599A/en not_active Abandoned
Also Published As
Publication number | Publication date |
---|---|
EP1105793A4 (en) | 2007-07-25 |
WO2000011547A1 (en) | 2000-03-02 |
EP1105793A1 (en) | 2001-06-13 |
AU5686599A (en) | 2000-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
McFarling et al. | Reducing the cost of branches | |
US6157988A (en) | Method and apparatus for high performance branching in pipelined microsystems | |
US5669011A (en) | Partially decoded instruction cache | |
Ditzel et al. | Branch folding in the CRISP microprocessor: Reducing branch delay to zero | |
US6523110B1 (en) | Decoupled fetch-execute engine with static branch prediction support | |
US6631514B1 (en) | Emulation system that uses dynamic binary translation and permits the safe speculation of trapping operations | |
US5136696A (en) | High-performance pipelined central processor for predicting the occurrence of executing single-cycle instructions and multicycle instructions | |
EP0605872B1 (en) | Method and system for supporting speculative execution of instructions | |
US20020087849A1 (en) | Full multiprocessor speculation mechanism in a symmetric multiprocessor (smp) System | |
US6687812B1 (en) | Parallel processing apparatus | |
JP2000506636A (en) | Flexible implementation of system management mode (SMM) in processors | |
US8250344B2 (en) | Methods and apparatus for dynamic prediction by software | |
Nakra et al. | Value prediction in VLIW machines | |
KR20040045467A (en) | Speculative execution for java hardware accelerator | |
US7356673B2 (en) | System and method including distributed instruction buffers for storing frequently executed instructions in predecoded form | |
WO2002061574A1 (en) | Computer instruction with instruction fetch control bits | |
US5737562A (en) | CPU pipeline having queuing stage to facilitate branch instructions | |
WO2000011547A9 (en) | Processing element with special application for branch functions | |
Hwu et al. | Efficient instruction sequencing with inline target insertion | |
Song | Demystifying epic and ia-64 | |
Steven et al. | Using a resource-limited instruction scheduler to evaluate the iHARP processor | |
Tyagi et al. | Dynamic branch decoupled architecture | |
Thakkar et al. | An instruction fetch unit for a graph reduction machine | |
González | A survey of branch techniques in pipelined processors | |
Okamoto et al. | Instruction set architecture to control instruction fetch on pipelined processors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: C2 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: C2 Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
COP | Corrected version of pamphlet |
Free format text: PAGE 1/1, DRAWINGS, REPLACED BY A NEW PAGE 1/1; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1999943848 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1999943848 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |