WO2001061469A2 - Apparatus and method for reducing register write traffic in processors with exception routines - Google Patents

Apparatus and method for reducing register write traffic in processors with exception routines Download PDF

Info

Publication number
WO2001061469A2
WO2001061469A2 PCT/EP2001/000775 EP0100775W WO0161469A2 WO 2001061469 A2 WO2001061469 A2 WO 2001061469A2 EP 0100775 W EP0100775 W EP 0100775W WO 0161469 A2 WO0161469 A2 WO 0161469A2
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
result
pipeline
register file
execution device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2001/000775
Other languages
English (en)
French (fr)
Other versions
WO2001061469A3 (en
Inventor
Paul Stravers
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Priority to EP01953031A priority Critical patent/EP1208424B1/en
Priority to DE2001602777 priority patent/DE60102777T2/de
Priority to AT01953031T priority patent/ATE264520T1/de
Priority to JP2001560791A priority patent/JP2004508607A/ja
Publication of WO2001061469A2 publication Critical patent/WO2001061469A2/en
Publication of WO2001061469A3 publication Critical patent/WO2001061469A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3826Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling

Definitions

  • the present invention pertains generally to the field of digital computation circuits, and in particular, the invention relates to a system and method for an instruction execution device for use with a processor with exception routines.
  • ISAs microprocessor Instruction Set Architectures
  • pipeline a so-called “pipeline” method to overlap different execution stages of subsequent instructions.
  • a conventional four-stage pipeline employs a (1) Fetch, (2) Decode, (3) Execute and (4) a Write-back.
  • a load instruction For data transfer type instructions such as a load instruction, one extra instruction pipeline stage is usually required.
  • the processor fetches an instruction from memory.
  • the address of the instruction to fetch is stored in the internal register, named the program counter, or PC.
  • PC program counter
  • the processor increments the PC. This means the fetch phase of the next cycle will fetch the instruction in the next sequential location in memory (unless the PC is modified by a later phase of the cycle).
  • the processor stores the information returned by the memory in another internal register, known as the instruction register, or IR.
  • the IR now holds a single machine instruction encoded as a binary number.
  • the processor decodes the value in the IR in order to figure out which operations to perform in the next stage. In the execution stage, the processor actually carries out the instruction.
  • the instruction may direct the processor to fetch two operands from memory (for example, storing them in operand registers), add them and store the result in a third location (the destination addresses of the operands and the result are also encoded as part of the instruction).
  • the result computed upstream in the pipeline is written ⁇ retired) to a destination register in a register file.
  • circuitry that allows operand or result values to bypass the register file.
  • the operands or result values are already available to subsequent instructions before the operand-producing instructions are retired (e.g., written-back to register file).
  • the large register file typically contributes significantly to the overall power consumption and size of the processor.
  • the instruction execution device includes an instruction pipeline for producing a result for an instruction, wherein exception routines may interrupt the instruction pipeline at random intervals; a register file that includes at least one write port for storing the result, a bypass circuit for allowing access to the result, wherein an indication is provided to a register file control as to whether the result is used by only one other instruction, and the register file control prevents the result from being stored in the write port when the result has been accessed via the bypass circuit and is used by only one other instruction, a First-In-First-Out (FIFO) buffer for storing the result and a FIFO control for writing the contents of the FIFO buffer to the register file when an exception occurs.
  • FIFO First-In-First-Out
  • the indication whether the result is used by only one other instruction includes encoding each instruction.
  • a so-called “dead value” field is designated in the "opcode" of each instruction to indicate whether the result will be used by only one other instruction.
  • the means for indicating whether the first result is used by only the second instruction includes the instruction pipeline determining whether a result of an instruction in the instruction pipeline and another result of another instruction in the instruction pipeline are designated for storage in the same write port in the register file. Since the write port in the register file is "reused" by a subsequent instruction already in the instruction pipeline, this is used to indicate that the first result will be used by only one other instruction.
  • Fig. 1 is a block diagram of one illustrative arrangement of an instruction execution device in accordance with the teachings of the present invention
  • Fig. 2 is a block diagram of another illustrative arrangement of an instruction execution device in accordance with the teachings of the present invention.
  • FIG. 1 a block diagram is shown illustrating one embodiment of an instruction execution device in accordance with the teachings of the present invention. It will be recognized that Fig. 1 is simplified for explanation purposes and that the full processor environment suitable for use with the invention will comprise, for example, cache memory, RAM and ROM memory, compiler or assembler, I/O devices, etc., all of which need not be shown here.
  • instruction execution device 10 uses an n- stage pipeline instruction set register architecture (ISA) 12] through 12 n (hereinafter collectively known as "pipeline 12"), a conventional bypass circuit 14, a register file 16, a register file control 18, a First in First out (FIFO) buffer 24 and a FIFO buffer control 26.
  • the pipeline 12 includes a number of pipeline stages
  • pipeline 12 also includes a "consumer-id" field in the pipeline stage registers and its use is explained in more detail below. It should be understood, however, that the invention is not limited to particular pipeline architecture.
  • the stages in a pipeline may include: instruction fetch, decode, operand fetch, ALU execution, memory access and write back of the operation results.
  • the chain of stages in the pipeline can be subdivided more finely or combined. The number of stages in the pipeline is an architectural feature which can be changed according to the intended exploitation of instruction level parallelism.
  • Register file 16 inc udes at least one addressable destination write port 20 for storing data.
  • the register file can be any conventional database/indexing storage means that can store and allow access to records/data.
  • Register file control 8 contains the majority of logic, control, supervisory, translation functions required for controlling the operation of writing-back write data to register file 16.
  • Register file control 18 also includes programs for the operations functionally described in Fig. 3. As described in detail below, execution of these program implements the functionality necessary to reduce the number of registers file write operations in the pipeline. Instructions can be classified as one of three major types: arithmetic/logic, data transfer, and control. Arithmetic and logic instructions apply primitive functions of one or two arguments, for example, addition, multiplication, or logical AND.
  • the timing of each stage depends on the internal construction of the processor and the complexity of the instructions it executes.
  • the quantum time unit for measuring operations is known as a clock cycle.
  • the logic that directs operations within a processor is controlled by an external clock, which, for example, may be a circuit that generates a square wave with a fixed period. The number of clock cycles required to carry out an operation determines the amount of time it will take.
  • the consumer enters the pipeline architecture before the producer retires. This holds even stronger for specific types of processors, such as superscalar and VLIW processors. Accordingly, the consumer obtains the result value through a method other than the register file, for example, the bypass circuit. However, the result is nevertheless written-back to the register file.
  • the register file control 16 determines whether a particular instruction will be used by more than one consumer and, if so, the result is written-back to register file 16 in stage-n, for example, the Write-Back stage. In either case the result (which includes the write data, destination address and consumer-id) is temporarily stored in FIFO buffer 24.
  • explicit encoding in each instruction is used to indicate whether the result value of a particular instruction will be used by only one consumer (or by only other consumers in the pipeline).
  • a dedicated "dead value” bit in the instruction encoding (the so-called “opcode” of an instruction) is used, which is set or cleared by a compiler or assembler (not shown), depending on the degree of consumption. If the dead value bit is set, then the result value is not written-back (for example, via a write- enable signal) to register file 16 in stage-n of pipeline 12, but is stored in FIFO buffer 24.
  • the dead value bit as well as the instruction-id of the associated instruction, is provided to register file control 18, which in turn controls register file 16 via a write-enable signal to write-back the result value or not.
  • register file control 18 controls register file 16 via a write-enable signal to write-back the result value or not.
  • a dedicated bit is unavailable, then a few commonly used instructions can be selected (e.g., ADD and LOAD), which are assigned an alternative opcode to indicate the degree of consumption.
  • the invention By storing the result in FIFO buffer 24, the invention accounts for the effects of exceptions that are allowed to interrupt the instruction flow through the pipeline.
  • Exceptions can disrupt the instruction flow in such a way that the producer does not re-enter the pipeline after the exception is handled while the consumer does.
  • the consumer When the consumer re- enters the pipeline after the exception routine finishes, it will read its operands from register file 16 because now the producer is not downstream of the consumer in the pipeline anymore. But since the producer had not updated register file 16 before the exception occurred, the consumer reads stale data from register file 16, and as a consequence the program behaves incorrectly.
  • Trimedia ISA Certain pipeline architectures, such as the Trimedia ISA, do not allow interrupts to be taken at random points in the instruction flow. Instead, the ISA provides special instructions that cause the CPU to synchronize with pending exceptions. As long as the producer and the consumer are not separated by such instructions, no harm can be done and it is safe to discard dead values.
  • MIPS Microprocessor without Interlocking Pipeline Stages
  • most other conventional ISAs do not allow application programs to decide when the processor should synchronize with pending exceptions.
  • exceptions can disrupt the instruction flow in such a way that the producer does not re-enter the pipeline after the exception is handled while the consumer does.
  • FIFO buffer 24 stores the results of each instruction at the tail of the FIFO buffer 24. Again, the result includes the write data, destination address of register file 16 and consumer-id from the pipeline stage register that uniquely identifies a consumer.
  • FIFO control 26 continuously monitors the flow of retiring instructions and compares their identity to the "consumer-id" at the head of FIFO 24. Whenever it detects a match, it so-called “pops" the head from FIFO buffer 24 and discards its value. However, during an exception, FIFO control 26 writes the content of FIFO buffer 24 to register file 16, starting with the head and progressing to the tail, until FIFO 24 is empty. As will be understood by persons skilled in the art, the FIFO buffer write operation can be performed in a variety of ways, for example, via the register file control or a direct connection to the register file. Thereafter a consumer will read the correct data from register file 16 after the exception routine finishes.
  • pipeline 12 determines whether another instruction in the pipeline, will use the same destination address (e.g., write port 20) in register file 16, as a subsequent instruction in the pipeline. This can be determined in a number of ways. For example, if the result value of the instruction in stage-3 12 3 is designated for the destination address of write port 20 in register file 16 and the result value of the instruction in stage- 1 1 ] is also designated for the same destination address in register file 16, then the result value of the instruction in stage-3 is "dead" because its destination address will be reused by another instruction in the pipeline.
  • a reduction in power consumption is dependent on the actual properties of the implementation technology. For instance, the data in the FIFO buffer is only stored there for a few clock cycles before it gets discarded. With this in mind, it may be possible to store the data in the FIFO buffer as dynamic charges with no refresh facilities. This could reduce the area complexity of the FIFO buffer significantly, which would be particularly advantageous for processors that exploit a high degree of instruction-level parallelism and which consequently must be able to retire many instructions per clock cycle.
  • any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements which performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function.
  • the invention as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. Applicant thus regards any means which can provide those functionalities as equivalent as those shown herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)
  • Computer And Data Communications (AREA)
PCT/EP2001/000775 2000-02-16 2001-01-24 Apparatus and method for reducing register write traffic in processors with exception routines Ceased WO2001061469A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP01953031A EP1208424B1 (en) 2000-02-16 2001-01-24 Apparatus and method for reducing register write traffic in processors with exception routines
DE2001602777 DE60102777T2 (de) 2000-02-16 2001-01-24 Vorrichtung und verfahren zur verminderung von datenschreibverkehr in prozessoren mit ausnahmeroutinen
AT01953031T ATE264520T1 (de) 2000-02-16 2001-01-24 Vorrichtung und verfahren zur verminderung von datenschreibverkehr in prozessoren mit ausnahmeroutinen
JP2001560791A JP2004508607A (ja) 2000-02-16 2001-01-24 例外ルーチンを有するプロセッサのレジスタライトトラフィックを減じる装置及び方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/505,986 2000-02-16
US09/505,986 US6851044B1 (en) 2000-02-16 2000-02-16 System and method for eliminating write backs with buffer for exception processing

Publications (2)

Publication Number Publication Date
WO2001061469A2 true WO2001061469A2 (en) 2001-08-23
WO2001061469A3 WO2001061469A3 (en) 2002-02-21

Family

ID=24012699

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2001/000775 Ceased WO2001061469A2 (en) 2000-02-16 2001-01-24 Apparatus and method for reducing register write traffic in processors with exception routines

Country Status (6)

Country Link
US (1) US6851044B1 (enExample)
EP (1) EP1208424B1 (enExample)
JP (1) JP2004508607A (enExample)
AT (1) ATE264520T1 (enExample)
DE (1) DE60102777T2 (enExample)
WO (1) WO2001061469A2 (enExample)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004010288A1 (en) * 2002-07-19 2004-01-29 Xelerated Ab Method and apparatus for pipelined processing of data packets
WO2004010287A1 (en) * 2002-07-19 2004-01-29 Xelerated Ab A processor and a method in the processor, the processor comprising a programmable pipeline and at least one interface engine
WO2007057831A1 (en) * 2005-11-15 2007-05-24 Nxp B.V. Data processing method and apparatus
WO2014190699A1 (zh) * 2013-05-31 2014-12-04 华为技术有限公司 一种cpu指令处理方法和处理器
US8977774B2 (en) 2004-12-22 2015-03-10 Marvell International Ltd. Method for reducing buffer capacity in a pipeline processor

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7290153B2 (en) * 2004-11-08 2007-10-30 Via Technologies, Inc. System, method, and apparatus for reducing power consumption in a microprocessor

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4228497A (en) * 1977-11-17 1980-10-14 Burroughs Corporation Template micromemory structure for a pipelined microprogrammable data processing system
AU553416B2 (en) * 1984-02-24 1986-07-17 Fujitsu Limited Pipeline processing
US6370623B1 (en) 1988-12-28 2002-04-09 Philips Electronics North America Corporation Multiport register file to accommodate data of differing lengths
JPH0719222B2 (ja) * 1989-03-30 1995-03-06 日本電気株式会社 ストアバッフア
AU629007B2 (en) 1989-12-29 1992-09-24 Sun Microsystems, Inc. Apparatus for accelerating store operations in a risc computer
US5222240A (en) 1990-02-14 1993-06-22 Intel Corporation Method and apparatus for delaying writing back the results of instructions to a processor
GB2241801B (en) 1990-03-05 1994-03-16 Intel Corp Data bypass structure in a register file on a microprocessor chip to ensure data integrity
JPH04367936A (ja) * 1991-06-17 1992-12-21 Mitsubishi Electric Corp スーパースカラープロセッサ
US5471626A (en) * 1992-05-06 1995-11-28 International Business Machines Corporation Variable stage entry/exit instruction pipeline
US5898882A (en) * 1993-01-08 1999-04-27 International Business Machines Corporation Method and system for enhanced instruction dispatch in a superscalar processor system utilizing independently accessed intermediate storage
JPH08212083A (ja) 1995-02-07 1996-08-20 Oki Electric Ind Co Ltd 割り込み処理装置
JP3490191B2 (ja) 1995-06-30 2004-01-26 株式会社東芝 計算機
US20020161985A1 (en) * 1999-10-01 2002-10-31 Gearty Margaret Rose Microcomputer/floating point processor interface and method for synchronization of cpu and fpu pipelines

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004010288A1 (en) * 2002-07-19 2004-01-29 Xelerated Ab Method and apparatus for pipelined processing of data packets
WO2004010287A1 (en) * 2002-07-19 2004-01-29 Xelerated Ab A processor and a method in the processor, the processor comprising a programmable pipeline and at least one interface engine
US7644190B2 (en) 2002-07-19 2010-01-05 Xelerated Ab Method and apparatus for pipelined processing of data packets
US8977774B2 (en) 2004-12-22 2015-03-10 Marvell International Ltd. Method for reducing buffer capacity in a pipeline processor
WO2007057831A1 (en) * 2005-11-15 2007-05-24 Nxp B.V. Data processing method and apparatus
WO2014190699A1 (zh) * 2013-05-31 2014-12-04 华为技术有限公司 一种cpu指令处理方法和处理器
CN104216681A (zh) * 2013-05-31 2014-12-17 华为技术有限公司 一种cpu指令处理方法和处理器

Also Published As

Publication number Publication date
DE60102777T2 (de) 2009-10-08
US6851044B1 (en) 2005-02-01
EP1208424B1 (en) 2004-04-14
JP2004508607A (ja) 2004-03-18
EP1208424A2 (en) 2002-05-29
WO2001061469A3 (en) 2002-02-21
ATE264520T1 (de) 2004-04-15
DE60102777D1 (de) 2004-05-19

Similar Documents

Publication Publication Date Title
US6862677B1 (en) System and method for eliminating write back to register using dead field indicator
US5881280A (en) Method and system for selecting instructions for re-execution for in-line exception recovery in a speculative execution processor
JP3870973B2 (ja) スーパースケーラマイクロプロセサ
US7133969B2 (en) System and method for handling exceptional instructions in a trace cache based processor
US6973563B1 (en) Microprocessor including return prediction unit configured to determine whether a stored return address corresponds to more than one call instruction
US6260189B1 (en) Compiler-controlled dynamic instruction dispatch in pipelined processors
US5864689A (en) Microprocessor configured to selectively invoke a microcode DSP function or a program subroutine in response to a target address value of branch instruction
US6219778B1 (en) Apparatus for generating out-of-order results and out-of-order condition codes in a processor
US5469552A (en) Pipelined data processor having combined operand fetch and execution stage to reduce number of pipeline stages and penalty associated with branch instructions
IL155298A (en) Locking source registers in a data processing apparatus
US6983359B2 (en) Processor and method for pre-fetching out-of-order instructions
EP1208424B1 (en) Apparatus and method for reducing register write traffic in processors with exception routines
JP3146058B2 (ja) 並列処理型プロセッサシステムおよび並列処理型プロセッサシステムの制御方法
US5850563A (en) Processor and method for out-of-order completion of floating-point operations during load/store multiple operations
JPH06214785A (ja) マイクロプロセッサ
US20040128482A1 (en) Eliminating register reads and writes in a scheduled instruction cache
JPH07121371A (ja) 複数命令同時取込み機構
Changwatchai et al. Optimizing Instruction Execution in the PowerPC™ 603e Superscalar Microprocessor

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

WWE Wipo information: entry into national phase

Ref document number: 2001953031

Country of ref document: EP

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2001 560791

Kind code of ref document: A

Format of ref document f/p: F

121 Ep: the epo has been informed by wipo that ep was designated in this application
AK Designated states

Kind code of ref document: A3

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

WWP Wipo information: published in national office

Ref document number: 2001953031

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 2001953031

Country of ref document: EP