WO2002057893A2 - Procede et appareil de reduction de la consommation d'energie dans un processeur numerique - Google Patents

Procede et appareil de reduction de la consommation d'energie dans un processeur numerique Download PDF

Info

Publication number
WO2002057893A2
WO2002057893A2 PCT/US2001/051064 US0151064W WO02057893A2 WO 2002057893 A2 WO2002057893 A2 WO 2002057893A2 US 0151064 W US0151064 W US 0151064W WO 02057893 A2 WO02057893 A2 WO 02057893A2
Authority
WO
WIPO (PCT)
Prior art keywords
pipeline
instruction
processor
data
logic
Prior art date
Application number
PCT/US2001/051064
Other languages
English (en)
Other versions
WO2002057893A3 (fr
Inventor
Daniel Hansson
Original Assignee
Arc International (Uk) Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arc International (Uk) Limited filed Critical Arc International (Uk) Limited
Priority to AU2002246904A priority Critical patent/AU2002246904A1/en
Publication of WO2002057893A2 publication Critical patent/WO2002057893A2/fr
Publication of WO2002057893A3 publication Critical patent/WO2002057893A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30083Power or thermal control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3228Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3237Power saving characterised by the action undertaken by disabling clock generation or distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30156Special purpose encoding of instructions, e.g. Gray coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the field of integrated circuit design, specifically to (i) power reduction techniques; and (ii) the use of a hardware description language (HDL) for implementing related instructions and control; in a pipelined central processing unit (CPU) or user-customizable microprocessor.
  • HDL hardware description language
  • RISC reduced instruction set computer
  • RISC processors are well known in the computing arts.
  • RISC processors generally have the fundamental characteristic of utilizing a substantially reduced instruction set as compared to non-RISC (commonly known as "CISC") processors.
  • CISC non-RISC
  • RISC processor machine instructions are not all micro- coded, but rather may be executed immediately without decoding, thereby affording significant economies in terms of processing speed.
  • This "streamlined" instruction handling capability furthermore allows greater simplicity in the design of the processor (as compared to non-RISC devices), thereby allowing smaller silicon and reduced cost of fabrication.
  • RISC processors are also typically characterized by (i) load/store memory architecture (i.e., only the load and store instructions have access to memory; other instructions operate via internal registers within the processor); (ii) unity of processor and compiler; and (iii) pipelining.
  • a significant concern in RISC processors (and for that matter, most every integrated circuit) is power consumption and dissipation.
  • dynamic power The power that is consumed only when a signal toggles (i.e. changes from 0 to 1 or from 1 to 0) is defined as dynamic power consumption.
  • Toggles are also commonly referred to as switching activity.
  • the much smaller amount of power that is consumed in a cell (e.g. a gate or flipflop) when there is no switching activity is called static power consumption or cell leakage power. In a modern CMOS technology, static power consumption represents less than 1% of the total power consumption and can thus be ignored in most applications.
  • Dynamic power in turn consists of two components: net switching power and cell internal power.
  • Net switching power is the power consumed on a net when the signal it is carrying is toggling. Net switching power is proportionally dependent on the switching activity, the net load and the squared voltage.
  • the net load is the capacitive load of the net itself plus the capacitive loads of all input pins of the cells connected to the net. Thus the net load is dependent on its length (its load) and its fanout (the load of connected cells).
  • Net switching power can also be defined as only the net load if the capacitive load of the input pins is added to the cell internal power. The total power consumption will be the same since both definitions include the same loads in aggregate.
  • N power supply voltage to the gate
  • f switching frequency
  • Cell internal power is the power consumed when one or more cell input signals toggle. During the transition time when an input or an output signal changes state, both the pulldown and pull-up transistor will be open and a large current will flow through the cell. This is also often called short circuit power. The transition time depends on the chosen technology, but the number of times the transition occurs depends on the switching activity. Cell internal power is proportionally dependent on the switching activity and the squared voltage. Voltage is generally the most important parameter for determining the total power consumption as it is the only squared term in the power equation. Therefore, the choice of technology (where the voltage is defined) is the most important factor that determines total power consumption.
  • HDL specifications typically do not permit designers to set the operating voltage level within the target design. Instead, HDL permits designers to address the second and third most important parameter, switching activity and net load. The product of these two parameters affects the power.
  • the principle of most power reduction strategies at the HDL level is to add logic that reduces the switching activity and thereby the power consumption.
  • Gray codes also called cyclical or progressive codes
  • Gray codes have historically been useful in mechanical encoders since a slight change in location only affects one bit.
  • these same codes offer other benefits well understood to one skilled in the art including being hazard-free for logic races and other conditions that could give rise to faulty operation of the circuit.
  • the use of such Gray codes also have important advantages in power saving designs. Because only one bit changes per state change, there is a minimal number of circuit elements involved in switching per input change. This in turn reduces the amount of dynamic power by limiting the number of switched nodes toggled per clock change. Using a typical binary code, up to n bits could change, with up to n subnets changing per clock or input change.
  • the present invention satisfies the aforementioned needs by providing an improved method and apparatus for reducing power consumption with a digital processor using sleep modes.
  • an improved method for reducing power consumption within a digital processor comprises first defining an instruction which invokes a "sleep mode" within the processor and its pipeline; inserting the instruction into the pipeline during operation of the processor; decoding and executing the instruction; stalling the pipeline in response to the sleep mode instruction; disabling processor memory in response to the sleep mode instruction; and awaking the core from sleep mode based on the occurrence of a predetermined event.
  • the programmer can selectively shut down portions of the processor under certain circumstances, thereby significantly reducing power consumption during such periods, and reducing the power consumption of the processor as a whole.
  • the aforementioned sleep mode methodology is combined with a pipeline low power enable configuration which stalls unnecessary data in the pipeline, thereby conserving power within the processor.
  • the method comprises providing a logic circuit adapted for detection of a predetermined condition of the data within the pipeline; mserting data into the pipeline; detecting, using the aforementioned logic circuit, that the predetermined condition exists with respect to certain of the data; invoking a sleep mode within the pipeline in response to the detected condition; and restarting the pipeline when the condition no longer exists.
  • Gray coding is used in the design of the pipeline logic and in conjunction with the aforementioned sleep mode technique to further reduce power consumption.
  • Such Gray coding comprises forming a binary sequence of data in which only one bit changes at any given time.
  • an improved instruction format for invoking the aforementioned "sleep mode" comprises (i) a base instruction element or kernel, (ii) one or more operand bits or fields, and (iii) one or more flag bits or fields.
  • the instruction is coded within the base instruction set of the processor.
  • an improved method of synthesizing the design of an integrated circuit incorporating the aforementioned sleep mode functionaUty comprises obtaining user input regarding the design configuration; creating a customized HDL functional block description based on the user input and existing libraries of functions; deterrnining a design hierarchy based on the user input and existing libraries; riirining a makefile to create the structural HDL and script; rurining the script to create a makefile for the simulator and a synthesis script; and synthesizing and/or simulating the design from the simulation makefile or synthesis script, respectively.
  • an improved computer program useful for synthesizing processor designs and embodying the aforementioned sleep mode functionality comprises an object code representation stored on the magnetic storage device of a microcomputer, and adapted to run on the central processing unit thereof.
  • the computer program further comprises an interactive, menu-driven graphical user interface (GUI), thereby facilitating ease of use.
  • GUI graphical user interface
  • an improved apparatus for running the aforementioned computer program used for synthesizing gate logic associated with the aforementioned sleep mode functionality comprises a stand-alone microcomputer system having a display, central processing unit, data storage device(s), and input device.
  • the processor comprises a reduced instruction set computer (RISC) having a four stage pipeline comprising instruction fetch, decode, execute, and writeback stages and an instruction set comprising at least one SLEEP instruction, which is used in a delay slot of the pipeline of the processor.
  • RISC reduced instruction set computer
  • Fig. la is a graphical representation of a first embodiment ("base case") of the SLEEP instruction format according to the present invention.
  • Fig. lb is a graphical representation of a second embodiment of the SLEEP instruction format according to the present invention, having associated operand and flag fields.
  • Fig. lc is a graphical representation of the debug register of the processor core, including ZZ and ED fields.
  • Fig. 2 is logical flow diagram illustrating a first embodiment of the method of reducing power consumption within a digital processor according to the present invention.
  • Figs. 3a and 3b are schematic diagrams illustrating exemplary embodiments of the logic used to implement the sleep mode functionality according to the present invention.
  • Fig. 4a is a functional block diagram illustrating the relationship of the core clock module to other components within the processor core.
  • Figs. 4b and 4c are schematic diagrams illustrating exemplary clock module gate logic for the instances where clock gating is selected and not selected during core build, respectively.
  • Figs 4d-4f are schematic diagrams illustrating exemplary embodiments of the logic used to implement the clock gating functionality according to the present invention.
  • Fig. 5 is logical flow diagram illustrating a second embodiment of the method of reducing power consumption within a digital processor by stalling the pipeline in response to the detection of invalid data.
  • Fig. 6 is a logical flow diagram illustrating the generalized methodology of synthesizing processor logic which incorporates the sleep mode functionality of the present invention.
  • Fig. 7 is a block diagram of a pipelined processor design incorporating the sleep mode functionality of the present invention.
  • Fig. 8 is a functional block diagram of one exemplary embodiment of a computer system useful for synthesizing logic gate logic implementing the aforementioned sleep mode functionality within a processor device.
  • processor is meant to include any integrated circuit or other electronic device capable of performing an operation on at least one instruction word including, without limitation, reduced instruction set core (RISC) processors such as the ARC user-configurable core manufactured by the Assignee hereof, central processing units (CPUs), and digital signal processors (DSPs).
  • RISC reduced instruction set core
  • CPUs central processing units
  • DSPs digital signal processors
  • the hardware of such devices may be integrated onto a single piece of silicon (“die”), or distributed among two or more die.
  • various functional aspects of the processor may be implemented solely as software or firmware associated with the processor.
  • stage refers to various successive stages within a pipelined processor; i.e., stage 1 refers to the first pipelined stage, stage 2 to the second pipelined stage, and so forth.
  • stage refers to the number of times a signal changes from 0 to 1 or from 1 to 0. If a signal changes from 0 to 1 it has toggled once. If it changes back to 0 again it has toggled twice.
  • a clock signal generally toggles twice per clock period, and all other signals toggle at a maximum of once per clock period (except if the signals are generated on both clock edges, etc.).
  • NHSIC hardware description language (NHDL)
  • other hardware description languages such as Nerilog® may be used to describe various embodiments of the invention with equal success.
  • an exemplary Synopsys® synthesis engine such as the Design Compiler 1999.05 (DC99) is used to synthesize the various embodiments set forth herein
  • other synthesis engines such as Buildgates® available from, inter alia, Cadence Design Systems, Inc.
  • IEEE std. 1076.3-1997, IEEE Standard VHDL Synthesis Packages describe an industry-accepted language for specifying a Hardware Definition Language-based design and the synthesis capabilities that may be expected to be available to one of ordinary skill in the art. Appendix I hereto provides relevant portions of the HDL code relating to the various aspects of the invention. Sleep Mode
  • the present invention comprises a "sleep mode" wherein the core pipeline (and optionally memory devices associated with the core) is shut down to conserve power.
  • the sleep mode is initiated using a SLEEP instruction which comprises an assembly language instruction of the type well known in the art which is placed within an instruction slot in the processor pipeline.
  • the SLEEP instruction when executed by the processor, allows the processor core to go into a sleep mode which, ter alia, stalls the processor pipeline until an interrupt or designated restart event occurs, thereby reducing power consumption.
  • interrupt refers to a state wherein the processor causes programmatic control to be transferred to an interrupt service routine
  • restart refers to that condition when the processor is re-enabled after having been halted.
  • SLEEP instruction of the invention is configured only to be detected in pipeline stage 2, and has no associated options or operands. Such embodiment represents the "baseline" functionality. It will be appreciated, however, that other configurations which utilize operands and/or flags may be employed with equal success, depending on the required attributes for the particular core design.
  • Fig. lb illustrates an exemplary embodiment of such an alternative instruction encoding (format) for the SLEEP instruction. As illustrated in Fig. lb, the format 100 comprises (i) a base instruction element or kernel 102; (ii) one or more operand fields 104; and (iii) one or more flag fields 106. Other configurations are also possible consistent with the invention.
  • the SLEEP instruction of the present invention may advantageously be put anywhere in the code, for example as shown below:
  • the SLEEP instruction comprises a single operand instruction without flags or other operands. This instruction is part of the base case instruction set of the core.
  • one or more additional control bits are introduced in the debug register 190 of the core of the present embodiment to control lower power modes. The following outlines the general functionality of the sleep mode control bits:
  • ZZ (Sleep Mode):- Indicates when the core is in sleep mode 0 - core is not in sleep mode (default)
  • the Sleep Mode flag is set when the core enters sleep mode as previously described.
  • the ZZ flag is set when a SLEEP instruction arrives in pipeline stage 2, and cleared when the core is restarted or receives an interrupt request of the type previously described.
  • the timer register aux_timer of Assignee's ARC core is incremented by one on every clock cycle. If the least significant bit in the aux_tcontrol register is set, the timer generates an interrupt when the register aux_timer "wraps.” This wrapping occurs one cycle after the aux_timer has reached the maximum value of OxOOFFFFFF. Hence, when the timer wraps, the interrupt signal is generated, and core wakes up from sleep mode as previously described.
  • the following exemplary code illustrates this concept:
  • JAL _start ivech_handler ⁇ User defined code> sr 0x0, [aux_tcontrol] ; Disable interrupt generation
  • the "sr 0x1" instruction (aux_tcontrol) enables interrupt generation, while "flag 2" enables level 1 interrupts.
  • the "sr OxOOFFOOOO” instruction sets the start value of the timer to a starting value of OxOOFFOOOO.
  • the core encounters the SLEEP instruction, it goes into sleep mode until the timer has counted to OxOOFFFFFF (from the starting value of OxOOFFOOOO).
  • the timer wraps (i.e. is set to the value 0x00000000) and generates an interrupt signal on (IRQ3) whereby the core wakes up.
  • the interrupt enable flag for level 1 has been set to allow the interrupt signal (IRQ3) to be recognized.
  • the first step 202 of the method 200 comprises defining a sleep mode for the processor via an instruction word format (such as the foregoing SLEEP word).
  • the SLEEP instruction is also coded to invoke a pipeline stall and optional disabling of the RAM via the HDL code that defines the pipeline operation.
  • the SLEEP instruction is inserted into the pipeline at stage 1.
  • the pipeline is advanced, with the SLEEP instruction being advanced to stage 2 (decode) of the pipeline.
  • the SLEEP instruction at stage 2 sets the ZZ flag when stage 2 is allowed to move into stage 3.
  • the processor enters the sleep mode. No more instruction fetches are allowed and pipeline stage 1 is prevented to move into stage 2 (step 210). Stages 2 and above flow free, however, which means that pipeline stages 2 and above will be flushed in the beginning of the sleep mode (step 212). This means that the SLEEP instruction itself will also be flushed, since the SLEEP instruction in stage 2 is advanced to stage 3 as described above. Also, upon execution, the RAM associated with the processor is optionally disabled per step 213, depending on the HDL coding of the instruction.
  • the sleep mode duration may then be optionally controlled using a timer or similar function, such as the aux_timer function as previously described herein (step 216).
  • a timer or similar function such as the aux_timer function as previously described herein (step 216).
  • an interrupt is generated (step 220), and the core wakes from the sleep mode per step 222.
  • the aforementioned interrupt signal may be generated by another function within the core, or may be generated by an external module, such as a disk drive.
  • the SLEEP instruction of the present invention may also advantageously be put in a delay slot present in the pipeline, as in the following code example: bal.d after_sleep sleep
  • the term "delay slot” refers to the slot within a pipeline subsequent to a branching or jump instruction being decoded.
  • Branching used consistent with the present invention may be conditional (i.e., based on the truth or value of one or more parameters, such as the value of a flag bit) or unconditional. It may also be absolute (e.g., based on an absolute memory address), or relative (e.g., based on a relative addressing scheme and independent of any particular memory address).
  • the processor core enters the sleep mode after the branch instruction has been executed.
  • the program counter PC points to the "add” instruction after the label "after_sleep".
  • the core wakes up, executes the interrupt service routine, and continues with the add instruction to which the PC is pointing.
  • the SLEEP instruction of the present invention can be put in the delay slot of a jump instruction to solve the problem with a real-time operating system (RTOS) that sets the interrupt flags in the main memory, the latter required to be cleared before entering the sleep mode.
  • RTOS real-time operating system
  • the current flag settings are first stored in core register rl.
  • the PC address to which the program jumps after it has been woken up from SLEEP mode is also stored in rl . Consequently the core register rl will contain both the current flag settings and the exit address towards which the program goes to after the sleep mode.
  • the interrupt enable flags are disabled so that no new interrupt requests can be detected by the processor. All interrupt flags in the memory are serviced until there are no more interrupt flags set.
  • the SLEEP instruction of the present invention acts as a no-operation (NOP) instruction during single-step mode since every single-step is treated as a restart and the core wakes up at the next single-step.
  • NOP no-operation
  • single-step mode refers generally to modes wherein the processor steps sequentially through a limited number of cycles, a specific example of which being where one processor cycle is initiated per switch closure on the single step pin of the processor. This mode is useful for software debugging and evaluation of pipeline contents during execution.
  • Figs. 3a and 3b illustrate first and second exemplary embodiments, respectively, of synthesized gate logic 300, 320 used to implement the foregoing sleep mode power reduction functionality within the core.
  • clock gating is a hardware option that is selected when the core build is created by the hardware engineer (described in greater detail below). Hence, the software programmer has no control over clock gating.
  • enable debug is a clock gating option for the action points of the core. If this option is selected, then the action point clock is gated when the action points are not used.
  • ED debug
  • the enable debug (ED) flag is used to enable the debug clock and thereby turn on the debug extensions.
  • debug extensions refers to optional instructions and other hardware capabilities that are included in the processor to facilitate the debugging process, such as for example extension instructions included as part of the extension instruction set designed to facilitate debug or related processes.
  • ED flag setting is typically accomplished via the host by the debugger just before it needs to access the debug extensions. When the ED flag is clear the debug clock is gated, and the debug extensions are thereby completely switched off. Conversely, when the flag is set, the debug clock is not gated, and the debug extensions are enabled.
  • the ED flag does not affect the sleep mode in any way; rather, it only controls the clock gating of the debug extensions.
  • the ED flag only works if clock gating was selected by the programmer. If clock gating was not selected during the core build, the ED flag is removed during the synthesis process, the latter being described below.
  • Fig. 4a illustrates the relationship of the core clock module to the rest of the design.
  • the clock module 450 is a part of all core builds, even if clock gating was not selected in the build; however, the content of the clock module varies accordingly. If clock gating was selected, the clock module 450 contains the clock gating (see Fig. 4b). If this option was not selected during core build, the clock module 450 is empty, with all clock outputs directly connected to the input clock (see Fig. 4c).
  • a constant called ck_gating (defined in extutil.vhdl) controls the clock module configuration.
  • Figs 4d-4f illustrate exemplary embodiments of logic 440, 460, 480 used to implement the foregoing clock gating functionality within the processor core. It will be recognized, however, that other logic configurations may be substituted to perform the foregoing functions with equal success, such other configurations being readily determined by those of ordinary skill in the processor design and logic synthesis arts. Gray Coding
  • Gray coding comprises forming a binary sequence in which only one bit changes at any given time. By restricting the core design during build such that only one bit changes at the time, power consumption is reduced. Specifically, Gray coding reduces power consumption by reducing the number of nodes that toggle per clock cycle. Since the core's pipeline employs a clock that operates at the highest frequency of the processor, reductions in the number of nodes toggled per clock cycle can be significant. Pipeline control logic is often implemented by state machine logic.
  • Gray code can generally be implemented in two ways within the processor core of the present invention: (i) within the HDL; or (ii) within the synthesis script. Full control over the Gray coding is often best achieved in the HDL.
  • the significant benefit to Gray coding, in contrast to many other power reduction techniques, is that it does not add any extra control logic to the design. Consequently there are very few if any downsides to implementing Gray coding.
  • Gray coding may be implemented in conjunction with the sleep mode functionality described above to further reduce core power consumption with effectively no detriments to other aspects of core operation.
  • Gray code for 3 bits is (000, 010, 011, 001, 101, 111, 110, 100).
  • An n- bit Gray code corresponds to a Hamiltonian cycle on an n-dimensional hypercube. While the term Gray code is used herein as if there is only one Gray code, it will be recognized that Gray codes are not unique.
  • One way to construct a Gray code for n bits is to use a Gray code for n-1 bits with each code prefixed by 0 (for the first half of the code) and append the n-1 Gray code reversed with each code prefixed by 1 for the second half.
  • the following example illustrates the creation of a 3 -bit Gray code from a 2-bit Gray code (algorithm derived from "Combinatorial Algorithms," Reingold, Nievergelt, Deo):
  • the method 500 of reducing power consumption comprises first providing a logic circuit adapted for detection of a predetermined condition of the data within the pipeline (step 502); inserting data into the pipeline (step 504); detecting, using the logic circuit, that the predetermined condition exists with respect to certain of the data (step 506); invoking a sleep mode within the pipeline in response to the detected condition (508); and restarting the pipeline when the condition no longer exists (step 510).
  • a logic circuit adapted for detection of a predetermined condition of the data within the pipeline
  • step 504 inserting data into the pipeline
  • invoking a sleep mode within the pipeline in response to the detected condition (508) invoking a sleep mode within the pipeline in response to the detected condition (508); and restarting the pipeline when the condition no longer exists (step 510).
  • Such conditions include anticipatory execution of an instruction which is then subsequently stopped by a conditional evaluation.
  • the pipeline logic may be modified to prevent unnecessary switching activity in two ways: (i) by generating a "low power" version of the pipeline enable signal enl (e.g., enl_lowpower); and (ii) by generating the enable signal en2 (which controls the data path to the ALU of the core) differently.
  • the modification comprises activating the two enable signals (individually) if the pipeline stage contains valid data. Accordingly, data determined to be no longer valid, or of no further use, is not propagated down the pipeline, thereby conserving power.
  • ALU arithmetic logic unit
  • the foregoing process may add a delay to the critical path and thereby reduce the maximum clock frequency. If this is not acceptable, it is a simple matter to use the non-low power version. If a timing problem exists with one of the extensions, the normal data path (slval and s2val) is selected. It is acceptable to change only the extension that is on the critical path, while letting all the other extensions use the low power version of the data path. Hence, the only reason not to use the low power version is if the extension in question will be on the critical path, and add too much delay, thereby adversely impacting the target clock frequency of the resulting design.
  • the small multi-cycle extensions of the ARC core can be further reduced in power consumption by using Gray code of the type previously described herein.
  • Gray code of the type previously described herein.
  • the two methods of introducing Gray code previously discussed i.e., in synthesis script or in HDL code
  • only the HDL solution gives a robust result, even though it provides only a few percent overall power reduction. Further reduction in the overall power consumption can be achieved by modifying the extension ALU of the core.
  • the exemplary ARC core described herein is configurable is highly advantageous from a power point of view. By only choosing those modules that will actually be used by the design, much unnecessary power consumption can be removed. This is a major advantage of configurable cores (such as the ARC) over non-configurable cores. Another important feature of such cores is the ability to design extensions to minimize cycle counts for common or recurring functions, thereby reducing the power consumption. Hence, by (i) choosing only modules used by the design; (ii) designing extensions adapted to minimize cycle counts; and (iii) utilization of one or more of the foregoing power reduction functions (e.g., sleep mode, clock gating, pipeline logic modification), the overall power consumption of the core can be significantly reduced.
  • the foregoing power reduction functions e.g., sleep mode, clock gating, pipeline logic modification
  • MAC multiply and accumulate
  • the technology library files in the present invention store all of the information related to cells necessary for the synthesis process, including for example logical function, input/output timing, and any associated constraints.
  • each user can define his/her own library name and location(s), thereby adding further flexibility.
  • step 603 the user creates customized HDL functional blocks based on the user's input and the existing library of functions specified in step 602.
  • step 604 the design hierarchy is determined based on user input and the aforementioned library files.
  • a hierarchy file, new library file, and makefile are subsequently generated based on the design hierarchy.
  • makefile refers to the commonly used UNLX makefile function or similar function of a computer system well known to those of skill in the computer prograinming arts.
  • the makefile function causes other programs or algorithms resident in the computer system to be executed in the specified order.
  • it further specifies the names or locations of data files and other information necessary to the successful operation of the specified programs. It is noted, however, that the invention disclosed herein may utilize file structures other than the "makefile” type to produce the desired functionality.
  • the user is interactively asked via display prompts to input information relating to the desired design such as the type of "build" (e.g., overall device or system configuration), width of the external memory system data bus, different types of extensions, cache type/size, use of clock gating, Gray coding restrictions, etc. Many other configurations and sources of input information may be used, however, consistent with the invention.
  • the user runs the makefile generated in step 604 to create the structural
  • This structural HDL ties the discrete functional block in the design together so as to make a complete design.
  • step 608 the script generated in step 606 is run to create a makefile for the simulator.
  • the user also runs the script to generate a synthesis script in step 508.
  • a decision is made whether to synthesize or simulate the design (step 610). If simulation is chosen, the user runs the simulation using the generated design and simulation makefile (and user program) in step 612. Alternatively, if synthesis is chosen, the user runs the synthesis using the synthesis script(s) and generated design in step 614. After completion of the synthesis/simulation scripts, the adequacy of the design is evaluated in step 616.
  • a synthesis engine may create a specific physical layout of the design that meets the performance criteria of the overall design process yet does not meet the die size requirements. In this case, the designer will make changes to the control files, libraries, or other elements that can affect the die size. The resulting set of design information is then used to re-run the synthesis script. If the generated design is acceptable, the design process is completed. If the design is not acceptable, the process steps beginning with step 602 are re-performed until an acceptable design is achieved. In this fashion, the method 600 is iterative.
  • Fig. 7 illustrates an exemplary pipelined processor fabricated using a 1.0 um process.
  • the processor 700 is an ARC microprocessor-like CPU device having, inter alia, a processor core 702, on-chip memory 704, and an external interface 706.
  • the device is fabricated using the customized NHDL design obtained using the method 600 of the present invention, which is subsequently synthesized into a logic level representation, and then reduced to a physical device using compilation, layout and fabrication techniques well known in the semiconductor arts.
  • the processor of Figure 6 may contain any commonly available peripheral such as serial communications devices, parallel ports, timers, counters, high current drivers, analog to digital (A D) converters, digital to analog converters (D/A), interrupt processors, LCD drivers, memories and other similar devices. Further, the processor may also include custom or application specific circuitry.
  • the present invention is not limited to the type, number or complexity of peripherals and other circuitry that may be combined using the method and apparatus. Rather, any limitations are imposed by the physical capacity of the extant semiconductor processes which improve over time. Therefore it is anticipated that the complexity and degree of integration possible employing the present invention will further increase as semiconductor processes improve.
  • the computing device 800 comprises a motherboard 801 having a central processing unit (CPU) 802, random access memory (RAM) 804, and memory controller 805.
  • a storage device 806 (such as a hard disk drive or CD-ROM), input device 807 (such as a keyboard or mouse), and display device 808 (such as a CRT, plasma, or TFT display), as well as buses necessary to support the operation of the host and peripheral components, are also provided.
  • the aforementioned NHDL descriptions and synthesis engine are stored in the form of an object code representation of a computer program in the RAM 804 and/or storage device 806 for use by the CPU 802 during design synthesis, the latter being well known in the computing arts.
  • the user (not shown) synthesizes logic designs by inputting design configuration specifications into the synthesis program via the program displays and the input device 807 during system operation. Synthesized designs generated by the program are stored in the storage device 806 for later retrieval, displayed on the graphic display device 808, or output to an external device such as a printer, data storage unit, other peripheral component via a serial or parallel port 812 if desired.
  • Sleep Mode signals — out AP_p3disable_r L To flags.vhdl. This signals to the ARC that the pipeline has been flushed due to a breakpoint or sleep instruction. If it was due to a breakpoint instruction the ARC is halted via the 'en' bit, and the AH bit is set to ' 1' in the debug register.
  • the sleep instruction is determined at stage 2 from:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Microcomputers (AREA)
  • Power Sources (AREA)
  • Advance Control (AREA)

Abstract

L'invention concerne un procédé et un appareil destinés à réduire la consommation d'énergie dans un processeur en pipeline. Dans un mode de réalisation, le procédé de l'invention consiste à définir une instruction qui appelle un « mode sommeil » dans le processeur et le pipeline, à introduire l'instruction dans le pipeline, à décoder et à exécuter l'instruction, à bloquer le pipeline en réponse à l'instruction de mode sommeil, à désactiver la mémoire en réponse à l'instruction de mode sommeil, et à réveiller le noyau du mode sommeil sur la base de l'occurrence d'un événement prédéterminé. L'invention concerne également des procédés destinés à structurer la logique noyau de pipeline et des instructions d'extension pour réduire la consommation d'énergie du noyau dans diverses conditions. L'invention concerne enfin des procédés et un appareil destinés à synthétiser une logique mettant en oeuvre la méthodologie précitée.
PCT/US2001/051064 2000-10-27 2001-10-25 Procede et appareil de reduction de la consommation d'energie dans un processeur numerique WO2002057893A2 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002246904A AU2002246904A1 (en) 2000-10-27 2001-10-25 Method and apparatus for reducing power consuption in a digital processor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US24407100P 2000-10-27 2000-10-27
US60/244,071 2000-10-27

Publications (2)

Publication Number Publication Date
WO2002057893A2 true WO2002057893A2 (fr) 2002-07-25
WO2002057893A3 WO2002057893A3 (fr) 2003-05-30

Family

ID=22921254

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/051064 WO2002057893A2 (fr) 2000-10-27 2001-10-25 Procede et appareil de reduction de la consommation d'energie dans un processeur numerique

Country Status (2)

Country Link
AU (1) AU2002246904A1 (fr)
WO (1) WO2002057893A2 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004051450A2 (fr) * 2002-12-04 2004-06-17 Koninklijke Philips Electronics N.V. Commande logicielle de la dissipation de puissance de microprocesseur
WO2006094196A2 (fr) * 2005-03-03 2006-09-08 Qualcomm Incorporated Procede et appareil destines a la reduction de la consommation electrique au moyen d'un processeur a multiples pipelines heterogenes
WO2007101216A2 (fr) * 2006-02-27 2007-09-07 Qualcomm Incorporated Processeur à virgule flottante à besoins en énergie réduits pour la précision inférieure au choix
WO2008009366A1 (fr) * 2006-07-21 2008-01-24 Sony Service Centre (Europe) N.V. Système ayant une pluralité de blocs matériels et procédé pour son exploitation
US8918446B2 (en) 2010-12-14 2014-12-23 Intel Corporation Reducing power consumption in multi-precision floating point multipliers
CN107977227A (zh) * 2016-10-21 2018-05-01 超威半导体公司 包括不同指令类型的独立硬件数据路径的管线

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5485625A (en) * 1992-06-29 1996-01-16 Ford Motor Company Method and apparatus for monitoring external events during a microprocessor's sleep mode
WO1996009583A2 (fr) * 1994-09-23 1996-03-28 Cambridge Consultants Limited Circuits et interfaces de traitement de donnees
US5584031A (en) * 1993-11-09 1996-12-10 Motorola Inc. System and method for executing a low power delay instruction
US5774709A (en) * 1995-12-06 1998-06-30 Lsi Logic Corporation Enhanced branch delay slot handling with single exception program counter
JP2001282548A (ja) * 2000-03-29 2001-10-12 Matsushita Electric Ind Co Ltd 通信装置及びその方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06332693A (ja) * 1993-05-27 1994-12-02 Hitachi Ltd タイムアウト機能付き休止命令の発行方式

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5485625A (en) * 1992-06-29 1996-01-16 Ford Motor Company Method and apparatus for monitoring external events during a microprocessor's sleep mode
US5584031A (en) * 1993-11-09 1996-12-10 Motorola Inc. System and method for executing a low power delay instruction
WO1996009583A2 (fr) * 1994-09-23 1996-03-28 Cambridge Consultants Limited Circuits et interfaces de traitement de donnees
US5774709A (en) * 1995-12-06 1998-06-30 Lsi Logic Corporation Enhanced branch delay slot handling with single exception program counter
JP2001282548A (ja) * 2000-03-29 2001-10-12 Matsushita Electric Ind Co Ltd 通信装置及びその方法

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004051450A2 (fr) * 2002-12-04 2004-06-17 Koninklijke Philips Electronics N.V. Commande logicielle de la dissipation de puissance de microprocesseur
WO2004051450A3 (fr) * 2002-12-04 2005-02-03 Koninkl Philips Electronics Nv Commande logicielle de la dissipation de puissance de microprocesseur
WO2006094196A2 (fr) * 2005-03-03 2006-09-08 Qualcomm Incorporated Procede et appareil destines a la reduction de la consommation electrique au moyen d'un processeur a multiples pipelines heterogenes
WO2006094196A3 (fr) * 2005-03-03 2007-02-01 Qualcomm Inc Procede et appareil destines a la reduction de la consommation electrique au moyen d'un processeur a multiples pipelines heterogenes
WO2007101216A2 (fr) * 2006-02-27 2007-09-07 Qualcomm Incorporated Processeur à virgule flottante à besoins en énergie réduits pour la précision inférieure au choix
WO2007101216A3 (fr) * 2006-02-27 2008-01-03 Qualcomm Inc Processeur à virgule flottante à besoins en énergie réduits pour la précision inférieure au choix
US8595279B2 (en) 2006-02-27 2013-11-26 Qualcomm Incorporated Floating-point processor with reduced power requirements for selectable subprecision
WO2008009366A1 (fr) * 2006-07-21 2008-01-24 Sony Service Centre (Europe) N.V. Système ayant une pluralité de blocs matériels et procédé pour son exploitation
US8161276B2 (en) 2006-07-21 2012-04-17 Sony Service Centre (Europe) N.V. Demodulator device and method of operating the same
US8918446B2 (en) 2010-12-14 2014-12-23 Intel Corporation Reducing power consumption in multi-precision floating point multipliers
CN107977227A (zh) * 2016-10-21 2018-05-01 超威半导体公司 包括不同指令类型的独立硬件数据路径的管线

Also Published As

Publication number Publication date
AU2002246904A1 (en) 2002-07-30
WO2002057893A3 (fr) 2003-05-30

Similar Documents

Publication Publication Date Title
US20030070013A1 (en) Method and apparatus for reducing power consumption in a digital processor
US6477697B1 (en) Adding complex instruction extensions defined in a standardized language to a microprocessor design to produce a configurable definition of a target instruction set, and hdl description of circuitry necessary to implement the instruction set, and development and verification tools for the instruction set
US7010558B2 (en) Data processor with enhanced instruction execution and method
US20020010848A1 (en) Data processing system
US7171631B2 (en) Method and apparatus for jump control in a pipelined processor
Mantovani et al. HL5: a 32-bit RISC-V processor designed with high-level synthesis
US20090096481A1 (en) Scheduler design to optimize system performance using configurable acceleration engines
US11086631B2 (en) Illegal instruction exception handling
US20020032558A1 (en) Method and apparatus for enhancing the performance of a pipelined data processor
Christensen et al. The design of an asynchronous TinyRISC/sup TM/TR4101 microprocessor core
EP1190337A2 (fr) Procede et appareil de segmentation et de reassemblage d'un processeur pipeline
WO2002057893A2 (fr) Procede et appareil de reduction de la consommation d'energie dans un processeur numerique
US6993674B2 (en) System LSI architecture and method for controlling the clock of a data processing system through the use of instructions
JP4800582B2 (ja) 演算処理装置
EP1190305B1 (fr) Procede et appareil de controle d'emplacement de temporisation de branchement dans un processeur pipeline
WO2000070446A2 (fr) Procede et appareil d'encodage de registre libre dans un processeur pipeline
US6044460A (en) System and method for PC-relative address generation in a microprocessor with a pipeline architecture
JP2001092661A (ja) データ処理装置
EP1190303B1 (fr) Procede et dispositif de commande de saut dans un processeur pipeline
Namjoo et al. Implementing sparc: A high-performance 32-bit risc microprocessor
Glokler et al. Power reduction for ASIPs: A case study
Wu et al. Instruction buffering for nested loops in low-power design
US20060168431A1 (en) Method and apparatus for jump delay slot control in a pipelined processor
Meyer et al. HDL FSM Code Generation Using a MIPS-based Assembler
Sathish et al. Impact of Clock-Gating on ALU Optimized RISC-V Microarchitectures for Low Power Applications

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP