US20070234089A1 - Programmable pipeline fabric having mechanism to terminate signal propagation - Google Patents

Programmable pipeline fabric having mechanism to terminate signal propagation Download PDF

Info

Publication number
US20070234089A1
US20070234089A1 US11/543,717 US54371706A US2007234089A1 US 20070234089 A1 US20070234089 A1 US 20070234089A1 US 54371706 A US54371706 A US 54371706A US 2007234089 A1 US2007234089 A1 US 2007234089A1
Authority
US
United States
Prior art keywords
pass register
register files
register
pass
registers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/543,717
Inventor
Herman Schmit
Benjamin Levine
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US11/543,717 priority Critical patent/US20070234089A1/en
Publication of US20070234089A1 publication Critical patent/US20070234089A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30141Implementation provisions of register files, e.g. ports
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/3243Power saving in microcontroller unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • G06F9/3832Value prediction for operands; operand history buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This invention was developed in part through funding provided by DARPA-ITO/TTO under contract No. DABT63-96-C-0083. The federal government may have rights in this invention.
  • the present invention is related to reconfigurable architectures and, more particularly, to reconfigurable architectures used to process information in a pipelined fashion.
  • Pipelined configuration involves virtualizing pipelined computations by breaking a single static configuration into pieces that correspond to pipeline stages in the application. Each pipeline stage is loaded, one per cycle, into the fabric. This makes performing the computation possible, even if the entire configuration is never present in the fabric at one time.
  • FIG. 1 illustrates the virtualization process, showing a five-stage pipeline virtualized on a three-stage fabric.
  • FIG. IA shows the five-stage application and each logical (or virtual) pipeline stage's state in six consecutive cycles.
  • FIG. 1B shows the state of the physical stages in the fabric as it executes this application.
  • virtual pipe stage 1 is configured in cycle 1 and ready to execute in the next cycle; it executes for two cycles.
  • cycles 2 , 3 , 7 , 8 . . . consume inputs and cycles 6 , 7 , 11 , 12 , . . . generate outputs.
  • FIG. 2 is an abstract view of the architectural class of a pipelined fabric.
  • Each row of processing elements (PEs) together with its associated interconnections is referred to as a stripe.
  • Each PE typically contains an arithmetic logic unit (ALU) and a pass register file.
  • ALU arithmetic logic unit
  • Each ALU contains lookup tables (LUTs) and extra circuitry for carry chains, zero detection, and so on.
  • Designers implement combinational logic using a set of N B-bit-wide ALUs. The ALU operation is static while a particular virtual stripe resides in a physical stripe.
  • Designers can cascade, chain or otherwise connect the carry lines of the ALUs to construct wider ALUs, and chain PEs together via an interconnection network to build complex combinational functions.
  • Pass register file 10 is comprised of four registers 12 , 14 , 16 , 18 (which may have an arbitrary bitwidth); a write port consisting of, in this figure, four multiplexers 20 , 22 , 24 , 26 and a write address decoder 28 ; and a read port, consisting of, in this figure, a 4-to-1 multiplexers 30 responsive to a read address.
  • the structure of FIG. 3 allows a functional unit connected to this register file 10 to read one value from the register file 10 and also allows a functional unit to write one value into one of the specific registers 12 , 14 , 16 , 18 .
  • FIG. 4 illustrates how four pass register files 42 , 44 , 46 , 48 might be used in an application.
  • the pass register files 42 , 44 , 46 , 48 are connected in a ring, but need not be so connected.
  • FIG. 4 only one register is shown in each of the register files 42 , 44 , 46 , 48 although each of the register files could be arbitrarily large.
  • data generated by Functional Unit 1 proceeds to Functional Unit 2 through one pass register file 44 .
  • a chief problem with the structure of FIG. 4 is that the value, which is only meant for use by Functional Unit 2 , continues through the other pass register files 46 , 48 , 42 , in subsequent stripes. If the value is not overwritten by other stripes using this register, such values continue to propagate all the way back to Functional Unit 1 . This activity is worthless for the computation, and dissipates significant power.
  • a related power consumption problem that occurs in pass register files in pipeline reconfigurable devices is that old values from previous applications that were in the chip continue to propagate through the chip, consuming power even though they are irrelevant to the current computation.
  • the present invention is directed to a method and apparatus for storing and using “register use” information to determine when a register is being used for the last time so that power savings may be achieved.
  • the register use information may take the form of “last read” information for a particular register.
  • the last read information may be used to force the value of the register, after being read, to a constant or to clock only that register while masking off the other registers.
  • FIGS. 1A and 1B illustrate the process of virtualizing a five-stage pipeline on a three stage reconfigurable fabric
  • FIG. 2 illustrates a stripe of a reconfigurable fabric
  • FIG. 3 is an example of a pass register file
  • FIG. 4 illustrates four pass register files, each having a single register, to demonstrate unwanted signal propagation
  • FIG. 5 illustrates one embodiment of the present invention for terminating unwanted signal propagation by forcing the value of the signal to zero
  • FIG. 6 illustrates another embodiment of the present invention for terminating unwanted signal propagation by clocking only the registers needed to produce the value to be read
  • FIG. 7 illustrates another embodiment of the present invention for terminating unwanted signal propagation by clocking only the registers needed to produce the value to be read
  • FIG. 8 is a diagram illustrating an embodiment for a mask unit
  • FIG. 9 illustrates a modification to the circuit of claim 6 so as to use local mask units
  • FIG. 10 illustrates a circuit in which registers are clocked by a common clock signal and four AND gates and a decoder are used to force one register to a value of zero;
  • FIG. 11 illustrates a modification to the circuit of FIG. 10 to enable each register to be clocked by its own clock signal.
  • FIG. 5 illustrates one embodiment of the present invention for terminating unwanted signal propagation.
  • each physical stripe is configured with a virtual stripe by, for example, writing a configuration word to the physical stripe.
  • a detailed explanation of configuration management and data management is provided in Schmit, et al, “Managing Pipeline-Reconfigurable FPGAs” published in ACM 6 th International Symposium on FPGAs, February 1998, the entirety of which is hereby incorporated by reference. The reader desiring more details on the task of writing a configuration word to a physical stripe is referred to the above-identified article.
  • One aspect of the present invention is to include some additional information in the encoding of a stripe (e.g. in the configuration word) that indicates whether a read from the register file is the last read of that data value in the application.
  • the “last read” information can be generated by the compiler or physical design tool that generates the virtual stripe information, or it can be done by a separate program that analyzes a set of virtual stripes to determine when is the last read.
  • the first and last stripes in an application present special cases. In the last stripe in a virtual application, there are no subsequent stripes. Therefore, there are no further reads of values in the register file. In the first virtual stripe, none of the values currently in the register files in physical stripes that are located before the first virtual stripe are going to be used. For stripes other than the first and last stripes in an application, the information about the last time a value in a register needs to be read (sometimes referred to as the last read information) can be used in a number of ways to reduce power consumption.
  • FIG. 5 illustrates one embodiment for using the last read information to reduce power consumption by masking the value after a final read.
  • each register file will have a plurality of registers as shown, for example, in FIG. 3 .
  • each register could store more than one bit.
  • each register in each register file stores eight bits.
  • the last read information is used to fix the value in subsequent stripes in the fabric to a constant value.
  • FIG. 5 there are four register files 42 , 44 , 46 , 48 each having one register 42 ′, 44 ′, 46 , 48 ′, respectively, for purposes of simplicity.
  • each register file will have a plurality of registers as shown, for example, in FIG. 3 .
  • each register could store more than one bit.
  • each register in each register file stores eight bits.
  • the last read information is used to fix the value in subsequent stripes in
  • AND gates Other gates that can be used in place of the AND gates include OR gates, a NAND gate. Any type of gate that exhibits a monotonic function, i.e. a gate that “forces” the output based on a controlling value at one of the inputs, can be used.
  • register 44 ′ is terminated, i.e. prevented from propagating, by AND gate 56 by forcing that value to zero.
  • clocking in a constant value consumes less power than clocking in a changing value.
  • forcing the value to zero results in power savings.
  • a similar result can be achieved by masking of the multiplexor read bit for the appropriate multiplexor responsive to the last read register so that the value output by the register is no longer read when no longer needed.
  • FIG. 6 another method of using the last read information to stop a signal from propagating and for saving power is illustrated.
  • the circuit of FIG. 6 is similar to the circuit of FIG. 5 except that the AND gates 52 , 54 , 56 , 58 are positioned to receive a clock signal 60 .
  • the clock signal output by AND gates 52 , 54 , 56 , 58 is input to registers 42 ′, 44 ′, 46 ′ and 48 ′, respectively.
  • Another way the last read information can be used to reduce power in a register is to stop the register from clocking.
  • FIG. 7 illustrates a somewhat more complex embodiment of the circuit shown in FIG. 6 in that instead of the providing a plurality of gates and a clocking mask to the gates, information is provided to a plurality of mask units 62 , 64 , 66 , 68 which locally determine if registers within register files 42 , 44 , 46 , 48 , respectively, should be clocked.
  • the design of FIG. 7 requires the additional circuitry of the mask units 62 , 64 , 66 , 68 and two AND gates per mask unit to compute the value of the clock mask variable for each stripe (register file). The clock mask bit is determined based on what happened “most recently” in each register within each register file.
  • a register in the register file would be clocked if that register is not in the last virtual stripe and was either written in this stripe (as indicated by the write address) or was clocked in the previous stripe and was not the last read (as indicated by the read address and the last read bit corresponding to that port).
  • FIG. 9 illustrates the circuit of FIG. 6 modified to provide local mask units.
  • the previous embodiments use exactly the same information, whether a value in a register is being read for the last time, to determine that the value should not be allowed to propagate, either by forcing the value to a constant (e.g. zero) or not clocking the registers, to reduce power.
  • the pass register file includes more than one register
  • the combination of the read port address (which specifies which register is being accessed), and the bit indicated “last read” can be combined to determine which value is being read for the last time in the application.
  • There are other ways to encode this information which, at present, seem less efficient. For example, it is possible to have an explicit “in-use” bit for each register in each register file such that it would not be necessary to combine the information with the read port address.
  • the present invention is directed to using any “register use” information for power savings.
  • the information that a stripe is either the first or last virtual stripe can also be used by the mask unit to save power.
  • the application knows that any data coming from previous stripes is not meaningful for this application. This bogus data could be the results from a prior computation that was executed on the stripes in the fabric.
  • a mask unit that is informed that a stripe is the first virtual stripe could mask the clock or gate the data for any data arriving from a physical stripe prior to the physical stripe containing the first virtual stripe.
  • FIG. 10 shows a complex register file with four registers, two read ports, one write port, and a set of four gates that can make the output values from a register that has been read for the last time constant.
  • FIG. 11 shows a register file with the same parameters as FIG. 10 , but with separate clocks that would be generated by a mask unit. The register file in FIG. 11 , if it were reduced to containing two registers, could be used in FIG. 7 to replace 44 .
  • a register file should have unused register file entries masked (e.g. see FIG. 10 ) or have their clocks gated by, for example, providing separate clock signals for each register (See FIG. 11 ).

Abstract

A method and apparatus for storing and using “register use” information to determine when a register is being used for the last time so that power savings may be achieved is disclosed. The register use information may take the form of “last read” information for a particular register. The last read information may be used to force the value of the register, after being read, to zero or to clock only that register while masking off the other registers. Several methods and hardware variations are disclosed for using the register use information to achieve power savings.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of copending U.S. application Ser. No. 10/222,608 filed 16 Aug. 2002 and entitled Programmable Pipeline Fabric Having Mechanism to Terminate Signal Propagation.
  • STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH
  • This invention was developed in part through funding provided by DARPA-ITO/TTO under contract No. DABT63-96-C-0083. The federal government may have rights in this invention.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention is related to reconfigurable architectures and, more particularly, to reconfigurable architectures used to process information in a pipelined fashion.
  • 2. Description of the Background
  • Traditional approaches to reconfigurable computing statically configure programmable hardware to perform a user-defined application. The static nature of such a configuration causes two significant problems: a computation may require more hardware than is available, and a single hardware design cannot exploit the additional resources that will inevitably become available in future process generations. A technique called pipelined reconfiguration implements a large logical configuration on a small piece of hardware through rapid reconfiguration of that hardware. With this technique, the compiler is no long responsible for satisfying fixed hardware constraints. In addition, a design's performance improves in proportion to the amount of hardware allocated to that design.
  • Pipelined configuration involves virtualizing pipelined computations by breaking a single static configuration into pieces that correspond to pipeline stages in the application. Each pipeline stage is loaded, one per cycle, into the fabric. This makes performing the computation possible, even if the entire configuration is never present in the fabric at one time.
  • FIG. 1 illustrates the virtualization process, showing a five-stage pipeline virtualized on a three-stage fabric. FIG. IA shows the five-stage application and each logical (or virtual) pipeline stage's state in six consecutive cycles. FIG. 1B shows the state of the physical stages in the fabric as it executes this application. In this example, virtual pipe stage 1 is configured in cycle 1 and ready to execute in the next cycle; it executes for two cycles. There is no physical pipe stage 4; therefore, in cycle 4, the fourth virtual pipe stage is configured in physical pipe stage 1, replacing the first virtual stage. Once the pipeline is full, every five cycles generates two results for two consecutive cycles. For example, cycles 2, 3, 7, 8 . . . consume inputs and cycles 6, 7, 11, 12, . . . generate outputs.
  • FIG. 2 is an abstract view of the architectural class of a pipelined fabric. Each row of processing elements (PEs) together with its associated interconnections is referred to as a stripe. Each PE typically contains an arithmetic logic unit (ALU) and a pass register file. Each ALU contains lookup tables (LUTs) and extra circuitry for carry chains, zero detection, and so on. Designers implement combinational logic using a set of N B-bit-wide ALUs. The ALU operation is static while a particular virtual stripe resides in a physical stripe. Designers can cascade, chain or otherwise connect the carry lines of the ALUs to construct wider ALUs, and chain PEs together via an interconnection network to build complex combinational functions.
  • One of the key enabling structures for pipeline reconfiguration is the pass register file. An example pass register file 10 is shown in FIG. 3. Pass register file 10 is comprised of four registers 12, 14, 16, 18 (which may have an arbitrary bitwidth); a write port consisting of, in this figure, four multiplexers 20, 22, 24, 26 and a write address decoder 28; and a read port, consisting of, in this figure, a 4-to-1 multiplexers 30 responsive to a read address. The structure of FIG. 3 allows a functional unit connected to this register file 10 to read one value from the register file 10 and also allows a functional unit to write one value into one of the specific registers 12, 14, 16, 18. If a value is not written into one of the registers 12, 14, 16, 18 by the write port, then the value from the corresponding pass register in the previous pass register file in the previous stripe is written into registers 12, 14, 16, 18 via lines 32, 34, 36, 38, respectively.
  • FIG. 4 illustrates how four pass register files 42, 44, 46, 48 might be used in an application. In this figure, the pass register files 42, 44, 46, 48 are connected in a ring, but need not be so connected. In FIG. 4, only one register is shown in each of the register files 42, 44, 46, 48 although each of the register files could be arbitrarily large. In FIG. 4, data generated by Functional Unit 1 proceeds to Functional Unit 2 through one pass register file 44.
  • A chief problem with the structure of FIG. 4 is that the value, which is only meant for use by Functional Unit 2, continues through the other pass register files 46, 48, 42, in subsequent stripes. If the value is not overwritten by other stripes using this register, such values continue to propagate all the way back to Functional Unit 1. This activity is worthless for the computation, and dissipates significant power.
  • A related power consumption problem that occurs in pass register files in pipeline reconfigurable devices is that old values from previous applications that were in the chip continue to propagate through the chip, consuming power even though they are irrelevant to the current computation. Thus, the need exist for a mechanism in the pipeline fabric for terminating signals that are no longer needed for the computation.
  • SUMMARY OF THE PRESENT INVENTION
  • The present invention is directed to a method and apparatus for storing and using “register use” information to determine when a register is being used for the last time so that power savings may be achieved. The register use information may take the form of “last read” information for a particular register. The last read information may be used to force the value of the register, after being read, to a constant or to clock only that register while masking off the other registers. Several methods and hardware variations are disclosed for using the “register use” information to achieve power savings. Those advantages and benefits, and others, will be apparent from the Detailed Description of the Invention herein below.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • For the present invention to be easily understood and readily practiced, the present invention will now be described, for purposes of illustration and not limitation, in conjunction with the following figures, wherein:
  • FIGS. 1A and 1B illustrate the process of virtualizing a five-stage pipeline on a three stage reconfigurable fabric;
  • FIG. 2 illustrates a stripe of a reconfigurable fabric;
  • FIG. 3 is an example of a pass register file;
  • FIG. 4 illustrates four pass register files, each having a single register, to demonstrate unwanted signal propagation;
  • FIG. 5 illustrates one embodiment of the present invention for terminating unwanted signal propagation by forcing the value of the signal to zero;
  • FIG. 6 illustrates another embodiment of the present invention for terminating unwanted signal propagation by clocking only the registers needed to produce the value to be read;
  • FIG. 7 illustrates another embodiment of the present invention for terminating unwanted signal propagation by clocking only the registers needed to produce the value to be read;
  • FIG. 8 is a diagram illustrating an embodiment for a mask unit;
  • FIG. 9 illustrates a modification to the circuit of claim 6 so as to use local mask units;
  • FIG. 10 illustrates a circuit in which registers are clocked by a common clock signal and four AND gates and a decoder are used to force one register to a value of zero; and
  • FIG. 11 illustrates a modification to the circuit of FIG. 10 to enable each register to be clocked by its own clock signal.
  • DETAILED DESCRIPTION OF THE INVENTION
  • FIG. 5 illustrates one embodiment of the present invention for terminating unwanted signal propagation. In FIG. 5, as is known, each physical stripe is configured with a virtual stripe by, for example, writing a configuration word to the physical stripe. A detailed explanation of configuration management and data management is provided in Schmit, et al, “Managing Pipeline-Reconfigurable FPGAs” published in ACM 6th International Symposium on FPGAs, February 1998, the entirety of which is hereby incorporated by reference. The reader desiring more details on the task of writing a configuration word to a physical stripe is referred to the above-identified article. Additional details regarding the construction and operation of reconfigurable fabrics may be found in Schmit, et al, “PipeRench: a virtualized programmable data path in 0.18 Micron Technology”, in Proceedings of the IEEE Custom Integrated Circuits Conference (CICC), 2002, the entirety of which is hereby incorporated by reference, Schmit, “PipeRench: a reconfigurable, architectural and compiler”, IEEE Computer, pages 70-76 (April 2000), the entirety of which is hereby incorporated by reference, Schmit, “Incremental Reconfiguration for Pipelined Applications”, Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, pp. 47-55, 1997, the entirety of which is hereby incorporated by reference and Schmit et al, “PipeRench: A Coprocessor for Streaming Multimedia Acceleration”, International Symposium on Computer Architecture, pp. 38-49, 1999, the entirety of which is hereby incorporated by reference.
  • One aspect of the present invention is to include some additional information in the encoding of a stripe (e.g. in the configuration word) that indicates whether a read from the register file is the last read of that data value in the application. The “last read” information can be generated by the compiler or physical design tool that generates the virtual stripe information, or it can be done by a separate program that analyzes a set of virtual stripes to determine when is the last read. The first and last stripes in an application present special cases. In the last stripe in a virtual application, there are no subsequent stripes. Therefore, there are no further reads of values in the register file. In the first virtual stripe, none of the values currently in the register files in physical stripes that are located before the first virtual stripe are going to be used. For stripes other than the first and last stripes in an application, the information about the last time a value in a register needs to be read (sometimes referred to as the last read information) can be used in a number of ways to reduce power consumption.
  • FIG. 5 illustrates one embodiment for using the last read information to reduce power consumption by masking the value after a final read. In FIG. 5, there are four register files 42, 44, 46, 48 each having one register 42′, 44′, 46, 48′, respectively, for purposes of simplicity. The reader will understand that in practice each register file will have a plurality of registers as shown, for example, in FIG. 3. In addition, the reader will understand that each register could store more than one bit. In the actual PipeRench implementation described in the previous publications, each register in each register file stores eight bits. In the embodiment of FIG. 5, the last read information is used to fix the value in subsequent stripes in the fabric to a constant value. In the embodiment of FIG. 5 that is accomplished with an AND 52 gate located prior to (or in) register file 42, AND 54 gate located prior to (or in) register file 44, AND 56 gate located prior to (or in) register file 46, and AND 58 gate located prior to (or in) register file 48. Assuming that the value read from register 44′ is the last time that value needs to be read, inputting a zero on one of the input terminals of the AND gate 56 forces the value at the output terminal of the AND gate 56, and in the subsequent pass register files, to zero. The value input to the input terminals of the other AND gates 52, 54, and 58 is not of significance in terminating the propagation of the signal produced by the register 44′. Other gates that can be used in place of the AND gates include OR gates, a NAND gate. Any type of gate that exhibits a monotonic function, i.e. a gate that “forces” the output based on a controlling value at one of the inputs, can be used.
  • It will be noticed that the value output by register 44′ is terminated, i.e. prevented from propagating, by AND gate 56 by forcing that value to zero. In a register, clocking in a constant value consumes less power than clocking in a changing value. Thus, forcing the value to zero results in power savings. A similar result can be achieved by masking of the multiplexor read bit for the appropriate multiplexor responsive to the last read register so that the value output by the register is no longer read when no longer needed.
  • In FIG. 6 another method of using the last read information to stop a signal from propagating and for saving power is illustrated. The circuit of FIG. 6 is similar to the circuit of FIG. 5 except that the AND gates 52, 54, 56, 58 are positioned to receive a clock signal 60. The clock signal output by AND gates 52, 54, 56, 58 is input to registers 42′, 44′, 46′ and 48′, respectively. Another way the last read information can be used to reduce power in a register is to stop the register from clocking. In FIG. 6, that is performed by masking (blocking) the clock signal 60 to those registers 42′, 46′, 48′ that are unused by inputting a zero to one of the input terminals of AND gates 52, 56, 58, respectively. Only the one register 44′ in use is actually clocked by inputting a one to one of the input terminals of the AND gate 54, which saves significant clock distribution power, as well the power dissipated in the register itself. The set of values input to AND gates 52, 54, 56, 58 (e.g. 0100) may be referred to as a clocking mask.
  • FIG. 7 illustrates a somewhat more complex embodiment of the circuit shown in FIG. 6 in that instead of the providing a plurality of gates and a clocking mask to the gates, information is provided to a plurality of mask units 62, 64, 66, 68 which locally determine if registers within register files 42, 44, 46, 48, respectively, should be clocked. The design of FIG. 7 requires the additional circuitry of the mask units 62, 64, 66, 68 and two AND gates per mask unit to compute the value of the clock mask variable for each stripe (register file). The clock mask bit is determined based on what happened “most recently” in each register within each register file. What happened most recently is determined from the inputs “ReadAdd0”, “ReadAdd1”, “WriteAdd”, “LastRead0”, “LastRead1”, and “LastVirtual”, as well information on the state of the previous mask unit. If that register has been “read for the last time”, then the clock is masked off. If the register has been written more recently than it has been “read for the last time”, the clock is enabled. That can be implemented with a small finite state machine receiving the inputs identified above.
  • In this state machine, shown in FIG. 8, a register in the register file would be clocked if that register is not in the last virtual stripe and was either written in this stripe (as indicated by the write address) or was clocked in the previous stripe and was not the last read (as indicated by the read address and the last read bit corresponding to that port).
  • FIG. 9 illustrates the circuit of FIG. 6 modified to provide local mask units.
  • The previous embodiments use exactly the same information, whether a value in a register is being read for the last time, to determine that the value should not be allowed to propagate, either by forcing the value to a constant (e.g. zero) or not clocking the registers, to reduce power. When the pass register file includes more than one register, the combination of the read port address (which specifies which register is being accessed), and the bit indicated “last read” can be combined to determine which value is being read for the last time in the application. There are other ways to encode this information which, at present, seem less efficient. For example, it is possible to have an explicit “in-use” bit for each register in each register file such that it would not be necessary to combine the information with the read port address. Thus, the present invention is directed to using any “register use” information for power savings.
  • Furthermore the information that a stripe is either the first or last virtual stripe can also be used by the mask unit to save power. At the first virtual stripe, the application knows that any data coming from previous stripes is not meaningful for this application. This bogus data could be the results from a prior computation that was executed on the stripes in the fabric. As a result, a mask unit that is informed that a stripe is the first virtual stripe could mask the clock or gate the data for any data arriving from a physical stripe prior to the physical stripe containing the first virtual stripe.
  • FIG. 10 shows a complex register file with four registers, two read ports, one write port, and a set of four gates that can make the output values from a register that has been read for the last time constant. FIG. 11 shows a register file with the same parameters as FIG. 10, but with separate clocks that would be generated by a mask unit. The register file in FIG. 11, if it were reduced to containing two registers, could be used in FIG. 7 to replace 44.
  • Finally, to address the special cases of the first and last virtual stripe, a register file should have unused register file entries masked (e.g. see FIG. 10) or have their clocks gated by, for example, providing separate clock signals for each register (See FIG. 11).
  • While the present invention has been described in connection with preferred embodiments thereof, those of ordinary skill in the art will recognize that many modifications and variations are possible. The present invention is intended to be limited only by the following claims and not by the foregoing description.

Claims (16)

1.-16. (canceled)
17. A power saving method, comprising:
providing configuration information to each of a plurality of series connected pass register files, each pass register file comprised of a plurality of registers;
providing clock pulses to each of said pass register files;
determining for each pass register file, if the registers within said pass register file should be clocked with said clock pulses based on a read address, a write address, and a last read data for said pass register file; and
selectively applying said clock pulses to the registers within each of said pass register files based on said determining.
18. The method of claim 17 wherein said determining is performed one of remotely or locally with respect to each of said pass register files.
19. The method of claim 17 wherein said determining is additionally based on a state of a preceding pass register file in said plurality of series connected pass register files.
20. The method of claim 19 wherein said determining is performed by a state machine.
21. The method of claim 17 wherein said determining is performed by a plurality of mask units each positioned locally with respect to one of said pass register files, and wherein said selectively applying is performed by a plurality of logic gates, each responsive to one of said plurality of mask units and each receiving said clock pulses.
22. A power saving circuit for use in a reconfigurable apparatus of the type constructed of a plurality of serially connected pass register files, each pass register file constructed of a plurality of registers, said power saving circuit comprising:
a plurality of mask units, each producing a signal for controlling the application of clock pulses to one of said pass register files based on a read address, a write address, and a last read data for said pass register file; and
a plurality of logic gates, each responsive to one of said mask units for selectively applying clock pulses to the registers within one of said pass register files.
23. The circuit of claim 22 wherein each of said mask units is located one of remotely or locally with respect to each of said pass register files.
24. The circuit of claim 22 wherein each of said mask units is additionally responsive to a state of a mask unit for a preceding pass register file in said plurality of series connected pass register files.
25. The circuit of claim 22 wherein each of said mask units includes a state machine.
26. The circuit of claim 22 wherein said plurality of logic gates includes a plurality of AND gates.
27. A reconfigurable apparatus, comprising:
a plurality of series connected pass register files each comprised of a plurality of registers, each of said pass register files adapted to receive configuration information;
a plurality of mask units, each producing a signal for controlling the application of clock pulses to one of said pass register files based on a read address, a write address, and a last read data for said pass register file; and
a plurality of logic gates, each responsive to one of said mask units for selectively applying clock pulses to the registers within one of said pass register files.
28. The apparatus of claim 27 wherein each of said mask units is located one of remotely or locally with respect to each of said pass register files.
29. The apparatus of claim 27 wherein each of said mask units is additionally responsive to a state of a mask unit for a preceding pass register file in said plurality of series connected pass register files.
30. The apparatus of claim 27 wherein each of said mask units includes a state machine.
31. The apparatus of claim 27 wherein said plurality of logic gates includes a plurality of AND gates.
US11/543,717 2002-08-16 2006-10-05 Programmable pipeline fabric having mechanism to terminate signal propagation Abandoned US20070234089A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/543,717 US20070234089A1 (en) 2002-08-16 2006-10-05 Programmable pipeline fabric having mechanism to terminate signal propagation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/222,608 US7131017B2 (en) 2002-08-16 2002-08-16 Programmable pipeline fabric having mechanism to terminate signal propagation
US11/543,717 US20070234089A1 (en) 2002-08-16 2006-10-05 Programmable pipeline fabric having mechanism to terminate signal propagation

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/222,608 Continuation US7131017B2 (en) 2002-08-16 2002-08-16 Programmable pipeline fabric having mechanism to terminate signal propagation

Publications (1)

Publication Number Publication Date
US20070234089A1 true US20070234089A1 (en) 2007-10-04

Family

ID=31715014

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/222,608 Expired - Fee Related US7131017B2 (en) 2002-08-16 2002-08-16 Programmable pipeline fabric having mechanism to terminate signal propagation
US11/543,717 Abandoned US20070234089A1 (en) 2002-08-16 2006-10-05 Programmable pipeline fabric having mechanism to terminate signal propagation

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/222,608 Expired - Fee Related US7131017B2 (en) 2002-08-16 2002-08-16 Programmable pipeline fabric having mechanism to terminate signal propagation

Country Status (6)

Country Link
US (2) US7131017B2 (en)
EP (1) EP1535145A2 (en)
JP (1) JP2005539292A (en)
CN (1) CN100409179C (en)
AU (1) AU2003265422A1 (en)
WO (1) WO2004017222A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100254618A1 (en) * 2009-04-01 2010-10-07 Yu-Min Chen Method for Accessing Image Data and Related Apparatus

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4143907B2 (en) * 2002-09-30 2008-09-03 ソニー株式会社 Information processing apparatus and method, and program
US7395411B2 (en) * 2005-03-14 2008-07-01 Sony Computer Entertainment Inc. Methods and apparatus for improving processing performance by controlling latch points
US7398153B2 (en) * 2005-03-31 2008-07-08 Trimble Navigation Limited Portable motion-activated position reporting device
JP4621604B2 (en) * 2006-02-20 2011-01-26 株式会社東芝 Bus device, bus system, and information transfer method
US20090055632A1 (en) * 2007-08-22 2009-02-26 Chao-Wu Chen Emulation Scheme for Programmable Pipeline Fabric
JP5231800B2 (en) * 2007-12-26 2013-07-10 株式会社東芝 Semiconductor integrated circuit device and clock control method for semiconductor integrated circuit device
US9286072B2 (en) 2011-10-03 2016-03-15 International Business Machines Corporation Using register last use infomation to perform decode-time computer instruction optimization
US8615745B2 (en) 2011-10-03 2013-12-24 International Business Machines Corporation Compiling code for an enhanced application binary interface (ABI) with decode time instruction optimization
US9697002B2 (en) 2011-10-03 2017-07-04 International Business Machines Corporation Computer instructions for activating and deactivating operands
US9329869B2 (en) 2011-10-03 2016-05-03 International Business Machines Corporation Prefix computer instruction for compatibily extending instruction functionality
US8612959B2 (en) 2011-10-03 2013-12-17 International Business Machines Corporation Linking code for an enhanced application binary interface (ABI) with decode time instruction optimization
US10078515B2 (en) 2011-10-03 2018-09-18 International Business Machines Corporation Tracking operand liveness information in a computer system and performing function based on the liveness information
US8756591B2 (en) 2011-10-03 2014-06-17 International Business Machines Corporation Generating compiled code that indicates register liveness
US9354874B2 (en) 2011-10-03 2016-05-31 International Business Machines Corporation Scalable decode-time instruction sequence optimization of dependent instructions
US9690583B2 (en) 2011-10-03 2017-06-27 International Business Machines Corporation Exploiting an architected list-use operand indication in a computer system operand resource pool

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983339A (en) * 1995-08-21 1999-11-09 International Business Machines Corporation Power down system and method for pipelined logic functions
US6247134B1 (en) * 1999-03-31 2001-06-12 Synopsys, Inc. Method and system for pipe stage gating within an operating pipelined circuit for power savings
US6247124B1 (en) * 1993-12-15 2001-06-12 Mips Technologies, Inc. Branch prediction entry with target line index calculated using relative position of second operation of two step branch operation in a line of instructions
US6393579B1 (en) * 1999-12-21 2002-05-21 Intel Corporation Method and apparatus for saving power and improving performance in a collapsable pipeline using gated clocks
US6609209B1 (en) * 1999-12-29 2003-08-19 Intel Corporation Method and apparatus for reducing the power consumed by a processor by gating the clock signal to pipeline stages
US6965991B1 (en) * 2000-05-12 2005-11-15 Pts Corporation Methods and apparatus for power control in a scalable array of processor elements

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5666300A (en) * 1994-12-22 1997-09-09 Motorola, Inc. Power reduction in a data processing system using pipeline registers and method therefor
US6393597B1 (en) * 1999-06-01 2002-05-21 Sun Microsystems, Inc. Mechanism for decoding linearly-shifted codes to facilitate correction of bit errors due to component failures

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6247124B1 (en) * 1993-12-15 2001-06-12 Mips Technologies, Inc. Branch prediction entry with target line index calculated using relative position of second operation of two step branch operation in a line of instructions
US5983339A (en) * 1995-08-21 1999-11-09 International Business Machines Corporation Power down system and method for pipelined logic functions
US6247134B1 (en) * 1999-03-31 2001-06-12 Synopsys, Inc. Method and system for pipe stage gating within an operating pipelined circuit for power savings
US6393579B1 (en) * 1999-12-21 2002-05-21 Intel Corporation Method and apparatus for saving power and improving performance in a collapsable pipeline using gated clocks
US6609209B1 (en) * 1999-12-29 2003-08-19 Intel Corporation Method and apparatus for reducing the power consumed by a processor by gating the clock signal to pipeline stages
US6965991B1 (en) * 2000-05-12 2005-11-15 Pts Corporation Methods and apparatus for power control in a scalable array of processor elements

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100254618A1 (en) * 2009-04-01 2010-10-07 Yu-Min Chen Method for Accessing Image Data and Related Apparatus

Also Published As

Publication number Publication date
AU2003265422A1 (en) 2004-03-03
EP1535145A2 (en) 2005-06-01
AU2003265422A8 (en) 2004-03-03
CN1688968A (en) 2005-10-26
US20040034804A1 (en) 2004-02-19
WO2004017222A2 (en) 2004-02-26
JP2005539292A (en) 2005-12-22
CN100409179C (en) 2008-08-06
WO2004017222A3 (en) 2004-10-07
US7131017B2 (en) 2006-10-31

Similar Documents

Publication Publication Date Title
US20070234089A1 (en) Programmable pipeline fabric having mechanism to terminate signal propagation
JP5364543B2 (en) A multiprocessor computer architecture incorporating multiple memory algorithm processors in a memory subsystem.
US7895416B2 (en) Reconfigurable integrated circuit
US7200735B2 (en) High-performance hybrid processor with configurable execution units
Trivedi et al. Design & analysis of 16 bit RISC processor using low power pipelining
US7263602B2 (en) Programmable pipeline fabric utilizing partially global configuration buses
JP7183197B2 (en) high throughput processor
Bhosle et al. FPGA Implementation of low power pipelined 32-bit RISC Processor
WO2020087619A1 (en) Reduced instruction set processor based on memristor
Abdelhadi et al. Modular switched multiported SRAM-based memories
US7908465B1 (en) Hardware emulator having a selectable write-back processor unit
Glossner et al. Sandblaster low power DSP [parallel DSP arithmetic microarchitecture]
Beck et al. Transparent acceleration of data dependent instructions for general purpose processors
US20040068329A1 (en) Method and apparatus for general purpose computing
US20030233593A1 (en) Reduced verification complexity and power saving features in a pipelined integated circuit
Subhashini et al. FPGA-Based 128-Bit RISC Processor Using Pipelining
Davis et al. A flexible architecture for simulation and testing (FAST) multiprocessor systems
Belias Reconfigurable computing concept for the on-shore data acquisition system of a km3-scale underwater neutrino telescope
Wallace et al. Design and implementation of a 100 MHz centralized instruction window for a superscalar microprocessor
Munshi et al. A parameterizable SIMD stream processor
Syed et al. Intelligent Reconfigurable Instruction Set Processor (IRISP) Design
Praveen et al. A survey on control implementation scheme
Schneider Beck et al. Reconfigurable acceleration with binary compatibility for general purpose processors
Ikeda et al. Data bypassing register file for low power microprocessor
GB2508162A (en) Reconfigurable predicate logic computer architectures

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE