US7774729B1 - Method and apparatus for reducing dynamic power in a system - Google Patents
Method and apparatus for reducing dynamic power in a system Download PDFInfo
- Publication number
- US7774729B1 US7774729B1 US11/807,437 US80743707A US7774729B1 US 7774729 B1 US7774729 B1 US 7774729B1 US 80743707 A US80743707 A US 80743707A US 7774729 B1 US7774729 B1 US 7774729B1
- Authority
- US
- United States
- Prior art keywords
- sequential
- sequential element
- clock signal
- combinatorial logic
- logic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/327—Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2119/00—Details relating to the type or aim of the analysis or the optimisation
- G06F2119/06—Power analysis or power optimisation
Definitions
- the present invention relates to the field of field programmable gate arrays (FPGAs) and other target devices. More specifically, the present invention relates to a method and apparatus for reducing dynamic power in a system.
- FPGAs field programmable gate arrays
- Support(P) The inputs of a combinatorial cone of logic driving P is referred to as Support(P). Changes in the value observed at P are caused by changes in Support(P). Glitches may be observed at P whenever multiple transitions occur at P within a single clock period. All transitions except the last are deemed to be glitches. Multiple transitions observed at P may be caused by transitions in the values in Support(P). The multiple transitions may occur in Support(P) and the effects of the multiple transitions arrive at P at different times. Alternatively, a single change in Support(P) may be propagated through multiple paths through the combinatorial logic and those paths may have differing delay.
- FIGS. 1 a and 1 b illustrate an example of glitching.
- FIG. 1 a illustrates an exemplary circuit with input registers RA 101 , RB 102 , RC 103 , an XOR gate Fgate 104 , and output register RF 105 .
- the propagation delay is 1 unit from the output of RA to the input of Fgate, 2 units from the output of RB to the input of Fgate, 5 units from the output of RC to the input of Fgate, 1 unit from each input of Fgate to the output of Fgate, and 1 unit from the output of Fgate to RF.
- FIG. 1 b illustrates an exemplary timing diagram for the circuit shown in FIG. 1 a.
- Some types of logic are more susceptible to glitching than others.
- Logic such as XOR gates, Adders, Multipliers, Multiplexors, crossbars, and barrel shifters tend to be more susceptible to glitching because they tend to generate a change in output in response to any change in their inputs.
- An AND gate is not susceptible to glitching because it is sensitive to an input bit only if all the other inputs are 1, which is a small fraction of its input space. Glitches are especially harmful when the logic cone is deep. A glitch in an early stage of the logic cone will often propagate through the rest of the logic, and cause a cascade of wasted power.
- glitches are eliminated in circuits of a system by insertion of one or more stages of pipeline sequential elements into glitch-prone combinatorial cones of logic.
- the sequential elements only change value at most once per clock cycle and prevent glitches from propagating downstream, effectively filtering glitches out of the system.
- the insertion of sequential elements in the circuits of the system may, however, increase the latency of the system.
- the clock signal transmitted to the inserted sequential elements are phase shifted.
- the phase in which to shift each set of pipelined sequential elements are determined based upon on the number of sets of pipelined sequential elements inserted into a combinatorial cone of logic.
- FIG. 1 a illustrates an exemplary circuit that experiences glitching.
- FIG. 1 b is an exemplary timing diagram that illustrates glitching experienced by the circuit shown in FIG. 1 a.
- FIG. 2 is a flow chart that illustrates a method for designing a system on a target device according to an embodiment of the present invention.
- FIG. 3 illustrates a target device according to an embodiment of the present invention.
- FIG. 4 is a flow chart that illustrates a method for performing sequential element insertion according to an embodiment of the present invention.
- FIG. 5 illustrates an example of performing sequential element insertion according to an embodiment of the present invention.
- FIG. 6 a illustrates an exemplary cone of combinatorial logic.
- FIG. 6 b illustrates an example of cut line enumeration according to an embodiment of the present invention.
- FIG. 7 illustrates a system designer according to an embodiment of the present invention.
- FIG. 8 is an exemplary computer system that implements a system designer according to an embodiment of the present invention.
- FIG. 2 is a flow chart illustrating a method for designing a system on a target device according to an embodiment of the present invention.
- the target device may be an integrated circuit such as a field programmable gate array (FPGA), a structured application specific integrated circuit (ASIC), or other circuit.
- the method described in FIG. 2 may be implemented as a computer aided design (CAD) flow executed on a system designer.
- CAD computer aided design
- circuit and constraint entries are made.
- a user may specify their circuit design and constraints associated with the implementation of the system.
- the user may provide a circuit description in a hardware description language (HDL) such as VHSIC HDL (VHDL) or Verilog.
- the user may specify constraints on the implementation such as timing constraints, power budgets, or other constraints.
- HDL hardware description language
- VHDL VHSIC HDL
- Verilog Verilog
- the system is synthesized.
- Synthesis includes generating a logic design of the system to be implemented by the target device.
- synthesis generates an optimized logical representation of the system from a HDL design definition.
- the optimized logical representation of the system may include a representation that has a minimized number of functional blocks and registers, such as logic gates and logic elements, required for the system.
- Synthesis also includes mapping the optimized logic design. Mapping includes determining how to implement logic gates and logic elements in the optimized logic representation with specific resources on the target device.
- a netlist is generated from mapping. This netlist may be an optimized technology-mapped netlist generated from the HDL.
- placement works on the optimized technology-mapped netlist to produce a placement for each of the functional blocks.
- placement includes fitting the system on the integrated circuit by determining which resources on the integrated circuit are to be used for specific logic elements, and other functional blocks.
- Routing involves determining how to connect the functional blocks in the system.
- a cost function may be used to generate a cost associated with each routing option.
- the cost function may take into account, delay, capacitive loading, cross-sink loading, power, and/or other criteria.
- sequential elements are inserted into the system design.
- glitches are eliminated in circuits of the system by insertion of stages of pipeline sequential elements into glitch-prone combinatorial cones of logic.
- the sequential elements only change value at most once per clock cycle and prevent glitches from propagating downstream, effectively filtering glitches out of the system.
- the clock signal transmitted to the inserted sequential elements are phase shifted.
- the phase in which to shift each set of pipelined sequential elements are determined based upon on the number of sets of pipelined sequential elements inserted into a combinatorial cone of logic.
- the insertion of sequential elements may be made after any one or more of the synthesis, placement, or routing procedures 202 - 204 . In order to accommodate the addition of sequential elements, incremental synthesis, placement, and/or routing may be performed without requiring entire design procedures described in FIG. 2 to be re-executed.
- an assembly procedure is performed.
- the assembly procedure involves creating a data file or set of files that includes information determined by the procedures described by 201 - 205 .
- the data file may be a bit stream that may be used to program the target device.
- the procedures illustrated in FIG. 2 may be performed by an electronic design automation (EDA) tool executed on a first computer system.
- EDA electronic design automation
- the data file generated may be transmitted to a second computer system to allow the design of the system to be further processed.
- the data file may be transmitted to a second computer system which may be used to program the target device according to the system design.
- the design of the system may also be output in other forms such as on a display device or other medium.
- FIG. 3 illustrates an exemplary target device 300 in which a system may be implemented on utilizing an FPGA according to an embodiment of the present invention.
- the target device 300 is a chip having a hierarchical structure that may take advantage of wiring locality properties of circuits formed therein.
- the target device 300 includes a plurality of logic-array blocks (LABs). Each LAB may be formed from a plurality of logic blocks, carry chains, LAB control signals, (lookup table) LUT chain, and register chain connection lines.
- a logic block is a small unit of logic providing efficient implementation of user logic functions.
- a logic block includes one or more combinational cells, where each combinational cell has a single output, and registers. According to one embodiment of the present invention, the logic block may operate similarly to a logic element (LE), such as those found in StratixTM manufactured by Altera® Corporation, or a combinational logic block (CLB) such as those found in VirtexTM manufactured by Xilinx® Inc.
- LABs are grouped into rows and columns across the target device 300 . Columns of LABs are shown as 311 - 316 . It should be appreciated that the logic block may include additional or alternate components.
- the target device 300 includes memory blocks.
- the memory blocks may be, for example, dual port random access memory (RAM) blocks that provide dedicated true dual-port, simple dual-port, or single port memory up to various bits wide at up to various frequencies.
- RAM dual port random access memory
- the memory blocks may be grouped into columns across the target device in between selected LABs or located individually or in pairs within the target device 300 . Columns of memory blocks are shown as 321 - 324 .
- the target device 300 includes digital signal processing (DSP) blocks.
- DSP digital signal processing
- the DSP blocks may be used to implement multipliers of various configurations with add or subtract features.
- the DSP blocks include shift registers, multipliers, adders, and accumulators.
- the DSP blocks may be grouped into columns across the target device 300 and are shown as 331 .
- the target device 300 includes a plurality of input/output elements (IOEs) 340 . Each IOE feeds an I/O pin (not shown) on the target device 300 .
- the IOEs are located at the end of LAB rows and columns around the periphery of the target device 300 .
- Each IOE includes a bidirectional I/O buffer and a plurality of registers for registering input, output, and output-enable signals. When used with dedicated clocks, the registers provide performance and interface support with external memory devices.
- the target device 300 includes LAB local interconnect lines (not shown) that transfer signals between LEs in the same LAB.
- the LAB local interconnect lines are driven by column and row interconnects and LE outputs within the same LAB.
- Neighboring LABs, memory blocks, or DSP blocks may also drive the LAB local interconnect lines through direct link connections.
- the target device 300 also includes a plurality of row interconnect lines (“H-type wires”) (not shown) that span fixed distances.
- Dedicated row interconnect lines route signals to and from LABs, DSP blocks, and memory blocks within the same row.
- the row interconnect lines may span a distance of up to four, eight, and twenty-four LABs respectively, and are used for fast row connections in a four-LAB, eight-LAB, and twenty-four-LAB region.
- the row interconnects may drive and be driven by LABs, DSP blocks, RAM blocks, and horizontal IOEs.
- the target device 300 also includes a plurality of column interconnect lines (“V-type wires”) (not shown) that operate similarly to the row interconnect lines.
- the column interconnect lines vertically routes signals to and from LABs, memory blocks, DSP blocks, and IOEs.
- Each column of LABs is served by a dedicated column interconnect, which vertically routes signals to and from LABs, memory blocks, DSP blocks, and IOEs.
- the column interconnect lines may traverse a distance of four, eight, and sixteen blocks respectively, in a vertical direction.
- FIG. 3 illustrates an exemplary embodiment of a target device.
- a system may include a plurality of target devices, such as that illustrated in FIG. 3 , cascaded together.
- the target device may include programmable logic devices arranged in a manner different than that on the target device 300 .
- a target device may also include FPGA resources other than those described in reference to the target device 300 .
- the invention described herein may be utilized on the architecture described in FIG.
- FIG. 4 is a flow chart that illustrates a method for performing sequential element insertion according to an embodiment of the present invention.
- the procedure described with reference to FIG. 4 may be used to implement procedure 205 illustrated in FIG. 2 .
- power estimates are computed.
- the power estimates may include a metric that describes the estimated overall power consumption for the system design.
- the power estimates may include a metric that describes the power consumption for each circuit or sub-circuit in the system design.
- the circuit may include a combinational or combinatorial cone of logic.
- the power estimate may include an estimate of signal activities for each resource, such as a net or block, in the system design.
- the signal activities may include a toggle rate and static probability (time-averaged fraction of time at which a signal is logic high) for each signal in the design.
- the signal activities may be determined by using the procedure described in application Ser. No. 11/414,855 entitled “Method and Apparatus for Deriving Signal Activities for Power Analysis and Optimization”, which is herein incorporated
- the power estimates computed at 401 may also include a glitchiness score for each signal and functional block.
- the glitchiness score indicates the impact a signal may have in producing glitches or a degree of susceptibility a functional block is to experiencing glitches.
- the glitchiness score may be higher for functional blocks whose outputs tend to toggle faster than the maximum toggle rate of any of its input. For example, a functional block implementing an XOR function will have a high glitchiness score because generally its output tends to toggle each time any one of its inputs toggles.
- a functional block implementing an AND function will have a low glitchiness score because generally its output tends to toggle under very limited conditions.
- combinatorial logic or combinatorial logic clouds which are candidate for sequential element insertion are identified.
- the combinatorial logic clouds are bounded by sequential elements, such as registers, which are clocked by the same clock.
- the bounding sequential elements may be referred to as source and destination sequential elements.
- candidates for sequential element insertion are identified are identified from combinatorial logic clouds that have significant dynamic power and in which there is significant glitching. This may be achieved by identifying combinatorial logic clouds having a signals and/or functional blocks with associated glitchiness scores that exceed a threshold value.
- Combinatorial logic clouds which are candidates for optimization through sequential element insertion may include combinatorial logic where enough glitches could be filtered to offset power tradeoffs associated with the additional power required from the additional sequential elements inserted and resources for performing phase shifting (local clock generation logic, global signals, and/or clock delay elements).
- cut lines are determined in the identified combinatorial logic clouds.
- placement of a cut line determines an upper bound on the number of sequential elements which may be inserted.
- Each edge across a cut line is a candidate location for a pipeline sequential element.
- Each inserted sequential element will absorb glitches and reduce dynamic power.
- cut lines may be positioned to separate levels of logic.
- a cut line is inserted after a first level of functional blocks and before a second level of functional blocks.
- New (intermediate) cut lines may also be generated or positioned by moving one or more logic gates across a cut line previously positioned to separate levels of logic.
- cut lines and/or intermediate cut lines may be inserted at each level of functional blocks or and then evaluated based upon the glitchiness score of signals and/or functional blocks in proximity of the cut lines.
- cut lines may be positioned based primarily on glitchiness scores of signals and/or functional blocks in the combinatorial logic cloud.
- pipelined sequential elements are placed at the cut lines.
- one sequential element is placed along the intersection between a cut line and a path between a source sequential element and a destination sequential element. Since there may be constraints on the number of sequential elements that may be implemented on a target device, it is advantageous to being insertions at cut lines which produce the greatest reduction in power.
- registers such as edge triggered registers, may be inserted as sequential elements. It should be appreciated that other types of sequential elements may also be utilized. For example, back-to-back latches may be configured to operate as an edge-triggered register. Single latches may also be utilized where the bounding sequential elements are positive-edge triggered.
- the data delay through cloud of logic may be doubled if the inserted sequential logic elements are clocked by the same clock signal clocking the source and destination sequential elements. For example, it may take two clock cycles for some data to receive at the destination sequential element instead of a single clock cycle. Since the latency of the circuit has increased, the functionality of the circuit may have also changed. In efforts to reduce the data delay and preserve the functionality of the circuit, the clock frequency through the cloud of logic may be doubled. This, however, has the adverse effect of doubling the dynamic power required to distribute the clock. If some parts of the system use the original clock and the updated clock, then this solution would require still more power since special purpose clock generation logic and distribution mechanisms would both be required.
- the inserted pipelined sequential elements are clocked with a clocking signal that is phase shifted with respect to the clock signal clocking the source and destination sequential elements.
- a clocking signal that is phase shifted with respect to the clock signal clocking the source and destination sequential elements.
- Affected(P) be the set of destination registers which may change in value in response to a change in P.
- the sequential elements in Affected(P) are also clocked by clock Clk.
- MAXDELAY(P, Affected(P)) is the maximum delay of any path from P to any register in Affected(P).
- the setup timing requirement through P is satisfied if the maximum path delay from Support(P) to Affected(P) is no larger than Period(Clk), i.e. value changes caused by the launch edge at Support(P) all arrive at Affected(P) within one clock period. This condition is satisfied if the following relationship is satisfied.
- Phasedelay(Pipereg(P),Clk) will result in a more stringent requirement, given by the following relationship. Phasedelay(Pipereg( P ),Clk)+MAXDELAY(Pipereg( P ),Affected( P )) ⁇ Period(Clk) [5]
- FIG. 5 illustrates an example of performing sequential element insertion according to an embodiment of the present invention.
- a cloud of combinatorial logic bounded by sequential elements Rinput and Routput is identified.
- a cut line is used to cut the combinatorial logic into early and late portions.
- Sequential elements, such as registers, are inserted at the cut line.
- the sequential elements are clocked at clock signal that has a phase shifted with respect to the clock signal clocking the sequential elements bounding the combinatorial logic, Rinput and Routput.
- the inserted pipelined sequential elements may be clocked by an inversion of the clock signal clocking the bounding sequential elements, where the clock signal has a phase difference of a 180 degrees from the original clock signal that clocks the source and destination sequential elements.
- the clock signal has a phase difference of a 180 degrees from the original clock signal that clocks the source and destination sequential elements.
- multiple pipeline stages may be inserted into a combinatorial logic cloud where each stage is clocked by a suitably phase shifted version of the clocking signal used to clock the source and destination sequential elements (the original clock).
- the combinatorial logic cloud maybe split into three parts, early, middle, and late combinatorial logic.
- a first pipeline stage may be inserted between the early and middle combinatorial logic where the sequential elements in the first pipeline stage is clocked by a version of the original clock phase shifted by 120 degrees.
- the data delay through the early combinatorial logic is bounded above by 1 ⁇ 3 of the clock period of the original clock.
- a second pipeline stage between the middle and late combinatorial logic where the sequential elements in the second pipeline stage is clocked by a version of the original clock phase shifted by 240 degrees.
- the data delay through the middle combinatorial logic is bounded above by 1 ⁇ 3 of the clock period of the original clock.
- the data delay through the late combinatorial logic is bounded by 1 ⁇ 3 of the clock period of the original clock.
- appropriate clocking signals are provided to the sequential elements inserted into the system.
- the clocking signals may be phase shifted according to a number of cut lines inserted into the system to allow the functionality of the system to be preserved without increasing the latency of data transmitted through the cloud of combinatorial logic.
- the appropriate clocking signal may be provided using special purpose clock generation logic such as DLL or PLL, regional or local clock delay elements, special purpose registers having tunable clock delays, local routing to delay a clock, logic elements to delay or invert a clock signal, and/or other appropriate mechanisms and techniques.
- FIG. 6 a illustrates an exemplary cone of combinatorial logic 600 .
- the cone of combinatorial logic is bounded by a plurality of source sequential elements Rin 1 -Rin 3 and a plurality of destination sequential elements Rout 1 -Rout 3 .
- exemplary cut lines may be inserted into the combinatorial logic 600 .
- cut lines may be inserted to separate levels of logic. Cut line L 1 is inserted after a first level of functional blocks, XOR 1 and AND 1 , and before a second level of functional blocksINV 2 and AND 2 . Cut line L 2 is inserted after the second level of functional blocks, AND 2 and INV 2 , and before a third level of functional blocks, XOR 3 .
- Intermediate cut line L 1 . 1 may be generated by taking existing cut line, L 1 , and moving functional block INV 2 across the cut line from right to left. It should be appreciated that one or more of the cut lines identified may be utilized after determining the effectiveness of insertion at the cut lines and also determining the resources available on the target device for sequential element insertion.
- the clock duty cycle for the original clock clocking the bounding sequential elements is 50%, where the clock is low for Period(Clk)/2 and high for Period(Clk)/2, the clock frequency of the clock signal clocking the bounding sequential elements are the same, the clock skew is negligible, and the setup, hold, and clock to output delays on the pipelined sequential elements are negligible.
- the intrinsic set setup delay on a sequential element is the time before an active clock edge during which a data signal must be kept steady.
- the intrinsic hold delay is the time after an active clock edge during which the data signal must be kept steady. If the data is not steady for these regions of time before and after the active clock edge, then the sequential element may not capture the data and the sequential element may be in an unstable state.
- the intrinsic clock to output delay is the delay between the arrival of the active clock edge at the sequential element and the time at which a change in the value stored in the register is propagated to the output of the sequential element.
- the propagation delay through the combinatorial logic is bounded above by Period(Clk).
- the worst case delay through the early combinatorial logic is bound by Period(Clk)/2 so that the final result at the cut line appears in time to be latched by the pipeline sequential logic elements with the inverted clock.
- the worst case delay through the late combinatorial logic is bound by Period(Clk)/2 so that the final results computed from the latched values at the pipeline registers can reach the destination registers.
- the duty cycle is not at 50%, other adjustments may be made to compensate for the condition.
- the maximum delay through the early combinatorial logic may be bounded above by 8 ns.
- the final values of the early combinatorial logic will be captured by the inserted sequential elements on the falling clock edge 8 ns after the clock period starts.
- the maximum delay though the late combinatorial logic may be bounded above by 2 ns. This allows any change in the output of the inserted sequential elements to be propagated through the late combinatorial logic and be captured by the output sequential elements of the circuit on the next rising clock edge (10 ns after the start of the clock period).
- the clock skew is non-negligible
- other adjustments may be made to compensate for the condition.
- the clock period is 10 ns
- the duty cycle is 50%
- the delay from the clock source to the first stage pipelined sequential elements is negligible, but the delay from the clock source to the destination sequential element is 3 ns.
- the maximum delay through the early combinatorial logic may be up to 8 ns. This accounts for 5 ns of delay between the rising clock edge and the falling clock edge and also 3 ns extra delay in the propagation of the clock signal to the inserted sequential elements.
- the maximum delay through the late combinatorial logic is still 5 ns because both the inserted sequential elements and the late combinatorial logic see the same clock delay of 3 ns so the launch and capture edges are 5 ns apart.
- the intrinsic setup delay may be subtracted from the overall delay budget for the early combinatorial logic.
- the maximum delay through the early logic must be at most Period(Clk)/2 minus the intrinsic setup delay on the pipeline sequential elements.
- the intrinsic hold time on the pipelined sequential elements is non-negligible, other adjustments may be made to compensate for the condition.
- the minimum delay through the early combinatorial logic should be no less than the intrinsic hold time of the pipelined sequential elements. Otherwise, changes will be propagated through the early combinatorial logic in effectively zero clock cycles rather than a half clock cycle.
- the intrinsic clock to output delay should be subtracted from the maximum delay budget of the late combinatorial logic.
- the maximum delay though the late combinatorial logic is bounded above by Period(Clk)/2 minus the intrinsic clock to output delay of the pipeline registers.
- the bounding sequential elements of a combinatorial logic cloud may operate at the same frequency but may be clocked with a clocking signal having a fixed phase difference.
- the pre-existing phase difference should be taken into account when computing the critical path delay budgets for the corresponding Early and Late clouds of combinatorial logic.
- FIG. 7 illustrates a system designer unit 700 according to an embodiment of the present invention.
- the system designer unit 700 may be an EDA tool.
- FIG. 7 illustrates software modules implementing an embodiment of the present invention.
- system design may be performed by a computer system (not shown) executing sequences of instructions represented by the software modules shown in FIG. 7 . Execution of the sequences of instructions causes the computer system to support system design as will be described hereafter.
- hard-wire circuitry may be used in place of or in combination with software instructions to implement the system design unit 700 .
- the system design unit 700 is not limited to any specific combination of hardware circuitry and software.
- Block 710 represents a system designer manager.
- the system designer manager 710 is coupled to and transmits information between the components in the system design unit 700 .
- Block 720 represents a synthesis unit.
- the synthesis unit 720 generates a logic design of a system to be implemented by a target device.
- the synthesis unit 720 takes a conceptual Hardware Description Language (HDL) design definition and generates an optimized logical representation of the system.
- the optimized logical representation of the system generated by the synthesis unit 720 may include a representation that has a minimized number of functional blocks and registers, such as logic gates and logic elements, required for the system.
- the optimized logical representation of the system generated by the synthesis unit 720 may include a representation that has a reduced depth of logic and that generates a lower signal propagation delay.
- the synthesis unit 720 also determines how to implement the functional blocks and registers in the optimized logic representation utilizing specific resources on a target device, thus creating an optimized “technology-mapped” netlist.
- the technology-mapped netlist indicates how the resources on the target device can be utilized to implement the system.
- the technology-mapped netlist may, for example, contain components such as LEs on the target device.
- FIG. 3 illustrates an exemplary target device 300 in which a system may be implemented on utilizing an FPGA according to an embodiment of the present invention.
- block 730 represents a placement unit.
- the placement unit 730 fits the system on the target device by determining which resources on the target device are to be used for specific functional blocks and registers.
- the placement unit 730 first determines how to implement portions of the optimized logic design in clusters.
- Clusters may represent a subset of the components on the target device 300 (shown in FIG. 3 ) such as, for example, a LAB having a plurality of logic blocks.
- the clusters may be placed by assigning the clusters to specific LABs on the target device.
- routing interconnections between the logic blocks may be performed.
- the placement unit 730 may utilize a cost function in order to determine a good assignment of resources on the target device.
- Block 740 represents a routing unit.
- the routing unit 740 determines the routing resources on the target device to use to provide interconnection between the functional blocks and registers on the target device.
- Block 750 represents a sequential element insertion unit.
- the sequential elements insertion unit 750 inserts one or more stages of pipelined sequential elements into glitch-prone combinatorial cones of logic to eliminate glitches in circuits of the system.
- the sequential elements only change value at most once per clock cycle and prevent glitches from propagating downstream, effectively filtering glitches out of the system.
- the clock signal transmitted to the inserted sequential elements are phase shifted.
- the phase in which to shift each set of pipelined sequential elements are determined based upon on the number of sets of pipelined sequential elements inserted into a combinatorial cone of logic.
- the insertion of sequential elements may be made after any one or more of the synthesis, placement, or routing procedures performed by the synthesis unit 720 , placement unit 730 , and routing unit 740 .
- incremental synthesis, placement, and/or routing may be performed without requiring entire design procedures to be re-executed.
- the sequential element insertion unit 750 includes a power estimation unit 751 .
- the power estimation unit 751 computes power estimates.
- the power estimates may include a metric that describes the overall power required for the system design.
- the power estimates may include a metric that describes the power consumption for each circuit or sub-circuit in the system design.
- the circuit may include a combinational or combinatorial cone of logic.
- the power estimate may include an estimate of signal activities for each resource, such as a net or block, in the system design.
- the signal activities may include a toggle rate and static probability.
- the power estimate may also include a glitchiness score for each signal and functional block. The glitchiness score indicates the impact a signal may have in producing glitches or a degree of susceptibility a functional block is to experiencing glitches.
- the sequential element insertion unit 750 includes a combinatorial logic identifier unit 752 .
- the combinatorial logic identifier unit 752 identifies combinatorial logic clouds which are candidate for sequential element insertion.
- the combinatorial logic clouds are bounded by sequential elements, such as registers, which are clocked by the same clock.
- the bounding sequential elements may be referred to as source and destination sequential elements.
- candidates for sequential element insertion are identified are identified from combinatorial logic clouds that have significant dynamic power and in which there is significant glitching. This may be achieved by identifying combinatorial logic clouds having a signals and/or functional blocks with associated glitchiness scores that exceed a threshold value.
- Combinatorial logic clouds which are candidates for optimization through sequential element insertion may include combinatorial logic where enough glitches could be filtered to offset power tradeoffs associated with the additional power required from the additional sequential elements inserted and resources for performing phase shifting (local clock generation logic, global signals, and/or clock delay elements).
- the sequential element insertion unit 750 includes a cut line unit 753 .
- the cut line unit 753 identifies cut lines to make in the identified combinatorial logic clouds.
- cut lines may be positioned to separate levels of logic.
- a cut line is inserted after a first level of functional blocks and before a second level of functional blocks.
- New (intermediate) cut lines may also be generated or positioned by moving one or more logic gates across a cut line previously positioned to separate levels of logic. Additional cut lines and/or intermediate cut lines may be inserted at each level of functional blocks or and then evaluated based upon the glitchiness score of signals and/or functional blocks in proximity of the cut lines.
- cut lines may be positioned based primarily on glitchiness scores of signals and/or functional blocks in the combinatorial logic cloud.
- the sequential element insertion unit 750 includes a sequential element placement unit 754 .
- the sequential element placement unit 754 places pipelined sequential elements at the cut lines. According to an embodiment of the present invention, in order to preserve functionality, one sequential element is placed along the intersection between a cut line and a path between a source sequential element and a destination sequential element.
- the sequential element insertion unit 750 includes a clocking unit 755 .
- the clocking unit 755 provides appropriate clocking signals to the sequential elements inserted into the system.
- the clocking signals may be phase shifted according to a number of cut lines inserted into the system to allow the functionality of the system to be preserved without increasing the latency of data transmitted through the cloud of combinatorial logic.
- Block 760 represents an assembly unit that performs an assembly procedure that creates a data file that includes the design of the system generated by the system designer 700 .
- the data file may be a bit stream that may be used to program the target device.
- the assembly unit 700 may output the data file so that the data file may be stored or alternatively transmitted to a separate machine used to program the target device. It should be appreciated that the assembly unit 700 may also output the design of the system in other forms such as on a display device or other medium.
- FIG. 8 is a block diagram of an exemplary computer system 800 in which an example embodiment of the present invention resides.
- the computer system 800 may be used to implement the system designer 700 shown in FIG. 7 .
- the computer system 800 includes a processor 801 that processes data signals.
- the processor 801 is coupled to a CPU bus 810 that transmits data signals between other components in the computer system 800 .
- the computer system 800 includes a memory 813 .
- the memory 813 may be a dynamic random access memory device, a static random access memory device, and/or other memory device.
- the memory 813 may store instructions and code represented by data signals that may be executed by the processor 801 .
- a bridge memory controller 811 is coupled to the CPU bus 810 and the memory 813 .
- the bridge memory controller 811 directs data signals between the processor 801 , the memory 813 , and other components in the computer system 800 and bridges the data signals between the CPU bus 810 , the memory 813 , and a first IO bus 820 .
- the first IO bus 820 may be a single bus or a combination of multiple buses.
- the first IO bus 820 provides communication links between components in the computer system 800 .
- a network controller 821 is coupled to the first IO bus 820 .
- the network controller 821 may link the computer system 800 to a network of computers (not shown) and supports communication among the machines.
- a display device controller 822 is coupled to the first IO bus 820 .
- the display device controller 822 allows coupling of a display device (not shown) to the computer system 800 and acts as an interface between the display device and the computer system 800 .
- a second IO bus 830 may be a single bus or a combination of multiple buses. The second IO bus 830 provides communication links between components in the computer system 800 .
- a data storage device 831 is coupled to the second IO bus 830 .
- the data storage device 831 may be a hard disk drive, a floppy disk drive, a CD-ROM device, a flash memory device or other mass storage device.
- An input interface 832 is coupled to the second IO bus 830 .
- the input interface 832 may be, for example, a keyboard and/or mouse controller or other input interface.
- the input interface 832 may be a dedicated device or can reside in another device such as a bus controller or other controller.
- the input interface 832 allows coupling of an input device to the computer system 800 and transmits data signals from an input device to the computer system 800 .
- a bus bridge 823 couples the first IO bus 820 to the second IO bus 830 .
- the bus bridge 823 operates to buffer and bridge data signals between the first IO bus 820 and the second IO bus 830 . It should be appreciated that computer systems having a different architecture may also be used to implement the computer system 800 .
- a system designer 840 may reside in memory 813 and be executed by the processor 801 .
- the system designer 840 may operate to synthesize a system, place the system on a target device, route the system, insert sequential elements into combinatorial logic in the system to reduce glitches where the sequential elements are clocked with a clock at a phase difference from a clock clocking source and destination sequential elements bounding the combinatorial logic, and assemble data for the system design.
- Embodiments of the present invention may be provided as a computer program product, or software, that may include an article of manufacture on a machine accessible or machine readable medium having instructions.
- the instructions on the machine accessible or machine readable medium may be used to program a computer system or other electronic device.
- the machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks or other type of media/machine-readable medium suitable for storing or transmitting electronic instructions.
- the techniques described herein are not limited to any particular software configuration. They may find applicability in any computing or processing environment.
- machine accessible medium or “machine readable medium” used herein shall include any medium that is capable of storing, encoding, or transmitting a sequence of instructions for execution by the machine and that cause the machine to perform any one of the methods described herein.
- machine readable medium e.g., any medium that is capable of storing, encoding, or transmitting a sequence of instructions for execution by the machine and that cause the machine to perform any one of the methods described herein.
- software in one form or another (e.g., program, procedure, process, application, module, unit, logic, and so on) as taking an action or causing a result. Such expressions are merely a shorthand way of stating that the execution of the software by a processing system causes the processor to perform an action to produce a result.
- FIGS. 1 and 4 are flow charts illustrating embodiments of the present invention. Some of the techniques illustrated in these figures may be performed sequentially, in parallel or in an order other than that which is described. It should be appreciated that not all of the techniques described are required to be performed, that additional techniques may be added, and that some of the illustrated techniques may be substituted with other techniques.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
Description
MAXDELAY(Support(P),P)+MAXDELAY(P,Affected(P))<Period(Clk). [1]
MAXDELAY(Support(P),P)<Phasedelay(Pipereg(P),Clk) [2]
Phasedelay(Pipereg(P),Clk)+MAXDELAY(Pipereg(P),Affected(P))<Period(Clk) [3]
MAXDELAY(Support(P),Pipereg(P))+MAXDELAY(Pipereg(P),Affected(P))<Period(Clk) [4].
Phasedelay(Pipereg(P),Clk)+MAXDELAY(Pipereg(P),Affected(P))<Period(Clk) [5]
Claims (24)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US11/807,437 US7774729B1 (en) | 2006-06-02 | 2007-05-29 | Method and apparatus for reducing dynamic power in a system |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US81077406P | 2006-06-02 | 2006-06-02 | |
| US11/807,437 US7774729B1 (en) | 2006-06-02 | 2007-05-29 | Method and apparatus for reducing dynamic power in a system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US7774729B1 true US7774729B1 (en) | 2010-08-10 |
Family
ID=42536697
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US11/807,437 Expired - Fee Related US7774729B1 (en) | 2006-06-02 | 2007-05-29 | Method and apparatus for reducing dynamic power in a system |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US7774729B1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120137263A1 (en) * | 2010-11-29 | 2012-05-31 | International Business Machines Corporation | Timing closure in chip design |
| US20180232475A1 (en) * | 2015-02-20 | 2018-08-16 | Altera Corporation | Method and apparatus for performing register retiming in the presence of false path timing analysis exceptions |
| US10339238B2 (en) * | 2015-02-20 | 2019-07-02 | Altera Corporation | Method and apparatus for performing register retiming in the presence of timing analysis exceptions |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040015803A1 (en) * | 2002-07-18 | 2004-01-22 | Huang Steve C. | Timing based scan chain implementation in an IC design |
| US6957403B2 (en) * | 2001-03-30 | 2005-10-18 | Syntest Technologies, Inc. | Computer-aided design system to automate scan synthesis at register-transfer level |
| US20050268265A1 (en) * | 2004-06-01 | 2005-12-01 | Mentor Graphics Corporation | Metastability effects simulation for a circuit description |
| US7017132B2 (en) * | 2003-11-12 | 2006-03-21 | Taiwan Semiconductor Manufacturing Company | Methodology to optimize hierarchical clock skew by clock delay compensation |
| US7296249B2 (en) * | 2003-10-10 | 2007-11-13 | Thomas Hans Rinderknecht | Using constrained scan cells to test integrated circuits |
-
2007
- 2007-05-29 US US11/807,437 patent/US7774729B1/en not_active Expired - Fee Related
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6957403B2 (en) * | 2001-03-30 | 2005-10-18 | Syntest Technologies, Inc. | Computer-aided design system to automate scan synthesis at register-transfer level |
| US20040015803A1 (en) * | 2002-07-18 | 2004-01-22 | Huang Steve C. | Timing based scan chain implementation in an IC design |
| US7127695B2 (en) * | 2002-07-18 | 2006-10-24 | Incentia Design Systems Corp. | Timing based scan chain implementation in an IC design |
| US7296249B2 (en) * | 2003-10-10 | 2007-11-13 | Thomas Hans Rinderknecht | Using constrained scan cells to test integrated circuits |
| US7017132B2 (en) * | 2003-11-12 | 2006-03-21 | Taiwan Semiconductor Manufacturing Company | Methodology to optimize hierarchical clock skew by clock delay compensation |
| US20050268265A1 (en) * | 2004-06-01 | 2005-12-01 | Mentor Graphics Corporation | Metastability effects simulation for a circuit description |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120137263A1 (en) * | 2010-11-29 | 2012-05-31 | International Business Machines Corporation | Timing closure in chip design |
| US8769470B2 (en) * | 2010-11-29 | 2014-07-01 | International Business Machines Corporation | Timing closure in chip design |
| US20180232475A1 (en) * | 2015-02-20 | 2018-08-16 | Altera Corporation | Method and apparatus for performing register retiming in the presence of false path timing analysis exceptions |
| US10339238B2 (en) * | 2015-02-20 | 2019-07-02 | Altera Corporation | Method and apparatus for performing register retiming in the presence of timing analysis exceptions |
| US10671781B2 (en) * | 2015-02-20 | 2020-06-02 | Altera Corporation | Method and apparatus for performing register retiming in the presence of false path timing analysis exceptions |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8640067B1 (en) | Method and apparatus for implementing a field programmable gate array clock skew | |
| US8291356B2 (en) | Methods and apparatuses for automated circuit design | |
| US8381142B1 (en) | Using a timing exception to postpone retiming | |
| US9026967B1 (en) | Method and apparatus for designing a system on multiple field programmable gate array device types | |
| US8739101B1 (en) | Systems and methods for reducing logic switching noise in parallel pipelined hardware | |
| US8856702B1 (en) | Method and apparatus for performing multiple stage physical synthesis | |
| US9148151B2 (en) | Configurable storage elements | |
| US8701069B1 (en) | Systems and methods for optimizing allocation of hardware resources to control logic in parallel pipelined hardware | |
| US8671371B1 (en) | Systems and methods for configuration of control logic in parallel pipelined hardware | |
| US9275178B1 (en) | Method and apparatus for considering paths influenced by different power supply domains in timing analysis | |
| US8185854B1 (en) | Method and apparatus for performing parallel slack computation within a shared netlist region | |
| US8793629B1 (en) | Method and apparatus for implementing carry chains on FPGA devices | |
| US8954906B1 (en) | Method and apparatus for performing parallel synthesis on a field programmable gate array | |
| US9230047B1 (en) | Method and apparatus for partitioning a synthesis netlist for compile time and quality of results improvement | |
| US8578306B2 (en) | Method and apparatus for performing asynchronous and synchronous reset removal during synthesis | |
| US7774729B1 (en) | Method and apparatus for reducing dynamic power in a system | |
| US8443334B1 (en) | Method and apparatus for generating graphical representations of slack potential for slack paths | |
| US8286109B1 (en) | Method and apparatus for performing incremental delay annotation | |
| Quinton et al. | Asynchronous IC interconnect network design and implementation using a standard ASIC flow | |
| US7308671B1 (en) | Method and apparatus for performing mapping onto field programmable gate arrays utilizing fracturable logic cells | |
| Lung et al. | Clock skew optimization considering complicated power modes | |
| US8930175B1 (en) | Method and apparatus for performing timing analysis that accounts for rise/fall skew | |
| US7725856B1 (en) | Method and apparatus for performing parallel slack computation | |
| US8558599B1 (en) | Method and apparatus for reducing power spikes caused by clock networks | |
| US8468487B1 (en) | Method and apparatus for implementing cross-talk based booster wires in a system on a field programmable gate array |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: ALTERA CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NETO, DAVID;REEL/FRAME:019411/0693 Effective date: 20070523 |
|
| FPAY | Fee payment |
Year of fee payment: 4 |
|
| FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.) |
|
| LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
| FP | Expired due to failure to pay maintenance fee |
Effective date: 20180810 |