US20010032067A1 - Method and system for determining optimal delay allocation to datapath blocks based on area-delay and power-delay curves - Google Patents

Method and system for determining optimal delay allocation to datapath blocks based on area-delay and power-delay curves Download PDF

Info

Publication number
US20010032067A1
US20010032067A1 US09/474,008 US47400899A US2001032067A1 US 20010032067 A1 US20010032067 A1 US 20010032067A1 US 47400899 A US47400899 A US 47400899A US 2001032067 A1 US2001032067 A1 US 2001032067A1
Authority
US
United States
Prior art keywords
delay
candidate binding
feasible
constraints
datapath
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US09/474,008
Other versions
US6327552B2 (en
Inventor
Mahadevamurty Nemani
Franklin Baez
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US09/474,008 priority Critical patent/US6327552B2/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BAEZ, FRANKLIN, NEMANI, MAHADEVAMURTY
Publication of US20010032067A1 publication Critical patent/US20010032067A1/en
Application granted granted Critical
Publication of US6327552B2 publication Critical patent/US6327552B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/327Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/06Power analysis or power optimisation

Definitions

  • the present invention relates to computer systems.
  • the invention relates to circuit design techniques and related computer-aided design (“CAD”) software tools.
  • CAD computer-aided design
  • Register transfer language (“RTL”) to schematic partitioning has also made the power-delay optimization problem more difficult for designers. Without proper knowledge of power-delay tradeoff points at the micro architecture level, circuit designers are forced to upsize entire blocks to meet circuit performance targets. For some designs, however, certain timing can be reallocated to adjacent blocks, and these blocks can then be concurrently downsized and upsized to further achieve a lower power design at the same original delay specification. Unfortunately, while some aspects of recalculating reallocated power designs and delays between blocks has been automated, existing systems still require the designers to manually reallocate the power designs and delays using alternate implementations of the blocks within the design. As the number of blocks and the number of possible implementations for each block both increase, so does the difficulty of manually redesigning and reallocating the power designs and delays.
  • Embodiments of the present invention provide a method, system and computer program product for automatically determining optimal design parameters of a subsystem to meet design constraints.
  • the subsystem comprises a plurality of circuits.
  • the optimal design parameters are determined by performing a parameter-delay curve optimization of the subsystem design parameters.
  • FIG. 1A is a diagram illustrating an engineering design cycle in accordance with the teachings of the invention.
  • FIG. 1B is a diagram illustrating a computer system in which one embodiment of the present invention may be utilized.
  • FIG. 2 is a diagram illustrating a design optimization phase according to one embodiment of the invention.
  • FIG. 3 is a diagram illustrating power-delay curves according to one embodiment of the invention.
  • FIG. 4 is a diagram illustrating a macrograph of datapath macros representing a circuit design for use according to one embodiment of the invention.
  • FIG. 5 is a diagram illustrating a piece-wise approximation of an area-delay trade-off curve for use according to one embodiment of the invention.
  • FIG. 6 is a flow diagram illustrating a method for performing an area-delay curve based determination of optimal design parameter values according to one embodiment of the invention.
  • FIG. 7 is a diagram illustrating an example of an arithmetic logic unit datapath functional block according to one embodiment of the invention.
  • FIG. 8A is a diagram illustrating a power-delay curve for the input multiplexer shown in FIG. 8 according to one embodiment of the invention.
  • FIG. 8B is a diagram illustrating a power-delay curve for the comparator shown in FIG. 8 according to one embodiment of the invention.
  • FIG. 8C is a diagram illustrating a power-delay curve for the static adder shown in FIG. 8 according to one embodiment of the invention.
  • FIG. 8D is a diagram illustrating a power-delay curve for the output multiplexer shown in FIG. 8 according to one embodiment of the invention.
  • FIG. 9 is a diagram illustrating a comparison of the power-delay curves for the three different implementation of an example circuit according to one embodiment of the invention.
  • Embodiments of the present invention provide a method and computer program product for determining optimal values for the design parameters of a circuit block, which result in optimally assigned delay targets for datapath blocks at the minimum power/area point.
  • the problem/solution space is extended to solve the problem of figuring out the best possible implementation (for example, static vs. domino) for each datapath block.
  • Parameter functions relating the design parameters for circuits in the circuit block are created. Based on these parameter functions, the design parameters are optimized to satisfy the design constraints.
  • the design parameters include power and delay and the parameter functions are power-delay curves.
  • the power-delay curves are generated using a timing simulator, a power estimator, and transistor sizing tools.
  • the design parameters include area and delay and the parameter functions are area-delay curves.
  • Embodiments of the present invention provide a technique to help designers automatically perform trade-off analyses to optimize the design within the specified design constraints.
  • the designer In a circuit design, the designer, usually a design engineer, is typically faced with a number of design parameters and design constraints.
  • the design constraints are usually dictated by the system requirements and specifications. Examples of the design constraints include propagation delay, power consumption, packaging, number of input/output (“I/O”) lines, etc.
  • the design constraints are typically imposed on one or more design parameters, while leaving other parameters to be optimized to achieve high performance.
  • the design parameters therefore, are divided into two parameter sets: a constraint set and an optimizing set.
  • the “constraint set” includes constraint parameters which are the parameters that have to meet the design constraints.
  • the “optimizing set” includes the optimizing parameters which are the parameters that need to be optimized.
  • a constraint parameter is the propagation delay and an optimizing parameter is the power consumption.
  • the propagation delay is the optimizing parameter and the power consumption is the constraint parameter.
  • a “parameter function” describes the variation of one parameter as a function of another parameter.
  • a parameter function may describe the variation of the power consumption as a function of the delay.
  • the variation of one parameter as a function of another is typically caused by a configuration of the circuit such as the size of the transistors, the choice of circuit technology (for example, domino versus static), etc.
  • a configuration of the circuit that gives rise to the particular values of the design parameters corresponds to a design point.
  • a system, a subsystem, a module or a functional block may consist of a number of circuits. Each circuit is characterized by a parameter function. Optimizing the design of a subsystem or functional block involves a trade-off consideration of all the parameter functions of all the individual circuits of the subsystem or functional block. For a parameter function of a given circuit, there are many design points corresponding to different circuit configurations. Therefore, optimizing a subsystem or functional block involves the selection of the design points on the parameter functions that provide the optimal values of the optimizing parameters and acceptable values of the constraint parameters.
  • the present invention provides a technique to automatically determine an optimal design based on the parameter functions using linear programming techniques.
  • FIG. 1A is a diagram illustrating an example of an engineering design cycle in accordance with the teachings of the invention.
  • the engineering design cycle 100 includes a first logic synthesis phase 110 , a circuit design phase 120 , a design optimization phase 130 , and a second logic synthesis phase 140 .
  • the first logic synthesis phase 110 provides the high level logic description and/or design of the circuits.
  • the designer synthesizes the circuits manually or using a number of tools including Computer-Aided Design (“CAD”) tools.
  • CAD tools include hardware description language (“HDL”) compilers, and schematic entry tool.
  • HDL hardware description language
  • RTL register transfer language
  • micro architecture micro architecture
  • the circuit design phase 120 receives the generated logic synthesis files to generate the synthesized circuits.
  • the synthesized circuits may be represented by circuit schematics, a netlist of the circuits, or any other convenient form that can be further processed by additional CAD tools.
  • the circuit design phase 120 represents an unoptimized complete design that shows subsystems or functional blocks at the detailed implementation level for the synthesized circuits.
  • the design optimization phase 130 determines the optimal values for the design parameters to meet the design constraints.
  • the design engineer uses a design workstation or a computer system 132 .
  • the computer system 132 is supported by a design environment which includes the operating system and many CAD tools such as timing analyzer, power estimator, transistor sizing tool to adjust the design parameters according to the allowable design budgets.
  • the design optimization phase 130 typically produces a number of parameter functions that relate the design parameters for the circuits.
  • An example of such a parameter function is a power-delay curve 135 .
  • the power-delay curve 135 shows the relationship between the power consumption and the propagation delay for a particular circuit in a functional block.
  • the power-delay curve 135 has a number of design points corresponding to different implementations or configurations of the circuit under consideration.
  • the power-delay curve 135 provides the design engineer the basic information to optimize his or her circuit under the specified design constraints.
  • the exemplary power-delay curve 135 has three design points A, B, and C.
  • the design point A corresponds to a circuit implementation that has high power consumption and fast speed, representing an undesirable implementation because of excessive power consumption.
  • the design point B corresponds to the optimal power consumption and optimal speed, also representing the best circuit implementation.
  • the design point C corresponds to low power consumption and acceptable speed, representing a desirable implementation. If the circuit implementation is at the design point A, the design engineer will have the option to go back to the first logic synthesis phase 110 or the circuit design phase 120 . If the circuit implementation is at the design point C, the design engineer will go to the second logic synthesis phase 140 .
  • the second logic synthesis phase 140 is essentially the same as the first logic synthesis phase 110 with the exception that the design engineer now focuses more on giving the extra design margin to other circuits in the subsystem or functional block.
  • the low power consumption at the design point C provides more margin to the power budget for other circuits.
  • the design engineer modifies the circuit synthesis based on the extra margin, such as repartitioning, floor-plan editing, sizing, etc.
  • FIG. 1B is a diagram illustrating one embodiment of a computer system 132 in which one embodiment of the present invention may be utilized.
  • the computer system 132 comprises a processor 150 , a host bus 155 , a peripheral bridge 160 , a storage device 165 , an advanced graphics processor 175 , a video monitor 177 , and a peripheral bus 180 ,
  • the processor 150 represents a central processing unit of any type of architecture, such as complex instruction set computers (“CISC”), reduced instruction set computers (“RISC”), very long instruction word (“VLIW”), or hybrid architecture.
  • CISC complex instruction set computers
  • RISC reduced instruction set computers
  • VLIW very long instruction word
  • the processor 150 is coupled to the peripheral bridge 160 via the host bus 155 . While this embodiment is described in relation to a single processor computer system, the invention can be implemented in a multi-processor computer system.
  • the peripheral bridge 160 provides an interface between the host bus 115 and a peripheral bus 180 .
  • the peripheral bus 180 is the Peripheral Components Interconnect (“PCI”) bus.
  • the peripheral bridge 160 also provides the graphic port, for example, Accelerated Graphics Port (“AGP”), or the graphics bus 172 for connecting to a graphics controller or advanced graphics processor 175 .
  • the advanced graphics processor 175 is coupled to a video monitor 177 .
  • the video monitor 177 displays graphics and images rendered or processed by the graphics controller 125 .
  • the peripheral bridge 160 also provides an interface to the storage device 165 .
  • the storage device 165 represents one or more mechanisms for storing data.
  • the storage device 165 may include non-volatile or volatile memories. Examples of these memories include flash memory, read only memory (“ROM”), or random access memory (“RAM”).
  • FIG. 1B also illustrates that the storage device 165 has stored therein data 167 and program code 166 .
  • the data 167 stores graphics data and temporary data.
  • Program code 166 represents the necessary code for performing any and/or all of the techniques in the present invention.
  • the storage device 165 preferably contains additional software (not shown), which is not necessary to understanding the invention.
  • the peripheral bus 180 represents a bus that allows the processor 150 to communicate with a number of peripheral devices.
  • the peripheral bus 180 provides an interface to a peripheral-to-expansion bridge 185 , peripheral devices 190 1 to 190 N , a mass storage controller 192 , a mass storage device 193 , and mass storage media 194 .
  • the peripheral devices 190 1 to 190 N represent any device that is interfaced to the peripheral bus 180 . Examples of peripheral devices are fax/modem controller, audio card, network controller, etc.
  • the mass storage controller 192 provides control functions to the mass storage device 193 .
  • the mass storage device 193 is any device that stores information in a non-volatile manner. Examples of the mass storage device 193 includes hard disk, floppy disk, and compact disk (“CD”) drive.
  • the mass storage device 193 receives the mass storage media 194 and reads their contents to configure the design environment for the design engineer.
  • the mass storage media 194 contain programs or software packages used in the environment.
  • the mass storage media 194 represent a computer program product having program code or code segments that are readable by the processor 150 .
  • a program code or a code segment includes a program, a routine, a function, a subroutine, or a software module that is written in any computer language (for example, high level language, assembly language, machine language) that can be read, processed, compiled, assembled, edited, downloaded, transferred, or executed by the processor 150 .
  • the mass storage media 194 include any convenient media such as floppy diskettes, compact disk read only memory (“CD-ROM”), digital audio tape (“DAT”), optical laser disc, or communication media (e.g., Internet, radio frequency link, fiber optics link).
  • FIG. 1B shows floppy diskettes 195 and CD-ROM 196 .
  • the floppy diskettes 195 and/or CD-ROM 196 contain design environment 198 .
  • Examples of the tools or computer readable program code in the design environment 198 include operating system, computer aided design (“CAD”) tools such as schematic capture, hardware description language (“HDL”) compiler, text editors, netlist generator, timing analyzer, power vector generator, timing simulator, power simulator, circuit configuration, component sizer, parameter function generator, parameter optimizer, and graphics design environment.
  • CAD computer aided design
  • HDL hardware description language
  • the peripheral-to-expansion bridge 187 represents an interface device between the peripheral bus 180 and an expansion bus 187 .
  • the expansion bus 187 represents a bus that interfaces to a number of expansion devices 188 1 to 188 K .
  • Example of an expansion device includes a parallel input/output (“I/O”) device, a serial communication interface device.
  • the expansion bus 187 is an Industry Standard Architecture (“ISA”) or Extended Industry Standard Architecture (“EISA”) bus.
  • the computer system 132 can be used in all or part of the phases of the design process.
  • the processor 150 executes instructions in the program 166 to access data 167 and interact with the design environment 198 .
  • the computer system 132 is used in the design optimization phase 130 .
  • FIG. 2 is a diagram illustrating a design optimization phase according to one embodiment of the invention.
  • the design optimization phase 130 includes a netlist generation module 210 , a critical path generation module 223 , a power vector generation module 227 , a delay calculation module 233 , a power calculation module 237 , a circuit configuration module 240 , a parameter function generation module 250 , and an optimization module 260 .
  • Each of these modules may be a software module or a hardware module or a combination of both. In one embodiment, these modules are implemented by program code that are readable and executed by the processor 150 .
  • the netlist generation module 210 generates the circuit netlist which provides the information on component identification and how the components of the circuit are interconnected.
  • the circuit netlist becomes the input to the critical path generation module 223 and the power vector generation module 227 .
  • the critical path generation module 223 generates timing delays of various paths in the circuit based on circuit components and interconnection patterns. From these timing delays, the critical path(s) is (are) identified.
  • the critical path represents the path through which the overall propagation delay is the most critical, e.g., timing parameters (e.g., setup time, hold time) are difficult to satisfy.
  • the timing files generated by the critical path generation module 223 become the input to the delay calculation module 233 .
  • the delay calculation module 233 calculates the delays of the critical paths and other paths using a timing simulator.
  • the timing simulator is the PathMill tool, developed by Epic Technologies, now owned by Synopsys, of Mountain View, Calif.
  • the timing values are then forwarded to the circuit configuration module 240 .
  • the power vector generation module 227 generates power vectors as input to the power calculation module 237 .
  • the power calculation module 237 calculates the power consumption of the circuit using a power estimator tool.
  • the power estimator tool is the PowerMill tool, developed by Epic Technologies of Mountain View, Calif. The power values are then forwarded to the circuit configuration module 240 .
  • the circuit configuration module 240 configures the circuit to effectuate the power consumption and delay.
  • One configuration is scaling the sizes (e.g., transistor size) of the circuit components using a sizing tool.
  • the sizing tool is Amps developed by Epic Technologies of Mountain View, Calif. The sizing tool applies scale factors to scale down the circuit elements either globally or locally. The resulting circuit is then simulated again for the next delay and power values.
  • the circuit configuration module 240 generates new circuit information to be fed back to the delay calculation module 233 and the power calculation module 237 . The process continues until all the values within the range of the scaling have been used. Then the delay and power values are forwarded to the parameter function generation module 250 .
  • the parameter function generation module 250 generates the parameter function (e.g., power-delay curves) showing the relationship between the design parameters.
  • the parameter function generation module 250 may also generate the design parameters in any other convenient forms for later processing.
  • the optimization module 260 receives the values of the design parameters either in the form of a parameter curve, or in any other convenient format. The optimization module 260 determines the optimal values of the design parameters.
  • FIG. 3 is a diagram illustrating a power-delay curve according to one embodiment of the invention.
  • the power-delay curves show two curves: a domino curve 310 and a static curve 320 .
  • the power-delay curves in FIG. 3 show the parameter function for an arithmetic circuit.
  • the arithmetic circuit can be designed using a domino circuit technology or a static circuit technology.
  • the domino curve 310 is the power-delay curve for the circuit using the domino circuit technology and the static curve 320 is the power-delay curve for the circuit using the static circuit technology.
  • the domino curve 310 has two design points A and B.
  • the design point A corresponds to the current domino design. At this design point, the circuit has a delay of approximately 1.35 nsec and a power consumption of approximately 14 mA.
  • the design point B corresponds to another domino design with longer delay at approximately 1.62 nsec and a power consumption of approximately 6.1 mA. Therefore the saving in power to go from design point A to design point B is 53% for a delay penalty of 23%.
  • the static curve 320 has a design point C.
  • the static curve 320 has a delay limit at approximately 1.42 nsec.
  • the design point C is at a delay of approximately 1.62 nsec and a power consumption of approximately 4.5 mA. Therefore, the design point C has approximately the same delay as the design point B of the domino curve 310 but has an additional power saving of 16%.
  • the parameter curve therefore provide the design engineer an immediate visualization of the relationship between the design parameters, e.g., power, delay, so that optimization can be carried out.
  • FIG. 4 a macro graph of datapath macros representing a circuit design for use according to one embodiment of the invention is illustrated.
  • M 1 410 , M 2 420 , M 3 430 , and M 4 440 are datapath macros for which area-delay trade-off curves, such as that shown in FIG. 2, for their different implementations are available. Therefore, when a designer wishes to meet a specified delay target from data A 450 to output O 470 and from data B 460 to output O 470 , the designer needs to answer the following questions:
  • determining the optimal power solution is performed using the same method used to optimize for area. The only difference is that in order to optimize for power, the power-delay curves are used instead of the area-delay curves for each macro block. Also, the objective function when optimizing for power is the sum of the powers dissipated by the macros.
  • An embodiment of the present invention assumes that there are m macros in the macro graph to be optimized. For example, in FIG. 4, there are 4 macros in the macro graph. Also, in this embodiment specific implementations for each of the macros are assumed. Given this information, what the delay assignment for each of these implementations can be calculated so as to meet the delay constraints. As stated earlier, in this embodiment of the present invention, the area-delay trade-off curve for each implementation of all the macros is known a-priori. These can be generated very efficiently using external CAD vendor tools like AMPS or more advanced internal Intel proprietary tools that employ the methods shown in FIG. 2.
  • this embodiment of the present invention begins by forming a piecewise linear approximation of each of the area-delay trade-off curves.
  • These piecewise approximations can be made arbitrarily accurate by increasing the number of linear pieces. While an exemplary piecewise linear approximation of an area-delay curve 510 is shown in FIG. 5 with three separate piecewise approximation sections 520 a, 520 b, and 520 c, respectively, the number of approximation sections can easily be increased to four or more for more accurate approximations Similarly, the number of approximation sections can be decreased to two or one with an attendant decrease in the accuracy of the approximation.
  • a i and D i are the area and delay variables, respectively, associated with the implementation of macro ‘i’ and each piecewise linear approximation is normalized.
  • the area-delay curve represents a Pareto-optimal curve, it follows that the piece-wise linear approximation of the area-delay curve generates a convex set of all feasible realizations of the implementation.
  • the above equations still apply and the area variable, A i , is replaced by a power variable, C i .
  • the different paths (p j ) through the macro graph be contained in the set P.
  • the set P for macro graph in FIG. 4 contains two paths, the first one from data ‘A’ 450 to output ‘O’ 470 and, the second from data ‘B’ 460 to output ‘O’ 470 .
  • the set P contains N paths. Then, for each p ⁇ P, the delay constraint on it can be written as,
  • m is equal to the number of macros and N is equal to the number of paths.
  • N is equal to the number of paths.
  • D i,min and D i,max are the minimum and maximum possible delays associated with the implementation of macro ‘i’.
  • OPT 1 is a linear programming problem, it can be solved efficiently using tools like COPL_LP (A linear programming solver from the University of Iowa), or commercial tools like MATLAB.
  • MATLAB is developed by the Mathworks, Incorporated of Natick, Mass. It is important to note that if the problem is infeasible, it implies that the implementations chosen for the macros can not meet the delay constraints. However, if the problem is feasible, then the optimal delays to be allocated to various macros in the graph that would lead to a minimum area solution will result.
  • a check for the feasibility of the optimization problem is performed by determining if the minimum-delay values of the implementations satisfy the generated delay constraints. For example, in FIG. 4, the minimum delays of the implementations of macros 1 through 4 are checked to determine if they satisfy the delay constraints on the paths from data ‘A’ 450 to output ‘O’ 470 and from data ‘B’ 460 to output ‘O’ 470 . If the implementations of macros 1 through 4 are not feasible, then the current implementations chosen for the macros can not meet the designer specified delay constraints. If the implementations of macros 1 through 4 are feasible, then the optimization problem OPT 1 can be solved.
  • the present invention solves the problem of finding the optimal delay assignment to the macros, when an implementation has already been chosen for the macros by the designer. However, this will not always produce the “best”, that is the most optimal, solution. Therefore, in another embodiment, the present invention determines the “best” implementation for the macros under designer specified delay constraints.
  • An embodiment of the present invention provides an approach to simultaneously search for the optimal solution among all the possible binding solutions. This is achieved as follows. Let A ik and D ik be the area and delay associated with implementation ‘k’ of macro ‘i’. Also define a new variable called A to measure the optimal area of the solution. Then it follows that:
  • D ik,min and D ik,max are the minimum and maximum possible delays associated with implementation ‘k’ of macro ‘i’.
  • a pre-requisite for this optimization is a database of area-delay curves for the macros under consideration and their corresponding implementations 680 .
  • the system takes as input a macro graph, which has been previously generated from the RTL description of the design. From this macro graph, the system automatically generates all of the possible paths in the graph. The designer specifies the delay constraints on the paths.
  • the system Based on the implementations for the macros, in the database, the system generates candidate binding solutions. For each binding solution, the system checks for unfeasibility. If the solution is infeasible, the system moves on to the next candidate binding. Otherwise, the system generates the constraints for the linear program. This procedure is repeated until all binding solutions have been exhausted. Finally, the system solves the linear program to produce the optimal solution, which corresponds to the optimal binding along with the optimal delay allocation.
  • Block 610 the RTL description of a circuit Functional Unit Block (“FUB”) is translated into a graph describing the connectivity of the macros (adders, multiplexers, etc.) in the FUB and transmitted to block 620 .
  • Block 610 can either be done automatically (using an internally developed CAD tool like REAL) or manually by the designer.
  • REAL is a proprietary Intel® Corporation CAD tool
  • the graph provides information regarding how data and control flow through the FUB.
  • the graph also forms the basis for extracting the various paths through which information can flow in the design.
  • the system automatically extracts the various paths through which information can move in the design to ensure that the timing constraints, which determine design performance, are met on all of these paths for a successful design.
  • each macro in the design can be potentially implemented in several ways, an association of implementations with macros is referred to as a candidate binding solution.
  • a candidate solution is generated by choosing an implementation for each macro block in the FUB and then applying the subsequent steps in the flow.
  • Each candidate solution is then transmitted to block 640 to determine if that implementation meets the timing constraints on the extracted paths found in block 620 .
  • the area-delay curves for all of the possible implementations of the macros are read from the database 680 and used to determine the candidate solution is feasible.
  • the candidate solution is discarded and the system returns to block 630 to generate another candidate solution. If in block 640 , the candidate solution is found to be feasible then, in block 650 , the system generates constraints for the candidate solution using the area-delay curves associated with the chosen implementations in database 680 for the linear program. These constraints are shown in OPT 2 .
  • the area-delay curves in the library of database 680 can be generated by using either a commercial CAD tool or internally developed Intel CAD tools.
  • LP complete linear program
  • the system is configured and used to optimize power for the macro graph of FIG. 4, as described above for the area optimization.
  • FIG. 7 is a diagram illustrating an example of an arithmetic logic unit (“ALU”) datapath subsystem or FUB according to one embodiment of the invention.
  • the ALU datapath FUB 700 includes an input multiplexer (“MUX”) 710 , a comparator 720 , a static adder 730 , and an output MUX 740 .
  • the ALU datapath FUB 700 is a common design used in the processor 150 or the graphic processor 175 in FIG. 1B
  • the design parameters include power and delay and the parameter function is the power-delay curve.
  • the constraint parameter is the propagation delay through the ALUFUB 700 and the optimizing parameter is the power. The optimization is to minimize the overall power consumption while keeping the propagation delay within the specified design constraint.
  • the input MUX 710 , the comparator 720 , the static adder 730 and the output MUX 740 form a cascaded chain of circuit elements which has a critical path going from one end to the other end.
  • the composite delay is the sum of the individual delays through each of the circuit elements.
  • these circuit elements are active, e.g., the power consumption of the ALUFUB 700 is the sum of the individual power consumption.
  • the delay requirement from input to output was specified as 4350 picoseconds (“ps”).
  • the power-delay curves were linearized, that is, “approximated,” similar to FIG. 5, with 6 linear pieces for the input MUX 710 , comparator 720 and output MUX 730 , and 5 linear pieces for the adder 740 .
  • All of the generated constraints were fed to a linear program solver, in this case COPL_LP.
  • the final solution generated by the solver was 19.2% smaller in power than a manually determined solution by a designer.
  • the solution generated using the embodiment of the present invention was also 9.8% smaller than the solution generated by hand optimization of the power-delay curves.
  • FIG. 8A is a diagram illustrating a power-delay curve 810 A for the input multiplexer shown in FIG. 7 according to one embodiment of the invention.
  • the power-delay curve 810 A has two design points, A and B.
  • the design point A has a delay value of 0.25 nsec and a power value of 3.2 mA.
  • the design point B has a delay value of 0.29 nsec and a power value of 1.79 mA.
  • a and B are the initial and new design points, respectively.
  • the arrow shows the move from design point A to design point B during the design optimization phase 130 .
  • FIG. 8B is a diagram illustrating a power-delay curve 810 B for the comparator shown in FIG. 7 according to one embodiment of the invention.
  • the power-delay curve 810 B has two design points, C and D.
  • the design point C has a delay value of 1.12 nsec and a power value of 1.0 mA.
  • the design point D has a delay value of 1.06 nsec and a power value of 1.04 mA.
  • C and D are the initial and new design points, respectively.
  • the arrow shows the move from design point C to design point D during the design optimization phase 130 .
  • FIG. 8C is a diagram illustrating a power-delay curve 810 C for the static adder shown in FIG. 7 according to one embodiment of the invention.
  • the power-delay curve 810 C has two design points, E and F.
  • the design point E has a delay value of 1.23 nsec and a power value of 10.0 mA.
  • the design point F has a delay value of 1.36 nsec and a power value of 5.92 mA.
  • E and F are the initial and new design points, respectively.
  • the arrow shows the move from design point E to design point F during the design optimization phase 130 .
  • FIG. 8D is a diagram illustrating a power-delay curve 810 D for the output multiplexer shown in FIG. 7 according to one embodiment of the invention.
  • the power-delay curve 810 D has two design points, G and H.
  • the design point G has a delay value of 1.75 nsec and a power value of 4.0 mA.
  • the design point H has a delay value of 1.64 nsec and a power value of 5.93 mA.
  • G and H are the initial and new design points, respectively.
  • the arrow shows the move from design point G to design point H during the design optimization phase 130 .
  • the power-delay curves in FIGS. 8A, 8B, 8 C, and 8 D illustrate the optimization process by varying the variable design parameter and selecting the best overall values.
  • the variable design parameter is common to all the curves.
  • the variable design parameter is the transistor size, or the power of the block.
  • the optimization process can be applied for different circuit configurations.
  • a circuit block can be designed using a static circuit technology or a domino circuit technology as illustrated in FIG. 3.
  • a circuit block may be designed using a multiplexer or a decoder.
  • the optimization process can be carried out based on the parameter function, for example, power-delay curve.
  • FIG. 9 is a diagram illustrating a comparison of the power-delay curves for three different implementations of an example circuit according to one embodiment of the invention.
  • the power-delay curves 910 , 920 , and 930 correspond to the initial, better, and worse designs, respectively.
  • the power-delay curve 910 has high power consumption but fast speed.
  • the power-delay curve 920 has a wider delay range and reasonable power consumption.
  • the power-delay curve 930 is similar to 920 but the delay covers a slower range.
  • the design constraint is a delay of approximately 1.5 nsec. Under this timing constraint, it is seen that the design depicted by the power-delay curve 930 is not acceptable. Both designs depicted by the power-delay curves 910 and 920 are acceptable because they cover the specified timing constraint. However, the power-delay curve 920 shows a better design because at 1.5 nsec, it results in a 50% power reduction.
  • the present invention therefore is a technique to automatically determine the optimal design of a subsystem of functional block having a number of circuits.
  • the subsystem or functional block has a set of design parameters which are divided into two groups: optimizing parameters and constraint parameters.
  • the technique includes the generation of parameter functions or data files which show the relationship between the design parameters.
  • An optimization process is then carried out to select the optimal values for the optimizing parameters while keeping the constraint parameters to be within the specified range.
  • the technique provides the design engineer a global picture of the overall design so that global optimization can be performed.

Abstract

A method, system and computer program product for automatically determining optimal design parameters of a subsystem to meet design constraints. The subsystem comprises a plurality of circuits. The optimal design parameters are determined by performing a parameter-delay curve optimization of the subsystem design parameters.

Description

    FIELD OF THE INVENTION
  • The present invention relates to computer systems. In particular, the invention relates to circuit design techniques and related computer-aided design (“CAD”) software tools. [0001]
  • BACKGROUND
  • 1. Introduction [0002]
  • While microprocessor speeds have historically doubled with every new processor generation, power consumption of circuit blocks in the microprocessors has gone up by six orders of magnitude during each new processor generation. Even with processor operating voltage reduction and capacitance reduction coming from new manufacturing processes which shrink transistor sizes, chip power consumption is still growing at a rate of three orders of magnitude per processor generation. This growth in power consumption is largely due to an increased use of on chip hardware to get parallelism and improve microprocessor performance. In addition, to get extra performance on certain critical timing paths, device sizes are being increased to get shorter delays at the circuit level. However, size optimization of all transistor sizes in a given design is very time consuming, and often, the penalty of upsizing transistors to get performance boosts comes at the expense of a much larger increase in circuit power consumption. [0003]
  • To achieve further performance increases in very critical arithmetic and control circuitry, designers are converting a larger portion of the static lower power portion of the chip to more power hungry dynamic (also referred to as domino) blocks to attain the very aggressive delay specifications dictated by the chip architecture. Therefore, the use of dynamic logic is becoming more prevalent and an increasing part of microprocessor circuit designs. It has been demonstrated that dynamic or domino logic consumes three times more power than static complementary metal-oxide-semiconductor (“CMOS”) designs. However, for some delay range, some domino designs can be made static at the same performance point, and power optimizations can become possible under these circumstances. [0004]
  • Register transfer language (“RTL”) to schematic partitioning has also made the power-delay optimization problem more difficult for designers. Without proper knowledge of power-delay tradeoff points at the micro architecture level, circuit designers are forced to upsize entire blocks to meet circuit performance targets. For some designs, however, certain timing can be reallocated to adjacent blocks, and these blocks can then be concurrently downsized and upsized to further achieve a lower power design at the same original delay specification. Unfortunately, while some aspects of recalculating reallocated power designs and delays between blocks has been automated, existing systems still require the designers to manually reallocate the power designs and delays using alternate implementations of the blocks within the design. As the number of blocks and the number of possible implementations for each block both increase, so does the difficulty of manually redesigning and reallocating the power designs and delays. For example, even in a small circuit with only five blocks and three possible implementations for each block there are over two hundred and forty possible configurations of the circuit that can be created. This is too many possible combinations for a designer to manually create and then efficiently and effectively evaluate the desirability of each combination. [0005]
  • High chip power consumption continues to be a major limiting factor for the introduction of new microprocessor designs to the market and as the demand for faster processor operating frequencies continues to increase, chip power consumption problems have only become worse. As a result, currently used power saving techniques are being nullified by the overwhelming trend in power increase. [0006]
  • Therefore, new Computer-Aided Design (“CAD”) tools and methodologies are needed for the next generations of microprocessor designs to optimize for power-delay or area-delay or both and enable higher productivity from designers during the design cycle. [0007]
  • SUMMARY OF THE INVENTION
  • Embodiments of the present invention provide a method, system and computer program product for automatically determining optimal design parameters of a subsystem to meet design constraints. The subsystem comprises a plurality of circuits. The optimal design parameters are determined by performing a parameter-delay curve optimization of the subsystem design parameters.[0008]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The features and advantages of the present invention will become apparent from the following detailed description of the present invention in which: [0009]
  • FIG. 1A is a diagram illustrating an engineering design cycle in accordance with the teachings of the invention. [0010]
  • FIG. 1B is a diagram illustrating a computer system in which one embodiment of the present invention may be utilized. [0011]
  • FIG. 2 is a diagram illustrating a design optimization phase according to one embodiment of the invention. [0012]
  • FIG. 3 is a diagram illustrating power-delay curves according to one embodiment of the invention. [0013]
  • FIG. 4 is a diagram illustrating a macrograph of datapath macros representing a circuit design for use according to one embodiment of the invention. [0014]
  • FIG. 5 is a diagram illustrating a piece-wise approximation of an area-delay trade-off curve for use according to one embodiment of the invention. [0015]
  • FIG. 6 is a flow diagram illustrating a method for performing an area-delay curve based determination of optimal design parameter values according to one embodiment of the invention. [0016]
  • FIG. 7 is a diagram illustrating an example of an arithmetic logic unit datapath functional block according to one embodiment of the invention. [0017]
  • FIG. 8A is a diagram illustrating a power-delay curve for the input multiplexer shown in FIG. 8 according to one embodiment of the invention. [0018]
  • FIG. 8B is a diagram illustrating a power-delay curve for the comparator shown in FIG. 8 according to one embodiment of the invention. [0019]
  • FIG. 8C is a diagram illustrating a power-delay curve for the static adder shown in FIG. 8 according to one embodiment of the invention. [0020]
  • FIG. 8D is a diagram illustrating a power-delay curve for the output multiplexer shown in FIG. 8 according to one embodiment of the invention. [0021]
  • FIG. 9 is a diagram illustrating a comparison of the power-delay curves for the three different implementation of an example circuit according to one embodiment of the invention.[0022]
  • DETAILED DESCRIPTION
  • Embodiments of the present invention provide a method and computer program product for determining optimal values for the design parameters of a circuit block, which result in optimally assigned delay targets for datapath blocks at the minimum power/area point. The problem/solution space is extended to solve the problem of figuring out the best possible implementation (for example, static vs. domino) for each datapath block. Parameter functions relating the design parameters for circuits in the circuit block are created. Based on these parameter functions, the design parameters are optimized to satisfy the design constraints. In one embodiment, the design parameters include power and delay and the parameter functions are power-delay curves. The power-delay curves are generated using a timing simulator, a power estimator, and transistor sizing tools. In another embodiment, the design parameters include area and delay and the parameter functions are area-delay curves. Embodiments of the present invention provide a technique to help designers automatically perform trade-off analyses to optimize the design within the specified design constraints. [0023]
  • In the following description, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the present invention. However, these specific details are not required in order to practice the present invention. In other instances, well known electrical structures and circuits are shown in block diagram form in order not to obscure the present invention. [0024]
  • A simple and efficient method for optimizing the design through the use of power-delay and area-delay curves to minimize chip power consumption is described herein. However, the method for optimizing is not able to automatically generate all of the possible solutions and then select the optimal solution from among all of the possible solutions. [0025]
  • In a circuit design, the designer, usually a design engineer, is typically faced with a number of design parameters and design constraints. The design constraints are usually dictated by the system requirements and specifications. Examples of the design constraints include propagation delay, power consumption, packaging, number of input/output (“I/O”) lines, etc. The design constraints are typically imposed on one or more design parameters, while leaving other parameters to be optimized to achieve high performance. The design parameters, therefore, are divided into two parameter sets: a constraint set and an optimizing set. The “constraint set” includes constraint parameters which are the parameters that have to meet the design constraints. The “optimizing set” includes the optimizing parameters which are the parameters that need to be optimized. In an exemplary scenario, a constraint parameter is the propagation delay and an optimizing parameter is the power consumption. In another scenario, the propagation delay is the optimizing parameter and the power consumption is the constraint parameter. [0026]
  • The relationship between the constraint parameters and the optimizing parameters is described by a parameter function. A “parameter function” describes the variation of one parameter as a function of another parameter. For example, a parameter function may describe the variation of the power consumption as a function of the delay. The variation of one parameter as a function of another is typically caused by a configuration of the circuit such as the size of the transistors, the choice of circuit technology (for example, domino versus static), etc. A configuration of the circuit that gives rise to the particular values of the design parameters corresponds to a design point. [0027]
  • A system, a subsystem, a module or a functional block may consist of a number of circuits. Each circuit is characterized by a parameter function. Optimizing the design of a subsystem or functional block involves a trade-off consideration of all the parameter functions of all the individual circuits of the subsystem or functional block. For a parameter function of a given circuit, there are many design points corresponding to different circuit configurations. Therefore, optimizing a subsystem or functional block involves the selection of the design points on the parameter functions that provide the optimal values of the optimizing parameters and acceptable values of the constraint parameters. The present invention provides a technique to automatically determine an optimal design based on the parameter functions using linear programming techniques. [0028]
  • FIG. 1A is a diagram illustrating an example of an engineering design cycle in accordance with the teachings of the invention. The [0029] engineering design cycle 100 includes a first logic synthesis phase 110, a circuit design phase 120, a design optimization phase 130, and a second logic synthesis phase 140.
  • The first [0030] logic synthesis phase 110 provides the high level logic description and/or design of the circuits. In the first logic synthesis phase 110, the designer synthesizes the circuits manually or using a number of tools including Computer-Aided Design (“CAD”) tools. Examples of CAD tools include hardware description language (“HDL”) compilers, and schematic entry tool. The result of the first logic synthesis phase 110 includes the design in high level form such as a textual description of circuit at the behavioral level, register transfer language (“RTL”), or micro architecture.
  • The [0031] circuit design phase 120 receives the generated logic synthesis files to generate the synthesized circuits. The synthesized circuits may be represented by circuit schematics, a netlist of the circuits, or any other convenient form that can be further processed by additional CAD tools. Essentially, the circuit design phase 120 represents an unoptimized complete design that shows subsystems or functional blocks at the detailed implementation level for the synthesized circuits.
  • In FIG. 1, the [0032] design optimization phase 130 determines the optimal values for the design parameters to meet the design constraints. In the design optimization phase 130, the design engineer uses a design workstation or a computer system 132. The computer system 132 is supported by a design environment which includes the operating system and many CAD tools such as timing analyzer, power estimator, transistor sizing tool to adjust the design parameters according to the allowable design budgets. The design optimization phase 130 typically produces a number of parameter functions that relate the design parameters for the circuits. An example of such a parameter function is a power-delay curve 135. The power-delay curve 135 shows the relationship between the power consumption and the propagation delay for a particular circuit in a functional block. The power-delay curve 135 has a number of design points corresponding to different implementations or configurations of the circuit under consideration. The power-delay curve 135 provides the design engineer the basic information to optimize his or her circuit under the specified design constraints.
  • As shown in FIG. 1A, from the information provided by the power-[0033] delay curve 135, the design engineer modifies the circuit design according to the design points. The exemplary power-delay curve 135 has three design points A, B, and C. The design point A corresponds to a circuit implementation that has high power consumption and fast speed, representing an undesirable implementation because of excessive power consumption. The design point B corresponds to the optimal power consumption and optimal speed, also representing the best circuit implementation.
  • The design point C corresponds to low power consumption and acceptable speed, representing a desirable implementation. If the circuit implementation is at the design point A, the design engineer will have the option to go back to the first [0034] logic synthesis phase 110 or the circuit design phase 120. If the circuit implementation is at the design point C, the design engineer will go to the second logic synthesis phase 140.
  • The second [0035] logic synthesis phase 140 is essentially the same as the first logic synthesis phase 110 with the exception that the design engineer now focuses more on giving the extra design margin to other circuits in the subsystem or functional block. The low power consumption at the design point C provides more margin to the power budget for other circuits. In the second logic synthesis phase 140, the design engineer modifies the circuit synthesis based on the extra margin, such as repartitioning, floor-plan editing, sizing, etc.
  • FIG. 1B is a diagram illustrating one embodiment of a [0036] computer system 132 in which one embodiment of the present invention may be utilized. The computer system 132 comprises a processor 150, a host bus 155, a peripheral bridge 160, a storage device 165, an advanced graphics processor 175, a video monitor 177, and a peripheral bus 180,
  • The [0037] processor 150 represents a central processing unit of any type of architecture, such as complex instruction set computers (“CISC”), reduced instruction set computers (“RISC”), very long instruction word (“VLIW”), or hybrid architecture. The processor 150 is coupled to the peripheral bridge 160 via the host bus 155. While this embodiment is described in relation to a single processor computer system, the invention can be implemented in a multi-processor computer system.
  • The [0038] peripheral bridge 160 provides an interface between the host bus 115 and a peripheral bus 180. In one embodiment, the peripheral bus 180 is the Peripheral Components Interconnect (“PCI”) bus. The peripheral bridge 160 also provides the graphic port, for example, Accelerated Graphics Port (“AGP”), or the graphics bus 172 for connecting to a graphics controller or advanced graphics processor 175. The advanced graphics processor 175 is coupled to a video monitor 177. The video monitor 177 displays graphics and images rendered or processed by the graphics controller 125. The peripheral bridge 160 also provides an interface to the storage device 165.
  • The [0039] storage device 165 represents one or more mechanisms for storing data. For example, the storage device 165 may include non-volatile or volatile memories. Examples of these memories include flash memory, read only memory (“ROM”), or random access memory (“RAM”). FIG. 1B also illustrates that the storage device 165 has stored therein data 167 and program code 166. The data 167 stores graphics data and temporary data. Program code 166 represents the necessary code for performing any and/or all of the techniques in the present invention. Of course, the storage device 165 preferably contains additional software (not shown), which is not necessary to understanding the invention.
  • The [0040] peripheral bus 180 represents a bus that allows the processor 150 to communicate with a number of peripheral devices. The peripheral bus 180 provides an interface to a peripheral-to-expansion bridge 185, peripheral devices 190 1 to 190 N, a mass storage controller 192, a mass storage device 193, and mass storage media 194. The peripheral devices 190 1 to 190 N represent any device that is interfaced to the peripheral bus 180. Examples of peripheral devices are fax/modem controller, audio card, network controller, etc. The mass storage controller 192 provides control functions to the mass storage device 193. The mass storage device 193 is any device that stores information in a non-volatile manner. Examples of the mass storage device 193 includes hard disk, floppy disk, and compact disk (“CD”) drive. The mass storage device 193 receives the mass storage media 194 and reads their contents to configure the design environment for the design engineer.
  • The [0041] mass storage media 194 contain programs or software packages used in the environment. The mass storage media 194 represent a computer program product having program code or code segments that are readable by the processor 150. A program code or a code segment includes a program, a routine, a function, a subroutine, or a software module that is written in any computer language (for example, high level language, assembly language, machine language) that can be read, processed, compiled, assembled, edited, downloaded, transferred, or executed by the processor 150. The mass storage media 194 include any convenient media such as floppy diskettes, compact disk read only memory (“CD-ROM”), digital audio tape (“DAT”), optical laser disc, or communication media (e.g., Internet, radio frequency link, fiber optics link). For illustrative purposes, FIG. 1B shows floppy diskettes 195 and CD-ROM 196. The floppy diskettes 195 and/or CD-ROM 196 contain design environment 198. Examples of the tools or computer readable program code in the design environment 198 include operating system, computer aided design (“CAD”) tools such as schematic capture, hardware description language (“HDL”) compiler, text editors, netlist generator, timing analyzer, power vector generator, timing simulator, power simulator, circuit configuration, component sizer, parameter function generator, parameter optimizer, and graphics design environment. These tools, together with the operating system of the computer system 132 form the design environment 198 on which the design and optimization process can be carried out.
  • The peripheral-to-[0042] expansion bridge 187 represents an interface device between the peripheral bus 180 and an expansion bus 187. The expansion bus 187 represents a bus that interfaces to a number of expansion devices 188 1 to 188 K. Example of an expansion device includes a parallel input/output (“I/O”) device, a serial communication interface device. In one embodiment, the expansion bus 187 is an Industry Standard Architecture (“ISA”) or Extended Industry Standard Architecture (“EISA”) bus.
  • The [0043] computer system 132 can be used in all or part of the phases of the design process. The processor 150 executes instructions in the program 166 to access data 167 and interact with the design environment 198. In particular, the computer system 132 is used in the design optimization phase 130.
  • FIG. 2 is a diagram illustrating a design optimization phase according to one embodiment of the invention. The [0044] design optimization phase 130 includes a netlist generation module 210, a critical path generation module 223, a power vector generation module 227, a delay calculation module 233, a power calculation module 237, a circuit configuration module 240, a parameter function generation module 250, and an optimization module 260. Each of these modules may be a software module or a hardware module or a combination of both. In one embodiment, these modules are implemented by program code that are readable and executed by the processor 150.
  • The [0045] netlist generation module 210 generates the circuit netlist which provides the information on component identification and how the components of the circuit are interconnected. The circuit netlist becomes the input to the critical path generation module 223 and the power vector generation module 227. The critical path generation module 223 generates timing delays of various paths in the circuit based on circuit components and interconnection patterns. From these timing delays, the critical path(s) is (are) identified. The critical path represents the path through which the overall propagation delay is the most critical, e.g., timing parameters (e.g., setup time, hold time) are difficult to satisfy. The timing files generated by the critical path generation module 223 become the input to the delay calculation module 233. The delay calculation module 233 calculates the delays of the critical paths and other paths using a timing simulator. In one embodiment, the timing simulator is the PathMill tool, developed by Epic Technologies, now owned by Synopsys, of Mountain View, Calif. The timing values are then forwarded to the circuit configuration module 240. On the power side, the power vector generation module 227 generates power vectors as input to the power calculation module 237. The power calculation module 237 calculates the power consumption of the circuit using a power estimator tool. In one embodiment, the power estimator tool is the PowerMill tool, developed by Epic Technologies of Mountain View, Calif. The power values are then forwarded to the circuit configuration module 240.
  • The [0046] circuit configuration module 240 configures the circuit to effectuate the power consumption and delay. One configuration is scaling the sizes (e.g., transistor size) of the circuit components using a sizing tool. In one embodiment, the sizing tool is Amps developed by Epic Technologies of Mountain View, Calif. The sizing tool applies scale factors to scale down the circuit elements either globally or locally. The resulting circuit is then simulated again for the next delay and power values. The circuit configuration module 240 generates new circuit information to be fed back to the delay calculation module 233 and the power calculation module 237. The process continues until all the values within the range of the scaling have been used. Then the delay and power values are forwarded to the parameter function generation module 250. The parameter function generation module 250 generates the parameter function (e.g., power-delay curves) showing the relationship between the design parameters. The parameter function generation module 250 may also generate the design parameters in any other convenient forms for later processing.
  • The [0047] optimization module 260 receives the values of the design parameters either in the form of a parameter curve, or in any other convenient format. The optimization module 260 determines the optimal values of the design parameters.
  • FIG. 3 is a diagram illustrating a power-delay curve according to one embodiment of the invention. The power-delay curves show two curves: a [0048] domino curve 310 and a static curve 320.
  • The power-delay curves in FIG. 3 show the parameter function for an arithmetic circuit. The arithmetic circuit can be designed using a domino circuit technology or a static circuit technology. The [0049] domino curve 310 is the power-delay curve for the circuit using the domino circuit technology and the static curve 320 is the power-delay curve for the circuit using the static circuit technology.
  • The [0050] domino curve 310 has two design points A and B. The design point A corresponds to the current domino design. At this design point, the circuit has a delay of approximately 1.35 nsec and a power consumption of approximately 14 mA. The design point B corresponds to another domino design with longer delay at approximately 1.62 nsec and a power consumption of approximately 6.1 mA. Therefore the saving in power to go from design point A to design point B is 53% for a delay penalty of 23%.
  • The [0051] static curve 320 has a design point C. The static curve 320 has a delay limit at approximately 1.42 nsec. The design point C is at a delay of approximately 1.62 nsec and a power consumption of approximately 4.5 mA. Therefore, the design point C has approximately the same delay as the design point B of the domino curve 310 but has an additional power saving of 16%.
  • The parameter curve therefore provide the design engineer an immediate visualization of the relationship between the design parameters, e.g., power, delay, so that optimization can be carried out. [0052]
  • In accordance with an embodiment of the present invention a mathematical approach to automatically solve for the optimal delay allocation of datapath blocks during the circuit design phase of a chip design is presented. For example, in FIG. 4, a macro graph of datapath macros representing a circuit design for use according to one embodiment of the invention is illustrated. [0053]
  • In FIG. 4, [0054] M1 410, M2 420, M3 430, and M4 440 are datapath macros for which area-delay trade-off curves, such as that shown in FIG. 2, for their different implementations are available. Therefore, when a designer wishes to meet a specified delay target from data A450 to output O470 and from data B460 to output O470, the designer needs to answer the following questions:
  • 1. What is the best implementation for each of the macros that minimizes the area (measured as total transistor width) or power (measured as the sum of the powers dissipated by the macros in the design)? and [0055]
  • 2. What is the optimal delay assignment to each of these macros so as to obtain a minimum area or power solution?[0056]
  • In the following description of this embodiment of the present invention a solution to determine the optimized area is presented. However, determining the optimal power solution is performed using the same method used to optimize for area. The only difference is that in order to optimize for power, the power-delay curves are used instead of the area-delay curves for each macro block. Also, the objective function when optimizing for power is the sum of the powers dissipated by the macros. [0057]
  • An embodiment of the present invention assumes that there are m macros in the macro graph to be optimized. For example, in FIG. 4, there are 4 macros in the macro graph. Also, in this embodiment specific implementations for each of the macros are assumed. Given this information, what the delay assignment for each of these implementations can be calculated so as to meet the delay constraints. As stated earlier, in this embodiment of the present invention, the area-delay trade-off curve for each implementation of all the macros is known a-priori. These can be generated very efficiently using external CAD vendor tools like AMPS or more advanced internal Intel proprietary tools that employ the methods shown in FIG. 2. Given an area-delay trade-off curve for each macro, this embodiment of the present invention, begins by forming a piecewise linear approximation of each of the area-delay trade-off curves. These piecewise approximations can be made arbitrarily accurate by increasing the number of linear pieces. While an exemplary piecewise linear approximation of an area-[0058] delay curve 510 is shown in FIG. 5 with three separate piecewise approximation sections 520 a, 520 b, and 520 c, respectively, the number of approximation sections can easily be increased to four or more for more accurate approximations Similarly, the number of approximation sections can be decreased to two or one with an attendant decrease in the accuracy of the approximation.
  • Therefore, in this embodiment of the present invention, the piecewise linear approximation of an implementation of a given macro ‘i’ can be expressed as follows:[0059]
  • a i,11 A i +a i,21 D i≧1
  • a i,12 A i +a i,22 D i≧1
  • . . . [0060]
  • a i,1n A i +a i,2n D i≧1
  • Here, A[0061] i and Di are the area and delay variables, respectively, associated with the implementation of macro ‘i’ and each piecewise linear approximation is normalized. As the area-delay curve represents a Pareto-optimal curve, it follows that the piece-wise linear approximation of the area-delay curve generates a convex set of all feasible realizations of the implementation. In another embodiment of the present invention, which is optimizing for power, the above equations still apply and the area variable, Ai, is replaced by a power variable, Ci.
  • Let the different paths (p[0062] j) through the macro graph be contained in the set P. For example the set P for macro graph in FIG. 4 contains two paths, the first one from data ‘A’ 450 to output ‘O’ 470 and, the second from data ‘B’ 460 to output ‘O’ 470. Assume that the set P contains N paths. Then, for each pεP, the delay constraint on it can be written as,
  • Σb ij D i≦1, where b ij=0 if Macro ‘i’ is absent on path pj; 1≦i≦m; 1≦j≦N.
  • Again, m is equal to the number of macros and N is equal to the number of paths. To minimize the overall area of the design, where the area is given by ΣA[0063] i(1≦i≦m), in this embodiment of the present invention, combining the above inequalities, results in the following optimization problem (OPT1):
  • Objective: min ΣA i(1≦i≦m)
  • Constraints:[0064]
  • Σb ij D i≦1, where b=0 if Macro ‘i’ is absent on path p j; 1≦i≦m; 1≦j≦N.
  • For each [0065] Macro 1≦i≦m,
  • a i,11 A i +a i,21 D i≦1
  • a i,12 A i +a i,22 D i≦1
  • . . .[0066]
  • a i,1n A i +a i,2n D i≦1
  • D i,min ≦D i ≦D i,max, 1≦i≦m
  • A i≦0,1≦i≦m
  • Here, D[0067] i,min and Di,max are the minimum and maximum possible delays associated with the implementation of macro ‘i’.
  • Note that since OPT[0068] 1 is a linear programming problem, it can be solved efficiently using tools like COPL_LP (A linear programming solver from the University of Iowa), or commercial tools like MATLAB. MATLAB is developed by the Mathworks, Incorporated of Natick, Mass. It is important to note that if the problem is infeasible, it implies that the implementations chosen for the macros can not meet the delay constraints. However, if the problem is feasible, then the optimal delays to be allocated to various macros in the graph that would lead to a minimum area solution will result.
  • In this embodiment of the present invention, a check for the feasibility of the optimization problem is performed by determining if the minimum-delay values of the implementations satisfy the generated delay constraints. For example, in FIG. 4, the minimum delays of the implementations of [0069] macros 1 through 4 are checked to determine if they satisfy the delay constraints on the paths from data ‘A’ 450 to output ‘O’ 470 and from data ‘B’ 460 to output ‘O’ 470. If the implementations of macros 1 through 4 are not feasible, then the current implementations chosen for the macros can not meet the designer specified delay constraints. If the implementations of macros 1 through 4 are feasible, then the optimization problem OPT1 can be solved.
  • The above embodiment of the present invention, solves the problem of finding the optimal delay assignment to the macros, when an implementation has already been chosen for the macros by the designer. However, this will not always produce the “best”, that is the most optimal, solution. Therefore, in another embodiment, the present invention determines the “best” implementation for the macros under designer specified delay constraints. [0070]
  • To solve this more general problem, since the system does not have a-priori knowledge of the implementation of each block, the system assumes that each macro in the graph, M[0071] i, has Li possible implementations. One approach to solving the problem of finding out the best implementation for each macro, which is referred to as “binding” in high-level synthesis, is to solve OPT1 for each of the candidate binding solutions. The number of problems of type OPT1 that need to be solved to get the best possible implementation of the design is given by ΠΛ(1≦i≦m). This can quickly become a large number if the number of possible implementations is large.
  • An embodiment of the present invention provides an approach to simultaneously search for the optimal solution among all the possible binding solutions. This is achieved as follows. Let A[0072] ik and Dik be the area and delay associated with implementation ‘k’ of macro ‘i’. Also define a new variable called A to measure the optimal area of the solution. Then it follows that:
  • A≦ΣA ik; 1≦k≦Λ 1; 1≦i≦m;
  • Any candidate binding solution must satisfy the delay constraints. This implies that[0073]
  • Σb ij,k D ik≦1, where b ij,k=0 if Macro ‘i’ is absent on path p j
  • 1≦k≦Λ i; 1≦i≦m; 1≦j≦N.
  • The optimization problem (OPT[0074] 2) can now be stated as follows:
  • Objective: max A [0075]
  • Constraints:[0076]
  • A≦ΣA ik; 1≦k≦Λ i; 1≦i≦m;
  • Σb ij,k D ik≦1, where b ij,k=0 if Macro ‘i’ is absent on path p j
  • 1≦k≦Λ i; 1≦i ≦m; 1≦j≦N.
  • For each [0077] Macro 1≦i≦m, and each implementation 1≦k≦Λi
  • a ik,11 A ik +a ik,21 D ik≦1
  • a ik,12 A ik +a ik,22 D ik≦1
  • . . . [0078]
  • a ik,1n A i +a ik,2n D ik≦1
  • D ik,min ≦D ik ≦D ik,max, 1≦i≦m
  • A ik≦0, 1≦i≦m; A≦0.
  • Here, D[0079] ik,min and Dik,max are the minimum and maximum possible delays associated with implementation ‘k’ of macro ‘i’.
  • Note that if all the possible implementations of every macro leads to a feasible final solution, OPT[0080] 2 is feasible. Then, the optimal solution to OPT2 is the optimal solution to the binding problem. However, if OPT2 is infeasible, it implies that there exists a binding that can not meet the delay constraints. Therefore, this embodiment of the present invention eliminates such bindings while formulating the optimization problem rather than discovering unfeasiblity after forming the constraints. This can potentially save on the run-time. As described above, checking for unfeasibility is quite simple, given a binding. At the time of forming the linear program, we simply perform the test for feasibility described above for OPT1. If the problem is infeasible, then, the current binding is dropped and the system moves on to the next binding. If the problem is feasible, then the system adds the constraints of the current binding to the linear program and moves on to the next binding. This process is repeated until all of the potential bindings have been checked. As a result, this approach guarantees that the optimization problem OPT2 is feasible and that the optimal solution of OPT2 gives us the optimal binding.
  • The flow diagram for solving the optimal binding problem is shown in FIG. 6. A pre-requisite for this optimization is a database of area-delay curves for the macros under consideration and their [0081] corresponding implementations 680. The system takes as input a macro graph, which has been previously generated from the RTL description of the design. From this macro graph, the system automatically generates all of the possible paths in the graph. The designer specifies the delay constraints on the paths. Based on the implementations for the macros, in the database, the system generates candidate binding solutions. For each binding solution, the system checks for unfeasibility. If the solution is infeasible, the system moves on to the next candidate binding. Otherwise, the system generates the constraints for the linear program. This procedure is repeated until all binding solutions have been exhausted. Finally, the system solves the linear program to produce the optimal solution, which corresponds to the optimal binding along with the optimal delay allocation.
  • In FIG. 6, in [0082] block 610, the RTL description of a circuit Functional Unit Block (“FUB”) is translated into a graph describing the connectivity of the macros (adders, multiplexers, etc.) in the FUB and transmitted to block 620. Block 610 can either be done automatically (using an internally developed CAD tool like REAL) or manually by the designer. REAL is a proprietary Intel® Corporation CAD tool The graph provides information regarding how data and control flow through the FUB. The graph also forms the basis for extracting the various paths through which information can flow in the design. In block 620, the system automatically extracts the various paths through which information can move in the design to ensure that the timing constraints, which determine design performance, are met on all of these paths for a successful design. To currently generate these paths would require a designer to perform exhaustive and time consuming manual searches on the graph. Since, each macro in the design can be potentially implemented in several ways, an association of implementations with macros is referred to as a candidate binding solution. In block 630, a candidate solution is generated by choosing an implementation for each macro block in the FUB and then applying the subsequent steps in the flow. Each candidate solution is then transmitted to block 640 to determine if that implementation meets the timing constraints on the extracted paths found in block 620. In block 640, the area-delay curves for all of the possible implementations of the macros are read from the database 680 and used to determine the candidate solution is feasible. If, in block 640, the candidate solution is found to be infeasible, then the candidate solution is discarded and the system returns to block 630 to generate another candidate solution. If in block 640, the candidate solution is found to be feasible then, in block 650, the system generates constraints for the candidate solution using the area-delay curves associated with the chosen implementations in database 680 for the linear program. These constraints are shown in OPT2. The area-delay curves in the library of database 680 can be generated by using either a commercial CAD tool or internally developed Intel CAD tools. Then, in block 660, a check is made to determine if all of the possible binding solutions have been generated, and if they have not all been generated, then the system returns to block 630 to continue generating candidate binding solutions. If, in block 660, it is determined that all of the possible binding solutions have been generated, then, a complete linear program (LP) which captures all the feasible binding solutions associated with the graph, along with the area-delay curves for the chosen macro implementations and the timing constraints on the design exists. Then, in block 670, the system solves the LP for the optimal solution using either a commercial or an internally developed Intel LP solver. The solution to this LP gives the optimal implementations for the macros in the macro graphs along with their respective delays that meet the designer specified timing constraints and has the lowest area requirements.
  • In an alternate embodiment of the present invention, the system is configured and used to optimize power for the macro graph of FIG. 4, as described above for the area optimization. [0083]
  • FIG. 7 is a diagram illustrating an example of an arithmetic logic unit (“ALU”) datapath subsystem or FUB according to one embodiment of the invention. The [0084] ALU datapath FUB 700 includes an input multiplexer (“MUX”) 710, a comparator 720, a static adder 730, and an output MUX 740. The ALU datapath FUB 700 is a common design used in the processor 150 or the graphic processor 175 in FIG. 1B
  • In this illustrative example, the design parameters include power and delay and the parameter function is the power-delay curve. The constraint parameter is the propagation delay through the [0085] ALUFUB 700 and the optimizing parameter is the power. The optimization is to minimize the overall power consumption while keeping the propagation delay within the specified design constraint.
  • The [0086] input MUX 710, the comparator 720, the static adder 730 and the output MUX 740 form a cascaded chain of circuit elements which has a critical path going from one end to the other end. The composite delay is the sum of the individual delays through each of the circuit elements. In addition, it is assumed that these circuit elements are active, e.g., the power consumption of the ALUFUB 700 is the sum of the individual power consumption.
  • In an actual test case used in one embodiment of the present invention, the delay requirement from input to output was specified as 4350 picoseconds (“ps”). The power-delay curves were linearized, that is, “approximated,” similar to FIG. 5, with [0087] 6 linear pieces for the input MUX 710, comparator 720 and output MUX 730, and 5 linear pieces for the adder 740. All of the generated constraints were fed to a linear program solver, in this case COPL_LP. For the given delay constraint the final solution generated by the solver was 19.2% smaller in power than a manually determined solution by a designer. In fact, the solution generated using the embodiment of the present invention was also 9.8% smaller than the solution generated by hand optimization of the power-delay curves.
  • FIG. 8A is a diagram illustrating a power-[0088] delay curve 810A for the input multiplexer shown in FIG. 7 according to one embodiment of the invention. The power-delay curve 810A has two design points, A and B. The design point A has a delay value of 0.25 nsec and a power value of 3.2 mA. The design point B has a delay value of 0.29 nsec and a power value of 1.79 mA. A and B are the initial and new design points, respectively. The arrow shows the move from design point A to design point B during the design optimization phase 130.
  • FIG. 8B is a diagram illustrating a power-[0089] delay curve 810B for the comparator shown in FIG. 7 according to one embodiment of the invention. The power-delay curve 810B has two design points, C and D. The design point C has a delay value of 1.12 nsec and a power value of 1.0 mA. The design point D has a delay value of 1.06 nsec and a power value of 1.04 mA. C and D are the initial and new design points, respectively. The arrow shows the move from design point C to design point D during the design optimization phase 130.
  • FIG. 8C is a diagram illustrating a power-delay curve [0090] 810C for the static adder shown in FIG. 7 according to one embodiment of the invention. The power-delay curve 810C has two design points, E and F. The design point E has a delay value of 1.23 nsec and a power value of 10.0 mA. The design point F has a delay value of 1.36 nsec and a power value of 5.92 mA. E and F are the initial and new design points, respectively. The arrow shows the move from design point E to design point F during the design optimization phase 130.
  • FIG. 8D is a diagram illustrating a power-delay curve [0091] 810D for the output multiplexer shown in FIG. 7 according to one embodiment of the invention. The power-delay curve 810D has two design points, G and H. The design point G has a delay value of 1.75 nsec and a power value of 4.0 mA. The design point H has a delay value of 1.64 nsec and a power value of 5.93 mA. G and H are the initial and new design points, respectively. The arrow shows the move from design point G to design point H during the design optimization phase 130.
  • The power and delay parameters obtained from the power-[0092] delay curves 810A, 810B, 810C, and 810D have the following values:
    Initial design points:
    Total delay 0.25 + 1.12 + 1.23 + 1.75 = 4.35 nsec
    Total current 3.2 + 1.0 + 10.0 + 4.0 = 18.2 mA
    New design points:
    Total delay 0.29 + 1.06 + 1.36 + 1.64 = 4.35 nsec
    Total current 1.79 + 1.04 + 5.92 + 5.93 = 14.7 mA
  • Therefore, it is seen that the new design points B, D, F, H result in the same composite delay of 4.35 nsec, but with a 19.2% saving in power. [0093]
  • The power-delay curves in FIGS. 8A, 8B, [0094] 8C, and 8D illustrate the optimization process by varying the variable design parameter and selecting the best overall values. The variable design parameter is common to all the curves. In this example, the variable design parameter is the transistor size, or the power of the block.
  • The optimization process can be applied for different circuit configurations. For example, a circuit block can be designed using a static circuit technology or a domino circuit technology as illustrated in FIG. 3. In another example, a circuit block may be designed using a multiplexer or a decoder. In these cases, the optimization process can be carried out based on the parameter function, for example, power-delay curve. [0095]
  • FIG. 9 is a diagram illustrating a comparison of the power-delay curves for three different implementations of an example circuit according to one embodiment of the invention. The power-[0096] delay curves 910, 920, and 930 correspond to the initial, better, and worse designs, respectively.
  • The power-[0097] delay curve 910 has high power consumption but fast speed. The power-delay curve 920 has a wider delay range and reasonable power consumption. The power-delay curve 930 is similar to 920 but the delay covers a slower range.
  • Suppose the design constraint is a delay of approximately 1.5 nsec. Under this timing constraint, it is seen that the design depicted by the power-[0098] delay curve 930 is not acceptable. Both designs depicted by the power- delay curves 910 and 920 are acceptable because they cover the specified timing constraint. However, the power-delay curve 920 shows a better design because at 1.5 nsec, it results in a 50% power reduction.
  • The present invention therefore is a technique to automatically determine the optimal design of a subsystem of functional block having a number of circuits. The subsystem or functional block has a set of design parameters which are divided into two groups: optimizing parameters and constraint parameters. The technique includes the generation of parameter functions or data files which show the relationship between the design parameters. An optimization process is then carried out to select the optimal values for the optimizing parameters while keeping the constraint parameters to be within the specified range. The technique provides the design engineer a global picture of the overall design so that global optimization can be performed. [0099]
  • The disclosed embodiments are illustrative of the various ways in which the present invention may be practiced. Other embodiments can be implemented by those skilled in the art without departing from the spirit and scope of the present invention. Accordingly, all such embodiments which fall within the spirit and the broad scope of the appended claims will be embraced by the principles of the present invention. [0100]

Claims (30)

What is claimed is:
1. A method for automatically determining optimal design parameters of a subsystem to meet design constraints, the subsystem comprising a plurality of circuits, the method comprising:
performing a parameter-delay curve optimization of the subsystem design parameters to determine the optimal design parameters.
2. The method of
claim 1
, wherein the parameter-delay curve is selected from the group comprising power-delay curves and area-delay curves.
3. The method of
claim 1
, wherein performing a parameter-delay curve optimization of the subsystem design parameters to determine the optimal design parameters comprises:
receiving a macro graph description of the subsystem;
extracting all possible paths through the macro graph;
generating all possible candidate binding solutions for the macro graph;
determining which of the possible candidate binding solutions are feasible;
generating constraints for each of the feasible candidate binding solutions; and
solving all constraints for each of the feasible candidate binding solution to determine the optimal solution.
4. The method of
claim 3
, wherein said extracting all possible paths through the macro graph comprises:
determining each unique pathway from each input datapath block to each output datapath block in the macro graph.
5. The method of
claim 3
, wherein said generating all possible candidate binding solutions for the macro graph comprises:
determining an implementation for each datapath block in a pathway; and
associating each of the datapath blocks into a candidate binding solution for the pathway.
6. The method of
claim 5
, wherein said associating each of the datapath blocks into a candidate binding solution for the pathway comprises:
creating a piecewise linear approximation for each feasible candidate binding solution.
7. The method of
claim 3
, wherein said determining all feasible candidate binding solutions comprises:
obtaining a first parameter-delay curve for each of the datapath blocks in the candidate binding solution;
combining values from the first parameter-delay curves for the datapath blocks; and comparing the combined values against a delay constraint value, wherein the candidate binding solution is feasible if the combined values are not greater than the delay constraint value.
8. The method of
claim 7
, wherein said delay constraint value comprises:
a sum of each specified datapath block delay constraint value for the pathway.
9. The method of
claim 3
, wherein said generating constraints for each of the feasible candidate binding solutions comprises.
creating constraints for each of the feasible candidate binding solutions using a specified delay.
10. The method of
claim 9
, wherein said solving all constraints for each of the feasible candidate binding solutions to determine the optimal solution comprises:
associating each of the piecewise linear approximations and the constraints for each feasible candidate binding solution in a linear program; and
solving the linear program to determine the optimal solution, wherein the optimal solution produces a minimal delay value for the subsystem and provides an optimal delay value for each of the datapath blocks.
11. A computer-readable medium having stored therein a computer program for automatically determining optimal design parameters of a subsystem to meet design constraints, the subsystem comprising a plurality of circuits, said computer program, when executed:
performs a parameter-delay curve optimization of the subsystem design parameters to determine the optimal design parameters.
12. The computer-readable medium of
claim 11
, wherein the parameter-delay curve is selected from the group comprising power-delay curves and area-delay curves.
13. The computer-readable medium of
claim 11
, wherein performing a parameter-delay curve optimization of the subsystem design parameters to determine the optimal design parameters comprises:
receiving a macro graph description of the subsystem;
extracting all possible paths through the macro graph;
generating all possible candidate binding solutions for the macro graph;
determining which of the possible candidate binding solutions are feasible;
generating constraints for each of the feasible candidate binding solutions; and
solving all constraints for each of the feasible candidate binding solution to determine the optimal solution.
14. The computer-readable medium of
claim 13
, wherein said extracting all possible paths through the macro graph comprises:
determining each unique pathway from each input datapath block to each output datapath block in the macro graph.
15. The computer-readable medium of
claim 13
, wherein said generating all possible candidate binding solutions for the macro graph comprises:
determining an implementation for each datapath block in a pathway; and
associating each of the datapath blocks into a candidate binding solution for the pathway.
16. The computer-readable medium of
claim 15
, wherein said associating each of the datapath blocks into a candidate binding solution for the pathway comprises:
creating a piecewise linear approximation for each feasible candidate binding solution.
17. The computer-readable medium of
claim 13
, wherein said determining all feasible candidate binding solutions comprises:
obtaining a first parameter-delay curve for each of the datapath blocks in the candidate binding solution;
combining values from the first parameter-delay curves for the datapath blocks; and
comparing the combined values against a delay constraint value, wherein the candidate binding solution is feasible if the combined values are not greater than the delay constraint value.
18. The computer-readable medium of
claim 17
, wherein said delay constraint value comprises:
a sum of each specified datapath block delay constraint value for the pathway.
19. The computer-readable medium of
claim 13
, wherein said generating constraints for each of the feasible candidate binding solutions comprises:
creating constraints for each of the feasible candidate binding solutions using a specified delay.
20. The computer-readable medium of
claim 19
, wherein said solving all constraints for each of the feasible candidate binding solutions to determine the optimal solution comprises:
associating each of the piecewise linear approximations and the constraints for each feasible candidate binding solution in a linear program; and
solving the linear program to determine the optimal solution, wherein the optimal solution produces a minimal delay value for the subsystem and provides an optimal delay value for each of the datapath blocks.
21. A method for automatically determining an optimal delay allocation for datapath blocks of a subsystem, the subsystem comprising a plurality of circuits, the method comprising:
receiving a macro graph description of the subsystem;
extracting all possible paths through the macro graph;
generating all possible candidate binding solutions for the macro graph;
determining which of the possible candidate binding solutions are feasible;
generating constraints for each of the feasible candidate binding solutions; and
solving all constraints for each of the feasible candidate binding solution to determine the optimal solution.
22. The method of
claim 21
, wherein said extracting all possible paths through the macro graph comprises:
determining each unique pathway from each input datapath block to each output datapath block in the macro graph.
23. The method of
claim 21
, wherein said generating all possible candidate binding solutions for the macro graph comprises:
determining an implementation for each datapath block in a pathway; and
associating each of the datapath blocks into a candidate binding solution for the pathway.
24. The method of
claim 23
, wherein said associating each of the datapath blocks into a candidate binding solution for the pathway comprises:
creating a piecewise linear approximation for each feasible candidate binding solution.
25. The method of
claim 21
, wherein said determining all feasible candidate binding solutions comprises:
obtaining a first parameter-delay curve for each of the datapath blocks in the candidate binding solution;
combining values from the first parameter-delay curves for the datapath blocks; and
comparing the combined values against a delay constraint value, wherein the candidate binding solution is feasible if the combined values are not greater than the delay constraint value.
26. The method of
claim 24
, wherein the first parameter-delay curves are selected from the group comprising power-delay curves and area-delay curves.
27. The method of
claim 21
, wherein said generating constraints for each of the feasible candidate binding solutions comprises:
creating constraints for each of the feasible candidate binding solutions using a specified delay.
28. The method of
claim 23
, wherein said solving all constraints for each of the feasible candidate binding solutions to determine the optimal solution comprises:
associating each of the piecewise linear approximations and the constraints for each feasible candidate binding solution in a linear program; and
solving the linear program to determine the optimal solution, wherein the optimal solution produces a minimal delay value for the subsystem and provides an optimal delay value for each of the datapath blocks.
29. A system for automatically determining an optimal delay allocation for datapath blocks of a subsystem, the subsystem comprising a plurality of circuits, the system comprising:
a computer system; and
a computer program stored in the computer system, said computer program, when executed, automatically determines an optimal delay allocation for datapath blocks of a subsystem by performing a parameter-delay curve optimization of the subsystem design using linear programming.
30. The system of
claim 29
, wherein the parameter-delay curve is selected from the group comprising power-delay curves and area-delay curves.
US09/474,008 1999-12-28 1999-12-28 Method and system for determining optimal delay allocation to datapath blocks based on area-delay and power-delay curves Expired - Fee Related US6327552B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/474,008 US6327552B2 (en) 1999-12-28 1999-12-28 Method and system for determining optimal delay allocation to datapath blocks based on area-delay and power-delay curves

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/474,008 US6327552B2 (en) 1999-12-28 1999-12-28 Method and system for determining optimal delay allocation to datapath blocks based on area-delay and power-delay curves

Publications (2)

Publication Number Publication Date
US20010032067A1 true US20010032067A1 (en) 2001-10-18
US6327552B2 US6327552B2 (en) 2001-12-04

Family

ID=23881846

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/474,008 Expired - Fee Related US6327552B2 (en) 1999-12-28 1999-12-28 Method and system for determining optimal delay allocation to datapath blocks based on area-delay and power-delay curves

Country Status (1)

Country Link
US (1) US6327552B2 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040168097A1 (en) * 2003-02-24 2004-08-26 International Business Machines Corporation Machine Code Builder Derived Power Consumption Reduction
US20060203602A1 (en) * 2005-03-14 2006-09-14 Rambus, Inc. Self-timed interface for strobe-based systems
US7181565B2 (en) 2005-05-03 2007-02-20 Atmel Corporation Method and system for configuring parameters for flash memory
US20100118627A1 (en) * 2004-08-20 2010-05-13 Best Scott C Strobe-offset control circuit
EP2257900A2 (en) * 2008-02-05 2010-12-08 Nangate A/S Optimization of integrated circuit design and library
US8743635B2 (en) 2004-12-21 2014-06-03 Rambus Inc. Memory controller for strobe-based memory systems
CN109583070A (en) * 2018-11-23 2019-04-05 拓卡奔马机电科技有限公司 A kind of method and system, computer readable storage medium, terminal optimizing trimmed curve quality
CN112417609A (en) * 2020-12-15 2021-02-26 中国第一汽车股份有限公司 Steering transmission shaft optimization design method, computer equipment and storage medium
US11361133B2 (en) * 2017-09-26 2022-06-14 Intel Corporation Method of reporting circuit performance for high-level synthesis
US20230075565A1 (en) * 2021-09-07 2023-03-09 International Business Machines Corporation Signal pre-routing in an integrated circuit design

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6658581B1 (en) * 1999-03-29 2003-12-02 Agency Of Industrial Science & Technology Timing adjustment of clock signals in a digital circuit
US6578176B1 (en) * 2000-05-12 2003-06-10 Synopsys, Inc. Method and system for genetic algorithm based power optimization for integrated circuit designs
US6826733B2 (en) * 2002-05-30 2004-11-30 International Business Machines Corporation Parameter variation tolerant method for circuit design optimization
US7003747B2 (en) * 2003-05-12 2006-02-21 International Business Machines Corporation Method of achieving timing closure in digital integrated circuits by optimizing individual macros
US7085942B2 (en) * 2003-05-21 2006-08-01 Agilent Technologies, Inc. Method and apparatus for defining an input state vector that achieves low power consumption in a digital circuit in an idle state
US20050050494A1 (en) * 2003-09-02 2005-03-03 Mcguffin Tyson R. Power estimation based on power characterizations of non-conventional circuits
US7000204B2 (en) * 2003-09-02 2006-02-14 Hewlett-Packard Development Company, L.P. Power estimation based on power characterizations
US7657416B1 (en) * 2005-06-10 2010-02-02 Cadence Design Systems, Inc Hierarchical system design
US7840451B2 (en) * 2005-11-07 2010-11-23 Sap Ag Identifying the most relevant computer system state information
US7716618B2 (en) * 2006-05-31 2010-05-11 Stmicroelectronics, S.R.L. Method and system for designing semiconductor circuit devices to reduce static power consumption
CN103810354B (en) * 2014-03-11 2016-08-17 大连交通大学 The Optimization Design of cylinder roller bearing logarithm modification curve
US9378326B2 (en) 2014-09-09 2016-06-28 International Business Machines Corporation Critical region identification
CN108446423B (en) * 2018-01-30 2021-11-26 中国人民解放军国防科技大学 Process and parameter selection for optical element surface shape processing and application method thereof

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5555201A (en) * 1990-04-06 1996-09-10 Lsi Logic Corporation Method and system for creating and validating low level description of electronic design from higher level, behavior-oriented description, including interactive system for hierarchical display of control and dataflow information
US5500805A (en) * 1993-10-06 1996-03-19 Nsoft Systems, Inc. Multiple source equalization design utilizing metal interconnects for gate arrays and embedded arrays
US5612892A (en) * 1993-12-16 1997-03-18 Intel Corporation Method and structure for improving power consumption on a component while maintaining high operating frequency
JP3299842B2 (en) * 1994-05-19 2002-07-08 富士通株式会社 Method and apparatus for arranging and wiring semiconductor integrated circuits
US5880967A (en) * 1995-05-01 1999-03-09 Synopsys, Inc. Minimization of circuit delay and power through transistor sizing
US5619420A (en) * 1995-05-04 1997-04-08 Lsi Logic Corporation Semiconductor cell having a variable transistor width
TW305958B (en) * 1995-05-26 1997-05-21 Matsushita Electric Ind Co Ltd
US5774367A (en) * 1995-07-24 1998-06-30 Motorola, Inc. Method of selecting device threshold voltages for high speed and low power
JP3625923B2 (en) * 1995-09-28 2005-03-02 フジノン株式会社 Retro focus lens
US5910898A (en) * 1995-12-14 1999-06-08 Viewlogic Systems, Inc. Circuit design methods and tools
US5867397A (en) * 1996-02-20 1999-02-02 John R. Koza Method and apparatus for automated design of complex structures using genetic programming
US5838947A (en) * 1996-04-02 1998-11-17 Synopsys, Inc. Modeling, characterization and simulation of integrated circuit power behavior
US5835380A (en) * 1996-06-11 1998-11-10 Lsi Logic Corporation Simulation based extractor of expected waveforms for gate-level power analysis tool
US5768145A (en) * 1996-06-11 1998-06-16 Lsi Logic Corporation Parametrized waveform processor for gate-level power analysis tool
US5889685A (en) * 1996-08-02 1999-03-30 Cirrus Logic, Inc. Method and apparatus for automatically characterizing short circuit current and power consumption in a digital circuit

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7185215B2 (en) * 2003-02-24 2007-02-27 International Business Machines Corporation Machine code builder derived power consumption reduction
US20040168097A1 (en) * 2003-02-24 2004-08-26 International Business Machines Corporation Machine Code Builder Derived Power Consumption Reduction
US8311761B2 (en) 2004-08-20 2012-11-13 Rambus Inc. Strobe-offset control circuit
US11551743B2 (en) 2004-08-20 2023-01-10 Rambus, Inc. Strobe-offset control circuit
US10741237B2 (en) 2004-08-20 2020-08-11 Rambus Inc. Strobe-offset control circuit
US10056130B2 (en) 2004-08-20 2018-08-21 Rambus Inc. Strobe-offset control circuit
US20100118627A1 (en) * 2004-08-20 2010-05-13 Best Scott C Strobe-offset control circuit
US9111608B2 (en) 2004-08-20 2015-08-18 Rambus Inc. Strobe-offset control circuit
US8135555B2 (en) 2004-08-20 2012-03-13 Rambus Inc. Strobe-offset control circuit
US8688399B2 (en) 2004-08-20 2014-04-01 Rambus Inc. Strobe-offset control circuit
US10332583B2 (en) 2004-12-21 2019-06-25 Rambus Inc. Memory controller for strobe-based memory systems
US10861532B2 (en) 2004-12-21 2020-12-08 Rambus Inc. Memory controller for strobe-based memory systems
US11842760B2 (en) 2004-12-21 2023-12-12 Rambus Inc. Memory controller for strobe-based memory systems
US8743635B2 (en) 2004-12-21 2014-06-03 Rambus Inc. Memory controller for strobe-based memory systems
US9105325B2 (en) 2004-12-21 2015-08-11 Rambus Inc. Memory controller for strobe-based memory systems
US11450374B2 (en) 2004-12-21 2022-09-20 Rambus Inc. Memory controller for strobe-based memory systems
US9390777B2 (en) 2004-12-21 2016-07-12 Rambus Inc. Memory controller for strobe-based memory systems
US9728247B2 (en) 2004-12-21 2017-08-08 Rambus Inc. Memory controller for strobe-based memory systems
US9905286B2 (en) 2004-12-21 2018-02-27 Rambus Inc. Memory controller for strobe-based memory systems
US7688672B2 (en) * 2005-03-14 2010-03-30 Rambus Inc. Self-timed interface for strobe-based systems
US8295118B2 (en) 2005-03-14 2012-10-23 Rambus Inc. Self-timed interface for strobe-based systems
US20060203602A1 (en) * 2005-03-14 2006-09-14 Rambus, Inc. Self-timed interface for strobe-based systems
US7249215B2 (en) 2005-05-03 2007-07-24 Atmel Corporation System for configuring parameters for a flash memory
US7181565B2 (en) 2005-05-03 2007-02-20 Atmel Corporation Method and system for configuring parameters for flash memory
EP2257900A2 (en) * 2008-02-05 2010-12-08 Nangate A/S Optimization of integrated circuit design and library
EP2257900A4 (en) * 2008-02-05 2012-10-17 Nangate As Optimization of integrated circuit design and library
US11361133B2 (en) * 2017-09-26 2022-06-14 Intel Corporation Method of reporting circuit performance for high-level synthesis
CN109583070A (en) * 2018-11-23 2019-04-05 拓卡奔马机电科技有限公司 A kind of method and system, computer readable storage medium, terminal optimizing trimmed curve quality
CN112417609A (en) * 2020-12-15 2021-02-26 中国第一汽车股份有限公司 Steering transmission shaft optimization design method, computer equipment and storage medium
US20230075565A1 (en) * 2021-09-07 2023-03-09 International Business Machines Corporation Signal pre-routing in an integrated circuit design

Also Published As

Publication number Publication date
US6327552B2 (en) 2001-12-04

Similar Documents

Publication Publication Date Title
US6327552B2 (en) Method and system for determining optimal delay allocation to datapath blocks based on area-delay and power-delay curves
US7275233B2 (en) Methods and apparatuses for designing integrated circuits
US6438735B1 (en) Methods and apparatuses for designing integrated circuits
US6493863B1 (en) Method of designing semiconductor integrated circuit
US6363515B1 (en) Early power estimation tool for high performance electronic system design
US6195786B1 (en) Constrained register sharing technique for low power VLSI design
Blume et al. Model-based exploration of the design space for heterogeneous systems on chip
Guo et al. RapidStream: parallel physical implementation of FPGA HLS designs
Amiri et al. FPGA-based soft-core processors for image processing applications
Coole et al. BPR: fast FPGA placement and routing using macroblocks
Jamier et al. APOLLON, a data-path silicon compiler
Soldavini et al. Automatic creation of high-bandwidth memory architectures from domain-specific languages: The case of computational fluid dynamics
Ali et al. Exploring HLS optimizations for efficient stereo matching hardware implementation
Banerjee et al. MATCH: A MATLAB compiler for configurable computing systems
US11238199B1 (en) High-level synthesis vector library for single-instruction multiple data programming and electronic system design
US7346479B2 (en) Selecting design points on parameter functions having first sum of constraint set and second sum of optimizing set to improve second sum within design constraints
Tariq et al. High-level annotation of routing congestion for xilinx vivado hls designs
Bazargan et al. Integrating scheduling and physical design into a coherent compilation cycle for reconfigurable computing architectures
O'Nils Specification, synthesis and validation of hardware/software interfaces
Kavvadias et al. Hardware design space exploration using HercuLeS HLS
Tang et al. Compiler optimizations in the pact hdl behavioral synthesis tool for asics and fpgas
Shirazi et al. Quantitative analysis of FPGA-based database searching
Greiner et al. Designing a high complexity microprocessor using the alliance CAD system
Mukherjee et al. A novel synthesis strategy driven by partial evaluation based circuit reduction for application specific dsp circuits
Gorjiara et al. GNR: A formal language for specification, compilation, and synthesis of custom embedded processors

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NEMANI, MAHADEVAMURTY;BAEZ, FRANKLIN;REEL/FRAME:010515/0872

Effective date: 19991220

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

CC Certificate of correction
FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20091204