WO2005072054A2 - Conception previsible de systemes a faible consommation d'energie par optimisation et estimation avant la mise en oeuvre - Google Patents

Conception previsible de systemes a faible consommation d'energie par optimisation et estimation avant la mise en oeuvre Download PDF

Info

Publication number
WO2005072054A2
WO2005072054A2 PCT/IB2005/000962 IB2005000962W WO2005072054A2 WO 2005072054 A2 WO2005072054 A2 WO 2005072054A2 IB 2005000962 W IB2005000962 W IB 2005000962W WO 2005072054 A2 WO2005072054 A2 WO 2005072054A2
Authority
WO
WIPO (PCT)
Prior art keywords
design
power
hardware
algorithmic
architecture
Prior art date
Application number
PCT/IB2005/000962
Other languages
English (en)
Inventor
Wolfgang Nebel
Ansgar Stammermann
Domenik Helms
Eike Schmidt
Milan Schulte
Lars Kruse
Gerd Von Colln
Arne Schulz
Original Assignee
Chipvision Design Systems Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chipvision Design Systems Ag filed Critical Chipvision Design Systems Ag
Priority to PCT/IB2005/000962 priority Critical patent/WO2005072054A2/fr
Priority to US11/044,646 priority patent/US7725848B2/en
Priority claimed from US11/044,646 external-priority patent/US7725848B2/en
Publication of WO2005072054A2 publication Critical patent/WO2005072054A2/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/06Power analysis or power optimisation

Definitions

  • This invention relates to the design of low power circuits and systems.
  • the invention features a method of designing a low power circuit that implements specified functionality.
  • the method includes: analyzing code for an algorithmic description of the specified functionality to generate a design representation of the circuit at the algorithmic level; instrumenting the code for the algorithmic description so as to capture data streams during execution/simulation of the algorithmic description; executing/simulating the instrumented code to generate an activity profile for the algorithmic description, wherein said activity profile includes at least portions of data traces of at least a portion of the executed algorithmic description; using the design representation to generate an initial hardware design that is an estimate of hardware resources necessary to implement at least some of the specified functionality;
  • the activity profile includes a complete data trace of at least a portion of the executed algorithmic description or it includes samples of a data trace of at least a portion of the executed algorithmic description.
  • the generated initial hardware design is an estimate of hardware resources necessary to implement all of the specified functionality.
  • the method also includes, based on the activity profile and the initial hardware design, iteratively modifying a derived hardware design to arrive at a final hardware design that is optimized for low power, wherein iteratively modifying involves using the activity profile to compute a power consumption estimation for the derived hardware design at multiple iterations of the design. Iteratively modifying involves using the activity profile to compute a power consumption estimation for the derived hardware design at each iteration of the design.
  • Iteratively modifying involves using the activity profile and power models for the hardware resources to compute a power consumption estimation for the derived hardware design at each iteration of the design.
  • the design representation is a control data flow graph.
  • Using the design representation to generate the initial hardware design involves using the design representation to allocate and schedule operations of the algorithmic description among the hardware resources.
  • the algorithmic description does not carry implementation-related information.
  • the activity profile is generated without reference to a hardware description.
  • the activity profile contains (1) information about the sequence of an execution of statements of the algorithmic description; and (2) information about data processed by those statements.
  • the initial hardware design includes a schedule, a set of allocated resources, an initial binding, and a floorplan.
  • the method also includes outputting the final hardware design that is optimized for low power.
  • US1DOCS 494 383 8vl algorithmic level automatically instrument the code for the algorithmic description so as to capture data streams during execution of the algorithmic description; and execute the instrumented code to generate an activity profile for the algorithmic description, wherein said activity profile includes data traces of at least a portion of the executed algorithm; (2) an architecture estimator module programmed to cause the processor system to use the design representation and a target architecture template to generate an initial hardware design that is an estimate of hardware resources necessary to implement the specified functionality; and (3) a power estimator module programmed to cause the processor system to compute power consumption for the initial hardware design based on the activity profile and power models for the hardware resources.
  • Fig. 10 shows the data streams for the inputs of the combined adder.
  • Fig. 1 The upper part of Fig. 1 is applied when a pre-implementation power estimate is needed.
  • the architecture of the design is unknown yet and the set of components to be allocated to implement the device is still to be determined. Consequently, neither their interconnection and communication structure nor their activation patterns are defined.
  • Eqs. 1 and 2 an evaluation of Eqs. 1 and 2 is not possible yet, even if higher-level capacitance and activation models were to be applied.
  • ASIC vendors can provide average power figures for logic in a given technology based on their experience and characterizations. This figure will come in terms of Milliwatts per Megahertz per kilogate and needs to be weighted with an activity ratio expected for the application. The number of registers can be a useful input to an estimate of the clock tree power.
  • interconnect power can exhibit a significant percentage of the system power. Analyzing the power consumption of interconnect requires input of physical layout and material properties. This can be partly available for a platform based on measurements or simulations. Off-chip interconnect capacitive loads which can easily be several orders of magnitude larger than on-chip loads, can be derived from the system specification. The power analysis becomes more difficult for on-chip interconnect and in case of complex bus encoding schemes.
  • the design tasks at the algorithmic-level of abstraction include optimizing algorithms, which are to be implemented either by software, application specific hardware, or by a combination of both.
  • the objectives include performance, cost and power optimizations.
  • Means of improvements include selection of the most suitable algorithm performing the requested function, optimizing this algorithm, and partitioning the algorithm into parts, which will finally be implemented in software, and others, which will be realized by hardware.
  • the algorithm can formally be represented by a Control and Data Flow Graph (CDFG) [Ref. 21: Girczyc, E.F., and Knight, J.P., An ADA to Standard Cell Hardware Compiler Based on Graph Grammers and Scheduling, in Proc. IEEE Int. Conf. on Computer Design, October, 1984.], [Ref. 22: Raghunathan, A., and Jha, N.K., Behavioral Synthesis for Low Power, in Proc. IEEE Int. Conf. on Computer Design, October, 1994.].
  • the vertices of the CDFG represent either the arithmetic or logic statements of the algorithm, or the control statements.
  • the edges model the data and control flow dependencies.
  • a CDFG implies a partial order on the execution of the statements as required by the data and control dependencies of the algorithm.
  • USlDOCS 4943838vl 19 different programs, to select processors, and to optimize the software.
  • the different levels offer a trade-off in accuracy versus effort to generate the power reports.
  • This number can be estimated by mapping the source code on instruction classes, which have been empirically characterized with respect to the instructions per cycle of each class.
  • the total energy E program needed for the execution of a program, can thus be estimated using Eq. 4 with T execution being the total execution time of the program and E proc the energy per Megahertz clock frequency of the processor. The accuracy of the approach is within 20% of an instruction level power analysis.
  • E program T execution • E proc • J f En M' 4
  • Simunic et al. [Ref. 25] could achieve a power reduction of more than 50% by replacing an L2 cache by a burst SDRAM while even improving the throughput of a signal processing system.
  • the disadvantage of working at this low level is the long execution time of the simulation.
  • Eq. 1 is replaced by Eq. 5 [see, Ref. 27: Mehra, R., and Rabaey, J., Behavioral Level Power Estimation and Exploration, in Proc. First Int. Workshop on Low Power Design, Napa Valley, April, 1994.]:
  • algorithmic-level power analysis includes the following general operations: architecture estimation (scheduling, allocation, binding of operations and memory accesses, communication architecture estimation including wire length prediction), activation estimation, and power model evaluation.
  • architecture estimation scheduling, allocation, binding of operations and memory accesses
  • communication architecture estimation including wire length prediction
  • activation estimation and power model evaluation.
  • power model evaluation The structure of an algorithmic-level power estimator tool that performs these functions is shown in Fig. 5.
  • the main challenge of algorithmic-level power estimation for hardware implementations is the difficulty of predicting the structural and physical properties of a yet to be designed power optimized circuit.
  • Existing approaches to solve this problem rely on a power optimizing architectural synthesis of the design before power analysis.
  • the accuracy of the power analysis depends on how well the assumed architecture matches the final architecture.
  • This final architecture is subject to many parameters, e.g. the design style specific architecture templates, which are the main differentiating factors in times of fabless semiconductor vendors, or the tool chain applied at the later phases of the design process (RT-level synthesis, floorplanning, routing, clock tree generation).
  • an architecture estimator should either consider the design flow and style applied to the real design, or generate an architecture of such high quality that it can be implemented without further changes.
  • FIG. 6 shows such a generic target architecture for the HW implementation of a CDFG. It consists of three parts: the datapath, which implements the dataflow of the CDFG, the controller, which organizes the dataflow and the control flow, and the clock tree. It is the task of the architecture synthesis to schedule the operations under timing and resource constraints, as well as to allocate the required resources in terms of operation units.
  • the operation units can be arithmetic or logic modules as well as memories.
  • USlDOCS 494 3 838vl 23 architecture synthesis is the set of operation units and registers allocated as well as the steering logic, which implements the data transfer connections between the operation units and the registers.
  • the second output is the controller, which is a state machine generating the necessary control signals to steer the multiplexers, operation units, and enable signals of the registers. To be able to do so, it needs to implement the control flow and the schedule based on the status signals of comparator operation units in the datapath.
  • Early work on architectural synthesis for low power has analyzed the impact of binding and allocation during high-level synthesis on the power consumption and integrated power optimizations into high level synthesis tools [Ref. 22], [Ref. 30: Martin, R.S., and Knight, J.P., Power-Profiler: Optimizing ASICs Power Consumption at the Behavioral Level, in Proc. Design Automation Conference, San Francisco, June, 1995.].
  • the schedule of a datapath defines at which control step each of the operations is performed. It has an impact on the power consumption. It defines the level of parallelism in the datapath and hence the number of required resources.
  • the schedule determines the usage of pipelining and chaining. While pipelining can be a means to reduce power by isolating the propagation of unnecessary signal transitions even within one operation unit, chaining causes the propagation of such glitches through several operation units in one clock cycle and hence increase the power consumption. Musoll et al. [Ref. 31: Musoll, E. and Cortadella, J., Scheduling and Resource Binding for Low Power, in Proc. Int.
  • USlDOCS494 3838 vl 24 the operations of the CDFG.
  • Several operations can be assigned to the same operation unit if they are scheduled into disjoint control steps and the operation belongs to a subset of the operations which can be implemented by the same unit.
  • These operation units are pre-designed and power-characterized modules, like multipliers, memories, adders, ALUs, comparators, subtractors etc.
  • the valid set of target units of the resource binding depends on the set of operations these units can perform. This opens further possibilities for power optimization, because more than one type of operation unit can be chosen as target unit, influencing the resulting power consumption. For example, an addition can be bound to a carry-look-ahead adder, a carry-save adder or an ALU. Similarly, variables and arrays can be mapped to registers or memories. Typically, arrays will be mapped to memories while single variables will be mapped to registers.
  • the resource allocation and binding affects the power consumption of the datapath due to several effects.
  • the power consumption of each operation unit strongly depends on the switching activity of its inputs.
  • the internal data applied to the operation units will usually not be independent, but highly correlated in a similar way over a wide range of input data.
  • Applying consecutive input data of high correlation to an operation unit reduces its power consumption.
  • An established measure for the input switching activity is the average Hamming Distance of a sequence of input patterns [Ref. 28]. Analyzing the input streams of the operations allows assigning the operations to operation units in a power optimized way by exploiting these data correlations. Since this assignment is an NP-complete problem, different heuristics have been proposed. Khouri et al. [Ref.
  • an algorithmic specification and its CDFG may contain calls to non-standard functions, e.g. combinational logic functions, which are defined by their input/output behavior. Since these are not part of the power-characterized library, they require a special treatment during algorithmic-level power estimation. Two main approaches are possible in principle: synthesis or complexity estimation.
  • USlDOCS49 4383 8vl 26 will be the effort to compute this value.
  • the entropy of a module output can thus be used as an indicator of its computing power consumption [Ref. 35: Marculescu, D., Marculescu, R., and Pedram, M., Information Theoretic Measures of Energy Consumption at Register Transfer Level, in Proc. International Symposium on Low Power Electronics and Design, Dana Point, April, 1995.], [Ref. 36: Nemani, M., and Najm, F.N., High-Level Area and Power Estimation for VLSI Circuits, in Proc. Int. Conference on Computer Aided Design, San Jose, November, 1997.], [Ref.
  • the input parameters for the regression include: the number of states, inputs, outputs, and the state coding as well as the input signal probabilities, which can be extracted from the schedule, the status and control signals.
  • USlDOCS 943838vl 27 can be separately analyzed or estimated once they have been allocated and their input activity was captured.
  • the power consumption of the communication between these components and their synchronization by the clock requires physical information of the placement of these components and their interconnect as well as their clock tree. It is important to consider the effect of the interconnect on the total power consumption during the different steps of the architecture definition.
  • a power aware interconnect design can significantly reduce the total power consumption of the system. For example, [Ref. 38: Zhong, L., and Jha, N.K., Interconnect-aware High-level Synthesis for Low Power, in Proc. Conference on Computer Aided Design, San Jose, November, 2002.] and [Ref.
  • This interconnect aware power optimization requires an estimation technique for the interconnect and its power consumption.
  • the interconnect power models applied so far are based on the switched capacitance of the wires, as formulated in Eq. 1. For a global power estimate it is sufficient to estimate the total switched capacitance.
  • Empirical wire models like Rent's Rule [Ref. 40: Christie, P., and Stroobandt, D., The Interpretation and Application of Rent's Rule, IEEE Transactions on VLSI Systems, Vol. 8, No. 6, December, 2000.] can be applied to predict the number and average length of wires.
  • this average figure is too pessimistic.
  • Such a power optimal floorplan will locate components, which are communicating at a high data rate as close together as possible and thus save power.
  • the capacitance and switching activity of individual wires should be known.
  • the problem of interconnect power estimation is to estimate the capacitance and activity of each wire of a RT-level architecture. Since the activity can be derived from the activity data of the modules, which have been discussed previously, the remaining problem is to estimate the wire capacitance which is primarily determined by the wire's length, the physical layers used to implement the wire, and the number of vias.
  • USlDOCS494 3 83Svl 28 The wire length depends on the location of the modules of the design on the floorplan and the routing structure between the modules.
  • the capacitance of a wire typically correlates with the wire's length in a non-linear way [Ref. 41: Stammermann, A., Helms, D., Schulte, M., Schulz, A., and Nebel, W., Binding, Allocation and Floorplanning in Low Power High-Level Synthesis, in Proc. Int. Conference on Computer Aided Design, San Jose, November 2003.].
  • the main problem remaining is to calculate the expected length of each wire in a power- optimized floorplan. This requires including floorplanning and routing into the estimation.
  • high-level synthesis consists of the phases: allocation, scheduling, and binding, which are typically performed in a sequential manner.
  • High- level synthesis for low power adds a further step: interconnect optimization.
  • Each of these steps is a NP-complete problem, thus the entire problem is np-complete, i.e., a guaranteed optimal solution cannot be found in reasonable computation time.
  • An optimal design can further not be achieved by applying these steps sequentially, basically, because the optimizations are not independent. Consequently, since power analysis at this level requires a detailed understanding of the target architecture, heuristics are needed to synthesize such architecture in a power-optimized way under simultaneous consideration of allocation, scheduling, binding, and floorplanning.
  • Prabhakaran et al. [Ref. 42] apply moves changing the schedule and the binding. Before evaluating the cost function, they perform a floorplanning step during each iteration. Zhong et al. [Ref. 38] use allocation and binding moves followed by a floorplanning for cost estimation.
  • Stammermann et al. [Ref. 41] include allocation, binding and floorplanning moves in their optimization heuristics (see Fig. 7).
  • the upper part of the figure shows the outer loop of the optimization, during which binding and allocation moves are preformed. If, based on a preliminary power estimate, a binding/allocation move is promising, then the floorplan is updated and optimized by several floorplan moves in an inner loop, as shown in the lower part of Fig. 7. In this case, the moves consist of resource allocation (sharing/splitting) and binding moves as well as floorplan related moves.
  • the results show a significant improvement compared to interconnect unaware power optimization. (This is further described in German Patent Application DE 103 24 565.0 filed May 30, 2003 and corresponding U.S. Patent Application U.S.S.N. 10/857,212, both of which are incorporated herein by reference.)
  • the floorplan and the allocated registers are also the basis for the generation of a clock tree model, which can be used for clock power prediction.
  • the result is a power-optimized architecture automatically generated from an algorithmic-level description.
  • the expected power consumption of this architecture is analyzed during the optimization loops. This power figure will have a high relative accuracy and can serve as an estimate of the power consumption of the input algorithm. It can be taken as a guide for optimizing the input algorithm for low power.
  • the power estimate is a good prediction of the absolute power consumption to be expected for the design.
  • the tool includes three main components, namely, an activity estimator 50, an architecture estimator 52, and a power calculation module or power estimator 54. These elements combined with a library of power models for macro modules 68 make up the key elements of the algorithmic-level power estimation flow.
  • Activity estimator 50 takes as input an algorithmic or pre-implementation specification 56, which is, for example, in the C/C++ or System C languages, and a testbench specification 58; it analyzes and automatically instruments the algorithmic specification; and it generates a Control Data Flow Graph (CDFG) 60, which is needed for optimization, and an activity profile 62 that contains complete data traces of at least portions of the algorithm.
  • algorithmic or pre-implementation specification 56 which is, for example, in the C/C++ or System C languages, and a testbench specification 58; it analyzes and automatically instruments the algorithmic specification; and it generates a Control Data Flow Graph (CDFG) 60, which is needed for optimization, and an activity profile 62 that contains complete data traces of
  • the code instrumentation inserts protocol statements, which capture activity of the algorithm during execution.
  • Architecture estimator 52 using constraints supplied by a constraints library 66 and component models from a component library 68, iteratively estimates or predicts an architecture 70 based on power calculations performed by the power estimator 54.
  • Power estimator 54 combines the activity profile 62, which was generated from the instrumented algorithm, with the predicted architecture 70 from architecture estimator 52 and the component models from components library 68 to estimate power consumption of the predicted architectures. The power estimations are fed back to architecture estimator 52 to enable iterative modifications toward a more optimal low power design.
  • the power estimation tool is largely implemented by code which runs on a platform that includes one or more processors or CPUs, local memory, and data storage for storing the code which implements the functionality described herein.
  • the tool provides a computational framework and user interface which integrates the techniques presented above into an EDA tool.
  • the power estimation tool provides a language front-end which allows reading algorithmic specifications in a suitable language.
  • the front-end includes a language parser which extracts the CDFG 60 of the algorithm and automatically instruments the source code. The system-level designer then executes the instrumented source code 57 with application stimuli or other representative testbenches 58.
  • US1DOCS 4943838vl 31 execution the values of the variables and the input and output vectors or operations are captured in an activity file. This activity can be attributed to the respective resources of the datapath and interconnect for later power calculation.
  • Fig. 5 The top right hand part of Fig. 5 represents the architecture estimation function. It is key that the estimated architecture is optimized for low power. Hence, it constructs a datapath and respective controller that minimizes the switching activity. As described above, iterative optimization techniques have to be applied to generate a power efficient resource allocation, scheduling, binding, and floorplanning. This iterative procedure employs a feed-back from intermediate power estimates of the temporary solutions as indicated in Fig. 5.
  • the accuracy of the power analysis depends on how well the assumed architecture matches the final architecture.
  • This final architecture is subject to many parameters, e.g. the design style specific architecture templates which are the main differentiating factors in times of fables semiconductor vendors, or the tool chain applied at the later phases of the design process (RT-level synthesis, floorplaning, routing, clock tree generation).
  • an architecture estimator should either consider the design flow and style applied to the real design, or generate an abstract architecture of such high quality that it can be implemented without further global changes, however, without limiting local optimizations.
  • the architecture output contains such a description of the architecture.
  • Activity estimator operates before hardware has been inferred. Thus, the activity is independent of the compete hardware that is inferred. This means that in evaluating the different design options one can change the inference of the hardware without having to perform another simulation of the system. It is straightforward to
  • USlDOCS494383 8 vl 32 estimate the activity of the algorithm simply by executing the algorithm and sampling the activity of the variables and operations of the algorithm. This process is automated by an automatic instrumentation of the source code. This instrumentation takes care of capturing the data streams during execution.
  • Pre-implementation specification 56 is a specification of a system or part of a system on a level of abstraction that does not incorporate aspects of its eventual hardware implementation.
  • the input specification for algorithmic-level power estimation is typically an executable specification in terms of a programming or system-level design language, e.g. C or SystemC.
  • the eventual goal of the tool is to estimate the power consumed by a hardware implementation of the system specified. Testbench:
  • Testbenches are prominent concepts in hardware description languages.
  • a testbench models the environment of a system (as described in pre-implementation specification 56) and its interaction with the system. In state of the art design flows, it is used for the purpose of verifying the system by means of simulation.
  • the testbench injects stimuli (i.e. data) into the system under test and reports and/or asserts the system's responses.
  • stimuli i.e. data
  • the testbench fully defines the dynamic behaviour of a deterministic system under test. Different testbenches on the other side can stimulate the system under test in a different way and lead to different behavior.
  • pre-implementation specification 56 and the testbench 58 form one joint executable specification. Quite often testbenches read their stimuli from a disk file. Herein this file is considered to be part of the definition of the testbench.
  • parsing During source code analysis the (lexical and) syntactic content of pre- implementation specification 56 is analyzed. This process is called parsing. During parsing control and dataflow is extracted from the specification and brought into a
  • Code instrumentation is a technique to obtain dynamic information about the execution of computer code. The basic concept is to add statements to the original code to protocol such information and then execute this "instrumented" version of ths code 57. Code instrumentation is a well-established technique, for example widely used for code profiling (i.e. analyzing the execution time and frequency of code passages).
  • This phase involves translating testbench 58 and instrumented code 57 into an executable/simulatable form and the subsequent execution/simulation. Due to the instrumentation, this step will produce the same output as the original specification plus a profile of the activity of the circuit as background files, i.e., activity profile 62.
  • Activity profile 62 contains: (1) information about the sequence of execution of statements of the original specification; and (2) information about the data processed by these statements. It has the form of one or several files. Note that while item (1) is a typical type of information for instrumentation, item (2) is rather specific to the problem of power estimation, as it is necessary to estimate the switching activity in the circuits here. Note that both kinds of information come from the execution simulation of an abstract, i.e. not implementation-related, specification. The activity observed in an eventual hardware implementation of the specification can therefore not be trivially deduced from this activity profile.
  • the refinement process from an algorithmic description to an register transfer level architecture is called behavioural synthesis or high-level synthesis (e.g. [Ref. 17]).
  • the three main problems to be solved during high-level synthesis are: scheduling, allocation and binding.
  • Scheduling is the process of fixing the execution time for each operation.
  • Allocation refers to defining the number of hardware functional units of each type.
  • Binding is the process of assigning a number of operations to each hardware functional unit.
  • the architecture estimator predicts the power relevant aspects of the result of high-level synthesis for the specification initially given. It does not necessarily generate a complete architecture in full detail.
  • the target architecture template includes: a data path made up hardware functional units like arithmetic units, logic and memory; a controller that coordinates activity of the hardware functional units; registers for storing the intermediate values; and a crossbar switch network for interconnecting the units.
  • each operation node of the CDFG is assigned to: • exactly one control step, or • in case of chaining, to an execution position within one control step, or • in case of pipelining, to a sequence of control steps.
  • the valid set of target units of the resource binding depends on the set of operations these units can perform. This opens further possibilities for power optimization because more than one type of operation unit can be chosen as a target unit, influencing the resulting power consumption. For example, an addition can be bound to a carry-look-ahead adder, a carry-save adder or an ALU. Similarly, variables and arrays
  • the component library contains models of the timing, area, and power consumption of the hardware functional units. It is used to determine the different cost aspects of design solutions. Power Models:
  • the operation units used in the generate architecture are pre-designed and power-characterized modules, like multipliers, memories, adders, ALUs, comparators, subtracters, etc.
  • these models can be generated by simulation and power characterization based on lower level power analysis tools and appropriate power models [Ref. 28], [Ref. 29].
  • These power models should be parameterized with respect to structural aspects, e.g. bit- width, and activity parameters.
  • the Hamming-Distance between consecutive input vectors has proven to be a reliable parameter to capture the input activity for such modules.
  • higher order functions of the switching probability distribution of input signals, e.g. momentums have been applied as parameters of high-level power models for macro modules [Ref.
  • interconnect power depends on the topology of individual wires and their activity.
  • power models for interconnects are parameterized by the wire length and the signal activity. These models need to be calibrated with respect to the placement and routing tools used as well as with the process technology. Such empirical models can include estimators for wire topology and the number of vias [Ref. 29]. Architecture:
  • the architecture prediction generates the following information for each operation of the source code: (1) the clock step in which the execution of the operation is started ("schedule”); and (2) the instance of a computational resource on which it is executed ("binding").
  • the simulation/instrumentation produces an activity profile that includes: (1) a valid sequence of execution of the operations; and (2) an ordered list of data vectors consumed and/or produced by each operation.
  • Component library 68 contains: (1) information about the delay of components; and (2) models for computing their power consumption.
  • each data vector is assigned a time stamp that denotes the time the vector would have been produced had the predicted architecture been simulated, as opposed to the algorithm.
  • the operations are visited in the order denoted by the information that was generated by the above-mentioned simulation/instrumentation, namely, the valid sequence of execution of the operations.
  • a time-counter is used during the process to time stamp the operations. For every
  • the correct time is computed by combining the information from the time counter and the clock step assigned by the schedule produced by the architecture prediction phase.
  • the execution sequence then continues to the multiple operation (i.e., * 2 ).
  • This operation merges the vector lists that would be mapped on the same functional resources.
  • the vector merge operation produces the timed vector lists that would be visible at the input of the hardware functional units. So, this operation does the transition from algorithmic operations to hardware resources.
  • Hd_total Hd(5,8)+-Hd(8,l)+Hd(l,2)+Hd(3,3)+Hd(3,l)+Hd(l,3)

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

Procédé permettant de concevoir un circuit à faible consommation de courant mettant en oeuvre une fonctionnalité déterminée, ledit procédé consistant à analyser un code pour une description algorithmique de la fonctionnalité déterminée afin de produire une représentation de conception du circuit au niveau algorithmique; à équiper le code pour la description algorithmique afin de capturer les flux de données pendant l'exécution/simulation de la description algorithmique; à exécuter/simuler le code équipé pour produire d'un profil d'activité pour la description algorithmique, ledit profil d'activité comprenant au moins des parties de traces de données au moins d'une partie de la description algorithmique exécutée; à utiliser la représentation de conception pour produire une première conception de matériel qui est une évaluation des ressources matérielles nécessaires à la mise en application d'au moins une partie de la fonctionnalité déterminée; et à calculer la consommation de courant pour la conception initiale de matériel en fonction du profil d'activité et des modèles de consommation pour les ressources matérielles.
PCT/IB2005/000962 2004-01-27 2005-01-27 Conception previsible de systemes a faible consommation d'energie par optimisation et estimation avant la mise en oeuvre WO2005072054A2 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/IB2005/000962 WO2005072054A2 (fr) 2004-01-27 2005-01-27 Conception previsible de systemes a faible consommation d'energie par optimisation et estimation avant la mise en oeuvre
US11/044,646 US7725848B2 (en) 2005-01-27 2005-01-27 Predictable design of low power systems by pre-implementation estimation and optimization

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US60/539,448 2004-01-27
PCT/IB2005/000962 WO2005072054A2 (fr) 2004-01-27 2005-01-27 Conception previsible de systemes a faible consommation d'energie par optimisation et estimation avant la mise en oeuvre
US11/044,646 US7725848B2 (en) 2005-01-27 2005-01-27 Predictable design of low power systems by pre-implementation estimation and optimization

Publications (1)

Publication Number Publication Date
WO2005072054A2 true WO2005072054A2 (fr) 2005-08-11

Family

ID=34921977

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2005/000962 WO2005072054A2 (fr) 2004-01-27 2005-01-27 Conception previsible de systemes a faible consommation d'energie par optimisation et estimation avant la mise en oeuvre

Country Status (1)

Country Link
WO (1) WO2005072054A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722746A (zh) * 2022-05-24 2022-07-08 苏州浪潮智能科技有限公司 一种芯片辅助设计方法、装置、设备及可读介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722746A (zh) * 2022-05-24 2022-07-08 苏州浪潮智能科技有限公司 一种芯片辅助设计方法、装置、设备及可读介质
CN114722746B (zh) * 2022-05-24 2022-11-01 苏州浪潮智能科技有限公司 一种芯片辅助设计方法、装置、设备及可读介质

Similar Documents

Publication Publication Date Title
US7725848B2 (en) Predictable design of low power systems by pre-implementation estimation and optimization
Li et al. A framework for estimation and minimizing energy dissipation of embedded HW/SW systems
US7134100B2 (en) Method and apparatus for efficient register-transfer level (RTL) power estimation
US7558719B1 (en) System and method for runtime analysis of system models for variable fidelity performance analysis
Gries Methods for evaluating and covering the design space during early design development
US7694249B2 (en) Various methods and apparatuses for estimating characteristics of an electronic system's design
Choi et al. HLScope+: Fast and accurate performance estimation for FPGA HLS
Zuo et al. A polyhedral-based systemc modeling and generation framework for effective low-power design space exploration
Bergamaschi et al. State-based power analysis for systems-on-chip
Posadas et al. System-level performance analysis in SystemC
Grüttner et al. An ESL timing & power estimation and simulation framework for heterogeneous SoCs
Stammermann et al. System level optimization and design space exploration for low power
Beltrame et al. Multi-accuracy power and performance transaction-level modeling
Nebel High-level power estimation and analysis
Makni et al. Hardware resource estimation for heterogeneous FPGA-based SoCs
Makni et al. HAPE: A high-level area-power estimation framework for FPGA-based accelerators
Oyamada et al. Applying neural networks to performance estimation of embedded software
Ahuja High level power estimation and reduction techniques for power aware hardware design
Schliebusch et al. A framework for automated and optimized ASIP implementation supporting multiple hardware description languages
Nebel System-level power optimization
WO2005072054A2 (fr) Conception previsible de systemes a faible consommation d'energie par optimisation et estimation avant la mise en oeuvre
Zaccaria et al. Power estimation and optimization methodologies for VLIW-based embedded systems
Sørensen et al. Generation of formal CPU profiles for embedded systems
Sharma et al. Real-time automated register abstraction active power-aware electronic system level verification framework
Brandolese A codesign approach to software power estimation for embedded systems

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

WWW Wipo information: withdrawn in national office

Ref document number: 2005207884

Country of ref document: AU

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 05718426

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 05718426

Country of ref document: EP

Kind code of ref document: A2

WWW Wipo information: withdrawn in national office

Ref document number: 5718426

Country of ref document: EP