WO2012050678A1 - Method and apparatus for using entropy in ant colony optimization circuit design from high level systhesis - Google Patents

Method and apparatus for using entropy in ant colony optimization circuit design from high level systhesis Download PDF

Info

Publication number
WO2012050678A1
WO2012050678A1 PCT/US2011/050081 US2011050081W WO2012050678A1 WO 2012050678 A1 WO2012050678 A1 WO 2012050678A1 US 2011050081 W US2011050081 W US 2011050081W WO 2012050678 A1 WO2012050678 A1 WO 2012050678A1
Authority
WO
WIPO (PCT)
Prior art keywords
solution
cost
data flow
flow graph
operations
Prior art date
Application number
PCT/US2011/050081
Other languages
English (en)
French (fr)
Inventor
Mustafa Ispir
Levent Oktem
Original Assignee
Synopsys, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/894,842 external-priority patent/US8296712B2/en
Priority claimed from US12/894,902 external-priority patent/US8296713B2/en
Priority claimed from US12/894,756 external-priority patent/US8296711B2/en
Application filed by Synopsys, Inc. filed Critical Synopsys, Inc.
Priority to EP11832913.5A priority Critical patent/EP2622549A4/de
Priority to CN2011800476031A priority patent/CN103140853A/zh
Publication of WO2012050678A1 publication Critical patent/WO2012050678A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/34Circuit design for reconfigurable circuits, e.g. field programmable gate arrays [FPGA] or programmable logic devices [PLD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/06Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the disclosed embodiments relate to circuit design, and more particularly to selecting solutions for time constrained scheduling of operations for a circuit design.
  • VLSI Very Large Scale Integration
  • VHDL Hardware Description Language
  • Verilog HDL Verilog HDL
  • RTL register transfer level
  • the HDL source code describes the circuit elements, and a synthesis process produces an RTL netlist from this source code.
  • the RTL netlist is typically a technology independent netlist, in that it is independent of the technology/architecture of a specific vendor's integrated circuit, such as a field programmable gate array (FPGA) or an
  • the RTL netlist corresponds to a schematic representation of circuit elements (as opposed to a behavioral
  • a mapping operation is then performed to convert from the technology independent RTL netlist to a technology specific netlist which can be used to create circuits in the vendor's technology/architecture.
  • Field Programmable Gate Array (FPGA) vendors use different technologies and architectures to implement logic circuits within their integrated circuits. This results in a final netlist which is specific to a particular vendor's technology and architecture.
  • High Level Synthesis is a process of converting the behavioral descriptions of HLD (High Level Description) to register transfer level (RTL) descriptions.
  • HLS is typically done with a set of design goals and constraints. So while there may be many different ways to implement the behavior of the HLD, HLS seeks to do so while minimizing particular defined costs.
  • the defined costs are typically things such as cycle time, part count, silicon area, power, interconnections, pin count, etc.
  • the constraints are typically driven by form factors, packaging constraints, interoperability and similar concerns.
  • HLS can be described as compiling a specification written in a high level language (HLL), allocating hardware resources to the operations in the specification and then generating the RTL description.
  • HLL high level language
  • the HLS schedules the operations, allocates the operation to particular functional hardware units, allocates any variables to storage elements, and allocates any data transfers to
  • DSP Digital Signal Processors
  • the RTL description provides inputs and outputs of the system and the algorithms that are to be performed. These are described as frames.
  • Frame based algorithms are described by using frame data. The input data is received in frames and the output data is produced in frames.
  • Frame based algorithms are typically synthesized in HLS as follows: First the device collects the frame data from an input stream; then the device processes the frame data; and finally the device sends the output frame as an output stream.
  • the frame synthesis includes scheduling of the operations and binding the operations to hardware to obtain an optimized device design. This methodology suffers from low throughput.
  • Ant Colony Optimization is a recent optimization method that has been applied to many different problems.
  • each ant constructs a candidate solution and leaves pheromones according to the cost associated with each solution it constructs.
  • ACO allows several different solutions to be found. These can then be compared to each other to find an optimum solution.
  • ACO has distinct limitations that prevent it from being directly applied to existing solution methodologies.
  • a method and apparatus for using entropy in ant colony optimization circuit design from high level synthesis is described.
  • an operation to be performed by a circuit is selected.
  • a plurality of hardware components for performing the operation are represented with a data flow graph having edges and nodes.
  • a plurality of solutions for performing the operation are simulated as hardware component combinations represented as paths on the data flow graph. For each solution the cost including a number of edges and nodes traversed on the data flow graph and a supplemental sub-integer cost is determined.
  • a solution is selected with the lowest cost as a hardware component combination for a circuit.
  • FIG. 1 is an example of a process flow diagram for performing high level synthesis for circuit design based on a high level description.
  • FIG. 2 is an example representation of a data flow graph for circuit design.
  • FIG. 3 is an alternative example representation of the data flow graph of FIG. 2.
  • FIG. 4 is a process flow diagram for one embodiment for pipelining operations for a circuit design using input and output data frames.
  • FIG. 5 shows one embodiment of a system for implementing the process of FIG. 5.
  • FIG. 6 is an example of a process flow diagram for solving circuit design using ant colony optimization.
  • FIG. 7 shows one embodiment of a system for implementing the process of FIG. 6.
  • FIG. 8 is an example of a process flow diagram for determining a supplementary cost of a circuit design for use in the process of FIG. 6.
  • FIG. 9 is an example of a process flow diagram of estimating an interconnection cost for use in the process of FIG. 6.
  • FIG. 10 is an example of a process flow diagram of determining a guiding function for selecting solutions for use in the process of FIG. 6.
  • FIG. 11 is an example of a process flow diagram of determining a function for selecting neighbors in a local search for use in the process of FIG. 6.
  • FIG. 12 is a block diagram example of a data processing system configured for use with the disclosed embodiments.
  • At least one embodiment of the disclosed embodiments seeks to use an ant colony optimization (ACO) method to improve the design of an integrated circuit.
  • ACO ant colony optimization
  • an additional cost is added to the cost of a candidate solution to improve the selection of additional candidate solutions.
  • High Level Synthesis is a process that is used to convert behavioral descriptions of a complex integrated circuit system to RTL descriptions that can be used to construct the system. Some of the behavioral descriptions may include frame synthesis, in which an input frame and a corresponding output frame are described.
  • a basic process for designing a circuit with HLS is shown in the context of FIG. 1.
  • the process of FIG. 1 starts with establishing the high level description, for example in HLD 102.
  • This description will provide the operations to be performed by the circuit, which may in one embodiment, include one or more types of partial operations.
  • a partial operation is a portion of a larger operation that is performed to complete the larger operation.
  • the partial operations may include additions and register shifts.
  • Embodiments can be applied to any type of operations whether complete or partial. All operations whether full or partial, will be referred to herein simply as operations.
  • the operations in the HLD are identified at 104 and variables are assigned to the operations at 106.
  • the variables for the operations are identified and ordered based on the time order in which they will be used.
  • the operations can be ordered based on this same time order at 108.
  • hardware components for performing these operations can be defined at 110.
  • a particular difficulty in frame synthesis for fully pipelined architectures is mapping or binding the frame data to memory registers.
  • the design of the memory mapping drastically affects the cost of the multiplexing logic and the control logic that is required to support the pipelined architecture. If the memory mapping is performed first, then there must be assumptions about the sequence of operations. These assumptions may turn out to be wrong after the scheduling algorithm is completed. On the other hand, if the scheduling is done first, then the scheduling algorithm may produce a solution which makes it difficult to map the variables to at least some of the memory blocks. Therefore, in one embodiment, the scheduling algorithm to support pipelining is linked to the corresponding binding algorithms and the memory mapping is performed as part of the scheduling.
  • the Input/Output frame synthesis can be accommodated at the scheduling phase.
  • input frame data that comes in a predetermined order, and input frame data that has no determined order can both be
  • Scheduling and binding algorithms can be defined using a graph structure or data flow graph.
  • a graph structure can be represented as (V, E, W).
  • V is the set of operations v.
  • Each operation has an operation type, which provides the hardware unit types upon which the corresponding operation can be executed.
  • the term operation includes partial operations.
  • E is the set of edges e which are the connections from one operation to another.
  • W is a function which gives the register number w of an edge.
  • Data flow graphs can be composed of nodes that represent the combinational computation units and edges interconnecting the nodes. Delays (e.g. registers) are represented as weights (w) on the edges. Each node has an execution time associated with it. Examples of data flow graphs are shown in FIGS. 2 and 3 which illustrate a method to construct a data flow graph for retiming. FIGS. 2 and 3 are two different representations of the same graph so that, for example, adder 205 and 225 are the same adder.
  • the combinational computation units (e.g., adder 205, multipliers 207 and 209) in FIG. 2 are represented as computation nodes (e.g., nodes 225, 227 and 229 in FIG. 3).
  • FIG. 2 has an input 201 and an output 203. The same path applies to FIG. 3.
  • the execution time of the combinational computation units can be represented by the computation time of the associated nodes.
  • node 225 may have a computation time of 2 ns, which is required by adder 205; and each of nodes 227 and 229 may have a computation time of 4 ns (nanoseconds), which is required by a multiplier (e.g., 209 or 207).
  • Edges represent connections between the computation units.
  • Edge 231 represents the connection between multiplier 207 and adder 205.
  • Edge 231 has a weight of 1, representing register 217 (or the one clock cycle latency due to register 217).
  • edge 233 has a one clock cycle latency due to register 215.
  • Edge 235 represents the connection between multipliers 209 and 207; and, there is no delay associated with edge 235.
  • the data flow graph can be used to compare paths and latencies. For example, in FIG. 3, the path from node 229 to node 227 contains edge 235 that has zero delay, but the path from node 229 to node 227 takes the longest
  • the delay on edge 233 can be moved to edge 235 so that the critical path becomes the path between nodes 225 and 229, which takes only 6 ns of computation time.
  • moving the delay from edge 233 to edge 235 which can be implemented by moving register 215 from between adder 205 and multiplier 209 to between multipliers 209 and 207, allows the modified (retimed) circuit to be operated at a reduced delay of 6 ns.
  • a timing model for a circuit module can be constructed by breaking down the module into registers and combinational computing elements and assigning one node to each combinational computing element.
  • the timing model of each hardware module is a combination of the timing models of the combinational computation units, delays, and interconnections.
  • the aggregation of the set of nodes and edges used in the translation of a particular hardware module is effectively the timing model (data flow graph) of that hardware module.
  • a data flow graph can be represented by diagrams of the type shown in FIGS. 2 and 3
  • a data flow graph can also be represented in other ways, including by tables, text with metadata, and mathematical equations.
  • V the set of values for v
  • E the set of values for e
  • W the set of values for w
  • Input and output frame data represents the input and the output data for a circuit that uses framed data.
  • the data frames can be one dimensional or multi-dimensional. Embodiments are described in the context of a one dimensional frame. However, the same principles can be used to extend the principles to more dimensions.
  • a one dimensional frame (F) can be represented as a set of variables ⁇ vi, v 2 , v n ⁇ , where n is the size of frame (IFI).
  • a pseudo code of a transformation algorithm to generate RTL specifications for a given data flow graph (V, E, W) can be represented as follows: for each input frame F
  • op is a variable name that refers to a newly created operation for a variable of the frame.
  • OpConsume refers to an operation which takes a variable from the input frame as its input.
  • OpSource refers to an operation which produces a variable of an output frame as its output.
  • the above transformation may be performed for each variable of a frame.
  • This provides a set of edges E that can be used to synthesize the frame input/output (I/O).
  • frame synthesis problems can be solved while meeting scheduling and binding objectives.
  • each frame since each frame has its own unique operation type, only one hardware unit can be assigned to a whole frame of data. This automatically converts the frame data to fully pipelined serial data. The memory and multiplexing cost of the synthesized frame can also be minimized.
  • the transformation described above can be used for any serial input sequence. If the serial sequence is predetermined, then the order of the sequence can be transformed to the schedules. In other words, the operations which are produced by frame transformation are scheduled as a pre- step of the scheduling algorithm.
  • the pseudo code of this pre-step can be represented as follows: for each op e V
  • FIG. 4 shows a process flow diagram corresponding to one embodiment of the pseudocode example shown above.
  • the variables from the input data frame of the high level description (HLD) are initialized for all of the variables v of the data frame.
  • operation types are defined for each of the variables in the HLD.
  • op is created for one of the variables.
  • op is a variable name that refers to a newly created operation for the variable of the frame.
  • the new op is added to the operations of the data flow graph.
  • a ConsumeOp refers to an operation which takes a variable from the input frame as its input. If a variable is not used by a consume operation, then an edge is created in the data flow graph from the new operation created at 403 to a consume operation. The process flow then continues to 407 to determine whether there are any additional variables.
  • variable is used by a consume operation then it is determined at 406 whether the variable is produced by a SourceOp.
  • a source operation is an operation that produces a variable of the output frame. If the variable is not produced by a source operation, then at 412, an edge is created in the data flow graph from a source operation to the new operation created for that variable at 403. In addition, the weight on that edge can be set at 0. After creating the edge, then the process continues to 407 to determine if there are any additional variables.
  • variable is produced by the source operation then at 407, it is determined whether there are any additional variables. If there are additional variables, then the process flow returns to 402 to define an operation type for the next variable. In one embodiment, this process is repeated for all of the variables of an input data frame until all of the defined variables have been bound to consume operations and bound to source operations.
  • the variables can be ordered into a frame. In one embodiment this is done using conventional methodologies.
  • the operations can be ordered based on the order of the variables.
  • this process can be repeated for all of the additional input time frames. After all the input time frames have been characterized and defined in the data flow graph and bound to operations, and after the operations have been ordered, in one embodiment this information can be used to determine hardware component combinations as suggested in FIG. 2 at 210.
  • FIG. 5 shows one embodiment of frame binder modules for implementing the frame binding process.
  • the system can be implemented as discrete components of an application specific integrated circuit (ASIC), digital signal processor (DSP), or another electronic device.
  • the system may be implemented in a software simulation system running on a computer system.
  • the modules of FIG. 5 include a high level description (HLD) analyzer 501 which is provides its analysis to an operation and variable binder 503.
  • the high level description analyzer 501 initializes all the variables for a data frame, defines operation types, and creates operations for each variable.
  • the HLD analyzer 501 is supplied by a high level description (HLD) 511.
  • the HLD 511 can be stored in any type of memory which is available to the HLD analyzer 501 and provides to the operation and variable binder 503, the operations, the variables, and the data frames that are desired for the intended final circuit design.
  • the operation and variable binder 503 binds variables to operations and binds operations to hardware types.
  • the operation and variable binder 503 is coupled to a stored set of design constraints 513 which establish the desired performance and hardware limitations and any other design considerations intended to apply to the solutions.
  • the operation and variable binder 503 provides the bound operations and variables to a solutions simulator 505.
  • This simulator 505 creates solutions in the form of hardware modules and hardware connections.
  • the solution in one embodiment can be created by reference to a data flow graph or in a variety of other ways.
  • the solutions from the solutions simulator 505 are in one
  • the selection module 509 in one embodiment looks at each of the solutions and the costs of those solutions from the estimator 507 and selects a final design for the integrated circuit design.
  • an operation can be selected to be performed by the integrated circuit that is to be designed.
  • This operation can include one or more partial operations of different types.
  • the operation may be a complex larger operation such as a mathematical algorithm, a conversion, or a transformation, and this operation may include a variety of individual steps within that operation. These individual steps can be treated as separate operations or as partial operations within the overall operation.
  • the operations and the performance of the circuit can all be described in the high level description. These operations are identified in the HLD, including any partial operations that may be a part of the overall operations.
  • the variables to be used by the operations are identified and ordered based on the times at which the variables will be used by the partial or full operations.
  • the partial operations can be ordered based on the ordering of the variables. Solutions are developed using, for example, a solution simulator which represents different hardware components for performing the operations in any of a variety of different ways. In one embodiment, a data flow graph such as that shown in FIGS. 2 and 3 that has edges and nodes as explained above can be used to simulate solutions.
  • the edges and nodes are connected based on the ordering of the partial operations.
  • Different solutions can be simulated for performing these operations, in one embodiment.
  • the simulations represent the operations as hardware component combinations and these combinations can be represented as paths on the data flow graphs.
  • a cost can be determined so that the different solutions can be compared.
  • the term "cost" can refer to a time to complete the path.
  • the cost can be calculated in a wide range of different ways. A simple approach is to include the number of edges and nodes that are traversed to perform the entire solution on the data flow graph.
  • the solution with the lowest cost can be selected as the hardware component combination for the intended circuit design. In one embodiment, this process can be repeated until all of the operations of the high level description have been characterized and solutions have been found.
  • a subset of possible solutions may be evaluated.
  • the ordering of the operations can have a significant impact on the solution.
  • the operations which produce a variable are ordered after the operations that consume the variable are ordered.
  • the consume operations are all defined and ordered first then the source operations are ordered based on the ordering of the consume operations. This helps to ensure that whenever a variable is consumed, the variable has been produced by a prior operation so that the variable is available for consumption.
  • the quality of the resulting circuit depends on the quality of the simulated solutions. With particularly complex circuits, the number of possible solutions becomes very large. Rather than simulate all possible solutions, techniques have been developed to try to simulate only the best solutions. In some techniques, a baseline is established and the process tries to find solutions that are better than the baseline. Another technique for generating candidate solutions is referred to as Ant Colony Optimization (ACO) which attempts to optimize a solution using a technique modeled on how ants optimize a path between their colony and a food source.
  • ACO Ant Colony Optimization
  • FIG. 6 shows a simplified process flow diagram for one embodiment of ACO.
  • the parameters of the process are first initialized at 601.
  • this initialization can include generating the operations and variables, and creating a net diagram including nodes and edges.
  • One embodiment of the operations included in initialization is described above in the context of FIGS. 4 and 5.
  • the termination condition at 604 can be based on many different factors. Typically a predefined number of cycles is used. However, the termination condition could be based on the variance in the cost of the solutions, the amount of change in the pheromones, or more complex determinations, such as inflection points and graphed costs for the constructed solutions.
  • the selection of a solution is not shown as a separate block because this is included in the local search at 603.
  • the local search 603 can compare a constructed solution at 602 to previous solutions or to different local possibilities in order to select one or more local solutions for simulation. In doing so, the prior solutions can be compared to the current solution and a current best solution can be determined. The pheromones can be updated based on the difference between the current solutions and the best prior solution. With such a methodology, a best solution is tracked. When the termination condition is met, this best solution can be used as the final result. Alternatively, a separate process (not shown) can be used to examine all of the results and pick a best solution.
  • solutions are produced one at a time.
  • a single solution is constructed and then one or a few neighboring solutions are constructed at 603.
  • the pheromones associated with the first solution are deposited, then at 605, the pheromones for another solution and its neighbors are deposited.
  • 20 or 30 solutions are constructed at each instance, compared, and then the local search tries to find a better neighboring solution for the best current solution.
  • FIG. 6 The process flow of FIG. 6 can be performed by hardware or software modules as shown in FIG. 7 in one embodiment. As with FIG. 5, these modules can be implemented in hardware as discrete or blended functional blocks of ASIC, DSP, or other circuitry. In another embodiment, these modules can be implemented in software on a computer system. As shown in FIG. 7, an ant construction module 703 generates one or more solutions based on the provided problem constraints. In one embodiment, the solutions are then applied to a local search module 705. This module searches for neighboring solutions that may produce locally better results. In one embodiment, the best local solution selected by the local search module 705 can be fed back to the ant construction module so that a complete solution can be constructed and simulated.
  • an ant construction module 703 generates one or more solutions based on the provided problem constraints. In one embodiment, the solutions are then applied to a local search module 705. This module searches for neighboring solutions that may produce locally better results. In one embodiment, the best local solution selected by the local search module 705 can be fed back to the ant construction module so
  • each solution is simulated in the ant construction module
  • pheromones are updated and stored in a memory 707.
  • each new solution is compared to the current best solution, and pheromones are updated based on that comparison.
  • the pheromones can then be used by the ant construction module to build and simulate solutions and by the local search module to help guide the local search.
  • the entire system described in FIG. 7 corresponds to the solution simulator 505 of FIG. 5.
  • TCS time constrained scheduling
  • the ants select a solution based on the costs in the local search and the costs in the pheromones. Adjusting these costs can change the behavior of the ants. However, these costs are also used to select the best solution, so any adjustment to the costs should consider its impact on the final design solution choice.
  • a virtual cost factor is added to the actual cost.
  • the virtual cost factor is designed to change the shape of the solution space.
  • the supplemental virtual cost can be used instead of the improved randomization techniques or as an addition to it, depending on the application.
  • the virtual cost can be used to guide the ants, but not to select a solution. Separating this virtual cost from actual costs can guide the ants within a plain of solutions without affecting the final design choice.
  • the plains within the solution space are caused by the cost function and are determined by how the cost function is traditionally (and naturally) defined.
  • a traditional definition a large set of different but neighboring solutions are expected to have the same maximum number of operations scheduled to the same time step. Since the cost function is expressed as a number of operations, it is an integer and this provides a "terraced landscape" in the solution space. In other words, many neighboring solutions may have the same number of operations in a step, and many other neighboring solutions differ by one in either direction.
  • the cost function does not provide a way to distinguish between different solutions that have the same maximum value for the number of operations.
  • This "terraced landscape" can be contoured in one embodiment with a sub-integer supplemental cost factor.
  • the sub-integer cost factor can give values between the integer steps in order to give a "natural continuous slope" to the solution space landscape. This allows the ants to use the sub-integer costs for local navigation and be guided towards lower local levels of cost.
  • supplemental cost factor is incorporated into the actual cost, supplementing the actual cost. This cost can then not be counted as cost for the solution.
  • the supplemental cost factor is virtual in the sense that it is not minimized for the final solution. It is used to enhance navigation. This can be done by using it to compare two candidate solutions whose traditionally defined integer costs are equal. The supplemental cost can then be used to favor a solution which is closer to a better solution.
  • a variety of different costs can be used as a supplemental cost function, such as probabilities, variances, co- variances etc.
  • a normalized entropy of the histogram of the operations on the time steps (schedules) is used. With normalized entropy of the histogram incorporated into the cost function, the cost for purposes of the pheromones can be calculated as the real cost (maximum number of operations per time step) minus the normalized entropy of the histogram.
  • the ants' search at local optima can be inhibited from stagnating by incorporating this supplemental virtual cost factor into the traditional integer cost function.
  • a high level pseudo code of a basic ACO algorithm such as that of FIG. 6 can be presented as follows:
  • the incremental value Ay can be determined as follows:
  • Cs Cost of the solution
  • HD histogram array
  • the cost can be determined in one embodiment the manner below:
  • X as the time steps, an integer from 0 to tmax (where tmax is the maximum number of time steps or time slots)
  • One solution might have the following schedule of operations (0, 2, 1, 0, 3, 3, 1, 1, 1, 0) where each number corresponds to an operation, and the numerical value corresponds to its timeslot.
  • the operations may be scheduled so that:
  • the entropy (E n ) then becomes the sum for each of the HD values of (log (P(k)))(
  • the P(k) sequence is (0.3, 0.4, 0.1 , 0.2).
  • FIG. 8 shows one embodiment of a process flow diagram for calculating a virtual cost.
  • a histogram array of time steps is created. This histogram corresponds, in one embodiment, to the histogram array identified as capital HD in the example above.
  • the maximum value of HD is determined.
  • the number of operations is determined. In one embodiment, this is assigned capital value N.
  • the probability of reinforcement (P) is calculated. The value of P is determined as HD/N.
  • the entropy can be normalized based on the maximum value for HD, in one embodiment.
  • the entropy may be normalized based on another value, in another embodiment.
  • the cost can be determined as a combination of an actual cost and a supplementary cost. In one embodiment, this cost can then be used in the local search to further enhance the selection of solutions. In one embodiment, as shown in the diagram of FIG. 7, the local search 705 can be enhanced with a supplementary cost that is used in a solution simulator for designing an integrated circuit.
  • the design of an integrated circuit can be enhanced using a supplementary cost.
  • the operations from a high level description or some other source are identified and the hardware components for executing these operations are determined. This can be done with a data flow graph or in a variety of other ways. Given the operations and hardware components, a variety of different solutions are simulated for performing these operations.
  • the solutions are typically represented as hardware component combinations and interconnections, represented as paths on a data flow graph for each solution.
  • a cost is determined and this cost can include not only the number of edge and nodes traversed on a data flow graph, but also the supplemental sub- integer cost such as entropy described above.
  • the optimal solution can then be selected as the solution with the actual lowest cost.
  • the supplemental cost is not included in this selection.
  • the supplemental cost is sub-integer and therefore need not be excluded.
  • the supplemental cost can be used in one embodiment for supplementing pheromone values in an ant colony optimization technique.
  • a circuit can be designed so that the same hardware components can be used by different operations in different time steps. Paths are folded back to the same hardware component when the HLD is transformed through HLS. This allows the total number of hardware components to be reduced. Folding transformation allows hardware units of a system to be shared among multiple operations of the behavioral descriptions by time multiplexing. In other words, processes are folded back to a single hardware component, so that the component serves different parts of different processes at different times.
  • Folding depends upon the scheduling of operations and the binding of operations to particular hardware components. Scheduling can be considered to be a pre-process for folding and binding can be considered a primary sub-process of folding. For each operation, a scheduling algorithm can determine a time step at which the operation is executed and a binding algorithm can determine a hardware unit upon which the operation is executed.
  • the interconnection cost in one embodiment includes routing registers and the multiplexing logic to route data from one operation to another.
  • results from ACO can be improved by adding some functions to the basic ACO routine described, for example, in the context of FIG. 6.
  • an interconnection cost function, a guiding function, and a local search neighbor selection function are described. These functions, in one embodiment, are combined to better consider interconnections when adding folding to a circuit design. While all three functions work well together, any one or more of the three functions can be used without the others depending on the particular application.
  • the interconnection cost function is related to the number of pairs of candidate folding edges and folding weights.
  • the guiding function is related to a density function (ED) based on the probability of a candidate folding edge and folding weight pair in an unscheduled netlist.
  • the neighbor selection function is related to the change of this density in edges connected to neighboring solutions.
  • This density function can be referred to as an edge density (ED) because it is defined for edges.
  • the density can be used to analyze and compare the numbers of edges of different solutions.
  • the actual interconnection cost occurs as a result of the communication buses, registers, timing gates, multiplexers, and similar components that are required to interconnect the hardware components of the circuit. Any circuit with an input and an output will have some cost for making connections. However, with folding, the number of hardware components required can be decreased but the interconnection cost can be significantly increased. The examples below are described in the context of solutions with folding, but can also be adapted to other types of circuit simulation.
  • the interconnection cost is a real cost incurred in any circuit, as mentioned above. However, at the scheduling phase, the actual interconnection cost cannot yet be determined. The actual interconnection cost depends upon the binding results which are not known until after scheduling is determined. An estimate can be made at the scheduling phase and this can be used in an ACO context to guide the selection of candidate solutions and also to guide the final selection of a solution. In this way, interconnection cost is considered even if it is not precisely determined. In the context of FIG. 6, in one embodiment the estimated interconnection cost can be used to select local solutions at 603, and can also be used to enhance the effectiveness of the pheromones at 605.
  • the interconnection cost can be estimated using a candidate folded edge (cfe) and a folding weight (fw).
  • the number of different (cfe, fw) pairs can be taken as an estimate of the cost of the interconnection from multiplexing and other sources.
  • the cfe is a candidate edge from a data flow graph in the final folded design.
  • a folded weight (fw) is the weight (w) of an edge (e) in the folded design and it is determined according to the folding formulation. This weight can be used as a weight factor to scale the interconnection cost when it is added to the scheduling cost.
  • the weight is determined by the number of registers or delay states on the respective edge. This weight (w) corresponds to the weight w discussed above with respect to creating the netlist.
  • Folding can be viewed as a function or a transformation which transforms a base design to a folded design.
  • the aim of this transformation is to reduce the number of hardware components. This typically reduces the design area or the amount of space required for all of the components of the circuit.
  • the circuit design as shown in FIGS. 2 and 3 can be represented as a data flow graph or netlist structure with (V, E, w).
  • the netlist is a list of the logic gates of a circuit and their interconnections. It can be represented as a data flow graph.
  • V is the set of nodes v.
  • a node in the base design before operations are bound to hardware refers to an operation.
  • the nodes refer to hardware units (HU).
  • E is the set of edges e. An edge is a connection from one output port of an operation to an input port of another operation as shown in FIGS. 2 and 3.
  • Scheduling in one embodiment determines the time step when each operation is executed. The time step assigned to an operation is called the schedule of the operation. Binding determines the hardware unit in which the scheduled operation is executed. If the scheduling is determined then the weight (fw) of the edge for the folded netlist can be calculated for a particular edge e, which is a part of the set of edges E (e e E), using a function referred to herein as FW.
  • FW(e) : w(e) * foldingFactor + schedule(e.targetOperation) - schedule(e.sourceOperation)
  • schedule (operation) refers to the time step at which an operation is scheduled to be performed. This is typically indicated by an integer count of the sequence of time steps.
  • a candidate folded edge (cfe) can be defined.
  • An edge definition for an edge of a final folded design netlist can be defined as being a connection from one hardware unit to another hardware unit. This is what is shown in e.g. FIG. 3.
  • the hardware units are not yet bound to any operations, so the cfe is defined by source and destination hardware unit types instead. In other words, the cfe is a pair (source hardware unit type, destination hardware unit type).
  • edge to candidate folding edge (e2cfe)
  • E -> CFE edge to candidate folding edge
  • the e2cfe function is determined based on the operations between the source operation and the destination operation on either side of a candidate folding edge.
  • the number of different (cfe, fw) pairs can be used as an interconnection cost function. In one embodiment, the number of different (cfe, fw) pairs can be used as an estimate of an actual interconnection cost.
  • CFE_FW is a set of individual (cfe, fw) pairs.
  • the total interconnection cost can then be estimated as follows:
  • Total cost Cs + interconnection cost, where T e set of hardware unit types.
  • the interconnection cost can be used in the solution construction phase of the ACO. This is shown, in one embodiment, in FIG. 6 as constructing a solution for each ant, 602.
  • a guiding function can be used in this phase to guide the construction of the solution.
  • a variety of different functions can be used. In one embodiment, described below, a heuristic value is used to guide the ants when they are constructing a solution.
  • Another density function (ND) can be defined which gives the probability of the realization of a candidate folding edge, folding weight (cfe, fw) pair in an unscheduled netlist. This density can be referred to as a node density.
  • uniformity is improved using the node density function, but in the interconnection cost case, all density is collected on some points which is the inverse of uniformity.
  • the maximum of the node density value for an edge can be used as the heuristic value.
  • each ant generates a schedule solution.
  • probabilities of choices are determined by the strength of the pheromones on a particular portion of the path. These probabilities can be modified by the guiding function. This guiding function accommodates the interconnection cost by guiding the ants to a schedule which generates the most frequently used (cfe, fw) pairs.
  • ASAP is a function which gives the minimum feasible schedule value for a given operation.
  • ASAP can be determined as the earliest schedule for an operation which does not contradict with feasibility constraints. For example, any values used in an operation must be generated prior to the operation taking place.
  • ALAP is a function which gives the maximum feasible schedule value for a given operation. ALAP can be determined as the latest schedule for an operation which does not contradict with feasibility constraints. For example, if the results of an operation are used by a subsequent operation, the operation must occur prior to that subsequent operation.
  • ND[index] ND[index] + 1 / (maxFW - minFW + 1 )
  • a guiding function which determines the heuristic value of setting the schedule of an operation to a particular selected schedule (sched) can be determined in one embodiment as provided in the pseudocode below.
  • heuristicValue MAX(heuristicValue, ND[index])
  • the total heuristic value can be calculated as:
  • the heuristic value for each interconnection is calculated, in one embodiment.
  • the local search 603 of FIG. 6 can be improved by considering the interconnection cost.
  • a significant part of the local search is to select a particular neighbor to compare against. Calculating the cost for all of the possible neighboring solutions can be complex and time-consuming.
  • a neighbor selection function can produce similar results more simply and in less time.
  • the neighbor selection function uses the change in density of the edges that connect an operation. The neighbor selection function seeks to have more use of each interconnection. As a result, there may be fewer total interconnections in the final design. This is represented as a density value (ID).
  • local search starts with a current or a best solution and searches for a better solution by evaluating neighboring solutions and moving to the best neighboring solution.
  • Neighbors can be defined in different ways. One simple definition that will be used here for illustrative purposes is that if the only difference between solution A and solution B is a schedule of one operation then A and B are neighbors. In other words, solution A can be achieved by changing the scheduling of only one operation in solution B.
  • this function is a density function (ID), defined from a (cfe, fw) pair to a double (CFE x int -> double). If all the schedules are determined, the density function gives an integer value which shows how many base design edges are mapped to a (cfe, fw) pair. Since in the context of local search all the schedules are determined, in one embodiment densities are integer values.
  • the output of the density function (ID) is a higher precision floating point number such as a double value, an integer, or a standard floating point decimal. This is described above in the context of defining the node density function.
  • a hash function (h) can be used in one embodiment to generate a unique index for each (cfe, fw) pair.
  • the particular hash function can be selected based on the particular application and the level of precision desired.
  • the output of h(cfe, fw) is an integer from 0 up to 2*IEI*FoldingFactor.
  • the FoldingFactor is a given value which defines the maximum possible number of operations shared by a single hardware unit.
  • I Density values (ID) for a schedule solution can be calculated as described in the following pseudocode example:
  • a selection function for changing an operation (o) schedule to a new schedule (newSched) can be represented in one embodiment as pseudocode as follows:
  • selectionValue selectionValue + direction * (maxDensity - MIN(preDensity, postDensity))
  • FIG. 9 is a process flow diagram of one embodiment of estimating an interconnection cost.
  • the process in one embodiment, corresponds to the pseudocode representation described above.
  • the interconnection cost can be used, as mentioned above, for updating pheromones and for selecting neighbor solutions, for example in the process flow of FIG. 6.
  • candidate folding edges are determined for each edge in a data flow graph for a potential solution.
  • the source and target operations for each candidate folding edge (cfe) are determined.
  • a folding weight (fw) is determined for each candidate folding edge using the source and target operations.
  • an interconnection cost can be determined for each edge of a solution based on the number of cfe, fw pairs associated with the edge.
  • the interconnection cost can be weighted for each edge using the folding weight for that edge.
  • the total interconnection cost is determined by adding up the values for all of the edges that are traversed for the solution.
  • these operations can be applied to the general integrated circuit design process of FIG. 2 in selecting hardware component combinations 210. In the ant colony optimization of FIG. 6 in one embodiment these operations can be applied to updating pheromones as well as in the local search.
  • the use of an interconnection cost can begin with a high level description which includes one or more operations to be performed by the circuit that is being designed.
  • a data flow graph or some other representation can be used to represent the hardware components that will be performing the operations. Different solutions are then simulated for performing the operations in the HLD. These solutions can be simulated as hardware component and schedule combinations.
  • a data flow graph in one embodiment the
  • a cost is determined that includes, for example, the number of edges and nodes traversed on the data flow graph.
  • This cost can be augmented with the interconnection cost, determined with the process flow diagram of Fig 9 for example.
  • the interconnection cost is related to the number of different hardware components in the path.
  • a pheromone trail can also be associated with each path which includes a cost of the respective scheduling solution.
  • the solution with the highest value pheromone trail can then be selected as a hardware and schedule combination for the circuit. As indicated in FIG. 2, this can be repeated until all of the operations are scheduled and bound to hardware.
  • the candidate folding edges of Fig. 9 provide a way to represent the steps for each solution.
  • each candidate folding edge can have a source hardware type paired with a destination hardware type and be represented as an edge on the data flow graph.
  • the interconnection cost is related to the number of different hardware components in the path.
  • a pheromone trail can also be associated with each path which includes a cost of the respective scheduling solution.
  • interconnection cost can be weighted by the number of different types of hardware units used by the solution. In one embodiment, this weight can represent the number of different types of hardware units as a ratio of the number of hardware types for one solution to the total number of different hardware types in the data flow graph.
  • the interconnection cost can also be weighted by the number of registers used to perform the simulated solution. In one embodiment, the interconnection cost can further be weighted by a folding factor that is related to the reuse of hardware resources. In one embodiment, the interconnection cost can further be weighted by a number of time steps to perform the simulated solution.
  • a guiding function in one embodiment can be determined using the process flow diagram of FIG. 10.
  • the source and target operations are determined for each candidate folding edge.
  • the folding weight is determined for each candidate folding edge. These operations are similar to the operations 901 and 903 of FIG. 9 and in one embodiment the same values can be used reducing calculation steps and the complexity of the overall solution.
  • an index can be determined for edges of the data flow graph using the number of (cfe, fw) pairs for each edge.
  • An index is a unique value, in one embodiment determined using a hash function.
  • the values for the current edge are compared to values for neighboring edges.
  • this comparison can be used to populate a histogram array of time steps for the edges.
  • the maximum and minimum feasible schedule values are determined using the histogram. This maximum and minimum can represent the highest and lowest number of time steps for the edges of each solution.
  • these determined schedule values can be used to select the next solution to simulate.
  • the comparison of the determined schedule values can be used, in one embodiment, to guide the selection of the next solution in a local search, such as the one shown in FIG. 6.
  • Such a local search in FIG. 2 can in one embodiment guide the determination of which hardware component combination to simulate next, as shown in FIG. 2 at 210.
  • the guiding function of FIG. 10 can be applied to an overall circuit design process as in FIG. 2 by first selecting an operation to be formed by the circuit to be designed.
  • the operation including any partial operations can be represented with nodes on a data flow graph for each of the hardware components performing the operations. Edges can be used for the paths between components. Solutions can then be simulated for performing these operations as hardware component and schedule combinations and represented as particular paths on the data flow graph.
  • a cost can be determined for each solution which includes for example a number of edges and nodes traversed on the path and any other additional or supplemental costs.
  • a pheromone trail can be associated with each path.
  • additional solutions are simulated that neighbor the previous solutions.
  • These solutions can be selected using a neighbor selection function, such as the one discussed with respect to FIG. 10 which is based on a number of operations performed by hardware components that neighbor the hardware units used by a solution.
  • a solution with the lowest cost or a low cost can be selected for the integrated circuit design.
  • the neighbor selection function can be designed to compare the number of operations performed using different schedules that start at different edges on the data flow graph to perform the same operation.
  • This function can be a function of the edge density, or the density of folding operations for each edge that neighbors the initial edge of a respective solution.
  • the next solution to be selected in the local search can be a solution which maximizes the density function that presents the greatest positive change, or presents the greatest difference in the density.
  • the neighbor selection function can determine an index for each edge of the graph based on the number of operations in a particular solution and the amount of folding for each included edge. Then the next solution to be selected can be one that has the highest index of the candidates considered.
  • FIG. 11 shows a process flow diagram of one embodiment for determining a neighbor selection function.
  • This function can be used in the local search 603 of an ant colony optimization for example.
  • a histogram array of time steps is determined.
  • source and target operations are determined for each candidate folding edge.
  • the folding is determined and candidate folding edges are presented.
  • FIGS. 9 and 10 the processes described above for FIGS. 9 and 10 may be used to do this.
  • a folding weight is determined for each candidate folding edge and, at 1107, indices are determined for the edges of the data flow graph. These indices can be determined using the number of (cfe, fw) pairs for each edge.
  • the index for a current edge is compared to indices for neighboring edges and at 1111, using this comparison, the neighboring edge with the highest index can be selected as the next solution to simulate. This process can be repeated to evaluate additional solutions.
  • an integrated circuit design can be augmented with a guiding function.
  • the operations to be performed by the integrated circuit design are characterized, for example, using high level description and the hardware components for performing this operation can be represented on a data flow graph with edges between the hardware components.
  • the guiding function can be used to select from among different solutions for performing the operations.
  • the solutions similar to those described above with respect to FIGS. 9 and 10, can be represented as hardware components and schedule combinations represented on the graph.
  • the costs for each simulation are determined and then a solution with the lowest cost is selected.
  • the guiding function can be related to the amount of hardware reuse on an edge of the data flow graph for the particular solution. This can be combined with pheromone trails to select a solution with a lower cost.
  • a register refers to a sequential element in general (e.g., a delay element, a memory cell, a flip-flop, or others).
  • a register samples and holds (stores) the input signal so that it can be output in synchronization with the clock of the circuit.
  • one delay on an edge of a data flow graph represents a unit of latency typically introduced by the presence of a register on the corresponding path.
  • the unit of latency can also be introduced through other means, such as different control signals for reading a memory cell, multiplexers, dividers, or path delays.
  • a digital processing system such as a conventional, general-purpose computer system.
  • Special purpose computers which are designed or programmed to perform only one function, may also be used.
  • FIG. 12 shows one example of a typical computer system which may be used with the disclosed embodiments.
  • the processes described with respect to FIGS. 1-4, 6, and 8-11 are operational through the example computing system.
  • the modules described in FIGS. 5 and 7 are configurable in a data processing system structured similar to the example computing system.
  • FIG. 12 illustrates various components of a computer system, it is not intended to represent any particular architecture or manner of interconnecting the components but rather provides an example representation of how the components and architecture may be configured.
  • network computers and other data processing systems which have fewer components or perhaps more components may also be used with the disclosed embodiments.
  • the computer system of FIG. 12 may be any computing system capable of performing the described operations.
  • the computer system 1201 which is a form of a data processing system, includes a bus 1202 which is coupled to a
  • computer system 1201 includes one or more of a read only memory (ROM) 1207, volatile memory (RAM) 1205, and a non-volatile memory (EEPROM, Flash) 1206.
  • ROM read only memory
  • RAM volatile memory
  • EEPROM non-volatile memory
  • the microprocessor 1203 is coupled to cache memory 1204 as shown in the example of FIG. 12.
  • Cache memory 1204 may be volatile or non- volatile memory.
  • the bus 1202 interconnects these various components together and in one embodiment interconnects these components 1203, 1207, 1205, and 1206 to a display controller and display device 1208.
  • the computer system 1201 may further include peripheral devices such as input/output (I/O) devices which may be mice, keyboards, modems, network interfaces, printers, scanners, video cameras and other devices which are well known in the art.
  • I/O input/output
  • the input/output devices 1210 are coupled to the system through input/output controllers 1209.
  • the volatile RAM 1205 is typically implemented as dynamic RAM (DRAM) which requires power continually in order to refresh or maintain data in the memory.
  • the non-volatile memory 1206 is typically a magnetic hard drive, magnetic optical drive, an optical drive, a DVD RAM, a Flash memory, or other type of memory system which maintains data even after power is removed from the system.
  • the non-volatile memory will also be a random access memory although this is not required.
  • FIG. 12 shows that the non-volatile memory is a local device coupled directly to the rest of the components in the data processing system, it will be appreciated that the disclosed embodiments may utilize a non- volatile memory which is remote from the system, such as a network storage device which is coupled to the data processing system through a network interface such as a modem or Ethernet interface.
  • a network storage device which is coupled to the data processing system through a network interface such as a modem or Ethernet interface.
  • the bus 1202 may include one or more buses connected to each other through various bridges, controllers and/or adapters as is well known in the art.
  • the I/O controller 1209 includes a USB (Universal Serial Bus) adapter for controlling USB peripherals, and/or an IEEE- 1394 bus adapter for controlling IEEE-1394 peripherals.
  • USB Universal Serial Bus
  • aspects of the disclosed embodiments may be embodied, at least in part, in software (or computer- readable instructions). That is, the techniques, for example the processes of FIGS. 1-4, 6, and 8-11, may be carried out in a computer system or other data processing system in response to its processor, such as a microprocessor, executing sequences of instructions contained in a memory, such as ROM 1207, volatile RAM 1205, non-volatile memory 1206, cache 1204 or a remote storage device.
  • ROM 1207 read-only memory
  • volatile RAM 1205 volatile RAM 1205
  • non-volatile memory 1206, cache 1204 or a remote storage device e.g., hardwired circuitry may be used in combination with software instructions to implement the disclosed embodiments.
  • the techniques are not limited to any specific combination of hardware circuitry and software nor to any particular source for the instructions executed by the data processing system.
  • various functions and operations are described as being performed by or caused by software code to simplify description.
  • a machine readable medium can be used to store software and data which when executed by a data processing system causes the system to perform various methods of the disclosed embodiments.
  • This executable software and data may be stored in various places including for example ROM 1207, volatile RAM 1205, non-volatile memory 1206 and/or cache 1204 as shown in FIG. 12. Portions of this software and/or data may be stored in any one of these storage devices.
  • a machine readable medium includes any mechanism that stores any information in a form accessible by a machine (e.g., a computer, network device, personal digital assistant, manufacturing tool, any device with a set of one or more processors, etc.).
  • a machine readable medium includes recordable/non-recordable media (e.g., read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Computer Hardware Design (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Manufacturing & Machinery (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
PCT/US2011/050081 2010-09-30 2011-08-31 Method and apparatus for using entropy in ant colony optimization circuit design from high level systhesis WO2012050678A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP11832913.5A EP2622549A4 (de) 2010-09-30 2011-08-31 Verfahren und vorrichtung zur verwendung einer entropie zum entwurf einer optimierungsschaltung für eine ameisenkolonie aus einer synthese hoher ordnung
CN2011800476031A CN103140853A (zh) 2010-09-30 2011-08-31 在根据高级综合的蚁群优化电路设计中使用熵的方法和装置

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US12/894,842 2010-09-30
US12/894,842 US8296712B2 (en) 2010-09-30 2010-09-30 Method and apparatus for improving the interconnection and multiplexing cost of circuit design from high level synthesis using ant colony optimization
US12/894,902 US8296713B2 (en) 2010-09-30 2010-09-30 Method and apparatus for synthesizing pipelined input/output in a circuit design from high level synthesis
US12/894,756 2010-09-30
US12/894,902 2010-09-30
US12/894,756 US8296711B2 (en) 2010-09-30 2010-09-30 Method and apparatus for using entropy in ant colony optimization circuit design from high level synthesis

Publications (1)

Publication Number Publication Date
WO2012050678A1 true WO2012050678A1 (en) 2012-04-19

Family

ID=45938606

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/050081 WO2012050678A1 (en) 2010-09-30 2011-08-31 Method and apparatus for using entropy in ant colony optimization circuit design from high level systhesis

Country Status (3)

Country Link
EP (1) EP2622549A4 (de)
CN (1) CN103140853A (de)
WO (1) WO2012050678A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115032997A (zh) * 2022-06-22 2022-09-09 江南大学 一种基于蚁群算法的第四方物流运输路径规划方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765667B (zh) * 2015-04-17 2018-09-28 西安电子科技大学 一种基于蚁群算法的fpga程序脆弱支路求取方法
CN108710587B (zh) * 2018-06-04 2021-03-26 中国电子科技集团公司第十四研究所 基于axi总线的信号处理fpga通用处理架构系统
CN111523698B (zh) * 2020-03-20 2023-08-08 全球能源互联网集团有限公司 一种用于清洁能源基地宏观选址的蚁群选址方法及装置

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010029600A1 (en) * 2000-03-30 2001-10-11 Hitachi, Ltd. Storage media being readable by a computer, and a method for designing a semiconductor integrated circuit device
US20040143560A1 (en) * 2003-01-20 2004-07-22 Chun Bao Zhu Path searching system using multiple groups of cooperating agents and method thereof
US20070208677A1 (en) * 2006-01-31 2007-09-06 The Board Of Trustees Of The University Of Illinois Adaptive optimization methods
US7284228B1 (en) * 2005-07-19 2007-10-16 Xilinx, Inc. Methods of using ant colony optimization to pack designs into programmable logic devices
US20080228764A1 (en) * 2004-07-27 2008-09-18 Srikanth Soogoor Hypercube topology based advanced search algorithm
US20080244500A1 (en) * 2006-11-13 2008-10-02 Solomon Research Llc System, methods and apparatuses for integrated circuits for nanorobotics
US20090052321A1 (en) * 2007-08-20 2009-02-26 Kamath Krishna Y Taxonomy based multiple ant colony optimization approach for routing in mobile ad hoc networks
US20090070550A1 (en) * 2007-09-12 2009-03-12 Solomon Research Llc Operational dynamics of three dimensional intelligent system on a chip
US20090089035A1 (en) * 2007-07-07 2009-04-02 Solomon Research Llc Hybrid multi-layer artificial immune system
US20100125820A1 (en) * 2008-11-14 2010-05-20 Ispir Mustafa Unfolding algorithm in multirate system folding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8989349B2 (en) * 2004-09-30 2015-03-24 Accuray, Inc. Dynamic tracking of moving targets
JP4488027B2 (ja) * 2007-05-17 2010-06-23 ソニー株式会社 情報処理装置および方法、並びに、情報処理システム

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010029600A1 (en) * 2000-03-30 2001-10-11 Hitachi, Ltd. Storage media being readable by a computer, and a method for designing a semiconductor integrated circuit device
US20040143560A1 (en) * 2003-01-20 2004-07-22 Chun Bao Zhu Path searching system using multiple groups of cooperating agents and method thereof
US20080228764A1 (en) * 2004-07-27 2008-09-18 Srikanth Soogoor Hypercube topology based advanced search algorithm
US7284228B1 (en) * 2005-07-19 2007-10-16 Xilinx, Inc. Methods of using ant colony optimization to pack designs into programmable logic devices
US20070208677A1 (en) * 2006-01-31 2007-09-06 The Board Of Trustees Of The University Of Illinois Adaptive optimization methods
US20080244500A1 (en) * 2006-11-13 2008-10-02 Solomon Research Llc System, methods and apparatuses for integrated circuits for nanorobotics
US20090089035A1 (en) * 2007-07-07 2009-04-02 Solomon Research Llc Hybrid multi-layer artificial immune system
US20090052321A1 (en) * 2007-08-20 2009-02-26 Kamath Krishna Y Taxonomy based multiple ant colony optimization approach for routing in mobile ad hoc networks
US20090070550A1 (en) * 2007-09-12 2009-03-12 Solomon Research Llc Operational dynamics of three dimensional intelligent system on a chip
US20100125820A1 (en) * 2008-11-14 2010-05-20 Ispir Mustafa Unfolding algorithm in multirate system folding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2622549A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115032997A (zh) * 2022-06-22 2022-09-09 江南大学 一种基于蚁群算法的第四方物流运输路径规划方法

Also Published As

Publication number Publication date
EP2622549A4 (de) 2014-04-23
CN103140853A (zh) 2013-06-05
EP2622549A1 (de) 2013-08-07

Similar Documents

Publication Publication Date Title
US8296711B2 (en) Method and apparatus for using entropy in ant colony optimization circuit design from high level synthesis
US8296712B2 (en) Method and apparatus for improving the interconnection and multiplexing cost of circuit design from high level synthesis using ant colony optimization
US7162704B2 (en) Method and apparatus for circuit design and retiming
Sun et al. FPGA pipeline synthesis design exploration using module selection and resource sharing
JP2009519528A (ja) 統計的タイミング解析におけるクリティカリティ予測のシステム及び方法
US8296713B2 (en) Method and apparatus for synthesizing pipelined input/output in a circuit design from high level synthesis
WO2012050678A1 (en) Method and apparatus for using entropy in ant colony optimization circuit design from high level systhesis
Prost-Boucle et al. A fast and autonomous HLS methodology for hardware accelerator generation under resource constraints
EP3805995A1 (de) Verfahren und vorrichtung zur datenverarbeitung eines tiefen neuronalen netzwerks
Wang et al. A survey of FPGA placement algorithm research
US20070028198A1 (en) Method and apparatus for allocating data paths to minimize unnecessary power consumption in functional units
CN116795508A (zh) 一种平铺加速器资源调度方法及系统
Goswami et al. Mlsbench: A benchmark set for machine learning based fpga hls design flows
Meribout et al. A-combined approach to high-level synthesis for dynamically reconfigurable systems
Ohm et al. A comprehensive estimation technique for high-level synthesis
JP4083491B2 (ja) モジュール間インタフェースの自動合成装置、合成方法、プログラム及び可搬記憶媒体
Bytyn et al. Dataflow aware mapping of convolutional neural networks onto many-core platforms with network-on-chip interconnect
WO2005114500A1 (en) Method and apparatus for allocating data paths
Nielsen et al. Towards behavioral synthesis of asynchronous circuits-an implementation template targeting syntax directed compilation
Minnen CNN Accelerator Throughput Improvement using High-Level Synthesis for FPGA
Thepayasuwan et al. Hardware-software co-design of resource constrained systems on a chip
Ryan FPGA Hardware Accelerators-Case Study on Design Methodologies and Trade-Offs
Liu COMPILING APPLICATIONS TO RECONFIGURABLE PUSH-MEMORY ACCELERATORS
TW200915123A (en) Architectural physical synthesis
Temmerman et al. Optimizing data structures at the modeling level in embedded multimedia

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180047603.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11832913

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2011832913

Country of ref document: EP