CN116776816A - Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy - Google Patents

Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy Download PDF

Info

Publication number
CN116776816A
CN116776816A CN202310676747.8A CN202310676747A CN116776816A CN 116776816 A CN116776816 A CN 116776816A CN 202310676747 A CN202310676747 A CN 202310676747A CN 116776816 A CN116776816 A CN 116776816A
Authority
CN
China
Prior art keywords
merging
buffer
clock
subtrees
power consumption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310676747.8A
Other languages
Chinese (zh)
Inventor
俞文心
丁劲皓
文茄汁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202310676747.8A priority Critical patent/CN116776816A/en
Publication of CN116776816A publication Critical patent/CN116776816A/en
Pending legal-status Critical Current

Links

Landscapes

  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

The invention discloses a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which comprises the following steps: input, bottom-up phase, top-down phase, output. The invention belongs to the technical field of low-power consumption clocks, and particularly provides a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which uses a bottom-up insertion strategy in a buffer insertion algorithm to avoid the influence of a buffer on clock deviation to the greatest extent and uses a bidirectional width priority search algorithm to perform clock obstacle avoidance wiring.

Description

Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy
Technical Field
The invention belongs to the technical field of low-power-consumption clocks, and particularly relates to a low-power-consumption clock tree comprehensive implementation method based on a deferred merging strategy.
Background
The clock is a very important part of digital chip design, the clock signal is a reference for data transmission, and it plays a decisive role in the function, performance and stability of synchronous digital systems, so the characteristics of the clock signal and its distribution network are of particular interest. The clock signal is continuously attenuated in the transmission process, and if the signal intensity is too weak, the clock period of each register cannot be synchronized, so that the performance of the chip is seriously affected. By inserting a buffer to enhance the clock signal, the transition time of the signal is shortened. At the same time, inserting buffers also reduces latency. The concept of clock tree synthesis refers to designing a clock tree with zero clock skew and automatically inserting buffers along the designed clock path to balance all clock delays and meet the required clock transition times. The clock signal must ensure that under worst-case conditions, timing requirements such as clock skew (clock), clock transition time (slew) can be met, otherwise any improper control of the clock signal may lead to a derangement condition, latching the erroneous data signal into the registers, and thus causing errors in system function. 30% or more of the total dynamic power consumption in today's chip designs is consumed on the clock network, and thus reducing the total power consumption of the clock tree is also one of the important goals. Therefore, the design of an efficient clock tree synthesis method is important to the physical design of the back end of the whole chip.
The topology structure of a clock tree is designed firstly, the quality of the clock tree is influenced by clock merging strategies, and the clock trees generated by different merging strategies are quite different. The traditional merging strategy is to merge two subtrees with the nearest Manhattan distance preferentially, but when the delays of the two subtrees are greatly different, the serpentine routing is increased after merging. And excessive meandering lines increase interconnect length, thereby increasing power consumption. Moreover, the snake walking line is greatly affected by process fluctuation, so that uncertainty of clock delay is increased, and finally clock deviation is increased. So to reduce the serpentine routing we cannot consider only the manhattan distance when designing the merge strategy, but all the resources needed for the merge, we define the resources as the sum of interconnect line capacitance and buffer capacitance in this invention.
We meet the clock transition time constraint by inserting a buffer. The conventional method often performs buffer insertion after the topology of the clock tree is constructed, which easily causes clock skew fluctuation, and thus affects the performance of the whole clock tree. Therefore, the invention adopts the algorithm of simultaneous buffer insertion and clock tree merging, thereby avoiding affecting clock deviation.
The clock source and the clock pins need to be connected by interconnecting wires. In the clock wiring stage, macro cells and the like placed in the layout stage are used as barriers, interconnection lines are allowed to coincide with the barriers, and buffers are not allowed to coincide with the barriers, so that the conventional method is to completely avoid the barriers during wiring, but the line length may be increased, and power consumption is further increased. The invention designs an algorithm which can reduce the line length and avoid the buffer from the obstacle.
Disclosure of Invention
In order to solve the existing problems, the invention provides a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which uses a bottom-up insertion strategy in a buffer insertion algorithm, avoids the influence of a buffer on clock bias to the greatest extent, and uses a bidirectional width priority search algorithm to perform clock obstacle avoidance wiring.
The technical scheme adopted by the invention is as follows: the invention discloses a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which comprises the following steps:
step 1: in order to equalize clock propagation delay from a root node to a leaf node, the invention constructs the clock tree by using a deferred merging algorithm (DME), wherein the algorithm obtains all merging segments (mergingsegments) in a bottom-up merging mode in a first stage, namely all possible merging point positions, and determines the accurate position of each merging point in a second stage after the position of the root node is determined, so that the length of an interconnection line is shortest and the power consumption is lowest;
step 2: at merging, if the buffer and the obstacle may coincide, the merging segment is found again;
step 3: in order to meet the clock transition time constraint and make the delay balance not affected by the inserted buffers, after each merging pair of subtrees is completed and the corresponding merging segment is generated, the buffer insertion operation is performed on the layer, but the specific positions of the buffers are not determined at this time, only the distance between the buffers is determined until the final position of each merging point is determined in the second stage of DME, and then the specific positions of the buffers are determined.
Further, step 1 comprises the steps of:
step 1.1, when searching subtrees to be merged (the subtrees of the first round are all triggers), constructing a delaunay triangle split map for all subtrees. Delaunay triangulation can reduce the time complexity of finding nearest neighbors from O (n) to O (nlogn);
step 1.2, calculating the weight of each side in the Delaunay triangle splitting diagram, wherein the weight is expressed as the sum of the interconnection line capacitance and the buffer capacitance required by merging two sub-trees corresponding to the side;
and 1.3, rapidly sequencing all edges, finding one edge with the minimum weight, merging two sub-trees corresponding to the edge to generate a corresponding merging segment, and adding the merging segment as a newly generated tree into the Delaunay triangle split map.
Further, step 2 comprises the steps of:
step 2.2, constructing a rectangular area on the boundary of the newly generated merging segment and the merging segments of the two subtrees;
step 2.3, if the rectangular area intersects an obstacle, the merging segments are re-found using a bi-directional breadth first search algorithm.
Further, step 3 comprises the steps of:
step 3.1, the delay and output conversion time of the buffer itself depend on the input conversion time and the load capacitance thereof, in the present invention, the input conversion time (inputslew) is a pre-specified value, in order to achieve relatively accurate delay and output conversion time of the buffer, a lookup table of delay and output conversion time of the buffer is established using NGSPICE simulation;
step 3.2, manually giving a distance value, calculating clock conversion time according to a PERI model, and if a clock conversion time violation exists, adjusting the interval distance to be a smaller value until the clock conversion time constraint is met;
step 3.3, changing the size of the buffer, and repeating the step 3.2;
step 3.4, after finding all possible solutions of the buffer, we choose the possible solution with the smallest power consumption;
step 3.5, when the buffer of the layer is determined, the buffer affects the propagation delay of the two subtrees, so that the merging segment position is recalculated from the last buffer of the two subtrees;
and 3.6, iterating the upper layer until each layer of the clock tree is merged, determining the inserted buffer, then executing a second stage of DME, determining the specific position of each merging point and the buffer, wiring, connecting the merging points and the buffer by using interconnection lines, and finally outputting the solution comprehensively generated by the clock tree.
The beneficial effects obtained by the invention by adopting the structure are as follows: according to the low-power consumption clock tree comprehensive implementation method based on the deferred merging strategy, the clock tree comprehensive algorithm ensures that the clock tree has small clock deviation, and the accurate buffer insertion algorithm ensures that the clock tree has small power consumption. The clock tree comprehensive algorithm is applied to reference circuits of ispd09 and ispd10, and compared with similar design methods, clock deviation and total power consumption are remarkably reduced. Clock tree synthesis is an important component of back-end physical design, which is very helpful for final timing convergence and performance during operation of the chip.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a rectangular region of a newly generated merge segment and two sub-tree merge segments of the present invention, wherein FIG. 2 (a) is a rectangular region of a newly generated merge segment and two sub-tree merge segments, where ms_a and ms_b are sub-tree merge segments and ms_v is a newly generated merge segment; FIG. 2 (b) is a schematic diagram of a rectangular region of newly generated merged segments and merged segments of two subtrees with overlapping obstacles;
FIG. 3 is a schematic diagram of a bi-directional breadth-first search algorithm of the present invention;
FIG. 4 is a schematic diagram of a buffer insertion according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The low-power consumption clock tree comprehensive implementation method based on the deferred merging strategy, provided by the scheme, comprises the following steps:
step 1: in order to equalize clock propagation delay from a root node to a leaf node, the invention constructs the clock tree by using a deferred merging algorithm (DME), wherein the algorithm obtains all merging segments (mergingsegments) in a bottom-up merging mode in a first stage, namely all possible merging point positions, and determines the accurate position of each merging point in a second stage after the position of the root node is determined, so that the length of an interconnection line is shortest and the power consumption is lowest;
step 2: at merging, if the buffer and the obstacle may coincide, the merging segment is found again;
step 3: in order to meet the clock transition time constraint and make the delay balance not affected by the inserted buffers, after each merging pair of subtrees is completed and the corresponding merging segment is generated, the buffer insertion operation is performed on the layer, but the specific positions of the buffers are not determined at this time, only the distance between the buffers is determined until the final position of each merging point is determined in the second stage of DME, and then the specific positions of the buffers are determined.
Step 1 comprises the following steps: step 1.1, when searching subtrees to be merged (the subtrees of the first round are all triggers), constructing a delaunay triangle split map for all subtrees. Delaunay triangulation can reduce the time complexity of finding nearest neighbors from O (n) to O (nlogn);
step 1.2, calculating the weight of each side in the Delaunay triangle splitting diagram, wherein the weight is expressed as the sum of the interconnection line capacitance and the buffer capacitance required by merging two sub-trees corresponding to the side;
and 1.3, rapidly sequencing all edges, finding one edge with the minimum weight, merging two sub-trees corresponding to the edge to generate a corresponding merging segment, and adding the merging segment as a newly generated tree into the Delaunay triangle split map.
Step 2 comprises the following steps: step 2.2, constructing a rectangular area on the boundary of the newly generated merging segment and the merging segments of the two subtrees;
step 2.3, if the rectangular area intersects an obstacle, the merging segments are re-found using a bi-directional breadth first search algorithm.
Step 3 comprises the following steps: step 3.1, the delay and output conversion time of the buffer itself depend on the input conversion time and the load capacitance thereof, in the present invention, the input conversion time (inputslew) is a pre-specified value, in order to achieve relatively accurate delay and output conversion time of the buffer, a lookup table of delay and output conversion time of the buffer is established using NGSPICE simulation;
step 3.2, manually giving a distance value, calculating clock conversion time according to a PERI model, and if a clock conversion time violation exists, adjusting the interval distance to be a smaller value until the clock conversion time constraint is met;
step 3.3, changing the size of the buffer, and repeating the step 3.2;
step 3.4, after finding all possible solutions of the buffer, we choose the possible solution with the smallest power consumption;
step 3.5, when the buffer of the layer is determined, the buffer affects the propagation delay of the two subtrees, so that the merging segment position is recalculated from the last buffer of the two subtrees;
and 3.6, iterating the upper layer until each layer of the clock tree is merged, determining the inserted buffer, then executing a second stage of DME, determining the specific position of each merging point and the buffer, wiring, connecting the merging points and the buffer by using interconnection lines, and finally outputting the solution comprehensively generated by the clock tree.
The input data is coordinate information of all triggers, a buffer library and a wire library, and is constrained to be maximum clock conversion time and maximum total capacitance, and the final goal is to construct a clock tree inserted into the buffer, wherein the clock tree meets given constraint conditions.
When selecting subtrees for merging, equations 1-3 are used to calculate the required capacitance using the capacitance required for merging as a cost function.
cost(i,j)=cL+C buf
Equation 1
Wherein c is capacitance per unit length, r is resistance per unit length, D 1 And D 2 Propagation delay of two subtrees respectively, C 1 And C 2 Load capacitance, d, of two subtrees respectively min Is the minimum Manhattan distance, d, of two subtrees 1 Manhattan distance C for left subtree to merge segment buf The capacitance of the buffer needed for the combination.
When the buffer is inserted, the buffer may overlap with the obstacle, and thus the merge segment is regenerated.
The bidirectional width first search algorithm divides a wiring area into m multiplied by n grids, maps the merging segments of two subtrees onto the grids, then respectively and simultaneously performs width first search, adds corresponding delay when searching outwards for one circle, and finally meets the corresponding point to be the regenerated merging segment, and if the delay of the two subtrees is unbalanced, the grids corresponding to the subtrees with lower delay are searched first until the delay of the two sides is equal, and then simultaneously searches.
The distance of the buffer insertion gap varies with the size of the buffer, and the buffer driving capability with large size is stronger, so the distance of the gap is larger, and the distance of the gap of the buffer inserted after the first buffer is the same because the structure of the buffer affects that only the load capacitance at the upstream can be seen because the subtrees driven by the first buffer can be different.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. A low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy is characterized by comprising the following steps of: the method comprises the following steps:
step 1: in order to equalize clock propagation delay from a root node to a leaf node, the invention constructs the clock tree by using a deferred merging algorithm, wherein the algorithm adopts a bottom-up merging mode to obtain all merging segments, namely all possible merging point positions in a first stage, and after the position of the root node is determined, the accurate position of each merging point is determined in a second stage, so that the length of an interconnection line is shortest and the power consumption is lowest;
step 2: at merging, if the buffer and the obstacle may coincide, the merging segment is found again;
step 3: in order to meet the clock transition time constraint and make the delay balance not affected by the inserted buffers, after each merging pair of subtrees is completed and the corresponding merging segment is generated, the buffer insertion operation is performed on the layer, but the specific positions of the buffers are not determined at this time, only the distance between the buffers is determined until the final position of each merging point is determined in the second stage of DME, and then the specific positions of the buffers are determined.
2. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: step 1 comprises the following steps:
and 1.1, constructing a Delaunay triangle split map for all subtrees when searching the subtrees to be combined. Delaunay triangulation can reduce the time complexity of finding nearest neighbors from O (n) to O (nlogn);
step 1.2, calculating the weight of each side in the Delaunay triangle splitting diagram, wherein the weight is expressed as the sum of the interconnection line capacitance and the buffer capacitance required by merging two sub-trees corresponding to the side;
and 1.3, rapidly sequencing all edges, finding one edge with the minimum weight, merging two sub-trees corresponding to the edge to generate a corresponding merging segment, and adding the merging segment as a newly generated tree into the Delaunay triangle split map.
3. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: step 2 comprises the following steps:
step 2.2, constructing a rectangular area on the boundary of the newly generated merging segment and the merging segments of the two subtrees;
step 2.3, if the rectangular area intersects an obstacle, the merging segments are re-found using a bi-directional breadth first search algorithm.
4. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: step 3 comprises the following steps:
step 3.1, the delay and output conversion time of the buffer itself depend on the input conversion time and the load capacitance, in the invention, the input conversion time is a pre-designated value, in order to obtain relatively accurate delay and output conversion time of the buffer, a lookup table of the delay and output conversion time of the buffer is established by using NGSPICE simulation;
step 3.2, manually giving a distance value, calculating clock conversion time according to a PERI model, and if a clock conversion time violation exists, adjusting the interval distance to be a smaller value until the clock conversion time constraint is met;
step 3.3, changing the size of the buffer, and repeating the step 3.2;
step 3.4, after finding all possible solutions of the buffer, we choose the possible solution with the smallest power consumption;
step 3.5, when the buffer of the layer is determined, the buffer affects the propagation delay of the two subtrees, so that the merging segment position is recalculated from the last buffer of the two subtrees;
and 3.6, iterating the upper layer until each layer of the clock tree is merged, determining the inserted buffer, then executing a second stage of DME, determining the specific position of each merging point and the buffer, wiring, connecting the merging points and the buffer by using interconnection lines, and finally outputting the solution comprehensively generated by the clock tree.
5. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: when selecting subtrees for merging, equations 1-3 are used to calculate the required capacitance using the capacitance required for merging as a cost function.
cost(i,j)=cL+C buf
Equation 1
Wherein c is capacitance per unit length, r is resistance per unit length, D 1 And D 2 Propagation delay of two subtrees respectively, C 1 And C 2 Load capacitance, d, of two subtrees respectively min Is the minimum Manhattan distance, d, of two subtrees 1 Manhattan distance C for left subtree to merge segment buf The capacitance of the buffer needed for the combination.
CN202310676747.8A 2023-06-08 2023-06-08 Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy Pending CN116776816A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310676747.8A CN116776816A (en) 2023-06-08 2023-06-08 Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310676747.8A CN116776816A (en) 2023-06-08 2023-06-08 Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy

Publications (1)

Publication Number Publication Date
CN116776816A true CN116776816A (en) 2023-09-19

Family

ID=88010802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310676747.8A Pending CN116776816A (en) 2023-06-08 2023-06-08 Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy

Country Status (1)

Country Link
CN (1) CN116776816A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117113915A (en) * 2023-10-25 2023-11-24 深圳鸿芯微纳技术有限公司 Buffer insertion method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117113915A (en) * 2023-10-25 2023-11-24 深圳鸿芯微纳技术有限公司 Buffer insertion method and device and electronic equipment
CN117113915B (en) * 2023-10-25 2024-02-02 深圳鸿芯微纳技术有限公司 Buffer insertion method and device and electronic equipment

Similar Documents

Publication Publication Date Title
US5666290A (en) Interactive time-driven method of component placement that more directly constrains critical paths using net-based constraints
US8205182B1 (en) Automatic synthesis of clock distribution networks
US7216322B2 (en) Clock tree synthesis for low power consumption and low clock skew
US8572542B2 (en) Clock-tree structure and method for synthesizing the same
Thonnart et al. A pseudo-synchronous implementation flow for WCHB QDI asynchronous circuits
CN116776816A (en) Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy
JP2001147948A (en) Delay time calculating method for cell and layout optimizing method for semiconductor integrated circuit
Cong et al. Minimum-cost bounded-skew clock routing
US20150363530A1 (en) Lsi design method
CN114861591B (en) Chip layout optimization method capable of realizing differential time sequence driving
US7917880B2 (en) Method for reducing power consumption of integrated circuit
Cong et al. Interconnect sizing and spacing with consideration of coupling capacitance
US6910195B2 (en) Flip-flop insertion in a circuit design
CN113792520A (en) Layout wiring method, layout wiring device, synchronous circuit and integrated circuit chip
Chen et al. Clock tree synthesis under aggressive buffer insertion
CN114386352A (en) Time sequence driving layout method and device, equipment and storage medium
CN112906342A (en) Method and device for setting clock tree wiring rule
Kwon et al. Lightweight buffer insertion for clock tree synthesis visualization
Zhou et al. Minimization of circuit delay and power through gate sizing and threshold voltage assignment
JP3251686B2 (en) Automatic wiring method for integrated circuits
Deng et al. Fast synthesis of low power clock trees based on register clustering
CN109388839A (en) Clock system method for analyzing performance and device
CN100470556C (en) Right-angle steiner tree method of obstacle at standard unit overall wiring
JP2004280439A (en) Crosstalk noise detecting method, method for designing semiconductor integrated circuit and design verifying method
CN116861842B (en) Implementation method and related device for adjustable segmented reverse clock tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination