CN116776816A - Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy - Google Patents
Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy Download PDFInfo
- Publication number
- CN116776816A CN116776816A CN202310676747.8A CN202310676747A CN116776816A CN 116776816 A CN116776816 A CN 116776816A CN 202310676747 A CN202310676747 A CN 202310676747A CN 116776816 A CN116776816 A CN 116776816A
- Authority
- CN
- China
- Prior art keywords
- merging
- buffer
- clock
- subtrees
- power consumption
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 239000000872 buffer Substances 0.000 claims abstract description 82
- 238000003780 insertion Methods 0.000 claims abstract description 12
- 230000037431 insertion Effects 0.000 claims abstract description 12
- 238000010845 search algorithm Methods 0.000 claims abstract description 7
- 238000006243 chemical reaction Methods 0.000 claims description 25
- 230000007704 transition Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 abstract description 3
- 238000013461 design Methods 0.000 description 7
- 230000004888 barrier function Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- WYTGDNHDOZPMIW-RCBQFDQVSA-N alstonine Natural products C1=CC2=C3C=CC=CC3=NC2=C2N1C[C@H]1[C@H](C)OC=C(C(=O)OC)[C@H]1C2 WYTGDNHDOZPMIW-RCBQFDQVSA-N 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 241000270295 Serpentes Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Landscapes
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
The invention discloses a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which comprises the following steps: input, bottom-up phase, top-down phase, output. The invention belongs to the technical field of low-power consumption clocks, and particularly provides a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which uses a bottom-up insertion strategy in a buffer insertion algorithm to avoid the influence of a buffer on clock deviation to the greatest extent and uses a bidirectional width priority search algorithm to perform clock obstacle avoidance wiring.
Description
Technical Field
The invention belongs to the technical field of low-power-consumption clocks, and particularly relates to a low-power-consumption clock tree comprehensive implementation method based on a deferred merging strategy.
Background
The clock is a very important part of digital chip design, the clock signal is a reference for data transmission, and it plays a decisive role in the function, performance and stability of synchronous digital systems, so the characteristics of the clock signal and its distribution network are of particular interest. The clock signal is continuously attenuated in the transmission process, and if the signal intensity is too weak, the clock period of each register cannot be synchronized, so that the performance of the chip is seriously affected. By inserting a buffer to enhance the clock signal, the transition time of the signal is shortened. At the same time, inserting buffers also reduces latency. The concept of clock tree synthesis refers to designing a clock tree with zero clock skew and automatically inserting buffers along the designed clock path to balance all clock delays and meet the required clock transition times. The clock signal must ensure that under worst-case conditions, timing requirements such as clock skew (clock), clock transition time (slew) can be met, otherwise any improper control of the clock signal may lead to a derangement condition, latching the erroneous data signal into the registers, and thus causing errors in system function. 30% or more of the total dynamic power consumption in today's chip designs is consumed on the clock network, and thus reducing the total power consumption of the clock tree is also one of the important goals. Therefore, the design of an efficient clock tree synthesis method is important to the physical design of the back end of the whole chip.
The topology structure of a clock tree is designed firstly, the quality of the clock tree is influenced by clock merging strategies, and the clock trees generated by different merging strategies are quite different. The traditional merging strategy is to merge two subtrees with the nearest Manhattan distance preferentially, but when the delays of the two subtrees are greatly different, the serpentine routing is increased after merging. And excessive meandering lines increase interconnect length, thereby increasing power consumption. Moreover, the snake walking line is greatly affected by process fluctuation, so that uncertainty of clock delay is increased, and finally clock deviation is increased. So to reduce the serpentine routing we cannot consider only the manhattan distance when designing the merge strategy, but all the resources needed for the merge, we define the resources as the sum of interconnect line capacitance and buffer capacitance in this invention.
We meet the clock transition time constraint by inserting a buffer. The conventional method often performs buffer insertion after the topology of the clock tree is constructed, which easily causes clock skew fluctuation, and thus affects the performance of the whole clock tree. Therefore, the invention adopts the algorithm of simultaneous buffer insertion and clock tree merging, thereby avoiding affecting clock deviation.
The clock source and the clock pins need to be connected by interconnecting wires. In the clock wiring stage, macro cells and the like placed in the layout stage are used as barriers, interconnection lines are allowed to coincide with the barriers, and buffers are not allowed to coincide with the barriers, so that the conventional method is to completely avoid the barriers during wiring, but the line length may be increased, and power consumption is further increased. The invention designs an algorithm which can reduce the line length and avoid the buffer from the obstacle.
Disclosure of Invention
In order to solve the existing problems, the invention provides a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which uses a bottom-up insertion strategy in a buffer insertion algorithm, avoids the influence of a buffer on clock bias to the greatest extent, and uses a bidirectional width priority search algorithm to perform clock obstacle avoidance wiring.
The technical scheme adopted by the invention is as follows: the invention discloses a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which comprises the following steps:
step 1: in order to equalize clock propagation delay from a root node to a leaf node, the invention constructs the clock tree by using a deferred merging algorithm (DME), wherein the algorithm obtains all merging segments (mergingsegments) in a bottom-up merging mode in a first stage, namely all possible merging point positions, and determines the accurate position of each merging point in a second stage after the position of the root node is determined, so that the length of an interconnection line is shortest and the power consumption is lowest;
step 2: at merging, if the buffer and the obstacle may coincide, the merging segment is found again;
step 3: in order to meet the clock transition time constraint and make the delay balance not affected by the inserted buffers, after each merging pair of subtrees is completed and the corresponding merging segment is generated, the buffer insertion operation is performed on the layer, but the specific positions of the buffers are not determined at this time, only the distance between the buffers is determined until the final position of each merging point is determined in the second stage of DME, and then the specific positions of the buffers are determined.
Further, step 1 comprises the steps of:
step 1.1, when searching subtrees to be merged (the subtrees of the first round are all triggers), constructing a delaunay triangle split map for all subtrees. Delaunay triangulation can reduce the time complexity of finding nearest neighbors from O (n) to O (nlogn);
step 1.2, calculating the weight of each side in the Delaunay triangle splitting diagram, wherein the weight is expressed as the sum of the interconnection line capacitance and the buffer capacitance required by merging two sub-trees corresponding to the side;
and 1.3, rapidly sequencing all edges, finding one edge with the minimum weight, merging two sub-trees corresponding to the edge to generate a corresponding merging segment, and adding the merging segment as a newly generated tree into the Delaunay triangle split map.
Further, step 2 comprises the steps of:
step 2.2, constructing a rectangular area on the boundary of the newly generated merging segment and the merging segments of the two subtrees;
step 2.3, if the rectangular area intersects an obstacle, the merging segments are re-found using a bi-directional breadth first search algorithm.
Further, step 3 comprises the steps of:
step 3.1, the delay and output conversion time of the buffer itself depend on the input conversion time and the load capacitance thereof, in the present invention, the input conversion time (inputslew) is a pre-specified value, in order to achieve relatively accurate delay and output conversion time of the buffer, a lookup table of delay and output conversion time of the buffer is established using NGSPICE simulation;
step 3.2, manually giving a distance value, calculating clock conversion time according to a PERI model, and if a clock conversion time violation exists, adjusting the interval distance to be a smaller value until the clock conversion time constraint is met;
step 3.3, changing the size of the buffer, and repeating the step 3.2;
step 3.4, after finding all possible solutions of the buffer, we choose the possible solution with the smallest power consumption;
step 3.5, when the buffer of the layer is determined, the buffer affects the propagation delay of the two subtrees, so that the merging segment position is recalculated from the last buffer of the two subtrees;
and 3.6, iterating the upper layer until each layer of the clock tree is merged, determining the inserted buffer, then executing a second stage of DME, determining the specific position of each merging point and the buffer, wiring, connecting the merging points and the buffer by using interconnection lines, and finally outputting the solution comprehensively generated by the clock tree.
The beneficial effects obtained by the invention by adopting the structure are as follows: according to the low-power consumption clock tree comprehensive implementation method based on the deferred merging strategy, the clock tree comprehensive algorithm ensures that the clock tree has small clock deviation, and the accurate buffer insertion algorithm ensures that the clock tree has small power consumption. The clock tree comprehensive algorithm is applied to reference circuits of ispd09 and ispd10, and compared with similar design methods, clock deviation and total power consumption are remarkably reduced. Clock tree synthesis is an important component of back-end physical design, which is very helpful for final timing convergence and performance during operation of the chip.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a rectangular region of a newly generated merge segment and two sub-tree merge segments of the present invention, wherein FIG. 2 (a) is a rectangular region of a newly generated merge segment and two sub-tree merge segments, where ms_a and ms_b are sub-tree merge segments and ms_v is a newly generated merge segment; FIG. 2 (b) is a schematic diagram of a rectangular region of newly generated merged segments and merged segments of two subtrees with overlapping obstacles;
FIG. 3 is a schematic diagram of a bi-directional breadth-first search algorithm of the present invention;
FIG. 4 is a schematic diagram of a buffer insertion according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The low-power consumption clock tree comprehensive implementation method based on the deferred merging strategy, provided by the scheme, comprises the following steps:
step 1: in order to equalize clock propagation delay from a root node to a leaf node, the invention constructs the clock tree by using a deferred merging algorithm (DME), wherein the algorithm obtains all merging segments (mergingsegments) in a bottom-up merging mode in a first stage, namely all possible merging point positions, and determines the accurate position of each merging point in a second stage after the position of the root node is determined, so that the length of an interconnection line is shortest and the power consumption is lowest;
step 2: at merging, if the buffer and the obstacle may coincide, the merging segment is found again;
step 3: in order to meet the clock transition time constraint and make the delay balance not affected by the inserted buffers, after each merging pair of subtrees is completed and the corresponding merging segment is generated, the buffer insertion operation is performed on the layer, but the specific positions of the buffers are not determined at this time, only the distance between the buffers is determined until the final position of each merging point is determined in the second stage of DME, and then the specific positions of the buffers are determined.
Step 1 comprises the following steps: step 1.1, when searching subtrees to be merged (the subtrees of the first round are all triggers), constructing a delaunay triangle split map for all subtrees. Delaunay triangulation can reduce the time complexity of finding nearest neighbors from O (n) to O (nlogn);
step 1.2, calculating the weight of each side in the Delaunay triangle splitting diagram, wherein the weight is expressed as the sum of the interconnection line capacitance and the buffer capacitance required by merging two sub-trees corresponding to the side;
and 1.3, rapidly sequencing all edges, finding one edge with the minimum weight, merging two sub-trees corresponding to the edge to generate a corresponding merging segment, and adding the merging segment as a newly generated tree into the Delaunay triangle split map.
Step 2 comprises the following steps: step 2.2, constructing a rectangular area on the boundary of the newly generated merging segment and the merging segments of the two subtrees;
step 2.3, if the rectangular area intersects an obstacle, the merging segments are re-found using a bi-directional breadth first search algorithm.
Step 3 comprises the following steps: step 3.1, the delay and output conversion time of the buffer itself depend on the input conversion time and the load capacitance thereof, in the present invention, the input conversion time (inputslew) is a pre-specified value, in order to achieve relatively accurate delay and output conversion time of the buffer, a lookup table of delay and output conversion time of the buffer is established using NGSPICE simulation;
step 3.2, manually giving a distance value, calculating clock conversion time according to a PERI model, and if a clock conversion time violation exists, adjusting the interval distance to be a smaller value until the clock conversion time constraint is met;
step 3.3, changing the size of the buffer, and repeating the step 3.2;
step 3.4, after finding all possible solutions of the buffer, we choose the possible solution with the smallest power consumption;
step 3.5, when the buffer of the layer is determined, the buffer affects the propagation delay of the two subtrees, so that the merging segment position is recalculated from the last buffer of the two subtrees;
and 3.6, iterating the upper layer until each layer of the clock tree is merged, determining the inserted buffer, then executing a second stage of DME, determining the specific position of each merging point and the buffer, wiring, connecting the merging points and the buffer by using interconnection lines, and finally outputting the solution comprehensively generated by the clock tree.
The input data is coordinate information of all triggers, a buffer library and a wire library, and is constrained to be maximum clock conversion time and maximum total capacitance, and the final goal is to construct a clock tree inserted into the buffer, wherein the clock tree meets given constraint conditions.
When selecting subtrees for merging, equations 1-3 are used to calculate the required capacitance using the capacitance required for merging as a cost function.
cost(i,j)=cL+C buf
Equation 1
Wherein c is capacitance per unit length, r is resistance per unit length, D 1 And D 2 Propagation delay of two subtrees respectively, C 1 And C 2 Load capacitance, d, of two subtrees respectively min Is the minimum Manhattan distance, d, of two subtrees 1 Manhattan distance C for left subtree to merge segment buf The capacitance of the buffer needed for the combination.
When the buffer is inserted, the buffer may overlap with the obstacle, and thus the merge segment is regenerated.
The bidirectional width first search algorithm divides a wiring area into m multiplied by n grids, maps the merging segments of two subtrees onto the grids, then respectively and simultaneously performs width first search, adds corresponding delay when searching outwards for one circle, and finally meets the corresponding point to be the regenerated merging segment, and if the delay of the two subtrees is unbalanced, the grids corresponding to the subtrees with lower delay are searched first until the delay of the two sides is equal, and then simultaneously searches.
The distance of the buffer insertion gap varies with the size of the buffer, and the buffer driving capability with large size is stronger, so the distance of the gap is larger, and the distance of the gap of the buffer inserted after the first buffer is the same because the structure of the buffer affects that only the load capacitance at the upstream can be seen because the subtrees driven by the first buffer can be different.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (5)
1. A low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy is characterized by comprising the following steps of: the method comprises the following steps:
step 1: in order to equalize clock propagation delay from a root node to a leaf node, the invention constructs the clock tree by using a deferred merging algorithm, wherein the algorithm adopts a bottom-up merging mode to obtain all merging segments, namely all possible merging point positions in a first stage, and after the position of the root node is determined, the accurate position of each merging point is determined in a second stage, so that the length of an interconnection line is shortest and the power consumption is lowest;
step 2: at merging, if the buffer and the obstacle may coincide, the merging segment is found again;
step 3: in order to meet the clock transition time constraint and make the delay balance not affected by the inserted buffers, after each merging pair of subtrees is completed and the corresponding merging segment is generated, the buffer insertion operation is performed on the layer, but the specific positions of the buffers are not determined at this time, only the distance between the buffers is determined until the final position of each merging point is determined in the second stage of DME, and then the specific positions of the buffers are determined.
2. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: step 1 comprises the following steps:
and 1.1, constructing a Delaunay triangle split map for all subtrees when searching the subtrees to be combined. Delaunay triangulation can reduce the time complexity of finding nearest neighbors from O (n) to O (nlogn);
step 1.2, calculating the weight of each side in the Delaunay triangle splitting diagram, wherein the weight is expressed as the sum of the interconnection line capacitance and the buffer capacitance required by merging two sub-trees corresponding to the side;
and 1.3, rapidly sequencing all edges, finding one edge with the minimum weight, merging two sub-trees corresponding to the edge to generate a corresponding merging segment, and adding the merging segment as a newly generated tree into the Delaunay triangle split map.
3. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: step 2 comprises the following steps:
step 2.2, constructing a rectangular area on the boundary of the newly generated merging segment and the merging segments of the two subtrees;
step 2.3, if the rectangular area intersects an obstacle, the merging segments are re-found using a bi-directional breadth first search algorithm.
4. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: step 3 comprises the following steps:
step 3.1, the delay and output conversion time of the buffer itself depend on the input conversion time and the load capacitance, in the invention, the input conversion time is a pre-designated value, in order to obtain relatively accurate delay and output conversion time of the buffer, a lookup table of the delay and output conversion time of the buffer is established by using NGSPICE simulation;
step 3.2, manually giving a distance value, calculating clock conversion time according to a PERI model, and if a clock conversion time violation exists, adjusting the interval distance to be a smaller value until the clock conversion time constraint is met;
step 3.3, changing the size of the buffer, and repeating the step 3.2;
step 3.4, after finding all possible solutions of the buffer, we choose the possible solution with the smallest power consumption;
step 3.5, when the buffer of the layer is determined, the buffer affects the propagation delay of the two subtrees, so that the merging segment position is recalculated from the last buffer of the two subtrees;
and 3.6, iterating the upper layer until each layer of the clock tree is merged, determining the inserted buffer, then executing a second stage of DME, determining the specific position of each merging point and the buffer, wiring, connecting the merging points and the buffer by using interconnection lines, and finally outputting the solution comprehensively generated by the clock tree.
5. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: when selecting subtrees for merging, equations 1-3 are used to calculate the required capacitance using the capacitance required for merging as a cost function.
cost(i,j)=cL+C buf
Equation 1
Wherein c is capacitance per unit length, r is resistance per unit length, D 1 And D 2 Propagation delay of two subtrees respectively, C 1 And C 2 Load capacitance, d, of two subtrees respectively min Is the minimum Manhattan distance, d, of two subtrees 1 Manhattan distance C for left subtree to merge segment buf The capacitance of the buffer needed for the combination.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310676747.8A CN116776816A (en) | 2023-06-08 | 2023-06-08 | Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310676747.8A CN116776816A (en) | 2023-06-08 | 2023-06-08 | Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116776816A true CN116776816A (en) | 2023-09-19 |
Family
ID=88010802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310676747.8A Pending CN116776816A (en) | 2023-06-08 | 2023-06-08 | Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116776816A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117113915A (en) * | 2023-10-25 | 2023-11-24 | 深圳鸿芯微纳技术有限公司 | Buffer insertion method and device and electronic equipment |
-
2023
- 2023-06-08 CN CN202310676747.8A patent/CN116776816A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117113915A (en) * | 2023-10-25 | 2023-11-24 | 深圳鸿芯微纳技术有限公司 | Buffer insertion method and device and electronic equipment |
CN117113915B (en) * | 2023-10-25 | 2024-02-02 | 深圳鸿芯微纳技术有限公司 | Buffer insertion method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5666290A (en) | Interactive time-driven method of component placement that more directly constrains critical paths using net-based constraints | |
US8205182B1 (en) | Automatic synthesis of clock distribution networks | |
US7216322B2 (en) | Clock tree synthesis for low power consumption and low clock skew | |
US8572542B2 (en) | Clock-tree structure and method for synthesizing the same | |
Thonnart et al. | A pseudo-synchronous implementation flow for WCHB QDI asynchronous circuits | |
CN116776816A (en) | Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy | |
JP2001147948A (en) | Delay time calculating method for cell and layout optimizing method for semiconductor integrated circuit | |
Cong et al. | Minimum-cost bounded-skew clock routing | |
US20150363530A1 (en) | Lsi design method | |
CN114861591B (en) | Chip layout optimization method capable of realizing differential time sequence driving | |
US7917880B2 (en) | Method for reducing power consumption of integrated circuit | |
Cong et al. | Interconnect sizing and spacing with consideration of coupling capacitance | |
US6910195B2 (en) | Flip-flop insertion in a circuit design | |
CN113792520A (en) | Layout wiring method, layout wiring device, synchronous circuit and integrated circuit chip | |
Chen et al. | Clock tree synthesis under aggressive buffer insertion | |
CN114386352A (en) | Time sequence driving layout method and device, equipment and storage medium | |
CN112906342A (en) | Method and device for setting clock tree wiring rule | |
Kwon et al. | Lightweight buffer insertion for clock tree synthesis visualization | |
Zhou et al. | Minimization of circuit delay and power through gate sizing and threshold voltage assignment | |
JP3251686B2 (en) | Automatic wiring method for integrated circuits | |
Deng et al. | Fast synthesis of low power clock trees based on register clustering | |
CN109388839A (en) | Clock system method for analyzing performance and device | |
CN100470556C (en) | Right-angle steiner tree method of obstacle at standard unit overall wiring | |
JP2004280439A (en) | Crosstalk noise detecting method, method for designing semiconductor integrated circuit and design verifying method | |
CN116861842B (en) | Implementation method and related device for adjustable segmented reverse clock tree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |