CN116776816A

CN116776816A - Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy

Info

Publication number: CN116776816A
Application number: CN202310676747.8A
Authority: CN
Inventors: 俞文心; 丁劲皓; 文茄汁
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2023-06-08
Filing date: 2023-06-08
Publication date: 2023-09-19

Abstract

The invention discloses a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which comprises the following steps: input, bottom-up phase, top-down phase, output. The invention belongs to the technical field of low-power consumption clocks, and particularly provides a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which uses a bottom-up insertion strategy in a buffer insertion algorithm to avoid the influence of a buffer on clock deviation to the greatest extent and uses a bidirectional width priority search algorithm to perform clock obstacle avoidance wiring.

Description

Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy

Technical Field

The invention belongs to the technical field of low-power-consumption clocks, and particularly relates to a low-power-consumption clock tree comprehensive implementation method based on a deferred merging strategy.

Background

The clock is a very important part of digital chip design, the clock signal is a reference for data transmission, and it plays a decisive role in the function, performance and stability of synchronous digital systems, so the characteristics of the clock signal and its distribution network are of particular interest. The clock signal is continuously attenuated in the transmission process, and if the signal intensity is too weak, the clock period of each register cannot be synchronized, so that the performance of the chip is seriously affected. By inserting a buffer to enhance the clock signal, the transition time of the signal is shortened. At the same time, inserting buffers also reduces latency. The concept of clock tree synthesis refers to designing a clock tree with zero clock skew and automatically inserting buffers along the designed clock path to balance all clock delays and meet the required clock transition times. The clock signal must ensure that under worst-case conditions, timing requirements such as clock skew (clock), clock transition time (slew) can be met, otherwise any improper control of the clock signal may lead to a derangement condition, latching the erroneous data signal into the registers, and thus causing errors in system function. 30% or more of the total dynamic power consumption in today's chip designs is consumed on the clock network, and thus reducing the total power consumption of the clock tree is also one of the important goals. Therefore, the design of an efficient clock tree synthesis method is important to the physical design of the back end of the whole chip.

The topology structure of a clock tree is designed firstly, the quality of the clock tree is influenced by clock merging strategies, and the clock trees generated by different merging strategies are quite different. The traditional merging strategy is to merge two subtrees with the nearest Manhattan distance preferentially, but when the delays of the two subtrees are greatly different, the serpentine routing is increased after merging. And excessive meandering lines increase interconnect length, thereby increasing power consumption. Moreover, the snake walking line is greatly affected by process fluctuation, so that uncertainty of clock delay is increased, and finally clock deviation is increased. So to reduce the serpentine routing we cannot consider only the manhattan distance when designing the merge strategy, but all the resources needed for the merge, we define the resources as the sum of interconnect line capacitance and buffer capacitance in this invention.

We meet the clock transition time constraint by inserting a buffer. The conventional method often performs buffer insertion after the topology of the clock tree is constructed, which easily causes clock skew fluctuation, and thus affects the performance of the whole clock tree. Therefore, the invention adopts the algorithm of simultaneous buffer insertion and clock tree merging, thereby avoiding affecting clock deviation.

The clock source and the clock pins need to be connected by interconnecting wires. In the clock wiring stage, macro cells and the like placed in the layout stage are used as barriers, interconnection lines are allowed to coincide with the barriers, and buffers are not allowed to coincide with the barriers, so that the conventional method is to completely avoid the barriers during wiring, but the line length may be increased, and power consumption is further increased. The invention designs an algorithm which can reduce the line length and avoid the buffer from the obstacle.

Disclosure of Invention

In order to solve the existing problems, the invention provides a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which uses a bottom-up insertion strategy in a buffer insertion algorithm, avoids the influence of a buffer on clock bias to the greatest extent, and uses a bidirectional width priority search algorithm to perform clock obstacle avoidance wiring.

The technical scheme adopted by the invention is as follows: the invention discloses a low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy, which comprises the following steps:

step 1: in order to equalize clock propagation delay from a root node to a leaf node, the invention constructs the clock tree by using a deferred merging algorithm (DME), wherein the algorithm obtains all merging segments (mergingsegments) in a bottom-up merging mode in a first stage, namely all possible merging point positions, and determines the accurate position of each merging point in a second stage after the position of the root node is determined, so that the length of an interconnection line is shortest and the power consumption is lowest;

step 2: at merging, if the buffer and the obstacle may coincide, the merging segment is found again;

step 3: in order to meet the clock transition time constraint and make the delay balance not affected by the inserted buffers, after each merging pair of subtrees is completed and the corresponding merging segment is generated, the buffer insertion operation is performed on the layer, but the specific positions of the buffers are not determined at this time, only the distance between the buffers is determined until the final position of each merging point is determined in the second stage of DME, and then the specific positions of the buffers are determined.

Further, step 1 comprises the steps of:

step 1.1, when searching subtrees to be merged (the subtrees of the first round are all triggers), constructing a delaunay triangle split map for all subtrees. Delaunay triangulation can reduce the time complexity of finding nearest neighbors from O (n) to O (nlogn);

step 1.2, calculating the weight of each side in the Delaunay triangle splitting diagram, wherein the weight is expressed as the sum of the interconnection line capacitance and the buffer capacitance required by merging two sub-trees corresponding to the side;

and 1.3, rapidly sequencing all edges, finding one edge with the minimum weight, merging two sub-trees corresponding to the edge to generate a corresponding merging segment, and adding the merging segment as a newly generated tree into the Delaunay triangle split map.

Further, step 2 comprises the steps of:

step 2.2, constructing a rectangular area on the boundary of the newly generated merging segment and the merging segments of the two subtrees;

step 2.3, if the rectangular area intersects an obstacle, the merging segments are re-found using a bi-directional breadth first search algorithm.

Further, step 3 comprises the steps of:

step 3.1, the delay and output conversion time of the buffer itself depend on the input conversion time and the load capacitance thereof, in the present invention, the input conversion time (inputslew) is a pre-specified value, in order to achieve relatively accurate delay and output conversion time of the buffer, a lookup table of delay and output conversion time of the buffer is established using NGSPICE simulation;

step 3.2, manually giving a distance value, calculating clock conversion time according to a PERI model, and if a clock conversion time violation exists, adjusting the interval distance to be a smaller value until the clock conversion time constraint is met;

step 3.3, changing the size of the buffer, and repeating the step 3.2;

step 3.4, after finding all possible solutions of the buffer, we choose the possible solution with the smallest power consumption;

step 3.5, when the buffer of the layer is determined, the buffer affects the propagation delay of the two subtrees, so that the merging segment position is recalculated from the last buffer of the two subtrees;

and 3.6, iterating the upper layer until each layer of the clock tree is merged, determining the inserted buffer, then executing a second stage of DME, determining the specific position of each merging point and the buffer, wiring, connecting the merging points and the buffer by using interconnection lines, and finally outputting the solution comprehensively generated by the clock tree.

The beneficial effects obtained by the invention by adopting the structure are as follows: according to the low-power consumption clock tree comprehensive implementation method based on the deferred merging strategy, the clock tree comprehensive algorithm ensures that the clock tree has small clock deviation, and the accurate buffer insertion algorithm ensures that the clock tree has small power consumption. The clock tree comprehensive algorithm is applied to reference circuits of ispd09 and ispd10, and compared with similar design methods, clock deviation and total power consumption are remarkably reduced. Clock tree synthesis is an important component of back-end physical design, which is very helpful for final timing convergence and performance during operation of the chip.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a rectangular region of a newly generated merge segment and two sub-tree merge segments of the present invention, wherein FIG. 2 (a) is a rectangular region of a newly generated merge segment and two sub-tree merge segments, where ms_a and ms_b are sub-tree merge segments and ms_v is a newly generated merge segment; FIG. 2 (b) is a schematic diagram of a rectangular region of newly generated merged segments and merged segments of two subtrees with overlapping obstacles;

FIG. 3 is a schematic diagram of a bi-directional breadth-first search algorithm of the present invention;

FIG. 4 is a schematic diagram of a buffer insertion according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The low-power consumption clock tree comprehensive implementation method based on the deferred merging strategy, provided by the scheme, comprises the following steps:

Step 1 comprises the following steps: step 1.1, when searching subtrees to be merged (the subtrees of the first round are all triggers), constructing a delaunay triangle split map for all subtrees. Delaunay triangulation can reduce the time complexity of finding nearest neighbors from O (n) to O (nlogn);

Step 2 comprises the following steps: step 2.2, constructing a rectangular area on the boundary of the newly generated merging segment and the merging segments of the two subtrees;

Step 3 comprises the following steps: step 3.1, the delay and output conversion time of the buffer itself depend on the input conversion time and the load capacitance thereof, in the present invention, the input conversion time (inputslew) is a pre-specified value, in order to achieve relatively accurate delay and output conversion time of the buffer, a lookup table of delay and output conversion time of the buffer is established using NGSPICE simulation;

step 3.3, changing the size of the buffer, and repeating the step 3.2;

The input data is coordinate information of all triggers, a buffer library and a wire library, and is constrained to be maximum clock conversion time and maximum total capacitance, and the final goal is to construct a clock tree inserted into the buffer, wherein the clock tree meets given constraint conditions.

When selecting subtrees for merging, equations 1-3 are used to calculate the required capacitance using the capacitance required for merging as a cost function.

cost(i，j)＝cL+C _buf

Equation 1

Wherein c is capacitance per unit length, r is resistance per unit length, D ₁ And D ₂ Propagation delay of two subtrees respectively, C ₁ And C ₂ Load capacitance, d, of two subtrees respectively _min Is the minimum Manhattan distance, d, of two subtrees ₁ Manhattan distance C for left subtree to merge segment _buf The capacitance of the buffer needed for the combination.

When the buffer is inserted, the buffer may overlap with the obstacle, and thus the merge segment is regenerated.

The bidirectional width first search algorithm divides a wiring area into m multiplied by n grids, maps the merging segments of two subtrees onto the grids, then respectively and simultaneously performs width first search, adds corresponding delay when searching outwards for one circle, and finally meets the corresponding point to be the regenerated merging segment, and if the delay of the two subtrees is unbalanced, the grids corresponding to the subtrees with lower delay are searched first until the delay of the two sides is equal, and then simultaneously searches.

The distance of the buffer insertion gap varies with the size of the buffer, and the buffer driving capability with large size is stronger, so the distance of the gap is larger, and the distance of the gap of the buffer inserted after the first buffer is the same because the structure of the buffer affects that only the load capacitance at the upstream can be seen because the subtrees driven by the first buffer can be different.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A low-power consumption clock tree comprehensive implementation method based on a deferred merging strategy is characterized by comprising the following steps of: the method comprises the following steps:

step 1: in order to equalize clock propagation delay from a root node to a leaf node, the invention constructs the clock tree by using a deferred merging algorithm, wherein the algorithm adopts a bottom-up merging mode to obtain all merging segments, namely all possible merging point positions in a first stage, and after the position of the root node is determined, the accurate position of each merging point is determined in a second stage, so that the length of an interconnection line is shortest and the power consumption is lowest;

2. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: step 1 comprises the following steps:

and 1.1, constructing a Delaunay triangle split map for all subtrees when searching the subtrees to be combined. Delaunay triangulation can reduce the time complexity of finding nearest neighbors from O (n) to O (nlogn);

3. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: step 2 comprises the following steps:

4. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: step 3 comprises the following steps:

step 3.1, the delay and output conversion time of the buffer itself depend on the input conversion time and the load capacitance, in the invention, the input conversion time is a pre-designated value, in order to obtain relatively accurate delay and output conversion time of the buffer, a lookup table of the delay and output conversion time of the buffer is established by using NGSPICE simulation;

step 3.3, changing the size of the buffer, and repeating the step 3.2;

5. The low-power consumption clock tree comprehensive implementation method based on deferred merging strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: when selecting subtrees for merging, equations 1-3 are used to calculate the required capacitance using the capacitance required for merging as a cost function.

cost(i，j)＝cL+C _buf

Equation 1