CN116861842B - Implementation method and related device for adjustable segmented reverse clock tree - Google Patents
Implementation method and related device for adjustable segmented reverse clock tree Download PDFInfo
- Publication number
- CN116861842B CN116861842B CN202311131872.7A CN202311131872A CN116861842B CN 116861842 B CN116861842 B CN 116861842B CN 202311131872 A CN202311131872 A CN 202311131872A CN 116861842 B CN116861842 B CN 116861842B
- Authority
- CN
- China
- Prior art keywords
- clock tree
- delay
- clock
- design
- stage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000008569 process Effects 0.000 claims abstract description 38
- 230000011218 segmentation Effects 0.000 claims abstract description 35
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 26
- 230000003068 static effect Effects 0.000 claims abstract description 25
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000012300 Sequence Analysis Methods 0.000 claims abstract description 20
- 238000004458 analytical method Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 10
- 238000004804 winding Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000009960 carding Methods 0.000 claims description 4
- 238000004141 dimensional analysis Methods 0.000 claims description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 230000008439 repair process Effects 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 230000008054 signal transmission Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/396—Clock trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/33—Design verification, e.g. functional simulation or model checking
- G06F30/3315—Design verification, e.g. functional simulation or model checking using static timing analysis [STA]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/398—Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
The invention provides an implementation method and a related device of an adjustable sectional reverse clock tree, belonging to the technical field of chip design, comprising the following steps: analyzing the data flow structure of the core functional module, and realizing the design of a first-stage reverse clock tree structure according to the data flow structure; determining the working performance index and corresponding clock frequency of the chip under different PVT conditions and the process library requirement of the minimum duty ratio of the clock aiming at the register; calculating the maximum number N of the segmented reverse clock tree by adopting a preset reverse clock tree segmentation algorithm; performing segment design on the first-stage reverse clock tree structure according to N to obtain a plurality of segment-designed reverse clock tree structures; the clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock period; designing a second-stage clock tree structure; and carrying out layout and wiring design, RC parameter extraction and static time sequence analysis on the structure of the two-stage clock tree, so that the structure of the two-stage clock tree meets PPA indexes.
Description
Technical Field
The invention relates to the technical field of chip design, in particular to an implementation method and a related device of an adjustable sectional type reverse clock tree.
Background
The inverse clock tree is mostly used in the design of multi-stage pipeline registers (up to 128 or 256 stages), and as shown in fig. 1, the clock signals in the inverse clock tree structure propagate in an inverse manner, i.e. the data signals flow from left to right, and the clock signals flow from right to left.
The sequential logic levels in the inverted clock tree structure are relatively deep, and in order to maintain proper phase relationships, the clocked inverters are typically inserted in pairs into the clock tree structure, thereby also causing the clocked tree to have a relatively deep inverter level. Because the number of pipeline stages of the whole data stream is relatively large, in most ultra-low voltage designs, rising delay (i.e. high level signal transmission delay) and falling delay (low level signal transmission delay) of logic units in the reverse clock tree are inconsistent, which can cause the duty ratio of the clock signal to change, and when the duty ratio of the clock signal is serious, the problem of duty ratio violation occurs. In addition, when the last stage data stream needs to interact with other functional modules (such as an asynchronous logic circuit, a feedback comparison circuit and the like), the clock delay difference of the two parts can be far more than one period, so that the performance of the chip can be seriously affected.
Disclosure of Invention
The invention provides an implementation method and a related device of an adjustable sectional type reverse clock tree, which can determine how to conduct sectional design on the reverse clock tree according to different project designs, clock periods and interface circuits, can ensure the performance of chips, can reduce the repair of the retention time in the circuit design, and can meet the design requirement of the minimum duty ratio of a clock. The technical proposal is as follows:
in a first aspect, an embodiment of the present invention provides a method for implementing an adjustable segmented reverse clock tree, including:
analyzing a data flow structure of a core functional module, and realizing the design of a first-stage reverse clock tree structure according to the data flow structure;
performing layout and wiring design, RC parameter extraction and static time sequence analysis under various PVT conditions on the core functional module, and determining the working performance index and corresponding clock frequency of a chip under different PVT conditions and the process library requirement of the minimum duty ratio of a clock aiming at a register;
based on the working performance indexes and the corresponding clock frequencies of the chips under different PVT conditions and the process library requirements of the minimum duty ratio of the clock of the register, calculating the maximum number N of the segmented reverse clock tree by adopting a preset reverse clock tree segmentation algorithm;
according to the maximum number N, carrying out sectional design on the first-stage reverse clock tree structure to obtain a plurality of reverse clock tree structures after sectional design; wherein the overall clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock cycle;
designing a second-stage clock tree structure;
carrying out layout and wiring design, RC parameter extraction and static time sequence analysis on the structure of the two-stage clock tree, so that the structure of the two-stage clock tree meets PPA indexes;
wherein the preset reverse clock tree segmentation algorithm comprises the following steps:
N<= (T_clk_period + T_launch_delay) / (T_clkinv_pair_delay * 2);
wherein N is the number of clock tree stages, T_clk_period is the clock period, T_delay_delay is the transmission clock delay, and T_clkinv_pair_delay is the delay of a single clock inverter logic unit;
the preset reverse clock tree segmentation algorithm is obtained by the following method:
T_clk_skew ≤T_clk_period;
T_clk_skew = T_capture_clk_delay – T_launch_delay;
T_capture_clk_delay = N * 2 * T_clkinv_pair_delay;
wherein t_clk_skew is the clock delay skew; t_capture_clk_delay is the sampling clock delay, and t_delay_delay is the transmit clock delay.
Optionally, the analyzing the data flow structure of the core functional module and implementing the design of the first-stage reverse clock tree structure according to the data flow structure includes:
the core functional module comprises a multi-stage pipeline register, and performs all-dimensional analysis and carding of a functional structure, a logic structure, performance indexes and a low-voltage process library aiming at the data flow of the multi-stage pipeline register to determine the planning and design structure of a reverse clock tree;
according to the planning and design structure of the reverse clock tree, arranging a register group corresponding to the data flow structure of the multi-stage running water according to the sequence rule of the data flow;
according to the physical positions of the data flow structure and the corresponding register group, developing a design script of the first-stage reverse clock tree structure, and according to the rules of physical and automatic layout and wiring, and winding, designing and realizing the first-stage reverse clock tree structure.
Optionally, the second level clock tree structure includes: a reverse clock tree structure or a balanced tree structure.
In a second aspect, an embodiment of the present invention provides an implementation apparatus for an adjustable segmented reverse clock tree, including:
the first design module is used for analyzing the data flow structure of the core function module and realizing the design of a first-stage reverse clock tree structure according to the data flow structure;
the first analysis and determination module is used for carrying out layout and wiring design, RC parameter extraction and static time sequence analysis under various PVT conditions on the core functional module, and determining the working performance index and corresponding clock frequency of the chip under different PVT conditions and the process library requirement of the minimum duty ratio of the clock aiming at the register;
the calculation module is used for calculating the maximum number N of the segmented reverse clock tree by adopting a preset reverse clock tree segmentation algorithm based on the working performance indexes of the chips under different PVT conditions, the corresponding clock frequencies and the process library requirements of the minimum duty ratio of the clock of the register;
the segmentation module is used for carrying out segmentation design on the first-stage reverse clock tree structure according to the maximum number N to obtain a plurality of reverse clock tree structures after segmentation design; wherein the overall clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock cycle;
the second design module is used for designing a second-stage clock tree structure;
the second analysis determining module is used for carrying out layout and wiring design, RC parameter extraction and static time sequence analysis on the structure of the two-stage clock tree so that the structure of the two-stage clock tree meets PPA indexes;
wherein the preset reverse clock tree segmentation algorithm comprises the following steps:
N<= (T_clk_period + T_launch_delay) / (T_clkinv_pair_delay * 2);
wherein N is the number of clock tree stages, T_clk_period is the clock period, T_delay_delay is the transmission clock delay, and T_clkinv_pair_delay is the delay of a single clock inverter logic unit;
the preset reverse clock tree segmentation algorithm is obtained by the following method:
T_clk_skew ≤T_clk_period ;
T_clk_skew = T_capture_clk_delay – T_launch_delay;
T_capture_clk_delay = N * 2 * T_clkinv_pair_delay;
wherein t_clk_skew is the clock delay skew; t_capture_clk_delay is the sampling clock delay, and t_delay_delay is the transmit clock delay.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a transceiver, a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of implementing an adjustable segmented reverse clock tree as described in the first aspect when the computer program is executed.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of implementing an adjustable segmented reverse clock tree according to the first aspect.
The technical scheme of the invention has the beneficial effects that:
the implementation method and the related device for the adjustable segmented reverse clock tree provided by the embodiment of the invention comprise the following steps: firstly, analyzing a data flow structure of a core functional module, and realizing the design of a first-stage reverse clock tree structure according to the data flow structure; performing layout and wiring design, RC parameter extraction and static time sequence analysis under various PVT conditions on the core functional module, and determining the working performance index and corresponding clock frequency of the chip under different PVT conditions and the process library requirement of the minimum duty ratio of the clock aiming at the register; further calculating the maximum number N of the segmented reverse clock tree by adopting a preset reverse clock tree segmentation algorithm based on the working performance indexes and the corresponding clock frequencies of the chips under different PVT conditions and the process library requirements of the minimum duty ratio of the clock of the register; further carrying out sectional design on the first-stage reverse clock tree structure according to the maximum number N to obtain a plurality of reverse clock tree structures after sectional design; wherein the overall clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock cycle; further designing a second-stage clock tree structure; and finally, carrying out layout and wiring design, RC parameter extraction and static time sequence analysis on the structure of the two-stage clock tree, so that the structure of the two-stage clock tree meets PPA indexes (optimal PPA indexes are obtained as far as possible). The embodiment of the invention can determine how to segment the reverse clock tree according to different project designs, clock periods and interface circuits, so that the segmented reverse clock tree design is developed under the condition of ensuring the performance of a chip, and meanwhile, the repair of the holding time in the circuit design is reduced, and the design requirement of the minimum duty ratio of the clock is met.
Drawings
FIG. 1 is a schematic diagram of a prior art inverted clock tree structure;
FIG. 2 is a flow chart of a method for implementing an adjustable segmented reverse clock tree according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a segmented 2-level inverted clock tree structure according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a hybrid architecture design of a segmented inverted clock tree and a balanced clock tree in an embodiment of the present invention;
FIG. 5 is a diagram showing the time sequence margin and corresponding chip performance frequency established under different delay_counter in static time sequence analysis according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an implementation apparatus for an adjustable segmented reverse clock tree according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments. In the following description, specific details such as specific configurations and components are provided merely to facilitate a thorough understanding of embodiments of the invention. It will therefore be apparent to those skilled in the art that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the invention. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the following processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
The embodiment of the invention provides an implementation method of an adjustable sectional type reverse clock tree, which aims at a multi-stage clock tree structure to obtain a plurality of sectional designed reverse clock tree structures through a sectional design method, wherein the integral clock delay deviation of the sectional design reverse clock tree structures is not more than one clock period. The reverse clock tree structure after the sectional design ensures that the chip design can reach better performance and power consumption indexes under ultra-low voltage, and ensures that the process requirement of the clock duty ratio of the multistage pipeline register is easier to repair violations and meet the design requirement, and the design of the sectional reverse clock tree can meet the design index requirements of different applications. As shown in fig. 2, the implementation method of the adjustable segmented reverse clock tree provided by the embodiment of the invention includes:
and step 101, analyzing the data flow structure of the core functional module, and realizing the design of a first-stage reverse clock tree structure according to the data flow structure.
In the project starting stage, firstly, data flow analysis and arrangement of a core functional module is needed, and the core functional module comprises n-level pipeline registers (n > 100). The method is used for carrying out all-around analysis and carding on a functional structure (such as high-performance computing power), a logic structure (such as a serial structure of multi-stage pipeline), a performance index (such as that the clock frequency of the normal voltage of 0.72v needs to reach 1Ghz, the clock frequency of the standard clock near the threshold voltage of 0.32v reaches 650 Mhz), a low-voltage process library (such as a process library time sequence unit library near the threshold voltage) and the like aiming at the data stream of the multi-stage pipeline register, and is used for determining the planning and design structure of a reverse clock tree.
After the planning design structure of the reverse clock tree is completed, the fixed physical position placement of the register set corresponding to the data flow structure of the multi-stage flow is developed and arranged, the width of the data bus, the physical design rule (for example, a group of staggered physical well (OD) units are required to be separated every 30-40um to meet the physical design rule under the advanced process of 8nm or 7 nm) and the automatic layout wiring design (the design can meet the design rule of the advanced process) are considered, and the register set corresponding to the data flow structure is laid out according to the sequence rule of the data flow.
And finally, developing a design script of a first-stage reverse clock tree structure according to the physical positions of the data flow structure and the corresponding register group, and designing and realizing the first-stage reverse clock tree according to the rules of physical and automatic layout and routing (APR) and winding.
Take the practical case of a high performance chip that implements an xxx-8nm process as an example:
the functional module of the chip is that an AIF (instruction encoding module) sends out instructions, then the instructions are operated and transferred through COMP0/1/2/3 and WT_GEN, and further the instructions are transferred to an HPART0/1/2/3 operation module for comparison operation, finally the operated numerical values are sent to the AIF for comparison, and 256-level data flows are designed from AIF to COMP/WT_GEN to HPART to AIF.
In a specific chip implementation process, the number of arithmetic logic units between data streams corresponding to AIF- & gtCOMP/WT_GEN- & gtHPART- & gtAIF and each level of data is put. The design of the first-stage reverse clock tree structure is finally realized by developing a design script of the first-stage reverse clock tree structure and according to the rules of physical and automatic layout and wiring and winding.
Step 102, performing layout design, RC parameter extraction, and static time sequence analysis under various PVT conditions on the core functional module, and determining the working performance index and corresponding clock frequency of the chip under different PVT conditions, and the process library requirement of the minimum duty ratio of the clock for the register.
It should be noted that, performing layout design, RC parameter extraction, and static Timing analysis (STA, static Timing Analysis) under various PVT (Process, voltage, temperature) conditions on the core functional module is a conventional design flow in the chip Timing convergence signing Process, and an existing Timing signing method is adopted, which is not described in detail in the embodiments of the present invention.
In the embodiment of the invention, through carrying out layout and wiring design, RC parameter extraction and static time sequence analysis under various PVT conditions on the core functional module, the working performance index and corresponding clock frequency of the chip under different PVT conditions and the process library requirement of the minimum duty ratio of the clock aiming at the register are determined.
It should be noted that, the different PVT conditions may be, for example: the Process condition Process is from Slow to Typical to Fast, namely from Slow to standard to Fast; the working Voltage is 0.9xV_typecal- & gt V_typecal- & gt 1.1xV_typecal, namely different process conditions correspond to different working Voltage conditions; operating Temperature-40 degrees, 0 degrees and 125 degrees in combination. The performance of the chip that can be achieved under different PVT conditions is not exactly the same.
The clock duty cycle requirement of a register (also called a data register) is a requirement of a wafer Foundry (Foundry) for timing acceptance of the register, and if the clock duty cycle requirement cannot be met, a chip functional error may occur.
Step 103, calculating the maximum number N of the segmented reverse clock tree by adopting a preset reverse clock tree segmentation algorithm based on the working performance indexes and the corresponding clock frequencies of the chips under different PVT conditions and the process library requirements of the minimum duty ratio of the clock of the register.
In the embodiment of the invention, the maximum number of stages N of the reverse clock tree after each segment is calculated by using a preset reverse clock tree segmentation algorithm, namely the maximum number of stages N of the reverse clock tree after each segment.
The preset reverse clock tree segmentation algorithm can be as follows:
N<= (T_clk_period + T_launch_delay) / (T_clkinv_pair_delay * 2);
where t_clk_period is the clock period, t_delay_delay is the transmit clock delay, and t_clkinv_pair_delay is the delay of a single clocked inverter logic unit.
In the design of the embodiment of the invention, the whole clock delay deviation of the reverse clock tree is considered to be not more than one clock period, namely, the following conditions are satisfied: t_clk_skew is less than or equal to T_clk_period (equation 1)
T_clk_skew is the clock delay skew;
t_clk_skew=t_capture_clk_delay-t_delay_delay (formula 2)
T_capture_clk_delay is the sampling clock delay, and T_delay_delay is the transmit clock delay;
t_capture_clk_delay=n_2×t_clkinv_pair_delay (formula 3)
The reverse clock tree segmentation algorithm in the embodiment of the invention can be obtained according to formulas 1,2 and 3. The design of the reverse clock tree segmentation algorithm mainly aims at that the overall clock delay deviation does not exceed one clock cycle, so that the design can better control the clock duty ratio constraint requirement of a register (DFF) and the chip performance to meet the design requirement.
104, carrying out segment design on the first-stage reverse clock tree structure according to the maximum number N to obtain a plurality of segment-designed reverse clock tree structures; wherein the overall clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock cycle.
After the maximum number N is calculated, carrying out sectional design on the first-stage reverse clock tree structure realized in the step 101 to obtain a plurality of reverse clock tree structures after sectional design; wherein the overall clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock cycle.
For example, the first-stage reverse clock tree structure implemented in step 101 is a 200-stage clock tree structure, and the maximum number of stages N is 50 calculated by adopting a reverse clock tree segmentation algorithm, so that the embodiment of the invention segments the 200-stage first-stage reverse clock tree structure into 4 50-stage reverse clock tree structures. In specific implementation, the embodiment of the invention designs the clock tree of the 151 th stage (3N+1) register into a reverse clock tree structure, and the clock tree of the 150 th stage (3N) register adopts a forward clock tree structure or a single reverse clock tree structure. The reverse clock tree of the 149 th stage (3N-1) register is restarted from the clock tree of the 150 th stage register. Similarly, the inverse clock tree for each 50-level register resumes computation. In terms of actual high-performance ultra-low-voltage chip design data, the chip performance can be integrally improved by 3% -5%, and violations of clock duty ratios of clock ports of registers can be reduced by more than 90%.
Step 105, designing a second level clock tree structure.
The second-level clock tree structure in the embodiment of the invention can be a reverse clock tree structure or a balanced tree structure. Of course, the second-level clock tree structure may also be in other structures, which are not limited by the present invention.
And 106, carrying out layout and wiring design, RC parameter extraction and static time sequence analysis on the structure of the two-stage clock tree, so that the structure of the two-stage clock tree meets PPA indexes.
PPA: power, performance, area, performance, power consumption, area. Meeting PPA index refers to meeting at least the basic PPA index corresponding to the performance of the chip module. In practical application, preferably, the embodiment of the invention finally strives to obtain an optimal PPA index, under which the performance of the chip module can be integrally improved by 3% -5%, and the violations of the clock duty cycle of the clock port of the register can be reduced by 80% -90%.
In conjunction with the illustration of FIG. 3, an embodiment of the present invention employs a 2-level inverted clock tree architecture. The first-stage reverse clock tree structure is realized by adopting the steps 101 to 104, then proper clock unit logic is selected according to the design rule of a process library, and a clock input interface from a clock source to an N+1-stage register group is directly built according to the physical distance and the driving capability of the clock unit logic; based on the algorithm, the clock design of the 2-level reverse clock tree corresponding to the register group sequence of the whole multi-level flow data stream is completed. And finally, optimizing time sequence, power consumption and area through the layout and wiring design of the module and RC parameter extraction, so as to achieve better performance of the module.
In conjunction with the embodiment of the invention as shown in fig. 4, a hybrid architecture design of a reverse clock tree and a balanced clock tree is employed. The reverse clock tree structure of the first stage is implemented by adopting steps 101 to 104, and then the clock tree of the next stage register (m×n+1, m=1, 2,3 …) corresponding to each end data stream and the registers of other non-multi-stage pipeline data streams of the chip module is designed by adopting a balanced tree, so that the better performance index of the module can be achieved in the PPA (performance, power consumption and area) index of the chip is met.
The applicant further illustrates, by way of example, a method for implementing an adjustable segmented reverse clock tree provided by an embodiment of the present invention. The chip design for the xxx n+1nm process is exemplified:
(1) Firstly, analyzing a chip functional module, and clearing the data flow and logic relation of the chip functional module from AIF to COMP/WT_GEN to HPART to AIF;
(2) Sequentially arranging corresponding register groups according to the front-back logic relationship of the data stream, and then carrying out placement and time sequence optimization of other logic units;
(3) Clock tree synthesis and time sequence optimization are carried out in a back-end layout wiring tool;
(4) Global winding and power consumption time sequence optimization are carried out in a back-end layout wiring tool;
(5) Extracting RC parameters by using a resistance capacitance RC parameter extracting tool;
(6) Performing static time sequence analysis under each burner (time sequence analysis condition) in a static time sequence analysis tool to analyze and confirm how much the actual frequency of the chip working under different burners can be achieved; and at the same time, a minimum duty cycle (Min Pulse Width) analysis can be performed to confirm whether there is a Min Pulse Width violation report in the design. The clock duty cycle requirements of the register are compared with the requirements in the cell library by calculating the delay of the rising edge of the clock and the delay difference of the falling edge of the clock in practical design mainly according to the time sequence cell library provided by the wafer factory. If oversized, repair in the design is required.
Fig. 5 shows the Setup timing margin (Setup slot) and corresponding chip performance frequency under different delay_corn in the static timing analysis.
(7) According to a preset reverse clock segmentation algorithm, calculating to obtain a maximum level N (for example, N is less than or equal to 50), and according to the maximum level N, segmenting the first-level reverse clock tree structure to obtain a plurality of segmented reverse clock tree structures, wherein the clock delay deviation of the whole reverse clock tree structure after segmented design is not more than one clock period;
(8) Re-planning and realizing the clock tree structure based on the reverse clock tree structure after each segment design;
(9) Floorplan (Placement and optimization), plaelement (clock tree synthesis and optimization), clock tree synthesis and optimization (clock tree synthesis and optimization), routing and optimization (wire winding and timing power consumption optimization), RC Extraction, STA (static timing analysis and violation repair) are performed in the back-end Placement and routing tool.
According to the final time sequence optimization and the result of the violation repair, the overall chip performance can be improved by 5% compared with the clock tree structure before segmentation, and the minimum duty ratio violations (min pulse width violation) of the register can be reduced from 100+ to 5-10 and can be further repaired.
Based on the implementation method of the adjustable segmented reverse clock tree provided by the foregoing text embodiment of the present invention, the embodiment of the present invention further provides an implementation device of the adjustable segmented reverse clock tree, as shown in fig. 6, including:
the first design module 100 is configured to analyze a data flow structure of the core function module, and implement design of a first-stage reverse clock tree structure according to the data flow structure;
the first analysis determining module 200 is configured to perform layout and wiring design, RC parameter extraction, and static timing analysis under multiple PVT conditions on the core functional module, determine the working performance index and corresponding clock frequency of the chip under different PVT conditions, and process library requirements for the minimum duty cycle of the clock of the register;
the calculating module 300 is configured to calculate a maximum number N of the segmented reverse clock tree by adopting a preset reverse clock tree segmentation algorithm based on the working performance indexes and the corresponding clock frequencies of the chips under different PVT conditions and the process library requirements of the minimum duty ratio of the clock of the register;
the segmentation module 400 is configured to perform a segmentation design on the first-stage reverse clock tree structure according to the maximum number N, so as to obtain a plurality of reverse clock tree structures after segmentation design; wherein the overall clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock cycle;
a second design module 500 for designing a second level clock tree structure;
the second analysis determining module 600 is configured to perform layout design, RC parameter extraction, and static timing analysis on the structure of the two-stage clock tree, so that the structure of the two-stage clock tree meets PPA indexes.
Optionally, the preset reverse clock tree segmentation algorithm includes:
N<= (T_clk_period + T_launch_delay) / (T_clkinv_pair_delay * 2);
where N is the number of clock tree stages, t_clk_period is the clock period, t_delay_delay is the transmit clock delay, and t_clkinv_pair_delay is the delay of a single clocked inverter logic unit.
Optionally, the preset reverse clock tree segmentation algorithm is obtained by the following method:
T_clk_skew ≤T_clk_period ;
T_clk_skew = T_capture_clk_delay – T_launch_delay;
T_capture_clk_delay = N * 2 * T_clkinv_pair_delay;
wherein t_clk_skew is the clock delay skew; t_capture_clk_delay is the sampling clock delay, and t_delay_delay is the transmit clock delay.
Optionally, the first design module 100 is specifically configured to:
the core functional module comprises a multi-stage pipeline register, and performs all-dimensional analysis and carding of a functional structure, a logic structure, performance indexes and a low-voltage process library aiming at the data flow of the multi-stage pipeline register to determine the planning and design structure of a reverse clock tree;
according to the planning and design structure of the reverse clock tree, arranging a register group corresponding to the data flow structure of the multi-stage running water according to the sequence rule of the data flow;
according to the physical positions of the data flow structure and the corresponding register group, developing a design script of the first-stage reverse clock tree structure, and according to the rules of physical and automatic layout and wiring, and winding, designing and realizing the first-stage reverse clock tree structure.
Optionally, the second level clock tree structure includes: a reverse clock tree structure or a balanced tree structure.
It should be noted that, the implementation device of the adjustable segmented reverse clock tree is a device corresponding to the implementation method of the adjustable segmented reverse clock tree in the foregoing embodiment, and all implementation means in the foregoing method embodiments are applicable to the embodiment of the implementation device of the adjustable segmented reverse clock tree, so that the same technical effects can be achieved.
As shown in fig. 7, an embodiment of the present invention further provides an electronic device, including:
a processor 1000; and a memory 1020 connected to the processor 1000 through a bus interface, the memory 1020 storing programs and data used by the processor 1000 in performing operations, the processor 1000 calling and executing the programs and data stored in the memory 1020.
Wherein the transceiver 1010 is coupled to the bus interface for receiving and transmitting data under the control of the processor 1000; the processor 1000 is configured to read the program in the memory 1020 to implement the following steps:
analyzing a data flow structure of a core functional module, and realizing the design of a first-stage reverse clock tree structure according to the data flow structure;
performing layout and wiring design, RC parameter extraction and static time sequence analysis under various PVT conditions on the core functional module, and determining the working performance index and corresponding clock frequency of a chip under different PVT conditions and the process library requirement of the minimum duty ratio of a clock aiming at a register;
based on the working performance indexes and the corresponding clock frequencies of the chips under different PVT conditions and the process library requirements of the minimum duty ratio of the clock of the register, calculating the maximum number N of the segmented reverse clock tree by adopting a preset reverse clock tree segmentation algorithm;
according to the maximum number N, carrying out sectional design on the first-stage reverse clock tree structure to obtain a plurality of reverse clock tree structures after sectional design; wherein the overall clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock cycle;
designing a second-stage clock tree structure;
and carrying out layout and wiring design, RC parameter extraction and static time sequence analysis on the structure of the two-stage clock tree, so that the structure of the two-stage clock tree meets PPA indexes.
Wherein in fig. 7, a bus architecture may comprise any number of interconnected buses and bridges, and in particular one or more processors represented by the processor 1000 and various circuits of the memory, represented by the memory 1020, are chained together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 1010 may be a number of elements, including a transmitter and a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The user interface 1030 may also be an interface capable of interfacing with an internal connection requiring device including, but not limited to, a keypad, display, speaker, microphone, joystick, etc., for different terminals. The processor 1000 is responsible for managing the bus architecture and general processing, and the memory 1020 may store data used by the processor 1000 in performing operations.
Those skilled in the art will appreciate that all or part of the steps of implementing the above-described embodiments may be implemented by hardware, or may be implemented by instructing the relevant hardware by a computer program comprising instructions for performing some or all of the steps of the above-described methods; and the computer program may be stored in a readable storage medium, which may be any form of storage medium.
In addition, the embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, where the program when executed by a processor implements the steps of the method in the foregoing embodiment, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.
Claims (6)
1. An implementation method of an adjustable segmented reverse clock tree is characterized by comprising the following steps:
analyzing a data flow structure of a core functional module, and realizing the design of a first-stage reverse clock tree structure according to the data flow structure;
performing layout and wiring design, RC parameter extraction and static time sequence analysis under various PVT conditions on the core functional module, and determining the working performance index and corresponding clock frequency of a chip under different PVT conditions and the process library requirement of the minimum duty ratio of a clock aiming at a register;
based on the working performance indexes and the corresponding clock frequencies of the chips under different PVT conditions and the process library requirements of the minimum duty ratio of the clock of the register, calculating the maximum number N of the segmented reverse clock tree by adopting a preset reverse clock tree segmentation algorithm;
according to the maximum number N, carrying out sectional design on the first-stage reverse clock tree structure to obtain a plurality of reverse clock tree structures after sectional design; wherein the overall clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock cycle;
designing a second-stage clock tree structure;
carrying out layout and wiring design, RC parameter extraction and static time sequence analysis on the structure of the two-stage clock tree, so that the structure of the two-stage clock tree meets PPA indexes;
wherein the preset reverse clock tree segmentation algorithm comprises the following steps:
N <= (T_clk_period + T_launch_delay) / (T_clkinv_pair_delay 2);
wherein N is the maximum level of the reverse clock tree, T_clk_period is the clock period, T_delay_delay is the transmission clock delay, and T_clkinv_pair_delay is the delay of a single clock inverter logic unit;
the preset reverse clock tree segmentation algorithm is obtained by the following method:
T_clk_skew ≤T_clk_period ;
T_clk_skew = T_capture_clk_delay – T_launch_delay;
T_capture_clk_delay = N 2 T_clkinv_pair_delay;
wherein t_clk_skew is the clock delay skew; t_capture_clk_delay is the sampling clock delay, and t_delay_delay is the transmit clock delay.
2. The method for implementing the adjustable segmented reverse clock tree according to claim 1, wherein the analyzing the data flow structure of the core function module and implementing the design of the first-stage reverse clock tree structure according to the data flow structure comprises:
the core functional module comprises a multi-stage pipeline register, and performs all-dimensional analysis and carding of a functional structure, a logic structure, performance indexes and a low-voltage process library aiming at the data flow of the multi-stage pipeline register to determine the planning and design structure of a reverse clock tree;
according to the planning and design structure of the reverse clock tree, arranging a register group corresponding to the data flow structure of the multi-stage running water according to the sequence rule of the data flow;
according to the physical positions of the data flow structure and the corresponding register group, developing a design script of the first-stage reverse clock tree structure, and according to the rules of physical and automatic layout and wiring, and winding, designing and realizing the first-stage reverse clock tree structure.
3. The method of claim 1, wherein the second stage clock tree structure comprises: a reverse clock tree structure or a balanced tree structure.
4. An implementation apparatus for an adjustable segmented reverse clock tree, comprising:
the first design module is used for analyzing the data flow structure of the core function module and realizing the design of a first-stage reverse clock tree structure according to the data flow structure;
the first analysis and determination module is used for carrying out layout and wiring design, RC parameter extraction and static time sequence analysis under various PVT conditions on the core functional module, and determining the working performance index and corresponding clock frequency of the chip under different PVT conditions and the process library requirement of the minimum duty ratio of the clock aiming at the register;
the calculation module is used for calculating the maximum number N of the segmented reverse clock tree by adopting a preset reverse clock tree segmentation algorithm based on the working performance indexes of the chips under different PVT conditions, the corresponding clock frequencies and the process library requirements of the minimum duty ratio of the clock of the register;
the segmentation module is used for carrying out segmentation design on the first-stage reverse clock tree structure according to the maximum number N to obtain a plurality of reverse clock tree structures after segmentation design; wherein the overall clock delay deviation of the reverse clock tree structure after each segment design does not exceed one clock cycle;
the second design module is used for designing a second-stage clock tree structure;
the second analysis determining module is used for carrying out layout and wiring design, RC parameter extraction and static time sequence analysis on the structure of the two-stage clock tree so that the structure of the two-stage clock tree meets PPA indexes;
wherein the preset reverse clock tree segmentation algorithm comprises the following steps:
N <= (T_clk_period + T_launch_delay) / (T_clkinv_pair_delay 2);
wherein N is the maximum level of the reverse clock tree, T_clk_period is the clock period, T_delay_delay is the transmission clock delay, and T_clkinv_pair_delay is the delay of a single clock inverter logic unit;
the preset reverse clock tree segmentation algorithm is obtained by the following method:
T_clk_skew ≤T_clk_period ;
T_clk_skew = T_capture_clk_delay – T_launch_delay;
T_capture_clk_delay = N 2 T_clkinv_pair_delay;
wherein t_clk_skew is the clock delay skew; t_capture_clk_delay is the sampling clock delay, and t_delay_delay is the transmit clock delay.
5. An electronic device, comprising: a transceiver, a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of implementing an adjustable segmented reverse clock tree according to any one of claims 1 to 3 when the computer program is executed.
6. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the method of implementing an adjustable segmented reverse clock tree according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311131872.7A CN116861842B (en) | 2023-09-04 | 2023-09-04 | Implementation method and related device for adjustable segmented reverse clock tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311131872.7A CN116861842B (en) | 2023-09-04 | 2023-09-04 | Implementation method and related device for adjustable segmented reverse clock tree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116861842A CN116861842A (en) | 2023-10-10 |
CN116861842B true CN116861842B (en) | 2023-12-19 |
Family
ID=88223822
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311131872.7A Active CN116861842B (en) | 2023-09-04 | 2023-09-04 | Implementation method and related device for adjustable segmented reverse clock tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116861842B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11194848A (en) * | 1998-01-06 | 1999-07-21 | Matsushita Electric Ind Co Ltd | Clock delay adjusting device |
US6367060B1 (en) * | 1999-06-18 | 2002-04-02 | C. K. Cheng | Method and apparatus for clock tree solution synthesis based on design constraints |
CN101504680A (en) * | 2009-03-20 | 2009-08-12 | 东南大学 | Clock offset locality optimizing analysis method |
US8719743B1 (en) * | 2011-04-29 | 2014-05-06 | Cadence Design Systems, Inc. | Method and system for implementing clock tree prototyping |
CN105748103A (en) * | 2016-04-22 | 2016-07-13 | 深圳先进技术研究院 | Delayed excitation ultrasonic imaging method and device and delayed excitation system |
CN111651402A (en) * | 2020-07-16 | 2020-09-11 | 深圳比特微电子科技有限公司 | Clock tree, hash engine, computing chip, force plate and digital currency mining machine |
US11321514B1 (en) * | 2020-12-31 | 2022-05-03 | Cadence Design Systems, Inc. | Macro clock latency computation in multiple iteration clock tree synthesis |
CN116167329A (en) * | 2022-11-28 | 2023-05-26 | 龙芯中科技术股份有限公司 | Method and device for constructing clock tree in chip and electronic equipment |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8384436B2 (en) * | 2011-01-10 | 2013-02-26 | Taiwan Semiconductor Manufacturing Company, Ltd. | Clock-tree transformation in high-speed ASIC implementation |
US9607122B2 (en) * | 2014-01-30 | 2017-03-28 | Mentor Graphics Corporation | Timing driven clock tree synthesis |
-
2023
- 2023-09-04 CN CN202311131872.7A patent/CN116861842B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11194848A (en) * | 1998-01-06 | 1999-07-21 | Matsushita Electric Ind Co Ltd | Clock delay adjusting device |
US6367060B1 (en) * | 1999-06-18 | 2002-04-02 | C. K. Cheng | Method and apparatus for clock tree solution synthesis based on design constraints |
CN101504680A (en) * | 2009-03-20 | 2009-08-12 | 东南大学 | Clock offset locality optimizing analysis method |
US8719743B1 (en) * | 2011-04-29 | 2014-05-06 | Cadence Design Systems, Inc. | Method and system for implementing clock tree prototyping |
CN105748103A (en) * | 2016-04-22 | 2016-07-13 | 深圳先进技术研究院 | Delayed excitation ultrasonic imaging method and device and delayed excitation system |
CN111651402A (en) * | 2020-07-16 | 2020-09-11 | 深圳比特微电子科技有限公司 | Clock tree, hash engine, computing chip, force plate and digital currency mining machine |
US11321514B1 (en) * | 2020-12-31 | 2022-05-03 | Cadence Design Systems, Inc. | Macro clock latency computation in multiple iteration clock tree synthesis |
CN116167329A (en) * | 2022-11-28 | 2023-05-26 | 龙芯中科技术股份有限公司 | Method and device for constructing clock tree in chip and electronic equipment |
Non-Patent Citations (3)
Title |
---|
Low Power Clock Tree Optimization by Clock Buffer/Inverter Reduction;Zhe Ge et al;2019 IEEE International Conference on Integrated Circuits, Technologies and Applications;第69-70页 * |
基于28NM工艺ASIC芯片的时钟树综合优化研究;汤勇;中国优秀硕士学位论文全文数据库信息科技辑;第 I135-228页 * |
基于40nm工艺MCU芯片的时钟树及时序优化分析与研究;臧涛;中国优秀硕士学位论文全文数据库信息科技辑;第I135-105页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116861842A (en) | 2023-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115017846B (en) | Interface-based time sequence repairing method, device and medium | |
Wong et al. | Inserting active delay elements to achieve wave pipelining | |
KR20110055549A (en) | System and method for designing integrated circuits that employ adaptive voltage scaling optimization | |
US9430600B2 (en) | Cell library and method for designing an asynchronous integrated circuit | |
CN114861591B (en) | Chip layout optimization method capable of realizing differential time sequence driving | |
CN109710981A (en) | The wiring method and system of FPGA | |
US9471735B2 (en) | Boundary based power guidance for physical synthesis | |
US20060129957A1 (en) | Method and computer program product for register transfer level power estimation in chip design | |
CN104036090B (en) | Circuit optimization method and device | |
EP1701279A1 (en) | Manufacturing a clock distribution network in an integrated circuit | |
Gibiluka et al. | A bundled-data asynchronous circuit synthesis flow using a commercial EDA framework | |
CN112580279B (en) | Optimization method and optimization device for logic circuit and storage medium | |
US7363606B1 (en) | Flip-flop insertion method for global interconnect pipelining | |
CN116861842B (en) | Implementation method and related device for adjustable segmented reverse clock tree | |
Wijayasekara et al. | Equivalence verification for NULL Convention Logic (NCL) circuits | |
US6401231B1 (en) | Method and apparatus for performing both negative and positive slack time budgeting and for determining a definite required constraint during integrated circuit design | |
CN116776816A (en) | Low-power consumption clock tree comprehensive implementation method based on deferred merging strategy | |
CN103150461B (en) | Parallel integration method and system thereof for IC design | |
CN112580278B (en) | Optimization method and optimization device for logic circuit and storage medium | |
JP2004280439A (en) | Crosstalk noise detecting method, method for designing semiconductor integrated circuit and design verifying method | |
Bommu et al. | Retiming-based factorization for sequential logic optimization | |
JP5338919B2 (en) | Integrated circuit power consumption calculation method, power consumption calculation program, and power consumption calculation device | |
Tang et al. | Relative Timing Latch Controller with Significant Improvement on Power, Performance, and Robustness | |
CN116401977B (en) | Integrated method, integrated device, computing apparatus and storage medium for integrated circuit | |
Backes et al. | Using cubes of non-state variables with property directed reachability |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |