TW200428240A - A clock tree synthesizing tool synchronously considering low clock skew and low power consumption - Google Patents

A clock tree synthesizing tool synchronously considering low clock skew and low power consumption Download PDF

Info

Publication number
TW200428240A
TW200428240A TW93124824A TW93124824A TW200428240A TW 200428240 A TW200428240 A TW 200428240A TW 93124824 A TW93124824 A TW 93124824A TW 93124824 A TW93124824 A TW 93124824A TW 200428240 A TW200428240 A TW 200428240A
Authority
TW
Taiwan
Prior art keywords
buffer
clock
clock tree
power consumption
tree
Prior art date
Application number
TW93124824A
Other languages
Chinese (zh)
Other versions
TWI244015B (en
Inventor
Wu-Shiung Feng
Ming-Hong Lai
jia-qi Zhu
Jau-Kai Chang
Original Assignee
Univ Chang Gung
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Chang Gung filed Critical Univ Chang Gung
Priority to TW93124824A priority Critical patent/TWI244015B/en
Publication of TW200428240A publication Critical patent/TW200428240A/en
Application granted granted Critical
Publication of TWI244015B publication Critical patent/TWI244015B/en

Links

Landscapes

  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

The invention brings up a method, which is applied with the skill of insertion, removal, swap and shift buffers to quickly determine the type and location of every buffer type in the clock tree , and pertinent to the combinational design process of designed clock tree to quickly determine the type and location of skew buffer, making the entire design meet the database design specification and the clock skew meet condition limitation respectively. The computation tool of the invention that quickly determines the buffer type and position in the clock tree comprises the inputted clock tree structure to be processed containing the information of each circuit and buffer insertion location, the database for inputting specific component containing the characteristics of input load, power consumption, clock delay, conversion time for output signal, etc., and initial state configuration. Determine if the lock tree structure complies with the feasible solution of design specification; if no, apply the load balancing and buffer balancing methods to get a feasible solution. Quickly determine the buffer type to reduce the power consumption of the clock tree. Apply the optimal technique integrating simulated annealing method to acquire the global optimal solution of the integrated buffer with minimum power consumption.

Description

200428240 九、發明說明: 【發明所屬之技術領域】 本發明係«高速超大型積體電路巾低功树鐘樹電路合成之電腦輔 助.又《十’尤其與在給定的緩衝||時序及功率資料庫巾,適當插人、移除、 置換、移_鐘樹上各緩補崎辦鐘樹絲神及_錢率之技術 有關。 【先前技術】 、高速數靖大缝_路輯流財,h轉鮮(aGekF样ency) 作為資料處理速度的指標。而各訊號在傳送的過程中,必須盡可能維持訊 號之完整性(signal Integrity),亦即必須滿足以下時序設計規範:盡可能減 :時鐘延遲(clockDelay)與時鐘歪曲率(clockskew),且針對特定之緩衝器 資料庫,f献其設計規範。時觀遲隸鱗鐘峨發送端傳送到各同 步系統接收端所花費的最長時間。縮短時鐘延遲有助於加速時鐘訊號的傳 遞。換言之,可提升電路鳴作辭。時鐘涵相树鐘訊號發送端到 時鐘樹中任兩接收端之間的路彳蝴差,其值若過大,各_統接收端 概到的時鐘訊號鮮同步,可能造成訊號失真舆邏輯誤動作。而對於特 ^的緩衝器,其輸入訊號轉換時fa1(T職ition Time)及輸出負載,具有可容 忍之上、下限,當輸入訊號轉換時間或輸出負载大於或小於該上、下限時 則將造成緩衝器之不可預期操作情形,故時鐘樹上之緩衝器及正反器 (Flip-Flop),其操作狀態均需滿足設計規範。 者超大型積體電路深次微讀程技術的突魅進,在考慮時序方 訊號在互連線路(Interc_e啦的峨遠大於在各元件上觀 5 200428240 時鐘樹合成設計而言,為減小線路延遲,在習知技術中,第一種設計方弋 為盡可能縮短時鐘樹線路的長度:線路長度愈長,則在線路上的時間延遲 愈大,同時訊號在線路中轉換所消耗之功率亦同時增加。一般而言,在史 大型積體電路佈局設計中,縮短線路長度經常最優先被考慮來減小時間延· 遲。 w 第二種設計方式乃於線路上適當之位置插入緩衝器,不僅可以分段線 路而減小祕輯’纽_衝!!騎_容,可減少各躲路的等效負 載量’使訊號快速充電減少上升時間,達到縮小線路延遲的目的,同時對鲁 於電路雜功率考量方面,輸入訊號轉換時間的減少,將同時可降低元件 之耗能。圖-為時鐘樹網路展開圖,時鐘訊號的輸人接墊戦為樹根 (Root),正反器16的時鐘訊號接收端稱為樹葉㈣。由接墊峨根)開始至 正反器:16的時鐘訊號接收卿.聊葉),包含沿途的緩衝器與連接線路,構 成-條完整路歸ath),所累積的緩衝器與線路延遲即為路徑延遲(触 y)圖中兩條为別到樹根B1.^F13的路徑,其路徑延遲差便是—組時 鐘正曲率在¥里樹網賴上的適當位置插入麟器丄8,除可減小時間延鲁 遲外,透過選擇不同的緩衝__改變各接收端時間延遲大小的方式, 尚可減小時鐘射各路徑的時鐘歪曲率。 圖一為白知技術(美國專利5,564,〇22)之1C設計佈局流程圖。傳統的 佈局設計騎僅進域置54魏線62。彳_輯合成㈣^ §㈣她)工具 產生-組描_輯閑及其連線的電網排線表列(齡的,接著放置分將電網 、、、 、♦各種頌邏輯閘的標準單元(Standard Cell)放在晶片中所選定 6 200428240 的位置,通常位置之間的距離與電網排線表列中各元件的連結關係有關, 其目的是為了減少稍後繞線的長度。繞線62則依放置的結果作實際連線。 現今的设計則將時鐘樹合成步驟56、58與60加入佈局設計中。步驟%200428240 IX. Description of the invention: [Technical field to which the invention belongs] The present invention is «Computer-aided for the synthesis of high-speed super large-scale integrated circuit towels and low-power trees and clock-tree circuits. Also," Ten 'is especially relevant to a given buffer || timing and The power database towel is appropriately inserted, removed, replaced, and shifted. The clocks on the clock tree are all related to the technology of the clock tree silk god and the money rate. [Previous technology], high-speed Jingda seam _ Lu Ji Liucai, h a fresh (aGekF-like rate) as an indicator of data processing speed. In the process of transmitting each signal, it is necessary to maintain the signal integrity as much as possible, that is, it must meet the following timing design specifications: reduce as much as possible: clock delay (clockDelay) and clock skew (clockskew), and Specific buffer database, to provide its design specifications. The maximum time it takes for the sender to send the time to the receiver of each synchronization system. Shortening the clock delay can help speed up the transmission of the clock signal. In other words, it is possible to improve the circuit tweet. If the value of the path difference between the clock signal phase tree clock sender and any two receivers in the clock tree is too large, the clock signals received by the receivers will not be synchronized, which may cause signal distortion and logic malfunction. For the special buffer, the input signal fa1 (T position time) and output load have tolerable upper and lower limits. When the input signal conversion time or output load is greater or less than the upper and lower limits, the As a result of the unexpected operation of the buffer, the operation status of the buffer and flip-flop on the clock tree must meet the design specifications. In the case of the ultra-micro integrated circuit deep sub-reading technology, the timing signal is considered in the interconnection line (Interc_e's E far greater than that on each component). 5 200428240 In order to reduce the design of the clock tree synthesis, Line delay. In the conventional technology, the first design method is to shorten the length of the clock tree line as much as possible: the longer the line length, the greater the time delay on the line, and the power consumed by the signal conversion in the line is also At the same time, in general, in the history of large-scale integrated circuit layout design, shortening the length of the line is often the first priority to reduce time delay. W The second design method is to insert a buffer at an appropriate position on the line. Not only can the line be segmented and reduce the secret 'New_Chong !! Ride_Capacity, which can reduce the equivalent load of each hiding path', so that the fast charging of the signal can reduce the rise time, and reduce the delay of the line. In terms of circuit miscellaneous power considerations, the reduction of the input signal conversion time will also reduce the power consumption of the components. Figure-is an expanded view of the clock tree network, and the input pad of the clock signal is the tree root ( Root), the clock signal receiving end of the flip-flop 16 is called the leaf cricket. From the pad Egen) to the flip-flop: the clock signal reception of the 16 (Liao Ye), including buffers and connection lines along the way, constitute -A complete path to ath), the accumulated buffer and line delay are the path delay (touch y) in the figure, the two paths are to the root B1. ^ F13, and the path delay difference is-group clock is positive The curvature is inserted in the appropriate position on the 里 tree network. In addition to reducing the time delay, the time delay can be reduced by selecting different buffers. Clock skew for each path. Figure 1 is the 1C design layout flowchart of Baizhi Technology (US Patent 5,564,022). The traditional layout design rides into the field and places 54 Wei Line 62.彳 _ 集合 ㈣ ^ §㈣her) The tool generates-group description_ compilation leisure and its connected grid wiring list (aged, and then puts standard units of the power grid ,,,, and various types of logic gates ( Standard Cell) is placed on the chip at the selected position of 20042004240. Usually, the distance between the positions is related to the connection relationship of the components in the grid wiring list. The purpose is to reduce the length of the later winding. The 62 windings The actual connection is made according to the result of the placement. In today's designs, clock tree synthesis steps 56, 58 and 60 are added to the layout design. Step%

進行最佳化時鐘樹網路的合成與繞線,因此修改部分邏輯合成電網排線表 列的内容。此外步驟56尚包含一分析硬體描述處理器(Hardware Descriptian Processor),根據給定的時序設計規範,分析邏輯合成電網排線表列中最佳 化緩衝II插人方式,其緩衝ϋ加人電_線表列中與其他元件的相關位置 可作為下-麵的緩衝n插人之實體料(Physieal Desi㈣的參考依據。Optimize the synthesis and winding of the clock tree network, so modify the content of the logic synthesis grid routing table. In addition, step 56 also includes an analysis hardware description processor (Hardware Descriptian Processor), which analyzes and synthesizes the optimized buffer II insertion method in the grid routing table according to the given timing design specifications, and its buffer is added to the power supply. _ The relevant positions of other components in the line table can be used as a reference for the physical material (Physieal Desi㈣) of the bottom n buffer.

由於描述各元件連線關係的原始邏輯合成電網排線表列已於步驟私時 放置妥當’插人緩衝ϋ後將會影響已放置的其它元件,@此步·必須重 新調整各元件賊置位置。另外,因緩衝器的插人分段了縣由時鐘訊號 發送端至各接收端的連線,因此必須修改步·已完成之時鐘樹網路的實 -凡、、友 n為了維持最小時鐘延遲,調整後的緩衝器位置並不能 離其原先選定的最佳位置太遠,因此緩衝器會較其它元件擁有較優先的放 置權利。步驟60計算每-由時敎號輸入端至正反器接收端的路徑延遲, 判斷所建立的時鐘樹網路有無違反時序設計規範。若符合時序規範,則可 進棘觸⑽峨鱗如’細_娜戰調整包括緩 衝器在内各元件的放置位置鱗鐘_路實體繞線。 一種緩衝器類型,卻未考慮到實際設計中, ibrary)中具有乡種不㈤尺相輯器類型以供 然而’如圖二的時鐘樹網路合成流程,所有插人的緩_均僅使用同 緩衝器時序資料庫(Timing 選擇,因而無法使時鐘樹網 7 200428240 路的時序控制更具彈性以滿足時序設計規範,為其主要缺點。若適卷琴擇 時鐘樹上各緩衝器的類型,將調整各路徑的延遲值而能改盖時鐘T曲率使 其符合時序設計規範,以及適當降低時鐘樹中動態功率消耗。通常找準單 元式設計資料庫中不同類型的緩衝器,其面積的大小可能互異,於是驅動 下一級電路的能力也不同。一般面積較大的缓衝器能比面積較小的能驅動 較大的負載,加速負載的訊號上升時間。然而較大的緩衝器本身也具有車六 大的延遲及較大功率消耗,如何在時鐘延遲及功率消耗中採取一折衷點, 為本發明演算法考量重點之一。 於選擇緩衝器類型時,假設考慮可供選擇的緩衝器類型有n種,同時 時鐘樹網路上需要插入m個緩衝器。若採取地毯式搜尋法(Εχ如姐^Because the original logical synthesis grid wiring list describing the connection relationship of each component has been properly placed in the private step, the "inserted buffer" will affect other components that have been placed. @This step · The position of each component must be readjusted . In addition, because the insertion of the buffer segmented the county's connection from the clock signal sender to each receiver, the steps of the completed clock tree network must be modified to maintain the minimum clock delay. The adjusted buffer position cannot be too far from the originally selected optimal position, so the buffer will have priority over other components. Step 60: Calculate the path delay from the time input terminal to the flip-flop receiver to determine whether the established clock tree network violates the timing design specification. If it meets the timing specifications, you can adjust the placement of each element including the buffer, such as ‘fine_nazhang’, and adjust the physical winding of the circuit. One type of buffer, but does not take into account the actual design, ibrary) has a country-size non-standard photo editor type for 'However, as shown in the clock tree network synthesis process in Figure 2, all the inserted buffers are used only Same buffer timing database (Timing choice, so it can not make the clock tree network 7 200428240 timing control more flexible to meet the timing design specifications, which is its main disadvantage. If you choose the type of each buffer on the clock tree, The delay value of each path will be adjusted to cover the clock T curvature to meet the timing design specifications, and to reduce the dynamic power consumption in the clock tree appropriately. Usually, different types of buffers in the unit design database are found, and their area sizes It may be different, so the ability to drive the next stage circuit is also different. Generally, a buffer with a larger area can drive a larger load than a smaller area, which accelerates the signal rise time of the load. However, the larger buffer itself also With six major delays and large power consumption, how to take a compromise between clock delay and power consumption is one of the key considerations of the algorithm of the present invention. When selecting the buffer type, it is assumed that there are n types of buffers that can be selected, and m buffers need to be inserted on the clock tree network. If a carpet search method is used (Εχ 如 姐 ^

Search),則必須判斷所有!^種緩衝器組合方式中,何者能求得最小時鐘延 遲並符合時鐘歪曲率限制。一般而言,在同時考慮緩衝器插入與緩衝器類 型選擇的問題上,由於時鐘樹網路的建立會依時序限制判斷的結果反覆進 行多次的放置與電路階層模擬(Circuit simulati〇ns),因此在圖二的ic 設計流程帽會花費大量的計算時間。因此對於鮮單元式超大型積體電 路设計工具而言,如何在時鐘樹電路合成階段,由給定的緩衝器時序設計 貧料庫中,快速找出一組適當的緩衝器類型,以減少模擬時間,符合時鐘 歪曲率時序設計規範及最低功率消耗,在時鐘樹電路合成工具中佔有相當 重要的地位。 在舀知技術中,(由 A· Vittal,M. Marek-Sadowska,“Low-Power BufferedSearch), you must judge all! Among the combinations of buffers, which one can obtain the minimum clock delay and meet the clock skew limit? Generally speaking, when considering the issues of buffer insertion and buffer type selection at the same time, the establishment of the clock tree network will repeatedly place and circuit-level simulation (Circuit simulati) based on the results of the timing constraints. Therefore, the ic design process cap in Figure 2 will take a lot of calculation time. Therefore, for a fresh cell type ultra-large integrated circuit design tool, how to quickly find a suitable set of buffer types in a poor material library from a given buffer timing design during the clock tree circuit synthesis stage to reduce The simulation time, which meets the clock skew design specifications and minimum power consumption, occupies a very important position in the clock tree circuit synthesis tool. In the known technology, (by A. Vittal, M. Marek-Sadowska, "Low-Power Buffered

Clock Tree Design,,,IEEE Tmns· on CAD of Integrated CAS,Vol· 16, No. 9, ρρ· 965-975, 1997钱$ ),應賴人麟_方狀建構H触翻翻面積與 200428240 功率消耗的最佳解;習知技術中提出應用插入緩衝器之技巧降低時鐘樹之 時鐘歪曲率(由 J· L. Neves 與 E· G· Friedman,“Design Methodology for Synthesizing Clock Distribution Networks Exploiting Nonzero Localized ClockClock Tree Design ,,, IEEE Tmns · on CAD of Integrated CAS, Vol · 16, No. 9, ρρ · 965-975, 1997 $), should be relying on Renlin_Square construction H flipping area and 200428240 power consumption The best solution; the technique of inserting buffers is proposed in the conventional technology to reduce the clock skew of the clock tree (by J. L. Neves and EG. Friedman, "Design Methodology for Synthesizing Clock Distribution Networks Exploiting Nonzero Localized Clock

Skew,” IEEE Trans· On VLSI Systems, Vol· 4, No· 2, pp. 286-291,1996 年提 出);而習知技術提出應用插入緩衝器之技巧減少缓衝器之差異並減少短路 電流以避免額外的電流消耗(由S· puiieia,ν· Menezes與L. T. Pillage,“Low Power IC Clock Tree Design^ Proceedings of the IEEE Custom IntegratedSkew, "IEEE Trans. On VLSI Systems, Vol. 4, No. 2, pp. 286-291, proposed in 1996); and the conventional technology proposes the application of the technique of inserting buffers to reduce the difference between buffers and reduce the short-circuit current. To avoid additional current consumption (by S. puiieia, v. Menezes and LT Pillage, "Low Power IC Clock Tree Design ^ Proceedings of the IEEE Custom Integrated

Circuits Conference 1995, pp· 263-266);另有習知技術(由 Κ· M· Carrig etc A New Direction in ASIC High-Performance Clock Methodology/5 Proceedings of the IEEE Custom Integrated Circuits Conference 1998, pp. 593_596)應用插人緩衝器及減少多餘之繞線以降低雜訊及減少功率消耗。 然而以上演算法中僅使用同—種緩衝雜類,與實際電路設計流程不符, 另外並未使用歸、錢、移辦鐘樹上緩_柄,無法對時鐘樹 連線中達到最佳時序及功率消耗之緩衝器組合。其他尚有美國專利 US66·63應用叢集插入緩衝器以達到相同時鐘延遲' _〇助應用階 層式緩_分__树鐘涵率、麵彻7平衡時鐘樹崎低_ 歪曲率、US6367060應用平衡各階層緩衝器以達到符合設計規範7 咖麵選擇Η樹之深度以減少功率消耗,上述專利其針對時鐘樹之合 成,應用早-緩衝器之插人技巧或分割祕以達到時鐘樹之功率消耗 低時鐘爾之目標,然而皆無法提出_種_對多種緩衝器種類^ 應用緩衝③移除、置換及移位技巧,以降低時鐘樹時鐘歪曲率 之有效演算法。 卞,自粍 本發明人晋申請中華民國專利「快速決定時鐘樹上緩衝器種類並滿足時 9 200428240 序設計規狀銳合成工具」,鱗高速超姻龍魏㈣賴電路 之電腦輔助設計’適當選擇時_上各緩漏軸以滿足最树鐘延遲及 Ϊ合嫩範,然其緩衝器之操作僅提供置換緩衝器之方 式,且無針對功率消耗影響做最佳化分析。 【發明内容】 本發明的目的在於提出i應用插人、移除、置換、移位時鐘樹上緩 衝^種捕方法’快速合成低耗能高速超大型積體電路時賴,不僅可配 合現有的賴齡贼職程,财蚊__顧德麵,除降 低時鐘樹上轉之消耗外,並餘時鐘歪鱗可簡合鱗設計規範。時 鐘樹合成設雜財,解緩_敝的健及鶴,在於傳統晶片設計 θ中才糾之只體繞線與緩衝器插人兩步驟後執行。此時輸入的資訊包 括機聯線表列、時鐘樹巾各段線路的參數與各種類緩衝器的 力率貝料庫時序貝料庫,以供計算時鐘樹的線路延遲、緩衝器延遲及緩 衝器功率消耗等數據。透過本發明所提出的快速決定緩衝器擺放位置及種 類的電路合成工具’最後輸出更新後能達到最小功率消耗,以及符合時鐘 歪曲率設計規範的時鐘樹電網排線表列。 本u所提出的方法之所以可以達到低功率時鐘樹合成的目的,在於 可以喊地_各緩衝器擺放位置及歧其_,其特點在於··⑴本發明 應=兀整之貪料結構,紀錄各緩衝器前級、後級之關係,在每一次最佳化 可在線性時間内計算各緩衝器之時鐘延遲、功率消耗之資訊; ,’、、、)减的日请歪曲率之設計規範,本發明針對時鐘樹各層的緩衝 器,採用經驗法物魅種類與嫩_,進行最佳化調整,以期得 10 200428240 到最小功率消耗及滿足嚴格時鐘歪曲率限制之設計規範;(3)應用啟發式 (Heuristic)的觀念作最佳化運算,而不採用地毯式搜尋之窮舉法,可以節省 求取最佳解的時間。由於時鐘樹中具有有限個緩衝器,但擺放位置並無限 制’本發明應用上述啟發式演算法決定緩衝器擺放位置後,應用模擬退火 演算法(Simulated Annealing,SA)可求得局部最佳解(Local 〇ptimai Solution) ’再經負載平衡之技術,可求得全域最佳解(Global Optimal Solution); (4)在設備容許的範圍下,本發明所發展的軟體工具能夠處理具有 百萬邏輯閘的高速超大型積體電路。 在一特定的實施例中,本發明之低耗能快速決定緩衝器位置及種類的 方法,包含以下步驟:⑴輸入待處理的時鐘樹電網排線表列,包含各線路 的資訊與緩衝器插入的位置,另外亦輸入缓衝器資料庫及正反器資料庫, 以提供程錢算之錄;(2)初始狀態奴;⑶觸此結齡否滿足時 鐘歪曲率的設計規範,以及電路操作時是否將違反元件資料庫之限制;⑷ 若達反上述規範,進行緩衝器之插入、移除、置換、移位步驟,以得時鐘 樹合成之可行解;(5)將可行解應用啟發式演算法搭配模擬退火演算法,2 取最小功率消耗並滿足設計規範之最佳解。 【實施方式】 如前所述,本發明的目的在於提出一種快速決定時鐘樹中所有緩、哭 位置及種_演紅具,使得時細節肖耗之辨最 *、衝裔 、’付r>8才鐘歪曲率 逐一置 時序設計規範。由於實際緩衝器功率及時序資料庫提供選擇的緩衝哭 有限,最鮮使絲麵上述最佳化問_方切為地毯錢賴類 11 200428240 換時鐘射每-緩衝器的種類並比較各組之功率消耗與時鐘歪曲率,直到 最後挑出其巾擁有最小功率消耗並符合時鐘歪曲率時序設計規範的最佳緩 衝觀型组合。'無此方法最主要的缺點在於:⑴浪f ; (2)並無 缓衝器插人、移除、移位等步驟,程式不易求得最佳解。Circuits Conference 1995, pp. 263-266); other conventional technologies (by KM Carrig etc A New Direction in ASIC High-Performance Clock Methodology / 5 Proceedings of the IEEE Custom Integrated Circuits Conference 1998, pp. 593_596) Use insert buffers and reduce unnecessary windings to reduce noise and power consumption. However, in the above algorithm, only the same kind of buffer miscellaneous is used, which is inconsistent with the actual circuit design process. In addition, the return, money, and relocation of the clock tree are not used. It is impossible to achieve the optimal timing and clock connection in the clock tree. Buffer combination for power consumption. Other U.S. patents US66 · 63 application clusters insert buffers to achieve the same clock delay. _〇 Helps the application of hierarchical slow_minute__ tree clock han rate, face-to-even balance clock tree low _ distortion rate, US6367060 application balance All levels of buffers are in compliance with the design specifications. 7 The noodle tree selects the depth of the cypress tree to reduce power consumption. The above patents apply the early-buffer insertion technique or partitioning secret to the clock tree for the power consumption of the clock tree. The goal of a low clock is, however, unable to come up with _kinds_ effective buffering algorithm for multiple buffer types ^ application of removal, replacement and shifting techniques to reduce clock tree clock skew. Alas, since the present inventor has applied for a patent of the Republic of China "to quickly determine the type of buffers on the clock tree and meet the requirements 9 200428240 sequence design rule sharp synthesis tool", the computer-aided design of the high-speed supermarine dragon Wei Weilai circuit is appropriate Each slow-leak axis on the time of selection meets the maximum tree-clock delay and coupled tenderness, but its buffer operation only provides a way to replace the buffer, and there is no optimization analysis for the impact of power consumption. [Summary of the Invention] The purpose of the present invention is to propose a method for inserting, removing, replacing, and shifting buffers on the clock tree, which can quickly synthesize low-energy and high-speed ultra-large integrated circuits. The post of Lai Ling, thief __ Gu Demian, in addition to reducing the consumption of the clock tree, and the clock scales can be simplified to scale design specifications. The clock tree synthesizes miscellaneous wealth, and relieves the ill-fated health of the crane. It is implemented in two steps: the winding of the body and the insertion of the buffer in the traditional chip design θ. The information entered at this time includes the list of machine-connected lines, the parameters of each section of the clock tree towel, and the power rate and timing database of various types of buffers to calculate the line delay, buffer delay, and buffer of the clock tree. Data such as device power consumption. Through the circuit synthesis tool for quickly determining the buffer placement position and various types of circuit synthesis tools provided by the present invention, the minimum output power can be achieved after the update and the clock tree power grid line list that meets the clock distortion design specifications. The reason why the method proposed by this u can achieve the purpose of low-power clock tree synthesis is that it can shout _ the position of each buffer and its divergence _, its characteristics are ... , Record the relationship between the front stage and the back stage of each buffer. For each optimization, you can calculate the clock delay and power consumption information of each buffer in linear time. Design specifications. The present invention uses empirical methods to optimize the adjustment of buffers for each layer of the clock tree, in order to achieve 10 200428240 to the minimum power consumption and to meet the design specifications of strict clock distortion limit; (3 ) Applying the concept of Heuristic for optimization, instead of using the exhaustive method of carpet search, it can save the time to find the best solution. Because the clock tree has a limited number of buffers, but the placement position is not limited. After the invention uses the above heuristic algorithm to determine the buffer placement position, the simulated annealing algorithm (Simulated Annealing, SA) can be used to obtain the local optimum. Good solution (Local 〇ptimai Solution) 'The global optimal solution (Global Optimal Solution) can be obtained through load balancing technology; (4) Within the range allowed by the equipment, the software tool developed by the present invention can handle High-speed super-large integrated circuit of ten thousand logic gates. In a specific embodiment, the method for quickly determining the position and type of a buffer with low energy consumption according to the present invention includes the following steps: (1) inputting a clock tree grid grid list to be processed, including information of each line and buffer insertion Position, in addition to the buffer database and flip-flop database, to provide a record of the calculation of the process; (2) the initial state slave; (3) whether this age meets the design specifications of the clock distortion rate, and the circuit operation Whether it will violate the limitation of the component database; ⑷ If the specifications are reversed, the buffer insertion, removal, replacement, and shift steps are performed to obtain a feasible solution for clock tree synthesis; (5) Heuristics will be applied to the feasible solution The algorithm is combined with the simulated annealing algorithm, and 2 is the best solution that takes the minimum power consumption and meets the design specifications. [Embodiment] As mentioned above, the object of the present invention is to propose a method for quickly determining all the positions and types of slow and cry in the clock tree. The clock distortion rate is set one by one for timing design specifications. Because the actual buffer power and timing database provide limited choices of buffering, the best optimization of the silk surface is rarely asked. _ Fang cut for the carpet money and the type 11 200428240 Change the type of each clock and compare the type of each buffer. Power consumption and clock skew rate, until the final selection of the best buffer view combination that has the smallest power consumption and meets the clock skew rate timing design specifications. 'The main disadvantage of this method is that: ⑴wave f; (2) There are no steps such as inserting, removing, and shifting the buffer, and the program cannot easily find the best solution.

基於上述理由,並鱗此細⑽題的馳,本發暖展—快速決定 時鐘樹中緩衝象、移除、鎌、獅·,侧㈣)將原始不 符合設計規範之時鐘樹設計’應職衝諸人、移除、置換、移位等步驟 修正至得可赌;(2)顧平_賴_之啟赋演算法,快速有效地決 定低耗能、時鐘延遲最小且符辦鐘歪曲率設計規範的最佳緩衝器位置及Based on the above reasons, and to solve this problem, this hair warming exhibition-quickly determine the buffer tree, remove, sickle, lion, side, etc. in the clock tree) will design the original clock tree that does not meet the design specifications. Modify the steps such as punching, removing, replacing, shifting, etc .; (2) Gu Ping_Lai_'s Qifu algorithm, quickly and efficiently determine low energy consumption, minimum clock delay, and clock distortion. Design specifications for optimal buffer locations and

於步驟114中判斷是否時鐘樹仍符合設計規範,若違反則進行步驟加回覆 至前-狀態,若不違反則至步驟118中進行降低辨消耗之最佳化演算法\ 至步驟120則本發明之低耗能時鐘樹合成演算法結束。 麵組合。圖三顯示依據本發明施行特定實施例的系統方塊圖。輸入特定 實施例的時鐘樹合成電網排線表㈣2、時_曲率設計規範上限跡以 及包含緩衝II與正反ϋ之功率、時序龍庫,經由步驟⑽觸是否符 合設計規範及義歪曲率關,假設違反上舰_進行步驟m,調整= 鐘樹之合成方式使其祕合設計職,反,進行步ml2,置換緩衝器J 態以快速降低時鐘樹消耗功率’然而本步驟存在違反設計規範之風險,故 特定資料庫之功率分析 在低耗糾鐘樹合成鱗法過程中,對於個別設計資料冑之緩衝器之 功率消耗特性可提供不同之緩衝n選湖斷絲,圖四為針對特定緩衝哭 12 200428240 貧料庫,目如觸鄉肅撕_,_In step 114, it is judged whether the clock tree still meets the design specification. If it is violated, the step is added back to the pre-state. If it is not violated, the algorithm is optimized in step 118 to reduce the discerning cost. To step 120, the present invention The low-energy clock tree synthesis algorithm ends.面 组合。 Face combination. FIG. 3 shows a block diagram of a system according to a specific embodiment of the present invention. Enter the clock tree of the specific embodiment to synthesize the power grid wiring table 电网 2, the hour_curvature design specification upper limit trace, and the power and timing dragon library including buffer II and positive and negative ,, and then check whether it meets the design specifications and the distortion rate through the steps. Assume that the ship is violated. Go to step m. Adjust = the synthesis method of the bell tree makes it a secret design job. Instead, perform step ml2 and replace the buffer J state to quickly reduce the power consumption of the clock tree. However, this step has the risk of violating the design specifications. Therefore, the power analysis of a specific database in the process of low-power bell-tree synthesis scale method can provide different buffers for the power consumption characteristics of individual designed data buffers. N Select the broken wire. Figure 4 shows specific buffers. 200428240 Poor materials storehouse, as if touching the countryside _, _

152,並依B階層緩衝器數目平均負載c階層正反器说。圖四之功率消耗 圖為該時鐘樹所消耗功率總合,包含緩衝器之操作消耗功率、正反器之操 作消耗功率及線路訊號轉換時之雜神,其中絲功率為味示該辦 鐘樹木構域#料庫設計規範’故不予討論。其巾若緩_輸錢號轉換 時間或輸出負載無法經由資料庫查表(Τ· 得知,將利用已知的輸 入。TU虎轉換時間與負載電容值查出四個角落的功率消耗數值,再利用多項 式線性内插法(P〇lynomial Interp〇lati〇n)方法計算内插值求近似的緩衝器功 率消耗,本發明利用經驗方程式A,y:M +瓜+ ,經由四個角落求出 4、C、/)四個常數後,再代入實際輸入訊號轉換時間及輸出負載求解。 CL則肌財最小讀人貞載、最傾耗辨及最祕輪力,、 贈刪具有次小之輸入負載、次小消耗功率及次小驅動能力,而 CLOUmo私敎輸人貞載、最謂耗辨騎越魏力。考慮圖 五之日t鐘树’》為A、B、c三階層,其中A階層為蚊種類之主要驅動緩 衝器’令為CLKBUFX8 ; B騎為魏緩衝[可變化魏衝器麵及 其個數,分別為圖四之X軸與γ軸之變數;c階層為負載正反器,固定正 反器麵,邱定正反魏目為個,其負載數值可估算為正反器輸入 電容加上平均線路負載。緩衝器152為Α階層中主要驅動緩衝器,緩衝器 154為B階層緩衝器之一,B階層之緩衝器最多可達則固,負載於緩衝器152, and according to the number of B-level buffers the average load c-level flip-flop said. The power consumption diagram in Figure 4 is the total power consumed by the clock tree, including the operating power consumption of the buffer, the operating power consumption of the flip-flop, and the clutter when the line signal is converted, where the silk power is the taste of the clock tree.建 域 #Material Design Specification 'will not be discussed. If it is slow, the conversion time of the lost money number or the output load cannot be obtained through the database look-up table (T. It is known that a known input will be used. TU Tiger conversion time and load capacitance value to find the power consumption values in the four corners, Then the polynomial linear interpolation method (Polynomial Interpolation) is used to calculate the approximate buffer power consumption by interpolation. The present invention uses the empirical equation A, y: M + melon + to find 4 through the four corners. , C, /) four constants, and then the actual input signal conversion time and output load solution. CL is the youngest person who reads, has the most discerning and most secret wheel power, and has the second smallest input load, the second smallest power consumption and the second smallest driving ability. It is said that the power consumption is more than the power of riding. Consider Figure 5, "T Zhongshu" "A, B, c three levels, where A level is the main driving buffer for mosquito species" Let CLKBUFX8; B ride is Wei buffer [Variable Wei Chong surface and its Figures are the X-axis and γ-axis variables in Figure 4. The c-level is the load flip-flop, and the fixed flip-flop surface is fixed. Qiu Ding's flip-flop is one. Its load can be estimated as the flip-flop input capacitance plus the average Line load. Buffer 152 is the main driving buffer in level A, and buffer 154 is one of the level B buffers. The level B buffers can be fixed at most.

經由本功率消耗圖之呈現:可發現其功率消耗為一凹面分佈,在B級緩衝 如们數較少之情況下,由於負載增加使得正反器之輸入轉換時間上升,進 13 200428240 之設計中,雖可 力口0經由 之設 而增加正反器部份之功率消I·而 … 降低輸入訊號轉換時間,然而麵 如尺寸較大之設言i 上述針對特定資料庫功率消 W份之功率消耗則略有增加。 計資料鱗,_^略本發财财不同特性 ”里知^成緩衝器選擇演算法。 的路===慮在%序設計的規範方面均與計算訊號由輸入端至輸出端 Γ 咖—條續_騎完整路徑上,緩_延遲盘 t嫌』蘭樹生格離edu_an細 描述,此時置換緩衝器的種類僅影響緩衝器延遲,與線路延遲無關;圖七⑷) 表不緩衝謂與緩衝請之間連線的寄生RC電路施,其轉移函數為 卿圖增為其RSPF格式線路電性參數,其中ρι模酬,⑽)為作) 的前三階泰勒展開式近似,可視為缓衝器162的等效線路負载,其線路總 負载量為C1+C2。該線路的接腳對接零in4〇_pin)延遲可以簡單的模型 表示,線路延遲為R2*C3。為簡化程式進行之運算複雜度,令線路負载= 實體繞線長度成-正比關係,而缓衝器延遲之計算與緩衝器的負载有關, 並忽略線路延遲之效應,即負載於同一緩衝器下之緩衝器,其輸入〃士 ⑴遲時 間均相等。緩衝器延遲值與輸出訊號轉換時間可由各緩衝器時序資料庫杳 表得知,如圖六所示,應用多項式内插值可求近似的緩衝器延遲值,长取 輸出訊號轉換時間的方式亦同。(2)置換缓衝器的種類,相對改變該缓 的輸入電容值(Pin Capacitance)。如由其前一級緩衝器輸出觀察,因改綠 吏了 14 j、里,而_器延遲亦會隨之改變。例如圖七省變緩衝_的類 改k緩衝器162的等效負载,因此緩衝器⑽與⑹都必須重新查 二,^1 (3)對於緩衝器的移位,由於線路的改變造成緩衡器的負載亦 曰t到铸,故緩衝器的延遲亦會造成影響。例如圖六,若改變緩衝器说 與緩衝盗I64在實體設計上之相對位置,則改變緩衝謂的等效負载, ^ 一緩衝162與164都必須重新查表計算延遲。⑷同步系統正反器為最 後一級緩衝器的負載’通常一個緩衝器會接多個正反器,換句話說,整體 正反器的數量比最後—級緩衝if的數量多。 圖八為本發明之低耗能快速決定時鐘樹中緩衝器位置及義演算工具 、貨原始輸场祕輯是否符合設計規範之麵⑽之雜流程圖。 、扠、且凋正正反益位於時鐘樹之位置204,時鐘樹負載平衡步驟 而口核組於凋整後皆必須進行路徑延遲計算及設計規範檢驗,步驟2⑽ 及212皆為驗言正判斷程式。各模組之詳細過程闊述於下。 首先是驗註判斷程式2〇6及m,包含兩項觀察重點:⑴在資料庫中· 緩衝器的_肖耗與時鐘延義計算中,由兩項指標所決定 •輸入訊號轉 換守門^輸出負載大小。因此對於目標緩衝器前一級之緩衝器而言,變更 目h緩衝為之位置或麵,均會造成前一級緩衝器之輸出訊號轉換時間的 改變;同時,資料庫中對於元件(緩衝器及正反器)輸入訊號的轉換時間, 有-限定之上、下界,若輸人赠之轉麵間小於可容許之下界,或轉入 訊號之轉換時間大於可容許之上界,皆會造献件在操作時無法正铜估 15 200428240 其現象,稱為違反設計規範。 倉都計次w _,對於目標、緩衝器所串接之元件,其 。負在^tr料村規定之上、下魏,若超舰-範《會造成缓衝 = 伽她,此麵椒峨範售數位電路 权什中,8守鐘樹為最長一條 …心,$鐘訊號經由CLK接腳傳入晶片中後, 透過%鐘樹傳遞至循序邏輯中’ 邏輯的延遲時間愈趨近-致命好,㈠^吊㈣,時鐘訊號到達猶序 短延遲時間之差需在-定範圍之中=祕設計中規範最長延遲時間與最 ^ 稱為0守鐘歪曲率,因此本驗証步驟亦 舄判斷電路之時鐘歪曲率是否 _ 又计規軛。本驗証判斷程式步驟將成為 本發明中取常被使用的模組 須盡可能提昇解。㈣誠少《執行的咖,齡判斷程式 综合以地,她娜何娜_嗽输計 解的過程中,樹翻她辑曜侧舰轉消耗外, 對於缓衝器之時鐘延遲、輸出訊號轉換時間均有顯著之影響,合理之負載 Γ避免敍料規範,柯滿足魏緩《之輸錢__間設計規 另方面為付口嚴格的時鐘歪曲率限制,調整時鐘樹中同階層緩衝哭 之負載,娜時鐘延遲效應趨近—致,對整體之時鐘延物正峨 果。 _步驟綱巾’為,_蝴《率賴㈣式,對於嚴格的時 知正曲率限制,-級緩_所造成的時鐘延遲即會違反時鐘歪曲率限制, 因此原始時鐘樹設物存在—娜與其他正反ϋ在時鐘樹上之位置不 在同-階層上,即會造成時鐘歪曲率限制違反。為避免此問題,本步驟將 16 200428240 '斤連接正反盗於日守鐘樹上之位置,除此之外,為考慮缓衝器負載之平衡, 在選擇緩衝器重新連接正反器時同時考慮前級缓翻之負載能力及已負載 大!以期緩衝ϋ之延遲效應滿足嚴格之時鐘歪曲率限制。Through the presentation of this power consumption diagram: it can be found that its power consumption is a concave distribution. In the case of a small number of B-level buffers, the input conversion time of the flip-flops is increased due to the increase in load, and it is designed into 13 200428240. Although you can increase the power consumption of the flip-flop part I through the setting of 0, and ... reduce the input signal conversion time, but the face of the larger size i said above for the specific database power consumption W part of the power Consumption increased slightly. Calculate the data scale, _ ^ slightly different characteristics of the fortune, "known as a buffer selection algorithm. The way = = = Considering the specifications of the% order design and the calculation signal from the input to the output Continued on the full path, slow_delayed disks are described in detail. The type of the replacement buffer at this time only affects the buffer delay, and has nothing to do with the line delay; Buffer the parasitic RC circuit connected between them, and its transfer function is Qing Tuzheng ’s RSPF format line electrical parameters, where ρmode compensation, ⑽) is the first three-order Taylor expansion approximation, which can be considered as a slow The equivalent line load of the punch 162, the total line load is C1 + C2. The pin of this line is docked with zero in4〇_pin) The delay can be represented by a simple model, and the line delay is R2 * C3. To simplify the program, The computational complexity is such that the line load = the physical winding length is in a proportional relationship, and the calculation of the buffer delay is related to the load of the buffer, and the effect of line delay is ignored, that is, the buffer loaded under the same buffer, which The input delay time is equal. The buffer delay value and the output signal conversion time can be obtained from the tables of each buffer timing database. As shown in Figure 6, the polynomial interpolation can be used to find the approximate buffer delay value. The method of obtaining the output signal conversion time is the same. (2) Replace the type of the buffer, and relatively change the value of the slow input capacitance (Pin Capacitance). As observed by the output of the previous level of the buffer, it will be changed to 14 j, and the delay of the device will also change. It changes accordingly. For example, the equivalent load of k buffer 162 is changed to the type of k buffer 162 in Figure 7 for the change of buffer_. Therefore, both buffers ⑽ and ⑹ must be re-checked. ^ 1 (3) For the shift of the buffer, The change causes the load of the slow balancer to be cast to t, so the delay of the buffer will also have an impact. For example, in Figure 6, if the relative position of the buffer theory and the buffer pirate I64 in the physical design is changed, the equivalent of the buffer term is changed. Load, ^ one buffer 162 and 164 must re-look up the table to calculate the delay. ⑷ Synchronous system flip-flop is the load of the last stage buffer 'Usually a buffer will connect multiple flip-flops, in other words, the overall flip-flop Quantity ratio The number of back-stage buffer ifs is large. Figure 8 is a flow chart showing the low energy consumption of the present invention to quickly determine the buffer position in the clock tree, the meaning calculation tools, and the original output field secrets to meet the design specifications. The fork, and the pros and cons are located at the clock tree position 204. The clock tree load balancing step and the nucleus team must perform path delay calculation and design specification inspection after the phasing. Steps 2⑽ and 212 are the test positive judgment programs. The detailed process of each module is described in the following. The first is the check-judgment program 206 and m, which contains two observation points: ⑴ in the database. _ Xiao consumption of the buffer and the calculation of the clock extension meaning. Determined by two indicators: input signal conversion gate ^ output load size. Therefore, for the buffer at the previous level of the target buffer, changing the position or surface of the target h buffer will cause the output signal conversion time of the previous level buffer At the same time, the conversion time of the input signals of the components (buffers and flip-flops) in the database has-limited upper and lower bounds. If the input gift is less than the allowable lower bound, or The conversion time of the input signal is greater than the allowable upper bound, which will result in the inability to correct the copper during the operation. 15 200428240 This phenomenon is called a violation of design specifications. The bins are counted w_, for the components connected to the target and the buffer,. Lost above ^ tr material village regulations, lower Wei, if Chao Jian-Fan "will cause a buffer = Gata, in this case Jiao Fan sells digital circuit rights, 8 bell clock is the longest one ... heart, $ After the clock signal is transmitted to the chip via the CLK pin, it is transmitted to the sequential logic through the% clock tree. The logic's delay time is getting closer-fatal is better, ㈣ ㈣ ㈣, the difference between the clock signal's arrival and the short delay time needs to be within -In the fixed range = the longest delay time and the highest standard in the secret design is called the zero clock distortion rate, so this verification step does not determine whether the clock clock distortion rate of the circuit is _ yoke. The steps of this verification judgment program will become the commonly used modules in the present invention. The solution must be improved as much as possible. Wu Chengshao's "The implementation of the coffee, age judgment program is comprehensive, in the process of losing her solution, she flips her edits to the side of the ship and consumes it. For the clock delay of the buffer, the output signal conversion Time has a significant impact. Reasonable load Γ avoids narrative specifications. Ke satisfies Wei Kui ’s "Lost in Money __ Design Rules. In addition, it has strict restrictions on clock distortion, and adjusts the buffers of the same class in the clock tree." The load, the clock delay effect is approaching-due to the overall clock delays. The _step outline is _____, "rate-reliance formula, for strict time-of-day positive curvature limit,-grade slow_ clock delay caused by _ will violate the clock skew limit, so the original clock tree design exists-Na and Other pros and cons on the clock tree are not on the same level, which will cause the clock distortion rate limit to be violated. To avoid this problem, this step will connect 16 200428240 'pounds to the position of the clock guard on the day clock. In addition, in order to consider the load balance of the buffer, when selecting the buffer to reconnect the flip-flop at the same time Consider the load capacity of the previous stage and the large load! In order to buffer the delay effect to meet the strict clock skew limit.

乂驟210巾’壯述飾正反胃步_無法滿麟鐘歪曲率時所採取 之措細’平鱗鐘射猶器之貞伽射減少城綠人接腳至緩衝 器、正反器之時鐘延遲外,亦可降低缓衝器、正反器上功率之消耗。如圖 九⑻所不这2緩衝盗Α貞載為254緩衝器Β及256緩衝器c,且由緩衝 器Α的輸出端向後級計算之輸㈣載小於緩衝器a所屬缓衝器種類最大可 承又之負載,就緩衝器A之輸錢雜斜間及輸出貞載所計算出之輪 出訊號轉換_ ’皆滿足緩衝U及緩衝紅所屬缓衝難射接受之^ 入訊__間’此時稱時鐘樹上緩衝器模組A、B、C具有可行解。考慮 ,”… C之下、,及緩衝盗,其中由緩衝器B所驅動為緩衝hd、e、F, 由缓衝器C所驅動為緩衝器G、H、j、;、κ、l,當前級緩衝器至次級緩 衝器之實魏《度在近鱗長,且魏緩麵讀人負齡料情況下 考慮,因此緩衝器B的輸出貞載小於緩_c,其將造成緩衝器B上的時 鐘延遲小概衝H C,耻_ D、E、F上之時觀遲大於_ g、 H、I、J、K、L,Step 210: 'Zhuang Shu Decoration Positive and Negative Steps _ Measures taken when Man Lin Zhong's Distortion Rate Cannot Be Taken' Flat Scale Bell Shooter's True Gamma Shot Reduces City Green People's Pins to the Buffer and Flip Clock Besides the delay, the power consumption of the buffers and flip-flops can also be reduced. As shown in Figure 9, the two buffers A and A are loaded as 254 buffer B and 256 buffer c, and the input load calculated from the output of buffer A to the subsequent stage is smaller than the maximum buffer type that buffer a belongs to. The load is calculated, and the round signal conversion calculated for the miscellaneous slope of the buffer A and the output of the load is _ 'all satisfying the buffer U and the buffer red which belong to the buffer is difficult to shoot ^ incoming __ 间' At this time, it is said that the buffer modules A, B, and C on the clock tree have feasible solutions. Consider, "... below C, and buffer theft, where buffer B is driven by buffers hd, e, F, and buffer C is driven by buffers G, H, j ,; κ, l, The actual buffer from the current buffer to the secondary buffer is considered to be near the scale, and Wei slowly reads the negative age data, so the output of buffer B is less than buffer _c, which will cause buffer The clock delay on B has a small impact on HC, and the time delay on D, E, F is greater than _g, H, I, J, K, L,

將達反嚴格的時缝醇設計規範。在本步财,為避免 這類架構的存在導鱗鐘歪神過大,故縣負餅衡之策略以符合設計 ,範。若時鐘樹上同_階層存在兩個或兩個以上的緩衝器,選擇輪出負載 最大及輸出負載最小的兩個緩衝器為修正目標,假設緩衝器C所連接次級 缓衝器中存在緩_ κ所造成之負載L0ADGK,A於緩觸B與緩衝器K 17 200428240 相連接後的負載LOADbk,則移除緩衝器κ與緩衝器C的__ 新連接緩衝器B與缓衝器K。經過此-操作,可確保緩衝器β所新辦的輸 出負載將小於麟H C所減少的輸㈣载,並且兩緩衝器之細負載曰將趨 於平衡,其負載平衡之結果如_)所示。若改變緩魅㈣緩衝哭c所 負載之次級元件為正反器,其判斷依據亦相同。本負載平衡調整步驟之應 用為經由時鐘樹之樹葉往樹根方向,同_階層同時考慮,並於調整後餅 設計規範驗註步驟’若仍無法献,則重回步驟細之進行負载平衡,若 可滿足設計規範,則調整電路為可行解之步驟結束214。 、右 佳解步驟 圖十為步驟m快速尋求最佳解之詳細步驟,對於符合設計規範及時 鐘歪曲率之時鐘樹設計,進行快速降低消耗功率之方法。步驟地輸入婉 由步物原始已符合設計規範之時鐘樹、或由步驟㈣整至符 =時_ ’糧料纖_㈣侧巾,料辦^ 功率之消耗,減少非必要性之緩衝器為—可行的辦法,然而為滿足時鐘 的限制’僅兩類緩衝器為本步驟中選擇移除的目標:⑴在時鐘樹的 ❼,所有緩衝器至樹根的共同路徑上,若存在單―、未分支之緩衝哭 則為緩衝器移除步驟考慮之目標,由於移除該緩衝器對所有正反器之日 成々絲致’故可避免產生違反時鐘歪曲率之設計規範,然而移 除=衝器須考慮前級緩衝器是否有㈣純能力以驅動下—級元件,是 庫之設_違反’若造成上述違反現象,__需存在 丁里1种。(2)針對時鐘樹上同階層之緩衝器,若其前級、次級緩衝 18 200428240 器之連線狀態類似,則視為本步驟可移除緩衝器之目標,然而本項操作牽 連元倾廣,在繼前級、纽元叙貞載、輪人舰_均符合資料庫 設計規範,从計算時鐘射最後—級正反^之時輕辦毅限制後, 始可移除。經由步驟304移除非必要之緩衝器步驟後,其時鐘樹架構為提 供最佳化步驟之基本架構。 本發明在步驟306中提供-種快速財時鐘樹上緩衝器之種類,針對 時鐘樹上最後-級之緩衝騎貞載之正㈣幻、,逐—變化最後一級緩衝 器之大小’並套用至時鐘樹中所有之緩衝器;最後應用驗註判斷步驟娜, 觀察是否符合資料庫設計規範及時鐘歪曲率限制,若其中一項無法滿足設 計規範,則步驟训將緩衝器種類調大一級’並重新計算時各級緩衝器之 時鐘延遲、輸人訊號轉換時間及輸出負載,#皆符合f料庫設計規範及時 曲率限制時,至步驟312結束快速決定時鐘樹緩衝器麵之程式。 在本發明提出低耗能時鐘樹設計主流程圖圖三中,步驟ιΐ4再度驗証 繼快速決定時鐘樹緩衝驗類步驟帽決定之緩魅是否滿足設計規 耗,好驟310中離開條件之一為所有緩衝器已調整至最大仍無法滿足設 計絲’是故在步驟116中將緩衝器之種類調整回進入步_前之初始 狀態’以#腿人最触步循時鐘狀設計確織足:賴庫設計規範及時 鐘歪曲率限制。 為步驟118應用經驗法則逐—調整緩衝器種類之詳細流程圖, 針對斗寸疋輪入之資料庫特性’經由功率消耗與時鐘延遲間之折衷分析,選 擇k之緩衝益提供緩衝器插入、置換步驟使用。步驟他為最佳化演算 19 200428240 、' ° ^驟404中為進打負載正反器之平衡步驟,與步驟21G操作過 程相似,但僅考慮最大時鐘延遲之正反器與最小時鐘延遲之正反器,而非 全部正反器同時考慮,此策略可降低程式運算時間,並經平衡緩衝器負載 過私進而降低整體功率之消耗。步驟觀細正判斷程式,為避免上述步 響時鐘樹其餘軸喊成輸_反,繼職範違反之 柯木構貝]、、、二由步驟4〇8回復時鐘樹狀態,以保証最佳化過程中時鐘 樹保持可行解架構。 ^為本^明應用習知技術模擬退火法之最佳化過程,引入溫度_ 、'A卻參數.Ο·9降低温度以求取最佳功率消耗解之時鐘樹緩 =U在本步^,為了降低變換緩衝器種類對設計規範中時鐘歪曲 ^ I ^划木用日守鐘樹中同一階層緩衝器採用同-麵之策略,除解決 上述影響外,更可降低最佳化過程之運算複雜度。步驟412中判斷經由模 擬退火你域之時鐘樹功率雜是否比前—階段輕,若成立則繼續步 驟姻之正反器平衡操作,以期得最佳解得達到全域最佳解;若不成立, 步驟414回復至前一最佳解之緩衝器組合,並至步驟416中結束最佳化過籲 長。圖十二中為本賴擬敎法絲礎之最佳化過程虛擬碼。 本發明所提出之低時鐘歪曲率及低功率時脈樹合成程式,其步驟可分為 λ卩6首先將違反料規範之電路變更為符合規範之時脈樹;接下來 對於符合設計規範之時鐘樹進行功率消耗最佳化及時鐘歪斜最佳化之步 驟。其演算法之運算複雜度分述如下。 、J斷疋否為可行解之步驟,將由輸入接腳啟始,逐一計算緩衝器上輸 20 200428240 入轉換時間,並依連線資輯算其貞狀小,再㈣㈣私查表方式而 得輸出轉換時間、功率消耗等資訊。令時脈樹中緩衝器個數為m、正反器個 數為N,其中正反器個數約為緩衝器錄之數十倍。令資料庫查表計算之複 雜度為0(P),湖岐轉可·之轉娜料阶(胁警 在變更可行解部份’其關鍵步驟為平衡樹狀結構,針對具有最大延遲時 之緩衝器與具有最小延遲時間之缓衝器進行負載平衡之操作。以完整二元 樹考慮’最後一級之緩翻之個數約佔所有緩衝器之一半,即纖。以最差 情況(Worst Case)考慮、,將調整酽之排列組合次數。此時之運算複雜度 為〇(P*(M+N)m2) 〇 另外在進行最佳化解法之步驟時,本發明之最佳化步驟採用模擬退火 演算法計算功率雜。由於本演算法在變化緩魅鶴之方式亦採用同一 P奢層變化’對於—包含M個緩衝器之完整:元樹,其包含之階層數(樹高) 近似為%2M,對於每-置換步驟均需判斷是否符合設計規範,此運算複雜 度為吧_抑_;另外由於變化緩衝器之大小時必須考慮是否會與 已存在之緩補、正反器造成重疊而違反規範,故每個緩衝賴化時,可 知其判斷《步狀複雜度為卿•目此模擬退火演算法所需之運算複 雜度共為吧。综合以上分析,本發明所提出之時鐘樹合成 /¾异法,在最差情況下之運算複雜度為⑽+州*从2)+ 〇(P*(M+N)2*log2M)。 應用本發明所提出之同時考慮功率消耗及最小時鐘歪斜之時鐘樹合成 演算法’採_人、移除、置換及移位之緩衝器操作,因超大型積體電路 200428240 中時鐘樹可能包含數萬個緩衝器,為考慮程式之運算複雜纟,故置換緩衝 器翻之步驟為針對時鐘樹同一階層同時置換,以達到嚴格之時鐘歪斜限 制,故降低時鐘樹之功率消耗目標將可達到近似最佳解。 應用圖三之演算法流程,表—整理出提供作為測試本發明之五個實施 例的相關資訊:測試案例—包含7個缓衝器及123個正反器;測試案例二 包含30個緩衝器及500個正反器;測試案例三包含ιΐ3個緩衝器及㈣ 個正反器;測試案例四包含251個緩衝器及屬個正反器;測試案例五含 奶個緩衝器及麵個正反器。表中列出原始時鐘樹排線列表之功率消耗 及時鐘歪曲率。表二中應用本發明所提出之演算法對各測試案例進行分 析’經由限制時鐘歪曲率同時觀察功率消耗之變化。由表二中可發現,本 發明所提演算法對於變更排線列表後之電路,其功率雜可降低鄕,另 卜右對π鐘正曲率嚴格没计’將提高時鐘樹之功率消耗,然而仍小於原始 架構之功娜。_二巾精現,騎嶋之輪醉而新增之 緩衝器,可能導致由時鐘樹根至樹葉(正反器)之最大時鐘延遲增加,作 在最大時觀賴最小時觀_萌蚊軌下,料影響f路之操作。 簡言之’本雜㈣軸人、鑛、、_細技巧, 快速決定時鐘樹上各緩衝器種類及位置之方法,可配合已設定之時鐘樹人 成設計流程,迅親蚊縫緩__及健,使雜體設計符合資料 庫设計規範’収時輕⑽可崎合條件_。本發敗快速決定時鐘 樹中緩衝麵及位置演算工具,包括:輸犧理的時鐘樹結構 各線路的她編插順;輪峨贩瓣, 22 200428240 功率消耗、時鐘延遲、輸出訊號轉換時間等特性;初始狀態設定;判斷時· ’谢木構疋否符合设計規範之可行解,若不存在則應用負載平衡及缓衝器-平衡之方法以得-可行解;快速決定緩衝器種類以降低時鐘樹之消耗功: 率;應用結合模擬退火法之最佳化技巧,求得最小功率消耗之缓衝器組合· 全域最佳解。 · 唯以上所述者,僅用以制本發作輕,當不能狀限制本發 明的範圍。即大凡依本發财請專纖騎做之解及修飾,仍將不Will meet the anti-strict time slot alcohol design specifications. In this step, in order to avoid the existence of this type of structure, the scale of the clock is too large, so the county's strategy of negative cake balance is in line with the design. If there are two or more buffers on the same level in the clock tree, the two buffers with the largest round load and the smallest output load are selected as the correction target. It is assumed that there is a buffer in the secondary buffer connected to buffer C. The load L0ADGK caused by _ κ, A is slowly touched by the load LOADbk after B is connected to buffer K 17 200428240, then __ newly connected buffer κ and buffer C are connected to buffer B and buffer K. After this operation, it can be ensured that the new output load of the buffer β will be less than the reduced input load of the Lin HC, and the fine load of the two buffers will tend to be balanced. The result of the load balancing is shown in _) . If you change the secondary component of the slow-clog buffer c to a flip-flop, the judgment basis is the same. The application of this load balancing adjustment step is to go through the leaves of the clock tree toward the root of the tree, consider the same level at the same time, and check the step of the cake design specification after the adjustment. If it is still not possible, return to the step to perform load balancing. If the design specifications can be met, the step of adjusting the circuit to a feasible solution ends 214. 2. Right Steps for Best Solution Figure 10 is the detailed steps for quickly finding the best solution for step m. For the clock tree design that meets the design specifications and the clock distortion rate, the method of quickly reducing the power consumption is performed. Enter step by step the clock tree that originally meets the design specifications, or adjust the step to the symbol = 时 _ 'grain material fiber_㈣ side towel, material handling ^ power consumption to reduce unnecessary buffers is — A feasible method, but in order to meet the clock limitation ', only two types of buffers are selected as the target for removal in this step: ⑴ In the clock tree ❼, all buffers are on the common path to the root of the tree. The unbranched buffer is considered as the goal of the buffer removal step. Because the removal of the buffer will cause the failure of all flip-flops, it can avoid the design specification that violates the clock distortion rate. However, removing = The puncher must consider whether the front-stage buffer has the pure ability to drive the lower-level components. It is the design of the library. _ Violation __ If one of the above violations is caused, __ must exist in Dingli. (2) For the buffers of the same level on the clock tree, if the connection status of the former and secondary buffers 18 200428240 is similar, it is regarded as the goal of removing the buffers in this step. However, this operation involves elemental dumping. It can be removed only after the previous level, New Zealand ’s Xu Zhenzai, and the ship ’s ship meet the design specifications of the database. From the time when the clock is fired to the last level, the level is positive and negative. After removing unnecessary buffer steps through step 304, its clock tree architecture is the basic architecture that provides the optimization steps. In step 306, the present invention provides a kind of buffers on the fast clock clock tree. According to the positive illusion of the last-stage buffer riding on the clock tree, the size of the last-stage buffer is changed one by one and applied to All buffers in the clock tree; the final step is to apply check notes to observe whether it meets the database design specifications and the clock distortion limit. If one of them fails to meet the design specifications, the steps are to increase the buffer type by one level. When recalculating, the clock delay, input signal conversion time, and output load of all levels of buffers are all in line with the design specifications of the library and the curvature limit is reached. Step 312 ends the program for quickly determining the clock tree buffer surface. In the main flow chart of the low-energy clock tree design proposed by the present invention in FIG. 3, step 4 is re-verified to quickly determine whether the slow-charging determined by the clock tree buffer test step cap satisfies the design consumption. One of the leaving conditions in step 310 is All the buffers have been adjusted to the maximum and still can't meet the design silk. 'That's why in step 116, the type of buffers is adjusted back to the initial state before entering the step. Library design specifications and clock skew limit. A detailed flowchart of adjusting the buffer type step by step by applying the rule of thumb for step 118. According to the characteristics of the database of the round-robin rolls, through the analysis of the trade-off between power consumption and clock delay, choose the buffer benefit of k to provide buffer insertion and replacement. Steps to use. Steps are optimization calculations. 19 200428240, '° ^ Step 404 is the balancing step of the load flip-flop. It is similar to the operation of step 21G, but only the flip-flop with the maximum clock delay and the flip-flop with the minimum clock delay are considered. Inverters, rather than all flip-flops, are considered at the same time. This strategy can reduce the program operation time and reduce the overall power consumption by overbalancing the buffer load. Steps are carefully judged, in order to avoid the above steps, the remaining axes of the clock tree are shouted and lost, and the successor who violates the rules of the Komaki shell],, and 2 return to the state of the clock tree by step 408 to ensure the optimization process. The clock tree maintains a feasible solution architecture. ^ Based on the application of conventional technology to simulate the optimization process of the annealing method, the temperature _ and the 'A' parameters are introduced. 〇 · 9 The clock tree that reduces the temperature to obtain the best power consumption solution is slowed down = U in this step ^ In order to reduce the type of transformation buffers, the clock distortion in the design specification is ^ I ^ The same-level strategy is adopted for the same level of buffers in the day-to-day clock tree. In addition to solving the above effects, the computational complexity of the optimization process can be reduced. . In step 412, it is judged whether the power miscellaneous of the clock tree of your domain is lower than that of the previous stage through the simulated annealing. If it is true, continue the step-by-step flip-flop balancing operation in order to obtain the best solution to reach the global best solution; if not, step 414 returns to the buffer combination of the previous best solution, and ends the optimization over-step in step 416. Figure 12 is the virtual code of the optimization process that relies on the pseudo-basic method. The steps for synthesizing the low clock distortion rate and low power clock tree proposed by the present invention can be divided into λ 卩 6. First, the circuit that violates the material specification is changed to a clock tree that conforms to the specification; The tree performs the steps of optimizing power consumption and optimizing clock skew. The computational complexity of the algorithm is described below. If J, J is a feasible solution step, it will start from the input pins, calculate the input conversion time on the buffer one by 20, 200428240, and calculate its chastity according to the connection data. Output information such as conversion time and power consumption. Let the number of buffers in the clock tree be m and the number of flip-flops be N, where the number of flip-flops is about tens of times of the buffer record. Let the complexity of the database look-up table calculation be 0 (P), and the Huqi transfer can be converted to the material level (the key step of the threat police in changing the feasible solution part is to balance the tree structure. The buffer and the buffer with the minimum delay time for load balancing operation. Considering a complete binary tree, the number of buffers in the last stage is about half of all buffers, that is, fiber. In the worst case (Worst Case ) Consider and adjust the number of permutations and combinations of 酽. At this time, the computational complexity is 〇 (P * (M + N) m2) 〇 In addition, when performing the optimization method step, the optimization step of the present invention uses The simulated annealing algorithm calculates the power miscellaneous. Because the algorithm also uses the same P luxury layer to change the way of slowing down the charm crane, 'Yes-complete with M buffers: meta tree, the number of levels (tree height) it contains is approximately % 2M. For each replacement step, it is necessary to determine whether it meets the design specifications. The complexity of this operation is ___. In addition, when the size of the buffer is changed, it must be considered whether it will overlap with the existing buffer and flip-flop. And violate the norm, so When the buffers are changed, it can be known that the complexity of the steps is equal to the calculation complexity required by the simulated annealing algorithm. Based on the above analysis, the clock tree synthesis / ¾ different method proposed by the present invention, In the worst case, the computational complexity is ⑽ + state * from 2) + 〇 (P * (M + N) 2 * log2M). Applying the proposed clock tree synthesis algorithm that considers both power consumption and minimum clock skew in the present invention, 'buffer operations of selecting, removing, replacing, and shifting, because the clock tree in the ultra-large integrated circuit 200428240 may contain numbers 10,000 buffers. In order to consider the complexity of the calculation of the program, the step of replacing the buffer is to simultaneously replace the same level of the clock tree to achieve strict clock skew restrictions. Therefore, the goal of reducing the power consumption of the clock tree will reach the approximate maximum. Good solution. Applying the algorithm flow of Figure 3, the table-sort out relevant information provided as a test of the five embodiments of the invention: test case-contains 7 buffers and 123 flip-flops; test case 2 contains 30 buffers And 500 flip-flops; test case 3 contains ΐ3 buffers and 正 flip-flops; test case 4 contains 251 buffers and a flip-flop; test case 5 contains milk buffers and face flip-flops Device. The table lists the power consumption and clock skew of the original clock tree wiring list. Table 2 uses the algorithm proposed by the present invention to analyze each test case 'by limiting the clock distortion rate while observing the change in power consumption. From Table 2, it can be found that the power of the algorithm proposed by the present invention can reduce the power miscellaneousness of the circuit after changing the wiring list. In addition, the right curvature of the π clock is strictly ignored, which will increase the power consumption of the clock tree. However, Still smaller than the original architecture. _ Two towels are revealed, and the new buffer added by the drunk wheel may cause the maximum clock delay from the clock tree root to the leaves (positive and negative) to increase, and the maximum time depends on the minimum time. Next, the material affects the operation of f. In short, 'this miscellaneous axeman, mine, and _ fine skills, a method to quickly determine the type and location of each buffer on the clock tree, can be matched with the set clock tree artificial design process, quick mosquito sewing __ And Jian, so that the hybrid design meets the database design specifications' close time can be used to meet the conditions _. This tool can quickly determine the buffer surface and position calculation tools in the clock tree, including: input and output sequence of each line of the clock tree structure of the sacrifice; Lianyapin, 22 200428240 power consumption, clock delay, output signal conversion time, etc. Characteristics; Initial state setting; When judging, "'Xie Mugou' does not meet the feasible solution of the design specification. If it does not exist, load balancing and buffer-balance methods are used to obtain a feasible solution; quickly determine the type of buffer to reduce the clock Consumption work of the tree: rate; apply the optimization technique combined with simulated annealing method to find the minimum power consumption of the buffer combination and the global best solution. · Only those mentioned above are used only to control the cost of the attack. When the situation cannot be limited, the scope of the present invention is limited. That is to say, the solution and modification of the professional fiber riders who have made a fortune in accordance with this book will still not

失本發狀要細在’林雜树婦和朗,_絲為本發明 的進一步實施狀況。 【圖式簡單說明】 圖一為時鐘樹網路展開圖。 圖二為一基本之1C設計佈局流程圖。 圖三為特定實施娜行本發明之演算卫輸端出方塊圖。 圖四為特定緩衝器資料庫中連線結構對功率消耗分析之示 圖五為時鐘樹中缓衝器連接結構之示意圖。 wπ ,六f缓衝器延遲值、輸出訊號轉換訊號及功顿耗的資料庫查 鼻不意圖。 一又口丁 # (b)^^ 〇 圖八為本發明貫施可行解步驟流程圖。 圖九為時鐘樹負載平衡步驟之示意圖,⑻為平衡前 圖十為快速決定低耗能時鐘樹緩衝器組合之流程圖。 '< 。 圖十二為以模擬退火法為基礎之低耗能時鐘樹最佳化演算法虛 =退火法為基礎之低耗能時鐘樹最佳化演算法流程圖。 擬碼 十、申請專利範圍 23Loss of hair loss is detailed in 'Linza Shufu and Lang,' which is the status of further implementation of the present invention. [Schematic description] Figure 1 is an expanded view of the clock tree network. Figure 2 is a basic 1C design layout flowchart. Figure 3 is a block diagram of the output of a computing satellite that implements the present invention. Figure 4 shows the analysis of the power consumption of the connection structure in the specific buffer database. Figure 5 shows the connection structure of the buffer in the clock tree. wπ, a database of six f buffer delay values, output signal conversion signals, and power consumption is not intended to check the database.一 又 口 丁 # (b) ^^ 〇 FIG. 8 is a flowchart of steps for implementing feasible solutions of the present invention. Figure 9 is a schematic diagram of the clock tree load balancing steps. Figure 10 is a flowchart before quickly balancing low-energy clock tree buffer combinations. '<. Figure 12 is a flowchart of a low-energy consumption clock tree optimization algorithm based on the simulated annealing method. Draft code X. Scope of patent application 23

Claims (1)

200428240 功率消耗、時鐘延遲、輸出訊號轉換時間等特性;初始狀態設定;判斷時· ’谢木構疋否符合设計規範之可行解,若不存在則應用負載平衡及缓衝器-平衡之方法以得-可行解;快速決定緩衝器種類以降低時鐘樹之消耗功: 率;應用結合模擬退火法之最佳化技巧,求得最小功率消耗之缓衝器組合· 全域最佳解。 · 唯以上所述者,僅用以制本發作輕,當不能狀限制本發 明的範圍。即大凡依本發财請專纖騎做之解及修飾,仍將不200428240 Power consumption, clock delay, output signal conversion time, and other characteristics; initial state settings; when judging whether "Xie Mugou" meets the design solution's feasible solution, if it does not exist, load balancing and buffer-balance methods are used to obtain -Feasible solution; quickly determine the type of buffer to reduce the power consumption of the clock tree: rate; apply optimization techniques combined with simulated annealing method to find the minimum power consumption buffer combination and global best solution. · Only those mentioned above are used only to control the cost of the attack. When the situation cannot be limited, the scope of the present invention is limited. That is to say, the solution and modification of the professional fiber riders who have made a fortune in accordance with this book will still not 失本發狀要細在’林雜树婦和朗,_絲為本發明 的進一步實施狀況。 【圖式簡單說明】 圖一為時鐘樹網路展開圖。 圖二為一基本之1C設計佈局流程圖。 圖三為特定實施娜行本發明之演算卫輸端出方塊圖。 圖四為特定緩衝器資料庫中連線結構對功率消耗分析之示 圖五為時鐘樹中缓衝器連接結構之示意圖。 wπ ,六f缓衝器延遲值、輸出訊號轉換訊號及功顿耗的資料庫查 鼻不意圖。 一又口丁 # (b)^^ 〇 圖八為本發明貫施可行解步驟流程圖。 圖九為時鐘樹負載平衡步驟之示意圖,⑻為平衡前 圖十為快速決定低耗能時鐘樹緩衝器組合之流程圖。 '< 。 圖十二為以模擬退火法為基礎之低耗能時鐘樹最佳化演算法虛 =退火法為基礎之低耗能時鐘樹最佳化演算法流程圖。 擬碼 十、申請專利範圍 23 200428240 1· -種同步考慮辦鐘歪轉及低耗能之時鐘樹合缸具,其步驟包含: ㈣含錢蘭瓣數資訊與緩 衝斋時序及功率資料庫資訊,· (b)輸入時鐘歪曲率的時序規範上限值,· A又找I蠢以’計減號經由時鐘樹根輪人峨後,至時鐘樹各 個緩衝器及正反器之轉入轉換時間,以及各緩衝器之輸出負載值,判斷 該緩衝器組合是否符合特定資料庫之設計規範; (d)時鐘樹貞齡衡’酿敍特定㈣輕計概之情況,奴時鐘歪曲 率大於没計限制之情形; ⑹變更緩衝ϋ種類快速尋求可行解,以降低時鐘樹上之功率消耗; ⑦利用針對特定資料庫之經驗法則,應用以模擬退火法為基礎之最佳化過 ί +找Βτ鐘樹上緩衝器連接方式之組合,以得時鐘樹上功率消耗之全 域最佳解。 •如申凊專利範圍第1項所述之一種同步考慮低時鐘歪曲率及低耗能之時籲 鐘树合成工具’其設計規範驗言正器的組成包含: (a)針對缓衝ϋ之特性,判斷輸人訊號之轉換時暇否符合輯規範; ()針對、之躲,判斷緩衝器輸出貞載是否符合設計規範; 、’寸緩衝為之4寸性’應用輸入讯號之轉換時間及輸出負載,使用内差法 -十异緩衝器之時鐘延遲及輸出訊號轉換時間。 如申凊專觀目第1項所述之—翻步考慮低時鐘涵率及低耗能之時 24 、樹。成卫具’其時鐘樹負載平衝步驟的組成包含: (a) =斷"輸轉嶋,树鴨财处峨在時鐘樹 < P白層,並以考慮緩衝器負載平衡之目標為重新連接之考旦. (b) 時鐘樹最後一階芦 里, 9 &衝$所貞載之正反11 ’騎慮整斷鐘延遲最小 減少時鐘歪曲率’故平衡時鐘樹最後一階層各緩衝器之輪出負載; 才鐘樹n緩衝器,其時鐘延遲將嚴重影響整體之時鐘歪曲率, 解衡緩魅之輸”鶴有效解決時鐘歪曲率之_,料平衡負載 可改善輸出訊號轉換時間,進而降低整體時鐘樹之功率消耗。、 4. U請翻細第丨俩述__步考慮低時鐘歪轉及低耗能之時鐘 五、〇成工具,其快速尋求緩衝器組合之步驟包含: (a)針對時鐘樹中最後—階層緩衝器之負載,選擇同—種類之緩衝器; ⑼將時鐘樹中所有緩衝器變更絲種類緩衝器,判斷是否達反設計規範; ⑷測試财龍料緩_之義,選满合設計規範且功率消耗最小1 2之 組合為本快速尋求緩衝器組合步驟之解。 25 1 . 如申請專利範圍第丨項所述之步考慮低時鐘歪曲率及低耗能之時 鐘祕合成工具,其尋找全域最佳解之步驟包含: (a) 對於特定之資料庫,其緩衝器特性不盡相同,本發明先行分析資料庫中 元件之特性,以為判定緩衝器種類選擇之依據; 2 (b) 平衡時雜最後-階層之貞載正反^,可祕設計規範巾時鐘歪曲率之 限制之影響’並改善正反器輸入訊號之轉換時間,同時降低整體時鐘樹 200428240 之功率消耗; (C)應用習知技術模擬退火法尋找缓衝器之最佳組合,為考量嚴格之時鐘歪 曲率限制及降低演算法之運算複雜度,緩衝器組合之變化以階層為單 位,尋求功率消耗之局部最佳解; (d)再度改善時鐘樹最後一階層之負載正反器平衡,重新應用模擬退火法尋 求時鐘緩衝器之組合,尋求功率消耗之全域最佳解。 十一、圖式: 如次頁Loss of hair loss is detailed in 'Linza Shufu and Lang,' which is the status of further implementation of the present invention. [Schematic description] Figure 1 is an expanded view of the clock tree network. Figure 2 is a basic 1C design layout flowchart. Figure 3 is a block diagram of the output of a computing satellite that implements the present invention. Figure 4 shows the analysis of the power consumption of the connection structure in the specific buffer database. Figure 5 shows the connection structure of the buffer in the clock tree. wπ, a database of six f buffer delay values, output signal conversion signals, and power consumption is not intended to check the database.一 又 口 丁 # (b) ^^ 〇 FIG. 8 is a flowchart of steps for implementing feasible solutions of the present invention. Figure 9 is a schematic diagram of the clock tree load balancing steps. Figure 10 is a flowchart before quickly balancing low-energy clock tree buffer combinations. '<. Figure 12 is a flowchart of a low-energy consumption clock tree optimization algorithm based on the simulated annealing method. Draft code 10. Application patent range 23 200428240 1 ·-A kind of clock tree combination cylinder that considers clock skew and low energy consumption synchronously, the steps include: ㈣ information including Qianlan petal number and buffer time sequence and power database information (B) Input the upper limit of the timing specification of the clock distortion rate, and A will find I again and use the 'minus sign' to go through the root of the clock tree, and then switch to the buffers and flip-flops of the clock tree. Time, and the output load value of each buffer, determine whether the buffer combination meets the design specifications of a particular database;限制 change the buffer ϋ change the type quickly to find a feasible solution to reduce the power consumption on the clock tree; ⑦ use the rule of thumb for the specific database, apply the optimization based on the simulated annealing method + find Βτ The combination of buffer connection methods on the clock tree to obtain the best global solution for the power consumption on the clock tree. • As described in item 1 of the patent application, a clock-tree synthesis tool that considers low clock skew and low power consumption synchronously 'its design specification checker composition includes: (a) Characteristics, to determine whether the conversion time of the input signal meets the editing specifications; () For, to hide, determine whether the output of the buffer meets the design specifications; and, “4 inch of the buffer” is the conversion time of the input signal And output load, using the internal delay method-ten different buffer clock delay and output signal conversion time. As described in item 1 of Shen Jing's Monograph—step through the time when considering low clock simplification and low energy consumption. Cheng Weiqi's clock tree load balancing step consists of: (a) = break " Lost &Turn; tree duck finance department in the clock tree < P white layer, with the goal of considering buffer load balancing as Rebirth of Kao Dan. (B) The last order of the clock tree, Luli, 9 & rushed $ 11 positive and negative 11 'riding the clock to minimize the clock delay to reduce the clock distortion rate, so balance the last level of the clock tree The load of the buffer turns out; only the clock tree n buffer, its clock delay will seriously affect the overall clock distortion rate, unbalance the charm of the slow "effectively solve the clock distortion rate _, material balance load can improve the output signal conversion Time, thereby reducing the power consumption of the overall clock tree. 4. Please review the first two steps __ Steps to consider low clock skew and low power consumption clock Five, a tool that quickly seeks buffer combinations Include: (a) Select the same type of buffer for the load of the last-hierarchical buffer in the clock tree; ⑼ Change all buffers in the clock tree to silk type buffers to determine whether they meet anti-design specifications; ⑷ Test Cailong Material slow _ meaning, election full The combination of design specifications with a minimum power consumption of 12 is a solution for quickly seeking buffer combination steps. 25 1. As described in step 丨 of the scope of patent application, a clock secret synthesis tool that considers low clock skew and low power consumption, The steps for finding the global best solution include: (a) For a specific database, the buffer characteristics are different. The present invention first analyzes the characteristics of the components in the database as a basis for determining the type of buffer; 2 (b ) Balance the last-level chaos of the time series, which can be used to limit the distortion of the clock, and improve the conversion time of the input signal of the flip-flop, while reducing the power consumption of the overall clock tree 200428240; (C ) Applying the conventional technology simulated annealing method to find the best combination of buffers. In order to consider the strict clock distortion limit and reduce the computational complexity of the algorithm, the change of the buffer combination is based on the hierarchy, and the local maximum of power consumption is sought. Good solution; (d) Improve the load flip-flop balance at the last level of the clock tree again, and re-apply the simulated annealing method to find the combination of clock buffers. Seek global optimal solution the power consumption of the XI drawings: Page summarized as follows
TW93124824A 2004-08-18 2004-08-18 A clock tree synthesizing tool synchronously considering low clock skew and low power consumption TWI244015B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW93124824A TWI244015B (en) 2004-08-18 2004-08-18 A clock tree synthesizing tool synchronously considering low clock skew and low power consumption

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW93124824A TWI244015B (en) 2004-08-18 2004-08-18 A clock tree synthesizing tool synchronously considering low clock skew and low power consumption

Publications (2)

Publication Number Publication Date
TW200428240A true TW200428240A (en) 2004-12-16
TWI244015B TWI244015B (en) 2005-11-21

Family

ID=37154671

Family Applications (1)

Application Number Title Priority Date Filing Date
TW93124824A TWI244015B (en) 2004-08-18 2004-08-18 A clock tree synthesizing tool synchronously considering low clock skew and low power consumption

Country Status (1)

Country Link
TW (1) TWI244015B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404352A (en) * 2014-09-11 2016-03-16 北京华大九天软件有限公司 Method for inspecting bottleneck in clock tree synthesis result to improve synthesis quality
CN105930591A (en) * 2016-04-26 2016-09-07 东南大学 Realization method for register clustering in clock tree synthesis
CN112464612A (en) * 2020-11-26 2021-03-09 海光信息技术股份有限公司 Clock winding method and device and clock tree
CN114117974A (en) * 2020-08-31 2022-03-01 深圳市中兴微电子技术有限公司 Chip clock driving unit external member and design method and chip

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404352A (en) * 2014-09-11 2016-03-16 北京华大九天软件有限公司 Method for inspecting bottleneck in clock tree synthesis result to improve synthesis quality
CN105404352B (en) * 2014-09-11 2018-05-11 北京华大九天软件有限公司 It is a kind of to check clock tree synthesis result bottleneck so as to the method for improving comprehensive quality
CN105930591A (en) * 2016-04-26 2016-09-07 东南大学 Realization method for register clustering in clock tree synthesis
CN114117974A (en) * 2020-08-31 2022-03-01 深圳市中兴微电子技术有限公司 Chip clock driving unit external member and design method and chip
CN112464612A (en) * 2020-11-26 2021-03-09 海光信息技术股份有限公司 Clock winding method and device and clock tree
CN112464612B (en) * 2020-11-26 2023-01-24 海光信息技术股份有限公司 Clock winding method and device and clock tree

Also Published As

Publication number Publication date
TWI244015B (en) 2005-11-21

Similar Documents

Publication Publication Date Title
US9117044B2 (en) Hierarchical verification of clock domain crossings
US7694242B1 (en) System and method of replacing flip-flops with pulsed latches in circuit designs
CN115017846B (en) Interface-based time sequence repairing method, device and medium
US7917882B2 (en) Automated digital circuit design tool that reduces or eliminates adverse timing constraints due to an inherent clock signal skew, and applications thereof
Kanng et al. Timing margin recovery with flexible flip-flop timing model
Gibiluka et al. A bundled-data asynchronous circuit synthesis flow using a commercial EDA framework
US9672008B2 (en) Pausible bisynchronous FIFO
TW200428240A (en) A clock tree synthesizing tool synchronously considering low clock skew and low power consumption
Simatic et al. A practical framework for specification, verification, and design of self-timed pipelines
Beheshti-Shirazi et al. A reinforced learning solution for clock skew engineering to reduce peak current and IR drop
JP5444985B2 (en) Information processing device
Baddam et al. Divided backend duplication methodology for balanced dual rail routing
Chuang et al. Synthesis of PCHB-WCHB hybrid quasi-delay insensitive circuits
US9275179B2 (en) Single event upset mitigation for electronic design synthesis
Saito et al. A floorplan method for asynchronous circuits with bundled-data implementation on FPGAs
Garg Common path pessimism removal: An industry perspective: Special session: Common path pessimism removal
Yang et al. An improved mesh topology and its routing algorithm for NoC
Gangwar et al. Hardware/software co-design of a high-speed Othello solver
Turki et al. Partitioning constraints and signal routing approach for multi-fpga prototyping platform
Chakrabarti Clock tree skew minimization with structured routing
Lin et al. NBTI and leakage reduction using ILP-based approach
Gaurav et al. RTL to GDSII Implementation of RADIX-4 Booth Multiplier
Liu et al. Implementation of AES as a CMOS core
US10691861B2 (en) Integrated circuit design
Wang et al. High-level power estimation model for SOC with FPGA prototyping

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees