Background technology
Very lagre scale integrated circuit (VLSIC) (being designated hereinafter simply as VLSI) is widely used in high-tech areas such as communication, computer, network.Abroad, VLSI design and manufacturing technology are ripe, but relate to hundreds of thousands door, VLSI (very large scale integrated circuit) chip when design of millions of, because in the complex optimum, the delay of interconnection line is to adopt the numerical value of estimating, timing conflict after timing conflict when therefore comprehensive is finished with placement-and-routing will inevitably cause certain difference, and how just can make the conflict of the two sequential consistent is a very popular technology, also is a difficult problem of solution timing conflict.Therefore, each company hush-hushes the top-secret technology of the technology that solves timing conflict as core.
Benefit China (Cadence) company announces that on April 19th, 2000 PKS (Envisia TM Physically Knowledgeable Synthesis) the physical synthesis product of their inventions has obtained the invention Grand Prix of EDNS announcement in 1999.The key problem that method solved of this PKS comes down to just the interconnection line in the placement-and-routing to be taken into account time of delay when comprehensive, thereby has reduced to optimizing the process that sequential need constantly iterate.For this reason, with complex optimum, placement-and-routing is created in the unified physical database, so that the sequential behind the complex optimum can be consistent with the sequential after the placement-and-routing, thereby realizes optimization synthesis, place and route simultaneously.Because the method for this PKS need be included whole environment of placement-and-routing in the Front-end Design in, therefore need huge data processing and powerful hardware supports.
The lsi development flow process of existing technology is designed to example with nearly 500,000 DSP (that is, digital signal processor), as shown in Figure 1.Conventional development process comprises the following steps: global design; Functional simulation; Carry out complex optimum according to the comprehensive storehouse that the standard cell sequential generates; Judge whether satisfy sequential, return whole design procedure when not satisfying, enter the 1st static timing analysis when satisfying and carry out gate leve emulation according to simulated library; Judge static timing analysis and gate leve emulation correctness; Carry out placement-and-routing according to the placement-and-routing storehouse when correct; Whether are the 2nd static timing analysis and post-simulation and correct judgment then; Generate the data format (GDSII) of making mask when correct, and carry out logic diagram domain consistency check (LVS), Design Rule Checking (DRC) and electricity rule are checked (ERC).By after can enter chip manufacturing.
Placement-and-routing recited above step, placement-and-routing's flow process with existing deep-submicron silicon standard cell placement wiring unit (Silicon Ensemble) is an example, as shown in Figure 2, it mainly comprises the following steps: by planning that whole power supply, dispensing unit etc. carry out layout; Produce clock trees; Check static timing; When static timing satisfies, in layout optimizer, carry out layout optimization; Behind the layout optimization, carry out power-supply wiring, clock routing and unit wiring successively, thereby form the wiring of the final domain of full chip.
At present, clock trees can adopt clock trees maker (Cadence Design Systems Inc.'s product: CT-Gen) generate.This clock trees maker is the data according to layout, uses the wire delays time estimate, and adds the buffer of insertion, reduces the flow process of the time difference (clock skew) that clock reaches sequential element.Such clock trees product process, to band unequally loaded clock, insertion value time of delay that it generated is also different.In addition, after clock trees produces, if static timing inspection, its sequential does not reach when requiring, carrying out layout optimizer (Placemud Based Optimigation) immediately is optimized, can roll up the additional buffer when optimizing like this, and often once not reach the sequential requirement, also need to iterate repeatedly.
In addition, the comprehensive valuable product of this PKS, the detailed technology content also can not disclose.Domesticly do not see similar report article as yet about the sequential after comprehensive and the solution of the inconsistent problem of sequential after the placement-and-routing.
Summary of the invention
The manufacture method that the purpose of this invention is to provide a kind of large scale integrated circuit, pick up under the multiple clock situation that may relate in ten thousand gate leves and even the millions of gate level circuit in order to eliminate number, owing to each clock with the different time-delay differences that cause inserting clock of sequential element quantity, so that under the different mode working condition, the timing conflict that forms during by the Port Multiplier circuit.
For achieving the above object, the manufacture method of very lagre scale integrated circuit (VLSIC) of the present invention, comprise: global design, functional simulation, complex optimum, static timing inspection and gate leve emulation, placement-and-routing, pre-buried a plurality of delay elements, detect a plurality of testing circuits, adjust sequential, and output system mask form, make the step of chip, it is characterized in that described placement-and-routing step also comprises later on: the step of setting up the balanced clock tree, a plurality of clocks are converted to a minority clock, regulation constraints time of delay, the structure of definition clock trees; The step that first static timing is checked; Produce the step of data tree, according to the path that the conflict sequential is arranged in the chip, the root of the maximum path of specified data time of delay is stipulated described time of delay of constraints, the structure of definition of data tree; The step that second static timing is checked; Described pre-buried step is used for burying underground respectively a plurality of delay cells and a plurality of testing circuit; Described detection step detects testing circuit, determines whether to exist timing conflict, and the location has the macroblock and the library unit of timing conflict; And the step of described adjustment sequential, be used for a plurality of delay cells that described pre-buried step is imbedded are inserted described macroblock and library unit with timing conflict.
Manufacturing method according to the invention, and then provide a kind of method that generates balanced clock tree, comprise and analyze a plurality of clocks with sequential time delay conflict, the insertion of a plurality of clocks is postponed to be converted to the insertion delay of a kind of clock, and according to constraints time of delay of integrated circuit, definition clock trees structure.
Generation data tree method of the present invention, in order to solve the contradiction that clock trees generates back and related data delay, thereby reached the timing conflict that makes behind the complex optimum can with timing conflict after the placement-and-routing near consistent, avoided wanting the process of continuous iterative optimization, realized the design cycle of one-step optimization for optimizing sequential.Therefore this method is different from PKS, and it does not need whole environment of placement-and-routing are included in the Front-end Design, thereby does not need huge data processing and powerful hardware supports to reduce cost.The present invention is only on the basis of using static timing analyzer (Pearl) and deep-submicron placement-and-routing device, achieve the goal according to producing balanced clock tree program and data tree program, need not to spend a large amount of funds, thereby quickened the process of design, saved the time, optimize area, strengthened competition capability.
Below, in conjunction with each accompanying drawing, describe embodiments of the invention in detail, make goal of the invention of the present invention and advantage become clearer.
Embodiment
At first, with reference to Fig. 3, VLSI development process of the present invention is described.The development process of development process of the present invention and prior art compares as shown in Figure 1, has step equally: global design; Functional simulation; Carry out complex optimum according to comprehensive storehouse; Judge whether satisfy sequential, return whole design procedure when not satisfying, enter the 1st static timing analysis when satisfying and carry out gate leve emulation according to simulated library; Judge the 1st static timing analysis and gate leve emulation correctness; Carry out placement-and-routing according to the placement-and-routing storehouse when correct; Whether are the 2nd static timing analysis and post-simulation and correct judgment then; Generate GDSII when correct, carry out LVS, DRC and ERC; And in order to make chip.In addition, do not exist together following aspect is arranged.
In placement-and-routing's step, increased the pre-buried step of burying a plurality of testing circuits underground and burying various delay cells and buffer in advance underground.Testing circuit comprises various memories, register, counter, frequency divider, trigger etc. and so on and writes out its test port.Delay cell comprises resistance, electric capacity, buffer, delay line etc.
And, after chip manufacturing is come out, through in survey, if find sequential incorrect or not to the time, by detecting this testing circuit, find out or the locate failure sequential element, promptly feed back to testing circuit locate failure sequential element rather than feed back to whole design procedure.Then,, increase 4-5 step process manufacturing procedure, can reach the element of revising this inefficacy sequential by revising 2-3 piece mask.
And then, in placement-and-routing's flow process, except that comprising equally: the layout of the whole power supply of implementary plan, dispensing unit etc.; Produce clock trees; Check static timing; When static timing satisfies, carry out layout optimization in layout optimizer; Behind the layout optimization, carrying out power-supply wiring, clock routing and unit wiring successively forms outside the final domain wiring of full chip, after the layout step, also has the step that produces the balanced clock tree, in addition, after the step of checking static timing, increase the data tree program that produces, generation data tree and another time static timing and check step.
In more detail, key step comprises:
At first,, determine the wide length of various MOS transistor and the type and the quantity of component library, set up relevant domain storehouse, timing sequence library, comprehensive storehouse, simulated library according to CMOSFET integrated circuit (IC) logic function and scale, among the DSP for example, SRAM carries out global design.
Here, this timing sequence library is that each unit and macroblock are made the sequential that draws after the SPICE simulation.This comprehensive storehouse comprises standard cell used when logic synthesis is optimized, the function of I/O unit and macroblock and timing sequence library.Simulated library is exactly length of delay according to the timing sequence library formed function timing sequence library of receptible form when being converted into analogue simulation.
Secondly, describe the function of circuit with VHDL after, carry out complex optimum and generate Verilog net table, and pre-buried various delay, buffer, testing circuit, carry out placement-and-routing with deep-submicron standard cell placement device again, write out and produce the balanced clock tree program, and generate clock trees, carry out the static timing inspection subsequently, the path of listing timing conflict.And then, according to the clock trees program, produce corresponding data tree, and after optimizing once more, can connect up, thereby filled and led up the difference of timing conflict between comprehensive and the placement-and-routing.
Next, generate parasitic parameter file * sdf, and reactionary slogan, anti-communist poster carries out post-simulation in Verilog net table, and after making Design Rule Checking (DRC), electricity rule and checking that (ERC) and logic diagram are to domain consistency check (LVS), output GDSII makes the mask form.
Then, with the mask of above-mentioned acquisition silicon chip is carried out processes and middle test.In test, in case find among macroblock and library unit, to have retention time (holdtime) conflict, then can be positioned with the macroblock and the library unit of conflict according to testing circuit, only need change 2-3 piece mask, pre-buried delay cell is inserted among macroblock and the library unit, can adjust its time of delay, eliminate time conflict therebetween.
According to placement-and-routing of the present invention flow process, because before producing clock trees (clock tree) step, the step that increase " produces the balanced clock tree " is inserted the different problem of delay thereby eliminate different clocks.After clock trees produces, if static timing inspection, its sequential does not reach when requiring, not the timing optimization that at once carries out full chip, but write out the program of data tree, realize data tree by the function of " generation clock trees " according to the circuit meshwork list that timing conflict is arranged, thereby eliminated timing conflict to the full extent, remake full chip optimization then,, need not remake the optimization that iterates repeatedly as long as spend a spot of buffer can satisfy the sequential requirement like this.
Produce the balanced clock tree program, as shown in Figure 5, it comprises: the situation of analyzing N clock; An above-mentioned N clock is converted to M clock; Given corresponding constraints, it comprises maximum delay time, minimum delay time and maximum clock input slope etc.; And the structure of definition clock trees, comprise root, leaf etc., thereby when having solved a plurality of clock different modes under work, the timing conflict of formation promptly, is eliminated different clocks and is inserted and postpone difference, reach in the integrated circuit between the various piece sequential and mate.And N and M are positive integer and N>M.N=5 for example, M=1 or 2 o'clock convert 5 clocks to the tree structure of 1 or 2 clock exactly.
Generation tree as shown in Figure 6, comprises the following steps: to have in the analysis chip path of conflict sequential according to the number program; The root in maximum (that is, the longest) path of specified data time of delay and define this root for tree according to root; Stipulate corresponding constraints, for example maximum delay time, maximum clock input slope etc.; And the structure of definition of data tree, comprise root, leaf etc., thereby write out generation tree according to the number program, produce data tree structure.After the inspection of another time static timing, enter the layout optimization step.
For instance,, be provided with four clock sources, and require to reach 40,000,000 operating frequencies in order to design an about dsp chip of 500,000.When adopting deep-submicron standard cell placement wiring unit to carry out placement-and-routing's flow scheme design, because of four clock sources are arranged, a large amount of retention times and conflict settling time appear after the placement-and-routing.It is 82% that chip is optimized back row (Row) utilance again, so the timing conflict of settling time and retention time can't be eliminated.
If adopt flow process of the present invention, solve the bell for a long time timing conflict that becomes by producing balanced clock tree program and data tree program, and filled and led up timing conflict between comprehensive and the placement-and-routing, just can after placement-and-routing, reach 40,000,000 operating frequencies comprehensively.And, eliminated the timing conflict of whole settling times and retention time after, the utilance only 76% of row has relatively reduced by 8% unit with above-mentioned utilance 82%, is equivalent to approximately reduce 40,000 gate circuits.
More than, though by making DSP with the CMOSFET integrated circuit technique is example, specifically disclosed the manufacture method of very lagre scale integrated circuit (VLSIC) of the present invention, but, the present invention not merely is defined in this, obviously, the those of ordinary skill of this semiconductor applications, after understanding above-mentioned argumentation, be not difficult the present invention is made various modifications, replace or retouching, for example the present invention is used for MOS type integrated circuit, bipolar integrated circuits etc. can be filled and led up timing conflict between comprehensive and the placement-and-routing by producing balanced clock tree program and data tree program equally, thereby optimize very lagre scale integrated circuit (VLSIC).Therefore, to any such modification of the present invention, replacement or retouching, should not think to have broken away from the scope of patent protection that design of the present invention and claims limit.