CN113095033B

CN113095033B - Superconducting RSFQ circuit layout method for dual-clock architecture

Info

Publication number: CN113095033B
Application number: CN202110442343.3A
Authority: CN
Inventors: 黄俊英; 张阔中; 叶笑春; 张志敏; 范东睿
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2023-07-21
Anticipated expiration: 2041-04-23
Also published as: CN113095033A

Abstract

There is provided a layout method of a superconducting RSFQ circuit for a dual-clock architecture, the total number of logic cells in the circuit excluding input IO and output IO being N, and an aspect ratio of a chip for laying out the circuit being α, the layout method comprising: performing initial layout on N logic units based on logic depth, including: calculating a reference height of a layout columnThe logic cells are arranged in order from a logic depth of 1 such that the cells of each logic depth are arranged in order of increasing vertical direction and the height of each column is not greater than H ₀ The different logic depths are arranged starting from the new column; the number of units is less than H ₀ Sequentially combining columns of (a) and the height of the combined columns is not greater than H ₀ The method comprises the steps of carrying out a first treatment on the surface of the And removing the empty columns and outputting the initial coordinates of the N logic units on the chip and the columns capable of being laid out; the initial layout is perturbed and optimized based on the simulated annealing layout framework.

Description

Superconducting RSFQ circuit layout method for dual-clock architecture

Technical Field

The invention relates to the field of superconducting circuits, in particular to a superconducting RSFQ circuit layout method for a double-clock architecture.

Background

Superconducting single flux quantum (Single Flux Quantum, SFQ) circuit technology is listed by ITRS as a very promising next generation integrated circuit technology. The superconducting fast single flux quantum (Rapid Single Flux Quantum, RSFQ) circuit is one of the SFQ circuits and has the advantages of ultra-high speed and ultra-low power consumption. Studies have shown that simple RSFQ circuits fabricated using sub-micron josephson junction (Josephson Junction, JJ) technology can operate at frequencies up to 770GHz, which is difficult to achieve with semiconductor integrated circuits. Moreover, under the same process conditions, both logic gate delay and bit operation power consumption in RSFQ circuits are two orders of magnitude lower than corresponding semiconductor circuits.

The most basic device in the RSFQ circuit is a superconducting loop composed of JJ, which is a switching element. Unlike CMOS circuits, the storage component of RSFQ circuits is an inductance rather than a capacitance. Quantization of magnetic flux in superconducting ring to Φ=n=Φ ₀ Wherein Φ ₀ ＝2.07×10 ^-15 Wb. Information is stored in the form of flux quanta and transmitted in the form of SFQ voltage pulses. The presence of a pulse indicates a logic "1" and the absence indicates a logic "0". Unlike CMOS circuits, in RSFQ logic circuits, almost all logic cells require clock driving to propagate stored flux quanta to the output. Since an RSFQ logic gate can be considered a one-stage pipeline, an RSFQ circuit is a fully-gated pipeline for this purpose, while logic depth refers to the number of stages with clocked logic gates.

In order to fully exploit the ultra-high frequency (tens or hundreds of GHz) advantages of RSFQ devices, researchers have proposed clock mechanisms suitable for RSFQ circuits, including clock-following data (clock-following-data clock), zero-skew clock (zero-skew clock), and concurrent clock (clock-flow clock). The zero offset clock is a clock mechanism adopted in a semiconductor circuit, and the concurrent clock, i.e. the clock and the data flow in the same direction, is a clock mechanism capable of obtaining the highest circuit frequency.

To ensure that the RSFQ logic gate functions correctly, the logic depth of the logic gates to which all its inputs are connected should be the same, a constraint called path balancing. If the logic depths of the fanin gates are different, a Flip-flop (DFF) should be inserted at the output of the fanin gate having the smaller logic depth. Therefore, the conventional design method of the RSFQ circuit is to ensure the correct operation of the circuit by inserting a large number of flip-flops. Recently, researchers have proposed a new architecture for implementing RSFQ circuits using fast and slow clock signals, referred to as a dual clock architecture, see in particular chinese patent application publication CN112116094a. In this new architecture, the flow of data is controlled by a double clock so that the correct operation of the RSFQ circuit can be ensured without inserting any path balanced DFF. This new architecture can save a lot of circuit area and power consumption cost considering that the number of path balanced DFFs inserted in a typical RSFQ circuit is several times that of a normal logic gate.

On the one hand, although some researches are carried out on the layout method of the double-clock superconducting RSFQ circuit, the work is to use zero deviation clock of the semiconductor circuit, and the concurrent clock mechanism of the RSFQ circuit is not considered, so that the work frequency of the circuit after layout is not high enough. On the other hand, the traditional superconducting RSFQ circuit (i.e. the RSFQ circuit of a non-dual-clock architecture) layout method does not consider the circuit characteristics of large cell number difference of each logic depth in the dual-clock architecture, so that the circuit area overhead after layout is large. Therefore, none of the existing layout methods is suitable for a dual-clock architecture superconducting RSFQ circuit.

Disclosure of Invention

Based on the above-mentioned drawbacks of the prior art, the present invention provides a layout method for a superconducting RSFQ circuit of a dual-clock architecture, in which the total number of logic cells excluding input IO and output IO is N, the aspect ratio of the chip on which the circuit is laid out is a,

the layout method comprises the following steps:

performing initial layout on N logic units based on logic depth, including:

calculating a reference height of a layout column

The logic cells are arranged in order from a logic depth of 1 such that the cells of each logic depth are arranged in order of increasing vertical direction and the height of each column is not greater than H ₀ The different logic depths are arranged starting from the new column;

the number of units is less than H ₀ Sequentially combining columns of (a) and the height of the combined columns is not greater than H ₀ The method comprises the steps of carrying out a first treatment on the surface of the And

removing the empty columns and outputting the initial coordinates of the N logic units on the chip and the columns capable of being laid out; and

the initial layout is perturbed and optimized based on a simulated annealing layout framework.

Preferably, the step of perturbing and optimizing the initial layout based on the simulated annealing layout framework comprises:

calculating the cost of the initial layout;

disturbing the initial layout to generate a new layout solution;

and calculating the cost of the new layout solution, and updating the coordinate values of the N logic units by using the layout solution with lower cost until the layout solution with the minimum cost is obtained as the final circuit layout.

Preferably, the initial layout further comprises:

for each logical depth i, the cells that calculate the logical depth require columns of the full-height layoutWherein blk_num [ i ]]Is the number of cells with logical depth i, each of the C columns starting from the current column is arranged with H ₀ The number of cells in each column is updated to H by the cells with logical depth i which are not laid out ₀ And updates the current column to current column +c.

Preferably, the initial layout further comprises:

the residual undeployed blk_num [ i ] with the logic depth of i]％H ₀ The units are arranged in the current column, and the number of units in the current column is updated to blk_num [ i ]]％H ₀ And updates the current column to current column +1.

Preferably, the initial layout further comprises:

the number of units is less than H ₀ The column numbers of the columns of (1) are stored in an array, and the array [ i+1 ] is stored]～array[i+j]The cell position of the column is adjusted to the array [ i ]]Column, and array [ i+1 ]]～array[i+j]The number of units in the column is set to 0, where array [ i ] before merging]Column, array [ i+1 ]]Array [ i+j ]]The sum of the number of units in the row is equal to or less than H ₀ < array [ i ] before merging]Column, array [ i+1 ]]Array, & array [ i+j ]]A row of,array[i+j+1]The sum of the number of units in the row.

Preferably, the step of perturbing the initial layout further comprises:

the input IOs are arranged at the grid point positions on the left side of the chip, and the output IOs are arranged at the grid point positions on the right side of the chip.

Preferably, the step of perturbing the initial layout further comprises:

when a logic cell can be laid out in a plurality of columns, a column is randomly selected from columns containing only cells of the same logic depth, then a lattice point position is randomly selected on the selected column, the logic cell is exchanged with the cells of the lattice point, and new coordinates are determined.

Preferably, the step of perturbing the initial layout further comprises:

when the number of the column which can be laid out of a certain logic unit is 1 and the number of the units with the same logic depth as the certain logic unit is larger than 1, exchanging the certain logic unit with the units with the same logic depth as the certain logic unit with a certain probability P, and determining a new coordinate, wherein P is more than or equal to 0 and less than or equal to 1.

Preferably, the step of perturbing the initial layout further comprises:

when the number of the column which can be laid out by a certain logic unit is 1 and the number of the units with the same logic depth as the certain logic unit is larger than 1, when two or more macro modules with the same number of the units and different logic depths exist in the layout column, the two macro modules are exchanged integrally with a certain probability of 1-P, and new coordinates of all units in the macro modules are determined, wherein P is more than or equal to 0 and less than or equal to 1.

Preferably, the step of perturbing the initial layout further comprises:

when the number of the column which can be laid out of a certain logic unit is 1 and the number of the units with the same logic depth as the certain logic unit is 1, if a space exists on the layout column, the certain logic unit is moved to a randomly selected space, and a new coordinate is obtained.

The present invention also provides a computer readable storage medium having embodied thereon a computer program executable by a processor to perform the steps of a superconducting RSFQ circuit layout method for a dual-clock architecture as described above.

The present invention also provides an electronic device including: one or more processors; and a memory, wherein the memory is to store one or more executable instructions; the one or more processors are configured to implement the steps of one of the superconducting RSFQ circuit layout methods for a dual-clock architecture described above via execution of the one or more executable instructions.

The superconducting RSFQ circuit layout method for the double-clock architecture adopts the concurrent clock to improve the working frequency of the circuit, and provides an initial layout method based on logic depth and a layout disturbance solving method meeting logic depth constraint to reduce the area of the circuit. Compared with the existing superconducting RSFQ circuit layout method, the superconducting RSFQ circuit layout method for the double-clock architecture provided by the invention can obtain a layout result with a better area under the condition of meeting logic depth constraint, and reduce the area of the circuit. Compared with a zero deviation clock, the layout result of the invention considers the logic depth of the units, so that the wiring stage after layout is easier to realize the time sequence constraint under the concurrent clock, thereby improving the working frequency of the circuit.

Drawings

FIG. 1A is a schematic diagram of a concurrent clock distribution network;

FIG. 1B is an example of two logic gates in the concurrent clock distribution network of FIG. 1A;

FIG. 2 is a schematic diagram of concurrent clock timing constraints;

FIG. 3 is a schematic diagram of a grid-based chip layout area in accordance with one embodiment of the invention;

FIG. 4 is a schematic diagram of a circuit gate level netlist of a 4-bit adder circuit according to one embodiment of the invention;

FIG. 5 is a schematic diagram of the layout result of the 4-bit adder circuit of FIG. 4 using a conventional concurrent clocked superconducting RSFQ layout algorithm;

FIG. 6 is a logic depth based dual clock superconducting RSFQ circuit initial layout algorithm flow chart of one embodiment of the present invention;

FIG. 7 is one example of an initial layout of the 4-bit adder circuit in FIG. 4;

FIG. 8 is a schematic diagram of a net bounding box with 10 end points;

fig. 9 is a final layout schematic of the 4-bit adder circuit of fig. 4.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by means of specific embodiments with reference to the accompanying drawings.

Fig. 1A is a schematic diagram of a concurrent clock distribution network in which clock sources and data are transmitted to logic gates through different signal networks. Fig. 1B is an example of two of the logic gates in the concurrent clock distribution network of fig. 1A, illustrating the timing of the concurrent clocks. As shown in fig. 1B, gate 1 and gate 2 are RSFQ logic gates, and black dots 101 and 102 are Splitters (SPL) for short. Data is transferred to gate 1 and after a clock pulse reaches gate 1, data is output to gate 2. The clock pulse is transmitted to SPL 101, and assuming that the time at which SPL 101 outputs the clock pulse is 0, the clock pulse arrival time (Tclk) of gate 2 is equal to the sum of the delays of line 2, SPL 102 and line 3; the data pulse arrival time (Tdata) of gate 2 is equal to the sum of the delays of line 1, gate 1 and line 4.

For concurrent clock timing, the data pulse arrives later than the clock pulse, i.e., tdata > Tclk, and gate 2 processes and outputs the data pulse when the next clock pulse arrives at gate 2. Thus, if data is transferred in an RSFQ circuit with N stages of logic gates, N clock cycles (n+1 clock pulses) are required. In this way, data can be processed sequentially and consecutively between the gate stage pipelines. In an RSFQ circuit, concurrent clock timing can achieve higher circuit frequencies.

FIG. 2 is a schematic diagram of concurrent clock timing constraints. In the timing design of the concurrent clocks, the timing constraints shown in fig. 2 need to be satisfied. Wherein t is _c Is the time of arrival of the clock at the logic gate, t _d Is the time of arrival of the data at the logic gate, t _cycle Is the clock period, t _hold Is the hold time (hold time) of the logic gate, t _setup Is the setup time of the logic gate. In timing design with concurrent clocks, each logic gate must satisfy a timing constraint: t is t _c +t _hold <t _d <t _c +t _cycle –t _setup . In superconducting RSFQ circuit layout, therefore, reasonable logic cell layout and wiring are required to meet the timing constraints described above. The layout is more compact and the bus length is shorter while the time sequence constraint is satisfied, so that the area of the circuit is reduced. The superconducting RSFQ circuit layout method for the double-clock architecture can realize layout based on logic depth, so that the final layout result is easy to meet concurrent clock time sequence constraint, and a layout result with a better area can be obtained.

Layout is the process of determining the physical location of a logic cell in a circuit on a chip, which generally includes two inputs: 1) The process library file is used for describing the shape, the size, the port position, the time sequence parameters and the like of the logic unit; 2) A circuit gate level netlist describing the connection relationships between logic cells in a circuit. The output of the layout is the specific coordinate location of the logic cells on the chip. The invention adopts a layout mode based on grid points, the logic units can only be placed at the positions of the grid points, and the areas between the grid points are used for wiring of the circuit.

FIG. 3 is a schematic diagram of a grid-based chip layout area in accordance with one embodiment of the invention. The layout in fig. 3 includes three layout columns, layout column 0, layout column 1, and layout column 2, each layout column including 3 grid points. The box in fig. 3 represents a lattice point for placement of RSFQ logic cells (i.e., RSFQ logic devices). The positions of the grid points are located by coordinates (x, y) in a planar rectangular coordinate system. For example, the lattice point position in the lower left corner is (0, 0), which means that the lattice point is at the lattice point position where x=0, and y=0.

FIG. 4 is a schematic diagram of a circuit gate level netlist of a 4-bit adder including circuit inputs (cin, a0-a3, b0-b 3), i.e., input IOs, according to one embodiment of the invention; circuit output (s 0)S 4), i.e. output IO; and logic cells (g 0-g 19). The arrowed lines represent the connection relationships between the logic cells in the circuit. The 4-bit adder circuit in fig. 4 employs a dual clock architecture, so DFF devices do not need to be inserted to achieve clock alignment. In the present invention, the logic depth (level) refers to the number of stages of clocked logic gates. The logic depth level (gi) of a logic cell gi is equal to the cell g to which all its inputs are connected _si Is added by 1, i.e., level (gi) =1+max { level (g) _si ) }. Wherein the logic depth of the input IO is defined to be 0. For example, since all inputs of g0 are input IO, i.e., max { level (g _s0 ) Level (g 0) =1; as another example, g8 has two inputs, a first input from input IO and a second input from g0, i.e., max { level (g _s8 ) Level (g 8) =2, and so on. The 20 logic cells of the gate level netlist in FIG. 4 are divided into 9 logic depths, level 1 through level 9. The maximum logic depth is level 9. The number of logical units of level 1 is 8, i.e. g0-g7, the number of logical units of level 2 is 2, i.e. g8-g9, and so on. For clarity, the logic cells in fig. 4 are each represented by a circle, which may represent different RSFQ logic devices.

Fig. 5 is a schematic diagram of the layout result of the 4-bit adder circuit of fig. 4 using a conventional concurrent clocked superconducting RSFQ layout algorithm. The connecting lines in the figure represent only the connection relations of signals. As can be seen from fig. 5, the conventional method is to place the units with the same logic depth in the same layout column, and if the conventional method is directly applied to a 4-bit adder circuit adopting a dual clock architecture, the circuit area is large, which causes space waste.

The invention provides a superconducting RSFQ circuit layout method for a double-clock architecture, which adopts a zero-deviation clock mechanism in a semiconductor circuit in the existing research in the direction. Meanwhile, the invention allows the units with different logic depths to be placed in the same layout column, so that the area cost of the circuit can be reduced.

In the prior art, the simulated annealing layout framework comprises the following steps: firstly, generating an initial layout, and calculating an initial temperature according to the initial layout; then enters an annealing stage and is divided into an inner loop and an outer loop. The inner loop realizes the disturbance of the solution by executing the exchange of the logic units for a plurality of times at the same temperature, thereby obtaining a new solution, the outer loop judges whether the algorithm exit criterion is satisfied, and if not, the temperature is updated according to the annealing table. The invention provides a new initial layout method and a disturbance method of a layout solution aiming at the characteristics of a double-clock superconducting circuit on the basis of a simulated annealing layout frame.

The initial layout method of the dual-clock superconducting circuit based on the logic depth and the layout solution disturbance method meeting the constraint of the logic depth according to the present invention will be described in detail with reference to fig. 6, algorithm 1 and algorithm 2. Fig. 6 is a logic depth based dual-clock superconducting RSFQ circuit initial layout algorithm flow chart of one embodiment of the present invention, algorithm 1 describing the logic depth based dual-clock superconducting circuit initial layout of one embodiment of the present invention.

Algorithm 1 Dual clock superconducting Circuit initial layout based on logic depth

Input: the process library file, the circuit gate level netlist G (V, E), V represents the logic cells and E represents the signal connections between the cells.

And (3) outputting: initial coordinate position of logic unit on chip, column information that logic unit can be laid out.

As shown in algorithm 1 and fig. 6, the total number of logic cells in the circuit excluding input IO and output IO is represented by N, H ₀ Representing the reference heights of the layout columns, the height of each column does not exceed H in the initial layout of algorithm 1 ₀ . Assuming that the aspect ratio of the chip to be laid out is α, α×h ₀ *H ₀ =n, thusWhen it is desired that the chip after layout is close to square, a=1,algorithm 1 is α=1, +.>An example is described. The logic depths of all logic cells are initialized, for example, the logic depth of the input IO cell is initialized to 0, and the logic depths of other logic cells are initialized to-1 (step S100). current column is used to mark the current layout column with an initial value of 0.blk_num_at_column [ current_column ]]For storing the number of cells that the current layout column has laid out, the initial value is null. Calculating the logic depth of each logic unit from the input IO unit according to the connection relation of the circuit netlist to obtain the maximum logic depth L of all the logic units _max And stores the unit number with logical depth i to blk_num [ i ]]In (step S101 and subroutine computer_block_logic_level (G) of line 2 of algorithm 1), for example, refer to fig. 4, blk_num [2 ]]＝[g8，g9]。

According to the number of the units (blocks) with the logic depth of i, the columns in which the units with the logic depth of i can be laid out are determined. Starting from a logical depth i=1 until the logical depth is i=l _max Column c= [ blk_num [ i ] of full height layout is needed to calculate level i block for each logical depth i]/H ₀ ]I.e. the number of blocks in each of columns C is H ₀ (step S103). Starting from the current layout column, up to the current_column+C-1 column, each column is arranged with H ₀ A block with a level i not laid out (step S105), wherein the H ₀ The blocks are randomly selected from blocks with a level of i which is not laid out, and the block number of each column is updated, namely, block_num_at_column [ j ]]+=h0. The current layout column is then updated to current_column+c. The remaining number of undeployed cells with logical depth i is blk_num [ i ]]％H ₀ The non-laid blk_num [ i ]]％H ₀ Block placement with level iPlaced in the current column, and placed sequentially upwards in ascending order of y-coordinate, wherein the non-laid blk_num [ i ]]％H ₀ The order between blocks is also random. Updating current block number blk_num_atcolumn [ current_column ]]+＝blk_num[i]％H ₀ The current layout column is then updated to current_column+1 (step S107).

The layout completed at this time has a cell count smaller than H ₀ Are all laid out in separate columns, and blk_num [ i ]]％H ₀ The remaining cells obtained are also laid out in separate columns, the number of cells in these columns being relatively small and possibly much smaller than H ₀ . Therefore, it is necessary to merge columns and merge columns having a small number of units.

As shown in algorithm lines 13-27, starting from column 0, if the number of blocks with column number col < H ₀ The column number col of the column is stored in the array until current_column-1 (steps S108 to S110). Therefore, all blocks < H are stored in the array ₀ Is a column number of (c). Combining multiple columns, i.e. array [ i+1 ]]～array[i+j]The block position of the column is adjusted to array [ i ]]Columns such that array [ i ]]The number of blocks of a column does not exceed H ₀ And is closest to H ₀ (step S112). Wherein is closest to H ₀ Refers to array [ i ] before merging]Column, array [ i+1 ]]Column … array [ i+j ]]The sum of the number of units in the row is equal to or less than H ₀ <Array [ i ] before merging]Column, array [ i+1 ]]Columns, … array [ i+j ]]Column, array [ i+j+1 ]]The sum of the number of units in the row. Updating array [ i ] after merging]The number of blocks in a column, and array [ i+1 ]]～array[i+j]The block number of columns is set to 0, i=i+j+1, until i=array.size () -1. From column 0, if the number of blocks in col columns is not equal to 0, the number of columns from column 0 to col column with block number of 0 is calculated and denoted as S, the x-coordinates of all blocks in col column are subtracted by S until the last column (i.e., current_column-1), and the x-coordinates of all blocks are updated (steps S114-S116).

Finally, the initial coordinates of the logic units except the input IO and the output IO on the chip and the column in which the logic units can be laid out are output. Stored in map < block_id, t_block_inf >, wherein block_id is used for storing the number of the logic unit, t_block_inf is used for storing the name of the logic unit, net, x coordinate, y coordinate connected with the logic unit, and column capable of being laid out by the logic unit.

The initial layout method of the superconducting RSFQ circuit for the dual-clock architecture of the present invention will be described with reference to a specific example of the adder of fig. 4. In this example, N is 19, H assuming α=1 ₀ 5. The initial layout process is as follows:

the first step: when i=1, blk_num [1 ] is determined]=8, c=1, so 5 cells in g0-g7 are randomly selected to be arranged to column 0, the positions of these 5 cells are random, current_column=1, blk_num [ i ]]％H ₀ =3, the remaining 3 cells in g0-g7 are randomly arranged in the 1 st column y=0, 1,2 positions; current_column=2 at this time; the placeable columns of g0-g7 are column 0 and column 1;

and a second step of: when i=2, blk_num [2 ] is determined]＝2，C＝0，blk_num[2]％H ₀ =2, g8-g9 2 units are randomly arranged at the position of column 2 y=0, 1, i.e. the order of g8 and g9 2 units is random; current_column=3 at this time;

and a third step of: when i=3, blk_num [3 ] is determined]＝1，C＝0，blk_num[3]％H ₀ =1, g10 is arranged at the position of 3 rd column y=0; current_column=4 at this time;

fourth step: when i=4, blk_num [4 ] is determined]＝2，C＝0，blk_num[4]％H ₀ =2, g11-g12 cells are randomly arranged at the positions of column 4 y=0, 1; current_column=5 at this time; similarly, g13 is arranged at the position of 5 th column y=0, g14 and g15 are arranged randomly at the position of 6 th column y=0, 1, g16 is arranged at the position of 7 th column y=0, g17 and g18 are arranged randomly at the position of 8 th column y=0, 1, and g19 is arranged at the position of 9 th column y=0.

Then, the rows are merged, and the initial layout after merging is shown in fig. 7. The number of units in column 0 is 5 and thus no merging is required, the number of units in column 1 is 3, the number of units in column 2 is 2, therefore columns 1 and 2 are merged to column 1, the number of units in column 2 is set to 0, and so on, columns 3, 4 and 5 are merged to column 3, the numbers of units in columns 4 and 5 are set to 0, columns 6, 7 and 8 are merged to column 6, and the numbers of units in columns 7 and 8 are set to 0. The x-coordinates of all cells are then subtracted by the number of columns with a cell number of 0 in all columns before it to remove columns with a cell number of 0 and the x-coordinates of all cells are updated. One example of an initial layout after merging is shown in fig. 7, but it should be noted that fig. 7 is only one example of an initial layout, and the initial layout is not unique because the layout of cells in the same logic depth is random during the layout process.

The initial layout method of the circuit based on the logic depth comprises the steps of firstly determining the initial height of a layout column according to the total number of logic units in the circuit, determining the layout column which can be allocated by the logic units with the same logic depth by judging the relation between the number of the logic units with the same logic depth and the initial height of the layout column, and if the same layout column contains a plurality of logic depth units, arranging the units according to the same sequence as the sequence of the logic depths so as to facilitate wiring under a concurrent clock; then randomly selecting a position for the cell within the layout column to obtain an initial layout that satisfies the gate level flow characteristics of the superconducting circuit.

A cost (cost) of the initial layout is calculated from the initial coordinate positions. The cost of the invention is equal to the sum of the border wire lengths of all nets, i.e

Where i represents net number, num_nets is net number, and for each net i, bb _x (i) And bb _y (i) Representing the horizontal and vertical spans of its bounding box, respectively. Fig. 8 shows a schematic diagram of a net bounding box with 10 end points. Wherein the dashed box includes 42 grid points, the dark boxes represent the end points, and there are 10 end points in fig. 8. Horizontal span bb between endpoints _x (i)＝X _max -X _min +1=7, vertical span bb _y (i)＝Y _max -Y _min +1=6. When the net end number is greater than 3, the border wire length model underestimates the length of the net and therefore will generally determineThe value of the sense compensation factor q (i) depends on the number of net i's end points. Usually, the compensation factor q (i) is calculated in advance and stored in an array, and different q (i) values can be obtained according to the corresponding endpoint number. Although the line length is used as the optimization target in the present invention, the present invention is not limited thereto, and other optimization targets such as delay, power consumption, etc. may be defined as needed in practical applications.

After obtaining the cost of the initial layout, perturbing the layout solution by the algorithm 2 to obtain the layout solution with the minimum cost as the final layout.

Algorithm 2. Layout solution disturbance method meeting logic depth constraint

Input: the gate level netlist G (V, E), V representing the logic cells, E representing the signal connections between the cells, initial layout results.

And (3) outputting: novel layout solution S _new 。

As shown in algorithm 2, a new coordinate location is found for all logic cells in the circuit. The total number of logic cells in the circuit, including input IO and output IO, is denoted by M. If the cell is an input IO, a trellis point location is selected on the left side of the chip, and if the cell is an output IO, a trellis point location is selected on the right side of the chip (lines 1-6 of Algorithm 2). If it is another cell, it needs to be processed separately according to the number of the cell layout columns.

If the cell (block j) can be laid out in multiple columns (greater than 1 column), the columns may contain only cells of the same logic depth, or cells of different logic depths. At this time, the present invention performs disturbance of the layout solution on those columns containing only cells of the same logic depth, first randomly selecting one column among columns containing only cells of the same logic depth,then randomly selecting a lattice point position on the selected column, exchanging the block j with the block of the lattice point, and determining their x _new 、y _new Coordinates (algorithm 2 line 8).

If the number of columns in which block j can be laid out is 1 and the number of cells having the same logical depth as block j is greater than 1 (algorithm 2, line 9), a random number R between 0 and 1 can be generated with a certain probability P (0.ltoreq.P.ltoreq.1), i.e. R<P, exchanging the blocks j with the blocks with the same logic depth to determine x _new 、y _new Coordinates (algorithm 2 line 10); when R is more than or equal to P, if two or more macro blocks with equal block numbers and unequal logic depths exist in the layout column, the two macro blocks are exchanged integrally, and x of all blocks in the macro blocks is determined _new 、y _new Coordinates (algorithm 2 line 11). Here, the macroblock refers to a circuit block formed by a plurality of blocks located in the same layout column and having the same logic depth, and for example, g11 and g12 in fig. 7 may be one macroblock. Further, a plurality of consecutive empty dots may be regarded as one macroblock, and the positions are exchanged with the same number of macroblocks in the same column. If the number of units with the same logic depth as block j is also 1, namely the units with the same logic depth as block j are only self, if the layout has a vacancy, the units are moved to a randomly selected vacancy to obtain a new coordinate x _new And y _new Coordinates (algorithm 2 line 12).

In algorithm 2, different probability values (i.e., different P values) can equalize the computation time and the convergence of the cost. It is assumed that the number of cells with a logical depth of 1 is 2, the number of cells with a logical depth of 2 is 2, and the number of cells with a logical depth of 3 is 1 in the same column. Then the 2 units with logic depth 1 can be interchanged with each other, and the macro block composed of the 2 units with logic depth 1 can be interchanged with the macro block composed of the 2 units with logic depth 2 as a whole. The overall exchange makes the cost reduce faster, but the calculation time is longer, so the internal exchange is judged by the probability P, and different probability values can balance the calculation time and the convergence speed of the cost compared with the time of the overall exchange of the macro module.

Then, a new layout solution S is calculated according to equation (1) _new Is a cost of (c). If the cost of the initial layout is lower, the coordinate value of the logic unit obtained by the initial layout is unchanged, if the new layout solution S _new Lower cost, according to the new layout solution S _new Updating the coordinate values of the logic unit. All the placeable x and y coordinates are cycled through to get the cost-minimum layout solution as the final layout. Fig. 9 is a schematic diagram of the final layout of the 4-bit adder circuit of fig. 4, which can be obtained by location exchange based on fig. 7.

It should be noted that the cells are arranged in the same order as the order of the logic depths by taking the column direction (i.e., the vertical direction) as an example in the above-described embodiment of the present invention, but it should be understood by those of ordinary skill in the art that the present invention should not be limited thereto, and that the cells may be arranged in the same order as the order of the logic depths in the row direction (i.e., the horizontal direction) as well. That is, in the layout process of the present invention, columns may be replaced with rows, and vertical directions may be replaced with horizontal directions, and the final layout result on the chip corresponds to a result obtained by performing layout in the column direction rotated 90 degrees clockwise. The direction of the layout can be chosen by a person skilled in the art according to the actual application, these choices and modifications being within the scope of the invention.

According to the layout disturbance solving method meeting logic depth constraint, through the constraint circuit, the input units can be placed on the left side of the chip only in one row, and the output units can be placed on the right side of the chip only in one row, so that output can reach in the same clock period as much as possible. When the position of the logic unit is optimized, the logic unit is only allowed to be placed in a layout column corresponding to the logic depth, and the bus length is used as an optimization target, so that a layout result with a better area can be obtained, and meanwhile, the constraint of the logic depth is met. In the present invention, the line length is used as an optimization target, but other optimization targets such as delay, power consumption, and the like may be defined as needed.

Compared with the existing superconducting RSFQ circuit layout method, the superconducting RSFQ circuit layout method for the double-clock architecture provided by the invention can obtain a layout result with a better area under the condition of meeting logic depth constraint, and reduce the area of the circuit; and the logic depth of the units is considered, so that wiring under concurrent clocks is easier to realize, and the working frequency of the circuit is improved.

The present invention also provides a computer readable storage medium having embodied thereon a computer program executable by a processor to perform the steps of the superconducting RSFQ circuit layout method for a dual-clock architecture described above.

The present invention also provides an electronic device including: one or more processors; and a memory, wherein the memory is to store one or more executable instructions; the one or more processors are configured to implement the steps of the superconducting RSFQ circuit layout method for a dual-clock architecture described above via execution of the one or more executable instructions.

Finally, it should be noted that the above examples are only for explaining the technical solution of the present invention and are not limiting. Although the invention has been described in detail with reference to examples, it will be understood by those of ordinary skill in the art that the specific examples described herein are intended to be illustrative only and are not to be construed as limiting the scope of the invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention as set forth in the appended claims.

Claims

1. A layout method for a superconducting RSFQ circuit of a dual-clock architecture, the total number of logic units except input IO and output IO in the circuit is N, the aspect ratio of a chip for laying out the circuit is alpha,

the layout method comprises the following steps:

performing initial layout on N logic units based on logic depth, including:

calculating a reference height of a layout column

Slave logicThe logic cells are sequentially arranged starting with the edit depth of 1, so that the cells of each logic depth are sequentially arranged in an increasing order in the vertical direction, and the height of each column is not more than H ₀ The different logic depths are arranged starting from the new column;

disturbing and optimizing the initial layout based on a simulated annealing layout framework;

wherein the initial layout further comprises:

for each logical depth i, the cells that calculate the logical depth require columns of the full-height layoutWherein blk_num [ i ]]Is the number of cells with logical depth i, each of the C columns starting from the current column is arranged with H ₀ The number of cells in each column is updated to H by the cells with logical depth i which are not laid out ₀ Updating the current column to be the current column +C;

the residual undeployed blk_num [ i ] with the logic depth of i]％H ₀ The units are arranged in the current column, and the number of units in the current column is updated to blk_num [ i ]]％H ₀ Updating the current column to be the current column +1;

the number of units is less than H ₀ The column numbers of the columns of (1) are stored in an array, and the array [ i+1 ] is stored]～array[i+j]The cell position of the column is adjusted to the array [ i ]]Column, and array [ i+1 ]]～array[i+j]The number of units in the column is set to 0, where array [ i ] before merging]Column, array [ i+1 ]]Column, …, array [ i+j ]]The sum of the number of units in the row is equal to or less than H ₀ <Array [ i ] before merging]Column, array [ i+1 ]]Column, …, array [ i+j ]]Column, array [ i+j+1 ]]The sum of the number of units in the row.

2. A superconducting RSFQ circuit layout method for a dual-clock architecture according to claim 1 wherein said step of perturbing and optimizing said initial layout based on a simulated annealing layout framework comprises:

calculating the cost of the initial layout;

disturbing the initial layout to generate a new layout solution;

3. A superconducting RSFQ circuit layout method for a dual-clock architecture according to claim 2 wherein said step of perturbing said initial layout further comprises:

4. A superconducting RSFQ circuit layout method for a dual-clock architecture according to claim 3 wherein said step of perturbing said initial layout further comprises:

5. A superconducting RSFQ circuit layout method for a dual-clock architecture according to claim 3 wherein said step of perturbing said initial layout further comprises:

6. A superconducting RSFQ circuit layout method for a dual-clock architecture according to claim 3 wherein said step of perturbing said initial layout further comprises:

7. A superconducting RSFQ circuit layout method for a dual-clock architecture according to claim 3 wherein said step of perturbing said initial layout further comprises:

8. A computer readable storage medium having embodied thereon a computer program executable by a processor to perform the steps of the method of any of claims 1-7.

9. An electronic device, comprising: one or more processors; and a memory, wherein the memory is to store one or more executable instructions; the one or more processors are configured to implement the steps of the method of one of claims 1-7 via execution of the one or more executable instructions.