CN115270698A - Chip global automatic layout method based on deep reinforcement learning - Google Patents

Chip global automatic layout method based on deep reinforcement learning Download PDF

Info

Publication number
CN115270698A
CN115270698A CN202210718626.0A CN202210718626A CN115270698A CN 115270698 A CN115270698 A CN 115270698A CN 202210718626 A CN202210718626 A CN 202210718626A CN 115270698 A CN115270698 A CN 115270698A
Authority
CN
China
Prior art keywords
layout
chip
macro
information
global automatic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210718626.0A
Other languages
Chinese (zh)
Inventor
陈学松
敖启缘
蔡述庭
张丽丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210718626.0A priority Critical patent/CN115270698A/en
Publication of CN115270698A publication Critical patent/CN115270698A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/398Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/392Floor-planning or layout, e.g. partitioning or placement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Geometry (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Architecture (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

The invention discloses a chip global automatic layout method based on deep reinforcement learning, which can quickly lay out a super-large-scale integrated circuit chip, can ensure the convergence of a layout result to realize quick layout, and enables the line length, congestion and area of layout and wiring to be approximately optimal. In addition, a space utilization method is provided in the chip global automatic layout method, and the space utilization method is applied to local layout and global automatic layout, so that the areas of the local layout and the global automatic layout are minimized. And in addition, by applying an asynchronous training network structure and learning and training through the network structure, the correlation between the local layout and the global automatic layout is tighter, the layout result is easier to converge, and the reliability of the global automatic layout of the chip can be realized.

Description

Chip global automatic layout method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of electronic design automation, in particular to a chip global automatic layout method based on deep reinforcement learning.
Background
Nowadays, with the rapid development of integrated circuits, the problems faced by Electronic Design Automation (EDA) technology are increasingly complex, and the circuit scale and the amount of data to be processed are increasing. The speed with which EDA technology can be developed to keep pace with the rapid advances in design and manufacturing processes has become a critical issue. Placing the wiring is an important and time consuming step in the physical design phase of the integrated circuit. First, the layout process involves a large number of iterations and optimizations, and the time required can significantly impact the integrated circuit design cycle time. Secondly, in the physical design of an integrated circuit, there is a close relationship between the steps, and the result of the layout affects the routability and the parameters of the routing process such as the running time, the degree of congestion and the routing rate. In recent years, in addition to being driven by line length and time delay, attention has been paid to a layout algorithm driven by routability. Despite significant advances in layout algorithms over the past few decades, fast and efficient layout remains a challenging problem.
Global automatic layout is a long-standing challenge in chip design, requiring multi-objective optimization of increasingly complex circuits. To solve the layout problem in the chip, the researchers have proposed parser-based solutions, including non-linear optimizers, as well as more advanced quadratic methods developed after the rise of modern analytical techniques, and more recently, electrostatic-based methods and alternative methods, which are based on updating the location of cells in a gradient optimization scheme, and can typically handle millions of standard cells by parallelization on the CPU using partitions to reduce runtime. Google also presented the first end-to-end learning method for macro placement that modeled chip placement as a sequential decision problem. The japanese happy village ministry patent uses Q learning to design the layout and wiring; deep learning based routability-driven placement algorithms are also proposed in the literature (Haui, chuiyi, zhouqiang, king Rui. DrPlace: deep learning based routability-driven placement algorithms [ J ]. Computer aided design and graphics newspapers, 2021,33 (04): 624-631). Although previous work performed the heavy numerical computation of the very large scale optimization problem on the CPU, there was room for optimization improvements in both layout quality and speed of layout.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a chip global automatic layout method based on deep reinforcement learning, and aims to realize rapid global automatic layout of a chip and obtain an approximately optimal solution by applying the deep reinforcement learning in a super-large-scale integrated circuit.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the chip global automatic layout method based on deep reinforcement learning comprises the following steps:
s1, inputting chip layout information;
s2, preprocessing the chip layout information, wherein the chip layout information comprises design rules;
s3, performing reinforcement learning on the local layout of the chip to obtain optimal local layout information of the chip;
s4, judging whether the optimal chip local layout information obtained in the step S3 meets the design rule, if so, entering the step S5, and if not, returning to the step S3 to perform reinforcement learning of the chip local layout again;
s5, performing deep reinforcement learning of the global automatic layout of the chip by combining the optimal local layout information of the chip to obtain the optimal global automatic layout information of the chip;
s6, performing filling layout according to the optimal chip global automatic layout information obtained in the step S5 to obtain an optimal chip global automatic layout effect;
and S7, judging whether the optimal global automatic layout effect of the chip obtained in the step S6 meets the design rule, if so, adopting the optimal global automatic layout information of the chip to perform global automatic layout of the chip, and otherwise, returning to the step S5 to continue deep reinforcement learning of the global automatic layout of the chip.
Further, the layout information is preprocessed, including:
s2-1, grid preprocessing: setting the grid as square and establishing a rectangular coordinate system with x as the horizontal axis and y as the vertical axis, wherein in the grid, the edge is represented by e, and the edge is arranged in a way of being parallel to the edgeWiring capacity between ceShowing that the center point information of the ith lattice G is marked as Gi={xi,yi,cei}, setting a grid number;
s2-2, macro unit pretreatment: each macro unit is regarded as a rectangle, the macro units are sorted according to the size by using a quick sorting algorithm, and a sorting sequence set is formed by the sorting result and serves as an input set:
H={Si,i=1,...,N}
wherein S isi=(Li,Wi,Pi) Representing the area of the macro-cell with position information, L, for a tupleiRepresents the length of the macro-cell, WiWidth, P, of macro-celliIndicating location information of macro-cells, i.e. Pi={xi,yiN denotes the total number of macro-cells;
s2-3, standard unit pretreatment: standard cells are divided into two cell clusters:
1) Dependent macrocell HiThe standard cell is an attached standard cell cluster marked as BiThen B isi={bi1,,bi2,...,bin};
2) If the standard cell not attached to the macro cell is a discrete standard cell cluster, denoted as B, then B = { B =1,b2,...,bn};
S2-4, designing rules.
Further, the reinforcement learning local layout comprises:
s3-1, inputting macro cell sequence H = { S ] into layout areaiI = 1.. Ang.N } and its attached standard cell cluster Bi={bi1,bi2,...,binAnd randomly dispersing in a form of aggregate clusters;
s3-2, for each macrocell SiAnd randomly placed attached standard cell cluster BiUsing the initial layout of the local layout model of the electrostatic system to make the attached standard cell cluster BiPerforming a distributed movement so that macro-cells SiAnd attached standard cell cluster BiCarrying out overall electrostatic balance to form an initial local layout information sequence state S;
s3-3, extracting characteristic information from an initial layout information state S obtained after the initial layout of the static system local layout model, enabling the characteristic information to be phi (S), inputting the phi (S) into an Actor-Critic reinforcement learning network, obtaining an optimal layout strategy through network training, outputting an optimal initial local layout according to the optimal layout strategy, and outputting an Actor network parameter theta and a Critic parameter omega corresponding to the optimal strategy;
s3-4, normalizing the unit module by using a rectangle to obtain a macro unit module with the length of LNWidth of WNArea is SNAnd the output information sequence is as follows:
HN={SN1,SN2,..,SNn}
wherein S isN={LN,WN,PN},LNTo update the length of the module, WNTo update the width, P, of a moduleNIs the location information of the module.
Further, in the step S3-3, the specific setting and steps are as follows:
s3-3-1, markov decision:
1) And a state S: initial local layout information sequence state formed by the local layout model of the electrostatic system, including macro-cell information SiAnd its attached standard cell cluster BiLength and width of (a) and its location information in the grid;
2) Action set A: a set of possible actions taken by all standard cells;
3) Attenuation factor γ: setting gamma to 1, indicating that all subsequent states are consistent with the current reward;
4) The exploration rate belongs to the scope of: performing value iteration by using an element-greedy method, namely setting a smaller element value, greedily selecting the behavior which is considered to be the maximum behavior value at present by using a probability of 1-element, and randomly selecting the behavior from all m selectable behaviors by using the element probability; is formulated as:
Figure BDA0003710196020000041
wherein a represents an action and s represents a state;
s3-3-2, constraint setting;
s3-3-3, setting a loss function;
s3-3-4, updating the network parameters to obtain Actor network parameters theta, critic network parameters omega and strategy gradient estimation
Figure BDA0003710196020000042
Further, the constraint setting includes:
1) Wire length constraint:
the half-perimeter line length is adopted and is closest to a Steiner tree, the lowest cost of wiring is obtained, and the calculation formula is as follows:
HPWL(i)=(maxb∈i{xb}-minb∈i{xb})+(maxb∈i{yb}-minb∈i{yb})
wherein xbAnd ybExpressing the x and y coordinates of the grid i, and summing HPWL (i), aiming at improving the convergence rate of a line length model and the accuracy of index judgment, normalizing the sum of the bus lengths between the macro unit and the standard unit by a normalization factor q, wherein the normalized bus length formula is as follows:
Figure BDA0003710196020000051
one of the goals is to make HPWL as small as possible, netlist denotes net;
2) And (3) congestion constraint:
evaluating whether the layout is distributable or not by taking a maximum overflow mode as congestion measurement; the maximum overflow mode is expressed as: OF (e) = max (ω)e+be-ce0); in order to make overflow of grid boundary easily absorbed by adjacent region and ensure routability of design, useThe following congestion evaluation formula:
congestion(e)=100×(ωe+be)/ce
wherein c iseMaximum capacity of edge e, beIs routing congestion on edge e, ωeFor wiring occupation on the edge e, a congestion of less than 50% is considered to be routable, and the goal is to make the congestion degree smaller as better;
3) And (3) density constraint: for density constraint, the design space utilization function is applied in the local layout, and is specifically designed as follows:
according to the sorted macro unit and the restriction rule and the space utilization function F to the macro unit S1And macro cell S2Combining, calculating the space utilization rate F after combination, and merging the macro units when the space utilization rate reaches the preset requirement; the set rules are as follows:
macroblock unit S to be merged1The information of (1) is: long L1Width W1In the position P1Area S1=L1×W1
Macroblock unit S to be merged2The information of (b) is: long L2Width W2In the position P2Area S2=L2×W2
Combined into a new macroblock unit SN: length of LNWidth of WNIn the position PNArea SN=LN×WN
Wherein L isNAnd WNThe following rules are satisfied:
max(LN,WN)≤min(L,W)
in order for the policy network not to place the macro-cells in locations that would cause the density to exceed the target density maximum or cause macro-overlap, the layout of the macro-cells satisfies the following area constraint:
Figure BDA0003710196020000061
the space utilization function is:
Figure BDA0003710196020000062
where L is the length and W is the width, the objective is to make the space utilization F as large as possible.
Further, the loss function setting includes:
1) Setting a reward function: the total line length, the congestion degree and the waste rate are weighted and summed to form a single-target reward function, wherein the weighting factor lambda1And λ2Mainly used for balancing the influence of three indexes, the reward function for policy network optimization is as follows:
R=-Wirelength-λ1Congestion+λ2F
S.t.minS≤SN≤maxS
wherein, wirelength represents the total line length, congestion represents the total Congestion degree, F represents the space utilization rate, and lambda represents1And λ2The weight of congestion degree and space utilization rate is respectively, lambda is more than or equal to 01≤1,0≤λ2≤1,λ12=1 and λ1>λ2The occupation weight of the congestion is higher than the weight of the loss rate, namely the routability of the wiring is firstly ensured, and the utilization rate of the area is considered;
2) Setting a loss function:
setting an optimized function target; setting the optimization target to the average value at each time step, i.e.
Figure BDA0003710196020000063
The gradient derived from θ for this equation is as follows:
Figure BDA0003710196020000064
Figure BDA0003710196020000065
estimating the strategy gradient to be optimized;
Figure BDA0003710196020000066
for the scoring function, the direction of parameter update is indicated, which uses the Softemax policy function, and the linear combination of the features phi (s, a) describing the state and behavior and the parameter theta is used to weigh the probability of a behavior occurring, namely:
Figure BDA0003710196020000071
the score function by derivation is:
Figure BDA0003710196020000072
further, the step S3-3-4 includes:
inputting iteration times T, a state dimension n, an action set A, a step length a, an attenuation factor gamma, an exploration rate E, a Critic network structure and an Actor network structure;
the updating process comprises the following steps:
a1, randomly initializing values Q corresponding to all states and actions, i =1;
a2, initializing S to be the first state of the current state sequence to obtain a feature vector phi (S);
a3, using phi (S) as an input in an Actor network, outputting an action set A, and obtaining a new state S' based on the action set A, and feeding back R;
a4, respectively using phi (S) and phi (S ') as inputs in a Critic network to obtain Q value outputs V (S) and V (S');
a5, calculating a TD error δ = R + γ V (S') -V (S);
a6, V and Q conversion:
Figure BDA0003710196020000073
a7, calculating strategy gradient estimation:
Figure BDA0003710196020000074
a8, using a mean square error loss function sigma (R + gamma V (S') -V (S, omega))2Updating parameters serving as Critic network parameters omega;
a9, updating an Actor network parameter theta:
Figure BDA0003710196020000075
a10, judging whether i is smaller than the iteration number T, if so, i = i +1, returning to the step A2, otherwise, outputting the latest Critic network parameter omega, the Actor network parameter theta and the strategy gradient estimation
Figure BDA0003710196020000076
Further, the step S5 includes:
s5-1, designing two-layer network structures which are a public network and a local network respectively; the public network comprises functions of an Actor network and a Critic network;
s5-2, calculating gradient estimation of each local layout
Figure BDA0003710196020000081
Accumulating and summing, and inputting the obtained Critic network parameter omega and the Actor network parameter theta with optimal local layout into a public network;
s5-3, updating the public network by using the obtained accumulated gradient estimation, and outputting a corresponding optimal strategy if the accumulated gradient estimation is converged in the updating process, otherwise, returning to the step S5-2;
s5-4, updating the macro module H through the optimal strategyNLayout is carried out, the global automatic layout is finally completed, and the updated module information sequence is HNNAnd outputs global automatic layout information HNN
Further, the step S6 includes:
s6-1, and carrying out global automatic layout on information HNNInputting the data into a force guidance method resolver;
s6-2, using a force guiding method to guide the spare part type standard cell cluster B = { B = { (B) }1,b2,...,bnFill up, make the discrete standard cell b by the continuous action of attraction and repulsioniThe balance is approached after the movement is continuously carried out until the relative displacement does not occur any more, the energy is continuously consumed, and finally the balance is approached to zero;
and S6-3, outputting the optimal global automatic layout effect of the chip.
Compared with the prior art, the principle and the advantages of the scheme are as follows:
the scheme can quickly arrange the ultra-large scale integrated circuit chip, can ensure the convergence of the arrangement result to realize quick arrangement, and leads the wire length, congestion and area of the arrangement wiring to be approximately optimal. In addition, the scheme also provides a method for space utilization rate, and the method is applied to local layout and global automatic layout, so that the areas of the local layout and the global automatic layout are minimized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a chip global automatic layout method based on deep reinforcement learning according to the present invention;
FIG. 2 is a schematic diagram of an initial layout of a local layout model of an electrostatic system (U is macros and their attachment units scattered locally at random; V is initial layout information obtained by local layout);
FIG. 3 is a schematic diagram of space utilization;
FIG. 4 is a partial layout diagram of an Actor-critical neural network;
FIG. 5 is a diagram illustrating a hierarchical reinforcement learning global automatic layout structure;
FIG. 6 is a diagram of global automatic layout filling.
Detailed Description
The invention is further illustrated by the following specific examples:
as shown in fig. 1, the chip global automatic layout method based on deep reinforcement learning according to this embodiment includes the following steps:
s1, inputting chip layout information;
s2, preprocessing the chip layout information, including:
s2-1, grid preprocessing: setting the grid as a square and establishing a rectangular coordinate system with the horizontal axis as x and the vertical axis as y, in the grid, the side is represented by e, and the wiring capacity between the sides is represented by ceIndicating that the center point information of the ith cell G is Gi={xi,yi,cei}, setting the grid number to 2n×2nN is a positive integer;
s2-2, macro unit pretreatment: and (2) regarding each macro unit as a rectangle, sorting the macro units according to the size by using a quick sorting algorithm, and forming a sorting sequence set by using the sorting result as an input set:
H={Si,i=1,...,N}
wherein S isi=(Li,Wi,Pi) Representing the area of the macro-cell with position information, L, for a tupleiDenotes the length of the macro-unit, WiWidth, P, of macro-celliIndicating the location information of the macro-cell, i.e. Pi={xi,yiN represents the total number of macro-cells;
s2-3, standard unit pretreatment: standard cells are divided into two clusters of cells:
1) Attachment macrocell HiThe standard cell is an attached standard cell cluster, and is marked as BiThen B isi={bi1,bi2,...,bin};
2) If the standard cell not attached to the macro cell is a discrete standard cell cluster, denoted as B, then B = { B =1,b2,...,bn};
S2-4, design rule:
1) The method is based on the layout principle of big first, small first and difficult second, i.e. the macro cell circuit and the core cell should be laid out preferentially;
2) The layout should meet the following requirements as much as possible: the total connecting line is as short as possible, and the key signal line is shortest;
3) Density priority principle: routing is started from the area with the most dense and complex connection relation;
4) And the layout is optimized according to the standards of uniform distribution, balanced gravity center and beautiful layout.
S3, performing reinforcement learning on the local layout of the chip to obtain optimal local layout information of the chip;
in this step, the reinforcement learning local layout includes:
s3-1, inputting macro cell sequence H = { S ] into layout areaiI =1,.. Cndot.N } and its attached standard cell cluster Bi={bi1,bi2,...,binD, randomly dispersing in a form of aggregate clusters;
s3-2, for each macrocell SiAnd randomly placed attached standard cell cluster BiUsing the local layout model of the electrostatic system to perform initial layout to make the attached standard cell cluster BiPerforming a distributed movement so that macro-cells SiAnd attached standard cell cluster BiOverall electrostatic balancing, forming an initial local layout information sequence state S, as shown in fig. 2;
s3-3, extracting characteristic information from an initial layout information state S obtained after the initial layout of the static system local layout model, enabling the characteristic information to be phi (S), inputting the phi (S) into an Actor-Critic reinforcement learning network, obtaining an optimal layout strategy through network training, outputting an optimal initial local layout according to the optimal layout strategy, and outputting an Actor network parameter theta and a Critic parameter omega corresponding to the optimal strategy;
the specific setting and steps of the step are as follows:
s3-3-1, markov decision:
1) And a state S: initial local layout information sequence state formed by the local layout model of the electrostatic system, including macro-cell information SiAnd its attached standard cell cluster BiLength and width of (a) and its position information in the grid;
2) Action set A: a set of possible actions taken by all standard cells;
3) Attenuation factor γ: setting gamma to 1, indicating that all subsequent states are consistent with the current reward;
4) The exploration rate belongs to the scope of: performing value iteration by using a epsilon-greedy method, namely setting a smaller epsilon value, greedy selecting the behavior which is considered as the maximum behavior value at present by using a probability of 1-epsilon, and randomly selecting the behavior from all m selectable behaviors by using the probability of epsilon; is formulated as:
Figure BDA0003710196020000111
wherein, a represents action, s represents state;
s3-3-2, constraint setting:
1) Wire length constraint:
adopting a half-perimeter line length which is closest to a Steiner tree, and the lowest cost of wiring, wherein the calculation formula is as follows:
HPWL(i)=(maxb∈i{xb}-minb∈i{xb})+(maxb∈i{yb}-minb∈i{yb})
wherein xbAnd ybExpressing the x and y coordinates of the grid i, and summing HPWL (i), aiming at improving the convergence rate of a line length model and the accuracy of index judgment, normalizing the sum of the bus lengths between the macro unit and the standard unit by a normalization factor q, wherein the normalized bus length formula is as follows:
Figure BDA0003710196020000112
one of the goals is to make HPWL as small as possible, netlist denotes net;
2) And (3) congestion constraint:
evaluating whether the layout is distributable or not by taking a maximum overflow mode as congestion measurement; the maximum overflow mode is expressed as: OF (e) = max (ω)e+be-ce0); in order to make overflow of the grid boundary easily absorbed by the adjacent area and ensure routability of the design, the following congestion evaluation formula is used:
congestion(e)=100×(ωe+be)/ce
wherein c iseMaximum capacity of edge e, beIs routing congestion on edge e, omegaeThe wiring on the edge e occupies, the condition that the congestion is less than 50% is regarded as distributable, and the goal is to make the congestion degree smaller and better;
3) And (3) density constraint: for density constraint, the design space utilization function of this embodiment is applied to local layout and global automatic layout, and the design space utilization function is spliced by two macro-units, as shown in fig. 3, and the specific design is as follows:
according to the sorted macro unit and the restriction rule and the space utilization function F to the macro unit S1And macro cell S2Combining, calculating the space utilization rate F after combination, and merging the macro units when the space utilization rate reaches the preset requirement; the set rules are as follows:
macroblock unit S to be merged1The information of (b) is: long L1Width W1In the position P1Area S1=L1×W1
Macroblock unit S to be merged2The information of (1) is: long L2Width W2In the position P2Area S2=L2×W2
Combined into a new macroblock unit SN: length of LNIs wide and wideIs WNIn the position PNArea SN=LN×WN
Wherein L isNAnd WNThe following rules are satisfied:
max(LN,WN)≤min(L,W)
in order for the policy network not to place the macro-cells in locations that would cause the density to exceed the target density maximum or cause macro overlap, the layout of the macro-cells satisfies the following area constraints:
Figure BDA0003710196020000121
the space utilization function is:
Figure BDA0003710196020000122
where L is the length and W is the width, the objective is to make the space utilization F as large as possible.
S3-3-3, setting a loss function:
1) Setting a reward function: the total line length, the congestion degree and the waste rate are weighted and summed to form a single-target reward function, wherein the weighting factor lambda1And λ2Mainly used for balancing the influence of three indexes, the reward function for policy network optimization is as follows:
R=-Wirelength-λ1Congestion+λ2F
S.t.minS≤SN≤maxS
wherein, wirelength represents the total line length, congestion represents the total Congestion degree, F represents the space utilization rate, and lambda represents1And λ2The weight of congestion degree and space utilization rate is respectively, lambda is more than or equal to 01≤1,0≤λ2≤1,λ12=1 and λ1>λ2The occupation weight of the congestion is higher than the weight of the loss rate, namely the routability of the wiring is firstly ensured, and the utilization rate of the area is considered;
2) Setting a loss function:
setting an optimized function target; setting the optimization target to the average value at each time step, i.e.
Figure BDA0003710196020000131
The gradient derived from θ for this equation is as follows:
Figure BDA0003710196020000132
Figure BDA0003710196020000133
estimating a strategy gradient for optimization;
Figure BDA0003710196020000134
for the scoring function, the direction of parameter update is indicated, which uses the Softemax policy function, and the linear combination of the features phi (s, a) describing the state and behavior and the parameter theta is used to weigh the probability of a behavior occurring, namely:
Figure BDA0003710196020000135
the score function by derivation is:
Figure BDA0003710196020000136
s3-3-4, updating the network parameters to obtain Actor network parameters theta, critic network parameters omega and strategy gradient estimation
Figure BDA0003710196020000137
Specifically, 2 identical neural networks were used, as shown in fig. 4. Wherein Critic uses a neural network to output an optimal value VtAnd calculating TD error delta, VtAnd the TD error delta is output to an Actor network, and the Actor utilizes VtThe parameter theta of the strategy function is iteratively updated by the optimal value, further action A is selected, and feedback R and a new state S are obtainedt+1(ii) a Critic uses feedback reward R and new state St+1Parameters omega in the neural network are updated, and then Critic uses the new network parameters to help the Actor compute the optimal value V of the statet. Continuously updating the Actor network parameter theta and the Critic network parameter omega through the above circulation until the strategy gradient is estimated
Figure BDA0003710196020000141
Converging, and then outputting final Actor network parameter theta, critic network parameter omega and strategy gradient estimation
Figure BDA0003710196020000142
The specific process is as follows:
inputting iteration times T, a state dimension n, an action set A, a step length alpha, an attenuation factor gamma, an exploration rate E, a Critic network structure and an Actor network structure;
the updating process comprises the following steps:
a1, randomly initializing values Q corresponding to all states and actions, i =1;
a2, initializing S to be the first state of the current state sequence to obtain a feature vector phi (S);
a3, using phi (S) as an input in the Actor network, outputting an action set A, and obtaining a new state S' based on the action set A, and feeding back R;
a4, respectively using phi (S) and phi (S ') as inputs in a Critic network to obtain Q value outputs V (S) and V (S');
a5, calculating a TD error delta = R + gamma V (S') -V (S);
a6, V and Q conversion:
Figure BDA0003710196020000143
a7, calculating strategy gradient estimation:
Figure BDA0003710196020000144
a8, using a mean square error loss function sigma (R + gamma V (S') -V (S, omega))2Updating parameters serving as Critic network parameters omega;
a9, updating an Actor network parameter theta:
Figure BDA0003710196020000145
a10, judging whether i is smaller than the iteration number T, if so, i = i +1, returning to the step A2, otherwise, outputting the latest Critic network parameter omega, the Actor network parameter theta and the strategy gradient estimation
Figure BDA0003710196020000146
S3-4, normalizing the unit module by using a rectangle to obtain a macro unit module with the length of LNWidth of WNArea is SNAnd the output information sequence is as follows:
HN={SN1,SN2,..,SNn}
wherein S isN={LN,WN,PN},LNTo update the length of the module, WNTo update the width, P, of a moduleNIs the location information of the module.
S4, judging whether the optimal chip local layout information obtained in the step S3 meets the design rule, if so, entering the step S5, and if not, returning to the step S3 to perform reinforcement learning of the chip local layout again;
s5, performing deep reinforcement learning of the global automatic layout of the chip by combining the optimal local layout information of the chip to obtain the optimal global automatic layout information of the chip; the method specifically comprises the following steps:
s5-1, designing a two-layer network structure which is a public network and a local network respectively. The local network has i working threads, and the specific design of the network structure of a single thread is shown in step S3; the public network comprises functions of an Actor network and a Critic network, and the overall neural network model is shown in figure 5;
s5-2, calculating gradient estimation of each local layout
Figure BDA0003710196020000151
Accumulating and summing, and inputting the obtained Critic network parameter omega and the Actor network parameter theta with optimal local layout into a public network;
s5-3, updating the public network by using the obtained accumulated gradient estimation, and outputting a corresponding optimal strategy if the accumulated gradient estimation is converged in the updating process, otherwise, returning to the step S5-2;
s5-4, updating the macro module H through the optimal strategyNLayout is carried out, the global automatic layout is finally completed, and the updated module information sequence is HNNAnd outputs global automatic layout information HNN
S6, performing filling layout according to the optimal chip global automatic layout information obtained in the step S5 to obtain an optimal chip global automatic layout effect; the method specifically comprises the following steps:
s6-1, and arranging the global automatic layout information HNNInput into a force-guided resolver, HNNThe information of the macro unit module, such as the position, the line length, the congestion and the like is included;
s6-2, using a force guiding method to guide the spare part type standard cell cluster B = { B = { (B) }1,b2,...,bnFill-in, make the discrete standard cells b continuously act by attraction and repulsioniThe balance is approached after the movement is continuously carried out until the relative displacement does not occur any more, the energy is continuously consumed, and finally the balance is approached to zero;
s6-3, outputting an optimal chip global automatic layout effect, wherein the effect is shown in FIG. 6;
and S7, judging whether the optimal global automatic layout effect of the chip obtained in the step S6 meets the design rule, if so, adopting the optimal global automatic layout information of the chip to perform global automatic layout of the chip, and otherwise, returning to the step S5 to continue deep reinforcement learning of the global automatic layout of the chip.
The above-mentioned embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereby, and all changes made in the shape and principle of the present invention should be covered within the scope of the present invention.

Claims (9)

1. The chip global automatic layout method based on deep reinforcement learning is characterized by comprising the following steps:
s1, inputting chip layout information;
s2, preprocessing the chip layout information, wherein the chip layout information comprises design rules;
s3, performing reinforcement learning on the local layout of the chip to obtain optimal local layout information of the chip;
s4, judging whether the optimal chip local layout information obtained in the step S3 meets the design rule, if so, entering a step S5, otherwise, returning to the step S3 to perform reinforcement learning of the chip local layout again;
s5, performing deep reinforcement learning of the global automatic layout of the chip by combining the optimal local layout information of the chip to obtain the optimal global automatic layout information of the chip;
s6, performing filling layout according to the optimal chip global automatic layout information obtained in the step S5 to obtain an optimal chip global automatic layout effect;
and S7, judging whether the optimal global automatic layout effect of the chip obtained in the step S6 meets the design rule, if so, adopting the optimal global automatic layout information of the chip to perform global automatic layout of the chip, and otherwise, returning to the step S5 to continue deep reinforcement learning of the global automatic layout of the chip.
2. The chip global automatic layout method based on deep reinforcement learning of claim 1, wherein the preprocessing of layout information comprises:
s2-1, grid preprocessing: setting a grid as a square and establishing a rectangular coordinate system, wherein the horizontal axis is x, the vertical axis is y, in the grid, sides are represented by e, wiring capacity between the sides is represented by ce, and the information of the central point of the ith grid G is represented by Gi={xi,yi,cei}, setSetting the number of grids;
s2-2, macro unit pretreatment: and (2) regarding each macro unit as a rectangle, sorting the macro units according to the size by using a quick sorting algorithm, and forming a sorting sequence set by using the sorting result as an input set:
H={Si,i=1,...,N}
wherein S isi=(Li,Wi,Pi) Is a tuple, representing the area of a macro-cell with position information, LiDenotes the length of the macro-unit, WiWidth, P, of the macro-celliIndicating the location information of the macro-cell, i.e. Pi={xi,yiN represents the total number of macro-cells;
s2-3, standard unit pretreatment: standard cells are divided into two cell clusters:
1) Attachment macrocell HiThe standard cell is an attached standard cell cluster, and is marked as BiThen B isi={bi1,bi2,…,bin};
2) If the standard cell not attached to the macro cell is a discrete standard cell cluster, denoted as B, then B = { B =1,b2,...,bn}:
S2-4, designing rules.
3. The deep reinforcement learning-based chip global automatic layout method according to claim 2, wherein the reinforcement learning local layout comprises:
s3-1, inputting macro cell sequence H = { S to layout areaiI =1,.. Cndot.N } and its attached standard cell cluster Bi={bi1,bi2,…,binAnd randomly dispersing in a form of aggregate clusters;
s3-2, for each macrocell SiAnd randomly placed attached standard cell cluster BiUsing the local layout model of the electrostatic system to perform initial layout to make the attached standard cell cluster BiPerforming a distributed movement so that macro-cells SiAnd attached standard cell cluster BiOverall electrostatic equilibrium is formedInitial local layout information sequence state S;
s3-3, extracting characteristic information from an initial layout information state S obtained after the initial layout of the local layout model of the electrostatic system, enabling the characteristic information to be phi (S), inputting the phi (S) into an Actor-Critic reinforcement learning network, obtaining an optimal layout strategy through network training, outputting an optimal initial local layout according to the optimal layout strategy, and outputting an Actor network parameter theta and a Critic parameter omega corresponding to the optimal strategy;
s3-4, normalizing the unit module by using a rectangle to obtain a macro unit module with the length of LNWidth of WNArea is SNAnd the output information sequence is as follows:
HN={SN1,SN2,..,SNn}
wherein S isN={LN,WN,PN},LNTo update the length of the module, WNTo update the width of the module, PNIs the location information of the module.
4. The deep reinforcement learning-based chip global automatic layout method according to claim 3, wherein in the step S3-3, the specific setting and steps are as follows:
s3-3-1, markov decision:
1) And a state S: initial local layout information sequence state formed by the local layout model of the electrostatic system, including macro-cell information SiAnd its attached standard cell cluster BiLength and width of (a) and its position information in the grid;
2) Action set A: a set of actions that all standard cells may take;
3) Attenuation factor γ: setting gamma to be 1, and indicating that all subsequent states are consistent with the current reward;
4) The exploration rate belongs to the group: performing value iteration by using an element-greedy method, namely setting a smaller element value, greedily selecting the behavior which is considered to be the maximum behavior value at present by using a probability of 1-element, and randomly selecting the behavior from all m selectable behaviors by using the element probability; is formulated as:
Figure FDA0003710196010000031
wherein a represents an action and s represents a state;
s3-3-2, constraint setting;
s3-3-3, setting a loss function;
s3-3-4, updating the network parameters to obtain the Actor network parameters theta, critic network parameters omega and strategy gradient estimation
Figure FDA0003710196010000032
5. The deep reinforcement learning-based chip global automatic layout method according to claim 4, wherein the constraint setting comprises:
1) Wire length constraint:
adopting a half-perimeter line length which is closest to a Steiner tree, and the lowest cost of wiring, wherein the calculation formula is as follows:
HPWL(i)=(maxb∈i{xb}-minb∈i{xb})+(maxb∈i{yb}-minb∈i{yb})
wherein xbAnd ybExpressing the x and y coordinates of the grid i, and summing HPWL (i), aiming at improving the convergence rate of a line length model and the accuracy of index judgment, normalizing the sum of the bus lengths between the macro unit and the standard unit by a normalization factor q, wherein the normalized bus length formula is as follows:
Figure FDA0003710196010000041
one of the goals is to make HPWL as small as possible, netlist denotes net;
2) And (3) congestion constraint:
by using maximumThe overflow mode is used as congestion measurement to evaluate whether the layout can be distributed; the maximum overflow mode is expressed as: OF (e) = max (ω)e+be-ce0); in order to make overflow of the grid boundary easily absorbed by the adjacent area and ensure routability of the design, the following congestion evaluation formula is used:
congestion(e)=100×(ωe+be)/ce
wherein c iseMaximum capacity of edge e, beIs routing congestion on edge e, ωeFor wiring occupation on the edge e, a congestion of less than 50% is considered to be routable, and the goal is to make the congestion degree smaller as better;
3) And (3) density constraint: for density constraint, the design space utilization function is applied in the local layout, and is specifically designed as follows:
according to the sorted macro unit and the restriction rule and the space utilization function F to the macro unit S1And macro cell S2Combining, calculating the space utilization rate F after combination, and merging the macro units when the space utilization rate reaches the preset requirement; the set rules are as follows:
macroblock unit S to be merged1The information of (b) is: long L1Width W1In the position P1Area S1=L1×W1
Macroblock unit S to be merged2The information of (b) is: long L2Width W2In the position P2Area S2=L2×W2
Combined into a new macroblock unit SN: length of LNWidth of WNIn the position PNArea SN=LN×WN
Wherein L isNAnd WNThe following rules are satisfied:
max(LN,WN)≤min(L,W)
in order for the policy network not to place the macro-cells in locations that would cause the density to exceed the target density maximum or cause macro-overlap, the layout of the macro-cells satisfies the following area constraint:
Figure FDA0003710196010000051
the space utilization function is:
Figure FDA0003710196010000052
where L is the length and W is the width, the objective is to make the space utilization F as large as possible.
6. The deep reinforcement learning-based chip global automatic layout method according to claim 5, wherein the setting of the loss function comprises:
1) The reward function R sets: the total line length, the congestion degree and the waste rate are weighted and summed to form a single-target reward function, wherein the weighting factor lambda is1And λ2Mainly used for balancing the influence of three indexes, the reward function for policy network optimization is as follows
R=-Wirelength-λ1Congestion+λ2F
S.t.minS≤SN≤maxS
Wherein, wirelength represents the total line length, congestion represents the total Congestion degree, F represents the space utilization rate, and lambda represents1And λ2The weight of congestion degree and space utilization rate is respectively, lambda is more than or equal to 01≤1,0≤λ2≤1,λ12=1 and λ1>λ2The occupation weight of the congestion is higher than the weight of the loss rate, namely the routability of the wiring is firstly ensured, and the utilization rate of the area is considered;
2) Setting a loss function:
setting an optimized function target; setting the optimization target to the average value at each time step, i.e.
Figure FDA0003710196010000053
The gradient derived for θ for this equation is as follows:
Figure FDA0003710196010000054
Figure FDA0003710196010000055
estimating a strategy gradient for optimization;
Figure FDA0003710196010000056
for the scoring function, the direction of updating the parameters is indicated, which uses the Softemax policy function, and the linear combination of the characteristics phi (s, a) describing the state and behavior and the parameters theta is used to weigh the probability of an action occurring, namely:
Figure FDA0003710196010000061
the score function by derivation is:
Figure FDA0003710196010000062
7. the deep reinforcement learning-based chip global automatic layout method according to claim 6, wherein the step S3-3-4 comprises:
inputting iteration times T, a state dimension n, an action set A, a step length alpha, an attenuation factor gamma, an exploration rate E, a Critic network structure and an Actor network structure;
the updating process comprises the following steps:
a1, randomly initializing values Q corresponding to all states and actions, i =1;
a2, initializing S to be the first state of the current state sequence to obtain a feature vector phi (S);
a3, using phi (S) as an input in an Actor network, outputting an action set A, and obtaining a new state S' based on the action set A, and feeding back R;
a4, respectively using phi (S) and phi (S ') as inputs in a Critic network to obtain Q value outputs V (S) and V (S');
a5, calculating a TD error δ = R + γ V (S') -V (S);
a6, V and Q conversion:
Figure FDA0003710196010000066
a7, calculating strategy gradient estimation:
Figure FDA0003710196010000063
a8, using a mean square error loss function sigma (R + gamma V (S') -V (S, omega))2Updating parameters serving as Critic network parameters omega;
a9, updating an Actor network parameter theta:
Figure FDA0003710196010000064
a10, judging whether i is smaller than the iteration number T, if so, i = i +1, returning to the step A2, otherwise, outputting the latest Critic network parameter omega, the Actor network parameter theta and the strategy gradient estimation
Figure FDA0003710196010000065
8. The deep reinforcement learning-based chip global automatic layout method according to claim 7, wherein the step S5 comprises:
s5-1, designing two-layer network structures which are a public network and a local network respectively; the public network comprises the functions of an Actor network and a Critic network;
s5-2, calculating gradient estimation of each local layout
Figure FDA0003710196010000071
Accumulating and summing, and inputting the obtained Critic network parameter omega and the Actor network parameter theta with optimal local layout into a public network;
s5-3, updating the public network by using the obtained accumulated gradient estimation, and outputting a corresponding optimal strategy if the accumulated gradient estimation is converged in the updating process, otherwise, returning to the step S5-2;
s5-4, updating the macro module H through the optimal strategyNPerforming layout, and finally finishing the global automatic layout, wherein the updated module information sequence is HNNAnd outputs global automatic layout information HNN
9. The deep reinforcement learning-based chip global automatic layout method according to claim 8, wherein the step S6 comprises:
s6-1, and arranging global automatic layout information HNNInputting the data into a force guidance method resolver;
s6-2, using a force guiding method to enable the discrete standard unit cluster B = { B = { B }1,b2,...,bnFill up, make the discrete standard cell b by the continuous action of attraction and repulsioniAfter moving continuously, the movement tends to be balanced until relative displacement does not occur any more, energy is consumed continuously, and finally the movement tends to zero;
and S6-3, outputting the optimal global automatic layout effect of the chip.
CN202210718626.0A 2022-06-23 2022-06-23 Chip global automatic layout method based on deep reinforcement learning Pending CN115270698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210718626.0A CN115270698A (en) 2022-06-23 2022-06-23 Chip global automatic layout method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210718626.0A CN115270698A (en) 2022-06-23 2022-06-23 Chip global automatic layout method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN115270698A true CN115270698A (en) 2022-11-01

Family

ID=83762285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210718626.0A Pending CN115270698A (en) 2022-06-23 2022-06-23 Chip global automatic layout method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115270698A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562218A (en) * 2023-05-05 2023-08-08 之江实验室 Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning
CN116738923A (en) * 2023-04-04 2023-09-12 暨南大学 Chip layout optimization method based on reinforcement learning with constraint
CN116911245A (en) * 2023-07-31 2023-10-20 曲阜师范大学 Layout method, system, equipment and storage medium of integrated circuit
CN117972812A (en) * 2024-03-26 2024-05-03 中国石油大学(华东) Engineering drawing layout optimization method, device, equipment and medium
CN117972812B (en) * 2024-03-26 2024-06-07 中国石油大学(华东) Engineering drawing layout optimization method, device, equipment and medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738923A (en) * 2023-04-04 2023-09-12 暨南大学 Chip layout optimization method based on reinforcement learning with constraint
CN116738923B (en) * 2023-04-04 2024-04-05 暨南大学 Chip layout optimization method based on reinforcement learning with constraint
CN116562218A (en) * 2023-05-05 2023-08-08 之江实验室 Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning
CN116562218B (en) * 2023-05-05 2024-02-20 之江实验室 Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning
CN116911245A (en) * 2023-07-31 2023-10-20 曲阜师范大学 Layout method, system, equipment and storage medium of integrated circuit
CN116911245B (en) * 2023-07-31 2024-03-08 曲阜师范大学 Layout method, system, equipment and storage medium of integrated circuit
CN117972812A (en) * 2024-03-26 2024-05-03 中国石油大学(华东) Engineering drawing layout optimization method, device, equipment and medium
CN117972812B (en) * 2024-03-26 2024-06-07 中国石油大学(华东) Engineering drawing layout optimization method, device, equipment and medium

Similar Documents

Publication Publication Date Title
JP7234370B2 (en) Generating Integrated Circuit Floorplans Using Neural Networks
CN115270698A (en) Chip global automatic layout method based on deep reinforcement learning
US11216609B2 (en) Generating integrated circuit placements using neural networks
CN114896937A (en) Integrated circuit layout optimization method based on reinforcement learning
US6415426B1 (en) Dynamic weighting and/or target zone analysis in timing driven placement of cells of an integrated circuit design
Chen et al. Improved particle swarm optimization-based form-finding method for suspension bridge installation analysis
Tang et al. A survey on Steiner tree construction and global routing for VLSI design
CN114154412A (en) Optimized chip layout system and method based on deep reinforcement learning
CN112836411A (en) Method and device for optimizing structure of stiffened plate shell, computer equipment and storage medium
Shu et al. An online variable-fidelity optimization approach for multi-objective design optimization
Mansour et al. Allocating data to multicomputer nodes by physical optimization algorithms for loosely synchronous computations
Liao et al. Dreamplace 4.0: Timing-driven placement with momentum-based net weighting and lagrangian-based refinement
El Mourabit Optimization of concrete beam bridges: development of software for design automation and cost optimization
KR20240019305A (en) Machine learning-based power/ground (P/G) with rejection
Zhou et al. Supervised-learning congestion predictor for routability-driven global routing
US20220391566A1 (en) Machine learning models for predicting detailed routing topology and track usage for accurate resistance and capacitance estimation for electronic circuit designs
Singh et al. Hybrid particle swarm optimization for pure integer linear solid transportation problem
Hanchate et al. A linear time algorithm for wire sizing with simultaneous optimization of interconnect delay and crosstalk noise
JP3433025B2 (en) Module placement method
US11741282B2 (en) Reinforcement learning-based adjustment of digital circuits
Fan et al. Technology mapping with crosstalk noise avoidance
Jiang et al. Accelerating Routability and Timing Optimization with Open-Source AI4EDA Dataset CircuitNet and Heterogeneous Platforms
Mangiras Timing Optimization Techniques for the Scalable Physical Synthesis of Digital Integrated Circuits
Zhang Handling the complexity of routing problem in modern VLSI design
CN118140225A (en) Layout method, apparatus, device, storage medium and program product for standard cell

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination