CN115270698A - Chip global automatic layout method based on deep reinforcement learning - Google Patents
Chip global automatic layout method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN115270698A CN115270698A CN202210718626.0A CN202210718626A CN115270698A CN 115270698 A CN115270698 A CN 115270698A CN 202210718626 A CN202210718626 A CN 202210718626A CN 115270698 A CN115270698 A CN 115270698A
- Authority
- CN
- China
- Prior art keywords
- layout
- chip
- macro
- information
- global automatic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/398—Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/392—Floor-planning or layout, e.g. partitioning or placement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Geometry (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Architecture (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
The invention discloses a chip global automatic layout method based on deep reinforcement learning, which can quickly lay out a super-large-scale integrated circuit chip, can ensure the convergence of a layout result to realize quick layout, and enables the line length, congestion and area of layout and wiring to be approximately optimal. In addition, a space utilization method is provided in the chip global automatic layout method, and the space utilization method is applied to local layout and global automatic layout, so that the areas of the local layout and the global automatic layout are minimized. And in addition, by applying an asynchronous training network structure and learning and training through the network structure, the correlation between the local layout and the global automatic layout is tighter, the layout result is easier to converge, and the reliability of the global automatic layout of the chip can be realized.
Description
Technical Field
The invention relates to the technical field of electronic design automation, in particular to a chip global automatic layout method based on deep reinforcement learning.
Background
Nowadays, with the rapid development of integrated circuits, the problems faced by Electronic Design Automation (EDA) technology are increasingly complex, and the circuit scale and the amount of data to be processed are increasing. The speed with which EDA technology can be developed to keep pace with the rapid advances in design and manufacturing processes has become a critical issue. Placing the wiring is an important and time consuming step in the physical design phase of the integrated circuit. First, the layout process involves a large number of iterations and optimizations, and the time required can significantly impact the integrated circuit design cycle time. Secondly, in the physical design of an integrated circuit, there is a close relationship between the steps, and the result of the layout affects the routability and the parameters of the routing process such as the running time, the degree of congestion and the routing rate. In recent years, in addition to being driven by line length and time delay, attention has been paid to a layout algorithm driven by routability. Despite significant advances in layout algorithms over the past few decades, fast and efficient layout remains a challenging problem.
Global automatic layout is a long-standing challenge in chip design, requiring multi-objective optimization of increasingly complex circuits. To solve the layout problem in the chip, the researchers have proposed parser-based solutions, including non-linear optimizers, as well as more advanced quadratic methods developed after the rise of modern analytical techniques, and more recently, electrostatic-based methods and alternative methods, which are based on updating the location of cells in a gradient optimization scheme, and can typically handle millions of standard cells by parallelization on the CPU using partitions to reduce runtime. Google also presented the first end-to-end learning method for macro placement that modeled chip placement as a sequential decision problem. The japanese happy village ministry patent uses Q learning to design the layout and wiring; deep learning based routability-driven placement algorithms are also proposed in the literature (Haui, chuiyi, zhouqiang, king Rui. DrPlace: deep learning based routability-driven placement algorithms [ J ]. Computer aided design and graphics newspapers, 2021,33 (04): 624-631). Although previous work performed the heavy numerical computation of the very large scale optimization problem on the CPU, there was room for optimization improvements in both layout quality and speed of layout.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a chip global automatic layout method based on deep reinforcement learning, and aims to realize rapid global automatic layout of a chip and obtain an approximately optimal solution by applying the deep reinforcement learning in a super-large-scale integrated circuit.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the chip global automatic layout method based on deep reinforcement learning comprises the following steps:
s1, inputting chip layout information;
s2, preprocessing the chip layout information, wherein the chip layout information comprises design rules;
s3, performing reinforcement learning on the local layout of the chip to obtain optimal local layout information of the chip;
s4, judging whether the optimal chip local layout information obtained in the step S3 meets the design rule, if so, entering the step S5, and if not, returning to the step S3 to perform reinforcement learning of the chip local layout again;
s5, performing deep reinforcement learning of the global automatic layout of the chip by combining the optimal local layout information of the chip to obtain the optimal global automatic layout information of the chip;
s6, performing filling layout according to the optimal chip global automatic layout information obtained in the step S5 to obtain an optimal chip global automatic layout effect;
and S7, judging whether the optimal global automatic layout effect of the chip obtained in the step S6 meets the design rule, if so, adopting the optimal global automatic layout information of the chip to perform global automatic layout of the chip, and otherwise, returning to the step S5 to continue deep reinforcement learning of the global automatic layout of the chip.
Further, the layout information is preprocessed, including:
s2-1, grid preprocessing: setting the grid as square and establishing a rectangular coordinate system with x as the horizontal axis and y as the vertical axis, wherein in the grid, the edge is represented by e, and the edge is arranged in a way of being parallel to the edgeWiring capacity between ceShowing that the center point information of the ith lattice G is marked as Gi={xi,yi,cei}, setting a grid number;
s2-2, macro unit pretreatment: each macro unit is regarded as a rectangle, the macro units are sorted according to the size by using a quick sorting algorithm, and a sorting sequence set is formed by the sorting result and serves as an input set:
H={Si,i=1,...,N}
wherein S isi=(Li,Wi,Pi) Representing the area of the macro-cell with position information, L, for a tupleiRepresents the length of the macro-cell, WiWidth, P, of macro-celliIndicating location information of macro-cells, i.e. Pi={xi,yiN denotes the total number of macro-cells;
s2-3, standard unit pretreatment: standard cells are divided into two cell clusters:
1) Dependent macrocell HiThe standard cell is an attached standard cell cluster marked as BiThen B isi={bi1,,bi2,...,bin};
2) If the standard cell not attached to the macro cell is a discrete standard cell cluster, denoted as B, then B = { B =1,b2,...,bn};
S2-4, designing rules.
Further, the reinforcement learning local layout comprises:
s3-1, inputting macro cell sequence H = { S ] into layout areaiI = 1.. Ang.N } and its attached standard cell cluster Bi={bi1,bi2,...,binAnd randomly dispersing in a form of aggregate clusters;
s3-2, for each macrocell SiAnd randomly placed attached standard cell cluster BiUsing the initial layout of the local layout model of the electrostatic system to make the attached standard cell cluster BiPerforming a distributed movement so that macro-cells SiAnd attached standard cell cluster BiCarrying out overall electrostatic balance to form an initial local layout information sequence state S;
s3-3, extracting characteristic information from an initial layout information state S obtained after the initial layout of the static system local layout model, enabling the characteristic information to be phi (S), inputting the phi (S) into an Actor-Critic reinforcement learning network, obtaining an optimal layout strategy through network training, outputting an optimal initial local layout according to the optimal layout strategy, and outputting an Actor network parameter theta and a Critic parameter omega corresponding to the optimal strategy;
s3-4, normalizing the unit module by using a rectangle to obtain a macro unit module with the length of LNWidth of WNArea is SNAnd the output information sequence is as follows:
HN={SN1,SN2,..,SNn}
wherein S isN={LN,WN,PN},LNTo update the length of the module, WNTo update the width, P, of a moduleNIs the location information of the module.
Further, in the step S3-3, the specific setting and steps are as follows:
s3-3-1, markov decision:
1) And a state S: initial local layout information sequence state formed by the local layout model of the electrostatic system, including macro-cell information SiAnd its attached standard cell cluster BiLength and width of (a) and its location information in the grid;
2) Action set A: a set of possible actions taken by all standard cells;
3) Attenuation factor γ: setting gamma to 1, indicating that all subsequent states are consistent with the current reward;
4) The exploration rate belongs to the scope of: performing value iteration by using an element-greedy method, namely setting a smaller element value, greedily selecting the behavior which is considered to be the maximum behavior value at present by using a probability of 1-element, and randomly selecting the behavior from all m selectable behaviors by using the element probability; is formulated as:
wherein a represents an action and s represents a state;
s3-3-2, constraint setting;
s3-3-3, setting a loss function;
s3-3-4, updating the network parameters to obtain Actor network parameters theta, critic network parameters omega and strategy gradient estimation
Further, the constraint setting includes:
1) Wire length constraint:
the half-perimeter line length is adopted and is closest to a Steiner tree, the lowest cost of wiring is obtained, and the calculation formula is as follows:
HPWL(i)=(maxb∈i{xb}-minb∈i{xb})+(maxb∈i{yb}-minb∈i{yb})
wherein xbAnd ybExpressing the x and y coordinates of the grid i, and summing HPWL (i), aiming at improving the convergence rate of a line length model and the accuracy of index judgment, normalizing the sum of the bus lengths between the macro unit and the standard unit by a normalization factor q, wherein the normalized bus length formula is as follows:
one of the goals is to make HPWL as small as possible, netlist denotes net;
2) And (3) congestion constraint:
evaluating whether the layout is distributable or not by taking a maximum overflow mode as congestion measurement; the maximum overflow mode is expressed as: OF (e) = max (ω)e+be-ce0); in order to make overflow of grid boundary easily absorbed by adjacent region and ensure routability of design, useThe following congestion evaluation formula:
congestion(e)=100×(ωe+be)/ce
wherein c iseMaximum capacity of edge e, beIs routing congestion on edge e, ωeFor wiring occupation on the edge e, a congestion of less than 50% is considered to be routable, and the goal is to make the congestion degree smaller as better;
3) And (3) density constraint: for density constraint, the design space utilization function is applied in the local layout, and is specifically designed as follows:
according to the sorted macro unit and the restriction rule and the space utilization function F to the macro unit S1And macro cell S2Combining, calculating the space utilization rate F after combination, and merging the macro units when the space utilization rate reaches the preset requirement; the set rules are as follows:
macroblock unit S to be merged1The information of (1) is: long L1Width W1In the position P1Area S1=L1×W1;
Macroblock unit S to be merged2The information of (b) is: long L2Width W2In the position P2Area S2=L2×W2;
Combined into a new macroblock unit SN: length of LNWidth of WNIn the position PNArea SN=LN×WN;
Wherein L isNAnd WNThe following rules are satisfied:
max(LN,WN)≤min(L,W)
in order for the policy network not to place the macro-cells in locations that would cause the density to exceed the target density maximum or cause macro-overlap, the layout of the macro-cells satisfies the following area constraint:
the space utilization function is:
where L is the length and W is the width, the objective is to make the space utilization F as large as possible.
Further, the loss function setting includes:
1) Setting a reward function: the total line length, the congestion degree and the waste rate are weighted and summed to form a single-target reward function, wherein the weighting factor lambda1And λ2Mainly used for balancing the influence of three indexes, the reward function for policy network optimization is as follows:
R=-Wirelength-λ1Congestion+λ2F
S.t.minS≤SN≤maxS
wherein, wirelength represents the total line length, congestion represents the total Congestion degree, F represents the space utilization rate, and lambda represents1And λ2The weight of congestion degree and space utilization rate is respectively, lambda is more than or equal to 01≤1,0≤λ2≤1,λ1+λ2=1 and λ1>λ2The occupation weight of the congestion is higher than the weight of the loss rate, namely the routability of the wiring is firstly ensured, and the utilization rate of the area is considered;
2) Setting a loss function:
setting an optimized function target; setting the optimization target to the average value at each time step, i.e.
The gradient derived from θ for this equation is as follows:
estimating the strategy gradient to be optimized;for the scoring function, the direction of parameter update is indicated, which uses the Softemax policy function, and the linear combination of the features phi (s, a) describing the state and behavior and the parameter theta is used to weigh the probability of a behavior occurring, namely:
the score function by derivation is:
further, the step S3-3-4 includes:
inputting iteration times T, a state dimension n, an action set A, a step length a, an attenuation factor gamma, an exploration rate E, a Critic network structure and an Actor network structure;
the updating process comprises the following steps:
a1, randomly initializing values Q corresponding to all states and actions, i =1;
a2, initializing S to be the first state of the current state sequence to obtain a feature vector phi (S);
a3, using phi (S) as an input in an Actor network, outputting an action set A, and obtaining a new state S' based on the action set A, and feeding back R;
a4, respectively using phi (S) and phi (S ') as inputs in a Critic network to obtain Q value outputs V (S) and V (S');
a5, calculating a TD error δ = R + γ V (S') -V (S);
a8, using a mean square error loss function sigma (R + gamma V (S') -V (S, omega))2Updating parameters serving as Critic network parameters omega;
a10, judging whether i is smaller than the iteration number T, if so, i = i +1, returning to the step A2, otherwise, outputting the latest Critic network parameter omega, the Actor network parameter theta and the strategy gradient estimation
Further, the step S5 includes:
s5-1, designing two-layer network structures which are a public network and a local network respectively; the public network comprises functions of an Actor network and a Critic network;
s5-2, calculating gradient estimation of each local layoutAccumulating and summing, and inputting the obtained Critic network parameter omega and the Actor network parameter theta with optimal local layout into a public network;
s5-3, updating the public network by using the obtained accumulated gradient estimation, and outputting a corresponding optimal strategy if the accumulated gradient estimation is converged in the updating process, otherwise, returning to the step S5-2;
s5-4, updating the macro module H through the optimal strategyNLayout is carried out, the global automatic layout is finally completed, and the updated module information sequence is HNNAnd outputs global automatic layout information HNN。
Further, the step S6 includes:
s6-1, and carrying out global automatic layout on information HNNInputting the data into a force guidance method resolver;
s6-2, using a force guiding method to guide the spare part type standard cell cluster B = { B = { (B) }1,b2,...,bnFill up, make the discrete standard cell b by the continuous action of attraction and repulsioniThe balance is approached after the movement is continuously carried out until the relative displacement does not occur any more, the energy is continuously consumed, and finally the balance is approached to zero;
and S6-3, outputting the optimal global automatic layout effect of the chip.
Compared with the prior art, the principle and the advantages of the scheme are as follows:
the scheme can quickly arrange the ultra-large scale integrated circuit chip, can ensure the convergence of the arrangement result to realize quick arrangement, and leads the wire length, congestion and area of the arrangement wiring to be approximately optimal. In addition, the scheme also provides a method for space utilization rate, and the method is applied to local layout and global automatic layout, so that the areas of the local layout and the global automatic layout are minimized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a chip global automatic layout method based on deep reinforcement learning according to the present invention;
FIG. 2 is a schematic diagram of an initial layout of a local layout model of an electrostatic system (U is macros and their attachment units scattered locally at random; V is initial layout information obtained by local layout);
FIG. 3 is a schematic diagram of space utilization;
FIG. 4 is a partial layout diagram of an Actor-critical neural network;
FIG. 5 is a diagram illustrating a hierarchical reinforcement learning global automatic layout structure;
FIG. 6 is a diagram of global automatic layout filling.
Detailed Description
The invention is further illustrated by the following specific examples:
as shown in fig. 1, the chip global automatic layout method based on deep reinforcement learning according to this embodiment includes the following steps:
s1, inputting chip layout information;
s2, preprocessing the chip layout information, including:
s2-1, grid preprocessing: setting the grid as a square and establishing a rectangular coordinate system with the horizontal axis as x and the vertical axis as y, in the grid, the side is represented by e, and the wiring capacity between the sides is represented by ceIndicating that the center point information of the ith cell G is Gi={xi,yi,cei}, setting the grid number to 2n×2nN is a positive integer;
s2-2, macro unit pretreatment: and (2) regarding each macro unit as a rectangle, sorting the macro units according to the size by using a quick sorting algorithm, and forming a sorting sequence set by using the sorting result as an input set:
H={Si,i=1,...,N}
wherein S isi=(Li,Wi,Pi) Representing the area of the macro-cell with position information, L, for a tupleiDenotes the length of the macro-unit, WiWidth, P, of macro-celliIndicating the location information of the macro-cell, i.e. Pi={xi,yiN represents the total number of macro-cells;
s2-3, standard unit pretreatment: standard cells are divided into two clusters of cells:
1) Attachment macrocell HiThe standard cell is an attached standard cell cluster, and is marked as BiThen B isi={bi1,bi2,...,bin};
2) If the standard cell not attached to the macro cell is a discrete standard cell cluster, denoted as B, then B = { B =1,b2,...,bn};
S2-4, design rule:
1) The method is based on the layout principle of big first, small first and difficult second, i.e. the macro cell circuit and the core cell should be laid out preferentially;
2) The layout should meet the following requirements as much as possible: the total connecting line is as short as possible, and the key signal line is shortest;
3) Density priority principle: routing is started from the area with the most dense and complex connection relation;
4) And the layout is optimized according to the standards of uniform distribution, balanced gravity center and beautiful layout.
S3, performing reinforcement learning on the local layout of the chip to obtain optimal local layout information of the chip;
in this step, the reinforcement learning local layout includes:
s3-1, inputting macro cell sequence H = { S ] into layout areaiI =1,.. Cndot.N } and its attached standard cell cluster Bi={bi1,bi2,...,binD, randomly dispersing in a form of aggregate clusters;
s3-2, for each macrocell SiAnd randomly placed attached standard cell cluster BiUsing the local layout model of the electrostatic system to perform initial layout to make the attached standard cell cluster BiPerforming a distributed movement so that macro-cells SiAnd attached standard cell cluster BiOverall electrostatic balancing, forming an initial local layout information sequence state S, as shown in fig. 2;
s3-3, extracting characteristic information from an initial layout information state S obtained after the initial layout of the static system local layout model, enabling the characteristic information to be phi (S), inputting the phi (S) into an Actor-Critic reinforcement learning network, obtaining an optimal layout strategy through network training, outputting an optimal initial local layout according to the optimal layout strategy, and outputting an Actor network parameter theta and a Critic parameter omega corresponding to the optimal strategy;
the specific setting and steps of the step are as follows:
s3-3-1, markov decision:
1) And a state S: initial local layout information sequence state formed by the local layout model of the electrostatic system, including macro-cell information SiAnd its attached standard cell cluster BiLength and width of (a) and its position information in the grid;
2) Action set A: a set of possible actions taken by all standard cells;
3) Attenuation factor γ: setting gamma to 1, indicating that all subsequent states are consistent with the current reward;
4) The exploration rate belongs to the scope of: performing value iteration by using a epsilon-greedy method, namely setting a smaller epsilon value, greedy selecting the behavior which is considered as the maximum behavior value at present by using a probability of 1-epsilon, and randomly selecting the behavior from all m selectable behaviors by using the probability of epsilon; is formulated as:
wherein, a represents action, s represents state;
s3-3-2, constraint setting:
1) Wire length constraint:
adopting a half-perimeter line length which is closest to a Steiner tree, and the lowest cost of wiring, wherein the calculation formula is as follows:
HPWL(i)=(maxb∈i{xb}-minb∈i{xb})+(maxb∈i{yb}-minb∈i{yb})
wherein xbAnd ybExpressing the x and y coordinates of the grid i, and summing HPWL (i), aiming at improving the convergence rate of a line length model and the accuracy of index judgment, normalizing the sum of the bus lengths between the macro unit and the standard unit by a normalization factor q, wherein the normalized bus length formula is as follows:
one of the goals is to make HPWL as small as possible, netlist denotes net;
2) And (3) congestion constraint:
evaluating whether the layout is distributable or not by taking a maximum overflow mode as congestion measurement; the maximum overflow mode is expressed as: OF (e) = max (ω)e+be-ce0); in order to make overflow of the grid boundary easily absorbed by the adjacent area and ensure routability of the design, the following congestion evaluation formula is used:
congestion(e)=100×(ωe+be)/ce
wherein c iseMaximum capacity of edge e, beIs routing congestion on edge e, omegaeThe wiring on the edge e occupies, the condition that the congestion is less than 50% is regarded as distributable, and the goal is to make the congestion degree smaller and better;
3) And (3) density constraint: for density constraint, the design space utilization function of this embodiment is applied to local layout and global automatic layout, and the design space utilization function is spliced by two macro-units, as shown in fig. 3, and the specific design is as follows:
according to the sorted macro unit and the restriction rule and the space utilization function F to the macro unit S1And macro cell S2Combining, calculating the space utilization rate F after combination, and merging the macro units when the space utilization rate reaches the preset requirement; the set rules are as follows:
macroblock unit S to be merged1The information of (b) is: long L1Width W1In the position P1Area S1=L1×W1;
Macroblock unit S to be merged2The information of (1) is: long L2Width W2In the position P2Area S2=L2×W2;
Combined into a new macroblock unit SN: length of LNIs wide and wideIs WNIn the position PNArea SN=LN×WN;
Wherein L isNAnd WNThe following rules are satisfied:
max(LN,WN)≤min(L,W)
in order for the policy network not to place the macro-cells in locations that would cause the density to exceed the target density maximum or cause macro overlap, the layout of the macro-cells satisfies the following area constraints:
the space utilization function is:
where L is the length and W is the width, the objective is to make the space utilization F as large as possible.
S3-3-3, setting a loss function:
1) Setting a reward function: the total line length, the congestion degree and the waste rate are weighted and summed to form a single-target reward function, wherein the weighting factor lambda1And λ2Mainly used for balancing the influence of three indexes, the reward function for policy network optimization is as follows:
R=-Wirelength-λ1Congestion+λ2F
S.t.minS≤SN≤maxS
wherein, wirelength represents the total line length, congestion represents the total Congestion degree, F represents the space utilization rate, and lambda represents1And λ2The weight of congestion degree and space utilization rate is respectively, lambda is more than or equal to 01≤1,0≤λ2≤1,λ1+λ2=1 and λ1>λ2The occupation weight of the congestion is higher than the weight of the loss rate, namely the routability of the wiring is firstly ensured, and the utilization rate of the area is considered;
2) Setting a loss function:
setting an optimized function target; setting the optimization target to the average value at each time step, i.e.
The gradient derived from θ for this equation is as follows:
estimating a strategy gradient for optimization;for the scoring function, the direction of parameter update is indicated, which uses the Softemax policy function, and the linear combination of the features phi (s, a) describing the state and behavior and the parameter theta is used to weigh the probability of a behavior occurring, namely:
the score function by derivation is:
s3-3-4, updating the network parameters to obtain Actor network parameters theta, critic network parameters omega and strategy gradient estimation
Specifically, 2 identical neural networks were used, as shown in fig. 4. Wherein Critic uses a neural network to output an optimal value VtAnd calculating TD error delta, VtAnd the TD error delta is output to an Actor network, and the Actor utilizes VtThe parameter theta of the strategy function is iteratively updated by the optimal value, further action A is selected, and feedback R and a new state S are obtainedt+1(ii) a Critic uses feedback reward R and new state St+1Parameters omega in the neural network are updated, and then Critic uses the new network parameters to help the Actor compute the optimal value V of the statet. Continuously updating the Actor network parameter theta and the Critic network parameter omega through the above circulation until the strategy gradient is estimatedConverging, and then outputting final Actor network parameter theta, critic network parameter omega and strategy gradient estimation
The specific process is as follows:
inputting iteration times T, a state dimension n, an action set A, a step length alpha, an attenuation factor gamma, an exploration rate E, a Critic network structure and an Actor network structure;
the updating process comprises the following steps:
a1, randomly initializing values Q corresponding to all states and actions, i =1;
a2, initializing S to be the first state of the current state sequence to obtain a feature vector phi (S);
a3, using phi (S) as an input in the Actor network, outputting an action set A, and obtaining a new state S' based on the action set A, and feeding back R;
a4, respectively using phi (S) and phi (S ') as inputs in a Critic network to obtain Q value outputs V (S) and V (S');
a5, calculating a TD error delta = R + gamma V (S') -V (S);
a8, using a mean square error loss function sigma (R + gamma V (S') -V (S, omega))2Updating parameters serving as Critic network parameters omega;
a10, judging whether i is smaller than the iteration number T, if so, i = i +1, returning to the step A2, otherwise, outputting the latest Critic network parameter omega, the Actor network parameter theta and the strategy gradient estimation
S3-4, normalizing the unit module by using a rectangle to obtain a macro unit module with the length of LNWidth of WNArea is SNAnd the output information sequence is as follows:
HN={SN1,SN2,..,SNn}
wherein S isN={LN,WN,PN},LNTo update the length of the module, WNTo update the width, P, of a moduleNIs the location information of the module.
S4, judging whether the optimal chip local layout information obtained in the step S3 meets the design rule, if so, entering the step S5, and if not, returning to the step S3 to perform reinforcement learning of the chip local layout again;
s5, performing deep reinforcement learning of the global automatic layout of the chip by combining the optimal local layout information of the chip to obtain the optimal global automatic layout information of the chip; the method specifically comprises the following steps:
s5-1, designing a two-layer network structure which is a public network and a local network respectively. The local network has i working threads, and the specific design of the network structure of a single thread is shown in step S3; the public network comprises functions of an Actor network and a Critic network, and the overall neural network model is shown in figure 5;
s5-2, calculating gradient estimation of each local layoutAccumulating and summing, and inputting the obtained Critic network parameter omega and the Actor network parameter theta with optimal local layout into a public network;
s5-3, updating the public network by using the obtained accumulated gradient estimation, and outputting a corresponding optimal strategy if the accumulated gradient estimation is converged in the updating process, otherwise, returning to the step S5-2;
s5-4, updating the macro module H through the optimal strategyNLayout is carried out, the global automatic layout is finally completed, and the updated module information sequence is HNNAnd outputs global automatic layout information HNN。
S6, performing filling layout according to the optimal chip global automatic layout information obtained in the step S5 to obtain an optimal chip global automatic layout effect; the method specifically comprises the following steps:
s6-1, and arranging the global automatic layout information HNNInput into a force-guided resolver, HNNThe information of the macro unit module, such as the position, the line length, the congestion and the like is included;
s6-2, using a force guiding method to guide the spare part type standard cell cluster B = { B = { (B) }1,b2,...,bnFill-in, make the discrete standard cells b continuously act by attraction and repulsioniThe balance is approached after the movement is continuously carried out until the relative displacement does not occur any more, the energy is continuously consumed, and finally the balance is approached to zero;
s6-3, outputting an optimal chip global automatic layout effect, wherein the effect is shown in FIG. 6;
and S7, judging whether the optimal global automatic layout effect of the chip obtained in the step S6 meets the design rule, if so, adopting the optimal global automatic layout information of the chip to perform global automatic layout of the chip, and otherwise, returning to the step S5 to continue deep reinforcement learning of the global automatic layout of the chip.
The above-mentioned embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereby, and all changes made in the shape and principle of the present invention should be covered within the scope of the present invention.
Claims (9)
1. The chip global automatic layout method based on deep reinforcement learning is characterized by comprising the following steps:
s1, inputting chip layout information;
s2, preprocessing the chip layout information, wherein the chip layout information comprises design rules;
s3, performing reinforcement learning on the local layout of the chip to obtain optimal local layout information of the chip;
s4, judging whether the optimal chip local layout information obtained in the step S3 meets the design rule, if so, entering a step S5, otherwise, returning to the step S3 to perform reinforcement learning of the chip local layout again;
s5, performing deep reinforcement learning of the global automatic layout of the chip by combining the optimal local layout information of the chip to obtain the optimal global automatic layout information of the chip;
s6, performing filling layout according to the optimal chip global automatic layout information obtained in the step S5 to obtain an optimal chip global automatic layout effect;
and S7, judging whether the optimal global automatic layout effect of the chip obtained in the step S6 meets the design rule, if so, adopting the optimal global automatic layout information of the chip to perform global automatic layout of the chip, and otherwise, returning to the step S5 to continue deep reinforcement learning of the global automatic layout of the chip.
2. The chip global automatic layout method based on deep reinforcement learning of claim 1, wherein the preprocessing of layout information comprises:
s2-1, grid preprocessing: setting a grid as a square and establishing a rectangular coordinate system, wherein the horizontal axis is x, the vertical axis is y, in the grid, sides are represented by e, wiring capacity between the sides is represented by ce, and the information of the central point of the ith grid G is represented by Gi={xi,yi,cei}, setSetting the number of grids;
s2-2, macro unit pretreatment: and (2) regarding each macro unit as a rectangle, sorting the macro units according to the size by using a quick sorting algorithm, and forming a sorting sequence set by using the sorting result as an input set:
H={Si,i=1,...,N}
wherein S isi=(Li,Wi,Pi) Is a tuple, representing the area of a macro-cell with position information, LiDenotes the length of the macro-unit, WiWidth, P, of the macro-celliIndicating the location information of the macro-cell, i.e. Pi={xi,yiN represents the total number of macro-cells;
s2-3, standard unit pretreatment: standard cells are divided into two cell clusters:
1) Attachment macrocell HiThe standard cell is an attached standard cell cluster, and is marked as BiThen B isi={bi1,bi2,…,bin};
2) If the standard cell not attached to the macro cell is a discrete standard cell cluster, denoted as B, then B = { B =1,b2,...,bn}:
S2-4, designing rules.
3. The deep reinforcement learning-based chip global automatic layout method according to claim 2, wherein the reinforcement learning local layout comprises:
s3-1, inputting macro cell sequence H = { S to layout areaiI =1,.. Cndot.N } and its attached standard cell cluster Bi={bi1,bi2,…,binAnd randomly dispersing in a form of aggregate clusters;
s3-2, for each macrocell SiAnd randomly placed attached standard cell cluster BiUsing the local layout model of the electrostatic system to perform initial layout to make the attached standard cell cluster BiPerforming a distributed movement so that macro-cells SiAnd attached standard cell cluster BiOverall electrostatic equilibrium is formedInitial local layout information sequence state S;
s3-3, extracting characteristic information from an initial layout information state S obtained after the initial layout of the local layout model of the electrostatic system, enabling the characteristic information to be phi (S), inputting the phi (S) into an Actor-Critic reinforcement learning network, obtaining an optimal layout strategy through network training, outputting an optimal initial local layout according to the optimal layout strategy, and outputting an Actor network parameter theta and a Critic parameter omega corresponding to the optimal strategy;
s3-4, normalizing the unit module by using a rectangle to obtain a macro unit module with the length of LNWidth of WNArea is SNAnd the output information sequence is as follows:
HN={SN1,SN2,..,SNn}
wherein S isN={LN,WN,PN},LNTo update the length of the module, WNTo update the width of the module, PNIs the location information of the module.
4. The deep reinforcement learning-based chip global automatic layout method according to claim 3, wherein in the step S3-3, the specific setting and steps are as follows:
s3-3-1, markov decision:
1) And a state S: initial local layout information sequence state formed by the local layout model of the electrostatic system, including macro-cell information SiAnd its attached standard cell cluster BiLength and width of (a) and its position information in the grid;
2) Action set A: a set of actions that all standard cells may take;
3) Attenuation factor γ: setting gamma to be 1, and indicating that all subsequent states are consistent with the current reward;
4) The exploration rate belongs to the group: performing value iteration by using an element-greedy method, namely setting a smaller element value, greedily selecting the behavior which is considered to be the maximum behavior value at present by using a probability of 1-element, and randomly selecting the behavior from all m selectable behaviors by using the element probability; is formulated as:
wherein a represents an action and s represents a state;
s3-3-2, constraint setting;
s3-3-3, setting a loss function;
5. The deep reinforcement learning-based chip global automatic layout method according to claim 4, wherein the constraint setting comprises:
1) Wire length constraint:
adopting a half-perimeter line length which is closest to a Steiner tree, and the lowest cost of wiring, wherein the calculation formula is as follows:
HPWL(i)=(maxb∈i{xb}-minb∈i{xb})+(maxb∈i{yb}-minb∈i{yb})
wherein xbAnd ybExpressing the x and y coordinates of the grid i, and summing HPWL (i), aiming at improving the convergence rate of a line length model and the accuracy of index judgment, normalizing the sum of the bus lengths between the macro unit and the standard unit by a normalization factor q, wherein the normalized bus length formula is as follows:
one of the goals is to make HPWL as small as possible, netlist denotes net;
2) And (3) congestion constraint:
by using maximumThe overflow mode is used as congestion measurement to evaluate whether the layout can be distributed; the maximum overflow mode is expressed as: OF (e) = max (ω)e+be-ce0); in order to make overflow of the grid boundary easily absorbed by the adjacent area and ensure routability of the design, the following congestion evaluation formula is used:
congestion(e)=100×(ωe+be)/ce
wherein c iseMaximum capacity of edge e, beIs routing congestion on edge e, ωeFor wiring occupation on the edge e, a congestion of less than 50% is considered to be routable, and the goal is to make the congestion degree smaller as better;
3) And (3) density constraint: for density constraint, the design space utilization function is applied in the local layout, and is specifically designed as follows:
according to the sorted macro unit and the restriction rule and the space utilization function F to the macro unit S1And macro cell S2Combining, calculating the space utilization rate F after combination, and merging the macro units when the space utilization rate reaches the preset requirement; the set rules are as follows:
macroblock unit S to be merged1The information of (b) is: long L1Width W1In the position P1Area S1=L1×W1;
Macroblock unit S to be merged2The information of (b) is: long L2Width W2In the position P2Area S2=L2×W2;
Combined into a new macroblock unit SN: length of LNWidth of WNIn the position PNArea SN=LN×WN;
Wherein L isNAnd WNThe following rules are satisfied:
max(LN,WN)≤min(L,W)
in order for the policy network not to place the macro-cells in locations that would cause the density to exceed the target density maximum or cause macro-overlap, the layout of the macro-cells satisfies the following area constraint:
the space utilization function is:
where L is the length and W is the width, the objective is to make the space utilization F as large as possible.
6. The deep reinforcement learning-based chip global automatic layout method according to claim 5, wherein the setting of the loss function comprises:
1) The reward function R sets: the total line length, the congestion degree and the waste rate are weighted and summed to form a single-target reward function, wherein the weighting factor lambda is1And λ2Mainly used for balancing the influence of three indexes, the reward function for policy network optimization is as follows
R=-Wirelength-λ1Congestion+λ2F
S.t.minS≤SN≤maxS
Wherein, wirelength represents the total line length, congestion represents the total Congestion degree, F represents the space utilization rate, and lambda represents1And λ2The weight of congestion degree and space utilization rate is respectively, lambda is more than or equal to 01≤1,0≤λ2≤1,λ1+λ2=1 and λ1>λ2The occupation weight of the congestion is higher than the weight of the loss rate, namely the routability of the wiring is firstly ensured, and the utilization rate of the area is considered;
2) Setting a loss function:
setting an optimized function target; setting the optimization target to the average value at each time step, i.e.
The gradient derived for θ for this equation is as follows:
estimating a strategy gradient for optimization;for the scoring function, the direction of updating the parameters is indicated, which uses the Softemax policy function, and the linear combination of the characteristics phi (s, a) describing the state and behavior and the parameters theta is used to weigh the probability of an action occurring, namely:
the score function by derivation is:
7. the deep reinforcement learning-based chip global automatic layout method according to claim 6, wherein the step S3-3-4 comprises:
inputting iteration times T, a state dimension n, an action set A, a step length alpha, an attenuation factor gamma, an exploration rate E, a Critic network structure and an Actor network structure;
the updating process comprises the following steps:
a1, randomly initializing values Q corresponding to all states and actions, i =1;
a2, initializing S to be the first state of the current state sequence to obtain a feature vector phi (S);
a3, using phi (S) as an input in an Actor network, outputting an action set A, and obtaining a new state S' based on the action set A, and feeding back R;
a4, respectively using phi (S) and phi (S ') as inputs in a Critic network to obtain Q value outputs V (S) and V (S');
a5, calculating a TD error δ = R + γ V (S') -V (S);
a8, using a mean square error loss function sigma (R + gamma V (S') -V (S, omega))2Updating parameters serving as Critic network parameters omega;
8. The deep reinforcement learning-based chip global automatic layout method according to claim 7, wherein the step S5 comprises:
s5-1, designing two-layer network structures which are a public network and a local network respectively; the public network comprises the functions of an Actor network and a Critic network;
s5-2, calculating gradient estimation of each local layoutAccumulating and summing, and inputting the obtained Critic network parameter omega and the Actor network parameter theta with optimal local layout into a public network;
s5-3, updating the public network by using the obtained accumulated gradient estimation, and outputting a corresponding optimal strategy if the accumulated gradient estimation is converged in the updating process, otherwise, returning to the step S5-2;
s5-4, updating the macro module H through the optimal strategyNPerforming layout, and finally finishing the global automatic layout, wherein the updated module information sequence is HNNAnd outputs global automatic layout information HNN。
9. The deep reinforcement learning-based chip global automatic layout method according to claim 8, wherein the step S6 comprises:
s6-1, and arranging global automatic layout information HNNInputting the data into a force guidance method resolver;
s6-2, using a force guiding method to enable the discrete standard unit cluster B = { B = { B }1,b2,...,bnFill up, make the discrete standard cell b by the continuous action of attraction and repulsioniAfter moving continuously, the movement tends to be balanced until relative displacement does not occur any more, energy is consumed continuously, and finally the movement tends to zero;
and S6-3, outputting the optimal global automatic layout effect of the chip.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210718626.0A CN115270698A (en) | 2022-06-23 | 2022-06-23 | Chip global automatic layout method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210718626.0A CN115270698A (en) | 2022-06-23 | 2022-06-23 | Chip global automatic layout method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115270698A true CN115270698A (en) | 2022-11-01 |
Family
ID=83762285
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210718626.0A Pending CN115270698A (en) | 2022-06-23 | 2022-06-23 | Chip global automatic layout method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115270698A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116562218A (en) * | 2023-05-05 | 2023-08-08 | 之江实验室 | Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning |
CN116738923A (en) * | 2023-04-04 | 2023-09-12 | 暨南大学 | Chip layout optimization method based on reinforcement learning with constraint |
CN116911245A (en) * | 2023-07-31 | 2023-10-20 | 曲阜师范大学 | Layout method, system, equipment and storage medium of integrated circuit |
CN117972812A (en) * | 2024-03-26 | 2024-05-03 | 中国石油大学(华东) | Engineering drawing layout optimization method, device, equipment and medium |
CN117972812B (en) * | 2024-03-26 | 2024-06-07 | 中国石油大学(华东) | Engineering drawing layout optimization method, device, equipment and medium |
-
2022
- 2022-06-23 CN CN202210718626.0A patent/CN115270698A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738923A (en) * | 2023-04-04 | 2023-09-12 | 暨南大学 | Chip layout optimization method based on reinforcement learning with constraint |
CN116738923B (en) * | 2023-04-04 | 2024-04-05 | 暨南大学 | Chip layout optimization method based on reinforcement learning with constraint |
CN116562218A (en) * | 2023-05-05 | 2023-08-08 | 之江实验室 | Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning |
CN116562218B (en) * | 2023-05-05 | 2024-02-20 | 之江实验室 | Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning |
CN116911245A (en) * | 2023-07-31 | 2023-10-20 | 曲阜师范大学 | Layout method, system, equipment and storage medium of integrated circuit |
CN116911245B (en) * | 2023-07-31 | 2024-03-08 | 曲阜师范大学 | Layout method, system, equipment and storage medium of integrated circuit |
CN117972812A (en) * | 2024-03-26 | 2024-05-03 | 中国石油大学(华东) | Engineering drawing layout optimization method, device, equipment and medium |
CN117972812B (en) * | 2024-03-26 | 2024-06-07 | 中国石油大学(华东) | Engineering drawing layout optimization method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7234370B2 (en) | Generating Integrated Circuit Floorplans Using Neural Networks | |
CN115270698A (en) | Chip global automatic layout method based on deep reinforcement learning | |
US11216609B2 (en) | Generating integrated circuit placements using neural networks | |
CN114896937A (en) | Integrated circuit layout optimization method based on reinforcement learning | |
US6415426B1 (en) | Dynamic weighting and/or target zone analysis in timing driven placement of cells of an integrated circuit design | |
Chen et al. | Improved particle swarm optimization-based form-finding method for suspension bridge installation analysis | |
Tang et al. | A survey on Steiner tree construction and global routing for VLSI design | |
CN114154412A (en) | Optimized chip layout system and method based on deep reinforcement learning | |
CN112836411A (en) | Method and device for optimizing structure of stiffened plate shell, computer equipment and storage medium | |
Shu et al. | An online variable-fidelity optimization approach for multi-objective design optimization | |
Mansour et al. | Allocating data to multicomputer nodes by physical optimization algorithms for loosely synchronous computations | |
Liao et al. | Dreamplace 4.0: Timing-driven placement with momentum-based net weighting and lagrangian-based refinement | |
El Mourabit | Optimization of concrete beam bridges: development of software for design automation and cost optimization | |
KR20240019305A (en) | Machine learning-based power/ground (P/G) with rejection | |
Zhou et al. | Supervised-learning congestion predictor for routability-driven global routing | |
US20220391566A1 (en) | Machine learning models for predicting detailed routing topology and track usage for accurate resistance and capacitance estimation for electronic circuit designs | |
Singh et al. | Hybrid particle swarm optimization for pure integer linear solid transportation problem | |
Hanchate et al. | A linear time algorithm for wire sizing with simultaneous optimization of interconnect delay and crosstalk noise | |
JP3433025B2 (en) | Module placement method | |
US11741282B2 (en) | Reinforcement learning-based adjustment of digital circuits | |
Fan et al. | Technology mapping with crosstalk noise avoidance | |
Jiang et al. | Accelerating Routability and Timing Optimization with Open-Source AI4EDA Dataset CircuitNet and Heterogeneous Platforms | |
Mangiras | Timing Optimization Techniques for the Scalable Physical Synthesis of Digital Integrated Circuits | |
Zhang | Handling the complexity of routing problem in modern VLSI design | |
CN118140225A (en) | Layout method, apparatus, device, storage medium and program product for standard cell |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |