CN115270698A

CN115270698A - Chip global automatic layout method based on deep reinforcement learning

Info

Publication number: CN115270698A
Application number: CN202210718626.0A
Authority: CN
Inventors: 陈学松; 敖启缘; 蔡述庭; 张丽丽
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2022-11-01

Abstract

The invention discloses a chip global automatic layout method based on deep reinforcement learning, which can quickly lay out a super-large-scale integrated circuit chip, can ensure the convergence of a layout result to realize quick layout, and enables the line length, congestion and area of layout and wiring to be approximately optimal. In addition, a space utilization method is provided in the chip global automatic layout method, and the space utilization method is applied to local layout and global automatic layout, so that the areas of the local layout and the global automatic layout are minimized. And in addition, by applying an asynchronous training network structure and learning and training through the network structure, the correlation between the local layout and the global automatic layout is tighter, the layout result is easier to converge, and the reliability of the global automatic layout of the chip can be realized.

Description

Chip global automatic layout method based on deep reinforcement learning

Technical Field

The invention relates to the technical field of electronic design automation, in particular to a chip global automatic layout method based on deep reinforcement learning.

Background

Nowadays, with the rapid development of integrated circuits, the problems faced by Electronic Design Automation (EDA) technology are increasingly complex, and the circuit scale and the amount of data to be processed are increasing. The speed with which EDA technology can be developed to keep pace with the rapid advances in design and manufacturing processes has become a critical issue. Placing the wiring is an important and time consuming step in the physical design phase of the integrated circuit. First, the layout process involves a large number of iterations and optimizations, and the time required can significantly impact the integrated circuit design cycle time. Secondly, in the physical design of an integrated circuit, there is a close relationship between the steps, and the result of the layout affects the routability and the parameters of the routing process such as the running time, the degree of congestion and the routing rate. In recent years, in addition to being driven by line length and time delay, attention has been paid to a layout algorithm driven by routability. Despite significant advances in layout algorithms over the past few decades, fast and efficient layout remains a challenging problem.

Global automatic layout is a long-standing challenge in chip design, requiring multi-objective optimization of increasingly complex circuits. To solve the layout problem in the chip, the researchers have proposed parser-based solutions, including non-linear optimizers, as well as more advanced quadratic methods developed after the rise of modern analytical techniques, and more recently, electrostatic-based methods and alternative methods, which are based on updating the location of cells in a gradient optimization scheme, and can typically handle millions of standard cells by parallelization on the CPU using partitions to reduce runtime. Google also presented the first end-to-end learning method for macro placement that modeled chip placement as a sequential decision problem. The japanese happy village ministry patent uses Q learning to design the layout and wiring; deep learning based routability-driven placement algorithms are also proposed in the literature (Haui, chuiyi, zhouqiang, king Rui. DrPlace: deep learning based routability-driven placement algorithms [ J ]. Computer aided design and graphics newspapers, 2021,33 (04): 624-631). Although previous work performed the heavy numerical computation of the very large scale optimization problem on the CPU, there was room for optimization improvements in both layout quality and speed of layout.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a chip global automatic layout method based on deep reinforcement learning, and aims to realize rapid global automatic layout of a chip and obtain an approximately optimal solution by applying the deep reinforcement learning in a super-large-scale integrated circuit.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

the chip global automatic layout method based on deep reinforcement learning comprises the following steps:

s1, inputting chip layout information;

s2, preprocessing the chip layout information, wherein the chip layout information comprises design rules;

s3, performing reinforcement learning on the local layout of the chip to obtain optimal local layout information of the chip;

s4, judging whether the optimal chip local layout information obtained in the step S3 meets the design rule, if so, entering the step S5, and if not, returning to the step S3 to perform reinforcement learning of the chip local layout again;

s5, performing deep reinforcement learning of the global automatic layout of the chip by combining the optimal local layout information of the chip to obtain the optimal global automatic layout information of the chip;

s6, performing filling layout according to the optimal chip global automatic layout information obtained in the step S5 to obtain an optimal chip global automatic layout effect;

and S7, judging whether the optimal global automatic layout effect of the chip obtained in the step S6 meets the design rule, if so, adopting the optimal global automatic layout information of the chip to perform global automatic layout of the chip, and otherwise, returning to the step S5 to continue deep reinforcement learning of the global automatic layout of the chip.

Further, the layout information is preprocessed, including:

s2-1, grid preprocessing: setting the grid as square and establishing a rectangular coordinate system with x as the horizontal axis and y as the vertical axis, wherein in the grid, the edge is represented by e, and the edge is arranged in a way of being parallel to the edgeWiring capacity between c_eShowing that the center point information of the ith lattice G is marked as G_i＝{x_i，y_i，c_ei}, setting a grid number;

s2-2, macro unit pretreatment: each macro unit is regarded as a rectangle, the macro units are sorted according to the size by using a quick sorting algorithm, and a sorting sequence set is formed by the sorting result and serves as an input set:

H＝{S_i，i＝1，...，N}

wherein S is_i＝(L_i，W_i，P_i) Representing the area of the macro-cell with position information, L, for a tuple_iRepresents the length of the macro-cell, W_iWidth, P, of macro-cell_iIndicating location information of macro-cells, i.e. P_i＝{x_i，y_iN denotes the total number of macro-cells;

s2-3, standard unit pretreatment: standard cells are divided into two cell clusters:

1) Dependent macrocell H_iThe standard cell is an attached standard cell cluster marked as B_iThen B is_i＝{b_i1，，b_i2，...，b_in}；

2) If the standard cell not attached to the macro cell is a discrete standard cell cluster, denoted as B, then B = { B =₁，b₂，...，b_n}；

S2-4, designing rules.

Further, the reinforcement learning local layout comprises:

s3-1, inputting macro cell sequence H = { S ] into layout area_iI = 1.. Ang.N } and its attached standard cell cluster B_i＝{b_i1，b_i2，...，b_inAnd randomly dispersing in a form of aggregate clusters;

s3-2, for each macrocell S_iAnd randomly placed attached standard cell cluster B_iUsing the initial layout of the local layout model of the electrostatic system to make the attached standard cell cluster B_iPerforming a distributed movement so that macro-cells S_iAnd attached standard cell cluster B_iCarrying out overall electrostatic balance to form an initial local layout information sequence state S;

s3-3, extracting characteristic information from an initial layout information state S obtained after the initial layout of the static system local layout model, enabling the characteristic information to be phi (S), inputting the phi (S) into an Actor-Critic reinforcement learning network, obtaining an optimal layout strategy through network training, outputting an optimal initial local layout according to the optimal layout strategy, and outputting an Actor network parameter theta and a Critic parameter omega corresponding to the optimal strategy;

s3-4, normalizing the unit module by using a rectangle to obtain a macro unit module with the length of L_NWidth of W_NArea is S_NAnd the output information sequence is as follows:

H_N＝{S_N1，S_N2，..，S_Nn}

wherein S is_N＝{L_N，W_N，P_N}，L_NTo update the length of the module, W_NTo update the width, P, of a module_NIs the location information of the module.

Further, in the step S3-3, the specific setting and steps are as follows:

s3-3-1, markov decision:

1) And a state S: initial local layout information sequence state formed by the local layout model of the electrostatic system, including macro-cell information S_iAnd its attached standard cell cluster B_iLength and width of (a) and its location information in the grid;

2) Action set A: a set of possible actions taken by all standard cells;

3) Attenuation factor γ: setting gamma to 1, indicating that all subsequent states are consistent with the current reward;

4) The exploration rate belongs to the scope of: performing value iteration by using an element-greedy method, namely setting a smaller element value, greedily selecting the behavior which is considered to be the maximum behavior value at present by using a probability of 1-element, and randomly selecting the behavior from all m selectable behaviors by using the element probability; is formulated as:

wherein a represents an action and s represents a state;

s3-3-2, constraint setting;

s3-3-3, setting a loss function;

s3-3-4, updating the network parameters to obtain Actor network parameters theta, critic network parameters omega and strategy gradient estimation

Further, the constraint setting includes:

1) Wire length constraint:

the half-perimeter line length is adopted and is closest to a Steiner tree, the lowest cost of wiring is obtained, and the calculation formula is as follows:

HPWL(i)＝(max_b∈i{x_b}-min_b∈i{x_b})+(max_b∈i{y_b}-min_b∈i{y_b})

wherein x_bAnd y_bExpressing the x and y coordinates of the grid i, and summing HPWL (i), aiming at improving the convergence rate of a line length model and the accuracy of index judgment, normalizing the sum of the bus lengths between the macro unit and the standard unit by a normalization factor q, wherein the normalized bus length formula is as follows:

one of the goals is to make HPWL as small as possible, netlist denotes net;

2) And (3) congestion constraint:

evaluating whether the layout is distributable or not by taking a maximum overflow mode as congestion measurement; the maximum overflow mode is expressed as: OF (e) = max (ω)_e+b_e-c_e0); in order to make overflow of grid boundary easily absorbed by adjacent region and ensure routability of design, useThe following congestion evaluation formula:

congestion(e)＝100×(ω_e+b_e)/c_e

wherein c is_eMaximum capacity of edge e, b_eIs routing congestion on edge e, ω_eFor wiring occupation on the edge e, a congestion of less than 50% is considered to be routable, and the goal is to make the congestion degree smaller as better;

3) And (3) density constraint: for density constraint, the design space utilization function is applied in the local layout, and is specifically designed as follows:

according to the sorted macro unit and the restriction rule and the space utilization function F to the macro unit S₁And macro cell S₂Combining, calculating the space utilization rate F after combination, and merging the macro units when the space utilization rate reaches the preset requirement; the set rules are as follows:

macroblock unit S to be merged₁The information of (1) is: long L₁Width W₁In the position P₁Area S₁＝L₁×W₁；

Macroblock unit S to be merged₂The information of (b) is: long L₂Width W₂In the position P₂Area S₂＝L₂×W₂；

Combined into a new macroblock unit S_N: length of L_NWidth of W_NIn the position P_NArea S_N＝L_N×W_N；

Wherein L is_NAnd W_NThe following rules are satisfied:

max(L_N，W_N)≤min(L，W)

in order for the policy network not to place the macro-cells in locations that would cause the density to exceed the target density maximum or cause macro-overlap, the layout of the macro-cells satisfies the following area constraint:

the space utilization function is:

where L is the length and W is the width, the objective is to make the space utilization F as large as possible.

Further, the loss function setting includes:

1) Setting a reward function: the total line length, the congestion degree and the waste rate are weighted and summed to form a single-target reward function, wherein the weighting factor lambda₁And λ₂Mainly used for balancing the influence of three indexes, the reward function for policy network optimization is as follows:

R＝-Wirelength-λ₁Congestion+λ₂F

S.t.minS≤S_N≤maxS

wherein, wirelength represents the total line length, congestion represents the total Congestion degree, F represents the space utilization rate, and lambda represents₁And λ₂The weight of congestion degree and space utilization rate is respectively, lambda is more than or equal to 0₁≤1，0≤λ₂≤1，λ₁+λ₂=1 and λ₁＞λ₂The occupation weight of the congestion is higher than the weight of the loss rate, namely the routability of the wiring is firstly ensured, and the utilization rate of the area is considered;

2) Setting a loss function:

setting an optimized function target; setting the optimization target to the average value at each time step, i.e.

The gradient derived from θ for this equation is as follows:

estimating the strategy gradient to be optimized;

for the scoring function, the direction of parameter update is indicated, which uses the Softemax policy function, and the linear combination of the features phi (s, a) describing the state and behavior and the parameter theta is used to weigh the probability of a behavior occurring, namely:

the score function by derivation is:

further, the step S3-3-4 includes:

inputting iteration times T, a state dimension n, an action set A, a step length a, an attenuation factor gamma, an exploration rate E, a Critic network structure and an Actor network structure;

the updating process comprises the following steps:

a1, randomly initializing values Q corresponding to all states and actions, i =1;

a2, initializing S to be the first state of the current state sequence to obtain a feature vector phi (S);

a3, using phi (S) as an input in an Actor network, outputting an action set A, and obtaining a new state S' based on the action set A, and feeding back R;

a4, respectively using phi (S) and phi (S ') as inputs in a Critic network to obtain Q value outputs V (S) and V (S');

a5, calculating a TD error δ = R + γ V (S') -V (S);

a6, V and Q conversion:

a7, calculating strategy gradient estimation:

a8, using a mean square error loss function sigma (R + gamma V (S') -V (S, omega))²Updating parameters serving as Critic network parameters omega;

a9, updating an Actor network parameter theta:

a10, judging whether i is smaller than the iteration number T, if so, i = i +1, returning to the step A2, otherwise, outputting the latest Critic network parameter omega, the Actor network parameter theta and the strategy gradient estimation

Further, the step S5 includes:

s5-1, designing two-layer network structures which are a public network and a local network respectively; the public network comprises functions of an Actor network and a Critic network;

s5-2, calculating gradient estimation of each local layout

Accumulating and summing, and inputting the obtained Critic network parameter omega and the Actor network parameter theta with optimal local layout into a public network;

s5-3, updating the public network by using the obtained accumulated gradient estimation, and outputting a corresponding optimal strategy if the accumulated gradient estimation is converged in the updating process, otherwise, returning to the step S5-2;

s5-4, updating the macro module H through the optimal strategy_NLayout is carried out, the global automatic layout is finally completed, and the updated module information sequence is H_NNAnd outputs global automatic layout information H_NN。

Further, the step S6 includes:

s6-1, and carrying out global automatic layout on information H_NNInputting the data into a force guidance method resolver;

s6-2, using a force guiding method to guide the spare part type standard cell cluster B = { B = { (B) }₁，b₂，...，b_nFill up, make the discrete standard cell b by the continuous action of attraction and repulsion_iThe balance is approached after the movement is continuously carried out until the relative displacement does not occur any more, the energy is continuously consumed, and finally the balance is approached to zero;

and S6-3, outputting the optimal global automatic layout effect of the chip.

Compared with the prior art, the principle and the advantages of the scheme are as follows:

the scheme can quickly arrange the ultra-large scale integrated circuit chip, can ensure the convergence of the arrangement result to realize quick arrangement, and leads the wire length, congestion and area of the arrangement wiring to be approximately optimal. In addition, the scheme also provides a method for space utilization rate, and the method is applied to local layout and global automatic layout, so that the areas of the local layout and the global automatic layout are minimized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the services required for the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a chip global automatic layout method based on deep reinforcement learning according to the present invention;

FIG. 2 is a schematic diagram of an initial layout of a local layout model of an electrostatic system (U is macros and their attachment units scattered locally at random; V is initial layout information obtained by local layout);

FIG. 3 is a schematic diagram of space utilization;

FIG. 4 is a partial layout diagram of an Actor-critical neural network;

FIG. 5 is a diagram illustrating a hierarchical reinforcement learning global automatic layout structure;

FIG. 6 is a diagram of global automatic layout filling.

Detailed Description

The invention is further illustrated by the following specific examples:

as shown in fig. 1, the chip global automatic layout method based on deep reinforcement learning according to this embodiment includes the following steps:

s1, inputting chip layout information;

s2, preprocessing the chip layout information, including:

s2-1, grid preprocessing: setting the grid as a square and establishing a rectangular coordinate system with the horizontal axis as x and the vertical axis as y, in the grid, the side is represented by e, and the wiring capacity between the sides is represented by c_eIndicating that the center point information of the ith cell G is G_i＝{x_i，y_i，c_ei}, setting the grid number to 2ⁿ×2ⁿN is a positive integer;

s2-2, macro unit pretreatment: and (2) regarding each macro unit as a rectangle, sorting the macro units according to the size by using a quick sorting algorithm, and forming a sorting sequence set by using the sorting result as an input set:

H＝{S_i，i＝1，...，N}

wherein S is_i＝(L_i，W_i，P_i) Representing the area of the macro-cell with position information, L, for a tuple_iDenotes the length of the macro-unit, W_iWidth, P, of macro-cell_iIndicating the location information of the macro-cell, i.e. P_i＝{x_i，y_iN represents the total number of macro-cells;

s2-3, standard unit pretreatment: standard cells are divided into two clusters of cells:

1) Attachment macrocell H_iThe standard cell is an attached standard cell cluster, and is marked as B_iThen B is_i＝{b_i1，b_i2，...，b_in}；

S2-4, design rule:

1) The method is based on the layout principle of big first, small first and difficult second, i.e. the macro cell circuit and the core cell should be laid out preferentially;

2) The layout should meet the following requirements as much as possible: the total connecting line is as short as possible, and the key signal line is shortest;

3) Density priority principle: routing is started from the area with the most dense and complex connection relation;

4) And the layout is optimized according to the standards of uniform distribution, balanced gravity center and beautiful layout.

in this step, the reinforcement learning local layout includes:

s3-1, inputting macro cell sequence H = { S ] into layout area_iI =1,.. Cndot.N } and its attached standard cell cluster B_i＝{b_i1，b_i2，...，b_inD, randomly dispersing in a form of aggregate clusters;

s3-2, for each macrocell S_iAnd randomly placed attached standard cell cluster B_iUsing the local layout model of the electrostatic system to perform initial layout to make the attached standard cell cluster B_iPerforming a distributed movement so that macro-cells S_iAnd attached standard cell cluster B_iOverall electrostatic balancing, forming an initial local layout information sequence state S, as shown in fig. 2;

the specific setting and steps of the step are as follows:

s3-3-1, markov decision:

1) And a state S: initial local layout information sequence state formed by the local layout model of the electrostatic system, including macro-cell information S_iAnd its attached standard cell cluster B_iLength and width of (a) and its position information in the grid;

2) Action set A: a set of possible actions taken by all standard cells;

4) The exploration rate belongs to the scope of: performing value iteration by using a epsilon-greedy method, namely setting a smaller epsilon value, greedy selecting the behavior which is considered as the maximum behavior value at present by using a probability of 1-epsilon, and randomly selecting the behavior from all m selectable behaviors by using the probability of epsilon; is formulated as:

wherein, a represents action, s represents state;

s3-3-2, constraint setting:

1) Wire length constraint:

adopting a half-perimeter line length which is closest to a Steiner tree, and the lowest cost of wiring, wherein the calculation formula is as follows:

HPWL(i)＝(max_b∈i{x_b}-min_b∈i{x_b})+(max_b∈i{y_b}-min_b∈i{y_b})

one of the goals is to make HPWL as small as possible, netlist denotes net;

2) And (3) congestion constraint:

evaluating whether the layout is distributable or not by taking a maximum overflow mode as congestion measurement; the maximum overflow mode is expressed as: OF (e) = max (ω)_e+b_e-c_e0); in order to make overflow of the grid boundary easily absorbed by the adjacent area and ensure routability of the design, the following congestion evaluation formula is used:

congestion(e)＝100×(ω_e+b_e)/c_e

wherein c is_eMaximum capacity of edge e, b_eIs routing congestion on edge e, omega_eThe wiring on the edge e occupies, the condition that the congestion is less than 50% is regarded as distributable, and the goal is to make the congestion degree smaller and better;

3) And (3) density constraint: for density constraint, the design space utilization function of this embodiment is applied to local layout and global automatic layout, and the design space utilization function is spliced by two macro-units, as shown in fig. 3, and the specific design is as follows:

macroblock unit S to be merged₁The information of (b) is: long L₁Width W₁In the position P₁Area S₁＝L₁×W₁；

Macroblock unit S to be merged₂The information of (1) is: long L₂Width W₂In the position P₂Area S₂＝L₂×W₂；

Combined into a new macroblock unit S_N: length of L_NIs wide and wideIs W_NIn the position P_NArea S_N＝L_N×W_N；

Wherein L is_NAnd W_NThe following rules are satisfied:

max(L_N，W_N)≤min(L，W)

in order for the policy network not to place the macro-cells in locations that would cause the density to exceed the target density maximum or cause macro overlap, the layout of the macro-cells satisfies the following area constraints:

the space utilization function is:

S3-3-3, setting a loss function:

R＝-Wirelength-λ₁Congestion+λ₂F

S.t.minS≤S_N≤maxS

2) Setting a loss function:

The gradient derived from θ for this equation is as follows:

estimating a strategy gradient for optimization;

the score function by derivation is:

Specifically, 2 identical neural networks were used, as shown in fig. 4. Wherein Critic uses a neural network to output an optimal value V_tAnd calculating TD error delta, V_tAnd the TD error delta is output to an Actor network, and the Actor utilizes V_tThe parameter theta of the strategy function is iteratively updated by the optimal value, further action A is selected, and feedback R and a new state S are obtained_t+1(ii) a Critic uses feedback reward R and new state S_t+1Parameters omega in the neural network are updated, and then Critic uses the new network parameters to help the Actor compute the optimal value V of the state_t. Continuously updating the Actor network parameter theta and the Critic network parameter omega through the above circulation until the strategy gradient is estimated

Converging, and then outputting final Actor network parameter theta, critic network parameter omega and strategy gradient estimation

The specific process is as follows:

inputting iteration times T, a state dimension n, an action set A, a step length alpha, an attenuation factor gamma, an exploration rate E, a Critic network structure and an Actor network structure;

the updating process comprises the following steps:

a3, using phi (S) as an input in the Actor network, outputting an action set A, and obtaining a new state S' based on the action set A, and feeding back R;

a5, calculating a TD error delta = R + gamma V (S') -V (S);

a6, V and Q conversion:

a7, calculating strategy gradient estimation:

a9, updating an Actor network parameter theta:

H_N＝{S_N1，S_N2，..，S_Nn}

s5, performing deep reinforcement learning of the global automatic layout of the chip by combining the optimal local layout information of the chip to obtain the optimal global automatic layout information of the chip; the method specifically comprises the following steps:

s5-1, designing a two-layer network structure which is a public network and a local network respectively. The local network has i working threads, and the specific design of the network structure of a single thread is shown in step S3; the public network comprises functions of an Actor network and a Critic network, and the overall neural network model is shown in figure 5;

s5-2, calculating gradient estimation of each local layout

S6, performing filling layout according to the optimal chip global automatic layout information obtained in the step S5 to obtain an optimal chip global automatic layout effect; the method specifically comprises the following steps:

s6-1, and arranging the global automatic layout information H_NNInput into a force-guided resolver, H_NNThe information of the macro unit module, such as the position, the line length, the congestion and the like is included;

s6-2, using a force guiding method to guide the spare part type standard cell cluster B = { B = { (B) }₁，b₂，...，b_nFill-in, make the discrete standard cells b continuously act by attraction and repulsion_iThe balance is approached after the movement is continuously carried out until the relative displacement does not occur any more, the energy is continuously consumed, and finally the balance is approached to zero;

s6-3, outputting an optimal chip global automatic layout effect, wherein the effect is shown in FIG. 6;

The above-mentioned embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereby, and all changes made in the shape and principle of the present invention should be covered within the scope of the present invention.

Claims

1. The chip global automatic layout method based on deep reinforcement learning is characterized by comprising the following steps:

s1, inputting chip layout information;

s4, judging whether the optimal chip local layout information obtained in the step S3 meets the design rule, if so, entering a step S5, otherwise, returning to the step S3 to perform reinforcement learning of the chip local layout again;

2. The chip global automatic layout method based on deep reinforcement learning of claim 1, wherein the preprocessing of layout information comprises:

s2-1, grid preprocessing: setting a grid as a square and establishing a rectangular coordinate system, wherein the horizontal axis is x, the vertical axis is y, in the grid, sides are represented by e, wiring capacity between the sides is represented by ce, and the information of the central point of the ith grid G is represented by G_i＝{x_i，y_i，c_ei}, setSetting the number of grids;

H＝{S_i，i＝1，...，N}

wherein S is_i＝(L_i，W_i，P_i) Is a tuple, representing the area of a macro-cell with position information, L_iDenotes the length of the macro-unit, W_iWidth, P, of the macro-cell_iIndicating the location information of the macro-cell, i.e. P_i＝{x_i，y_iN represents the total number of macro-cells;

1) Attachment macrocell H_iThe standard cell is an attached standard cell cluster, and is marked as B_iThen B is_i＝{b_i1，b_i2，…，b_in}；

2) If the standard cell not attached to the macro cell is a discrete standard cell cluster, denoted as B, then B = { B =₁，b₂，...，b_n}：

S2-4, designing rules.

3. The deep reinforcement learning-based chip global automatic layout method according to claim 2, wherein the reinforcement learning local layout comprises:

s3-1, inputting macro cell sequence H = { S to layout area_iI =1,.. Cndot.N } and its attached standard cell cluster B_i＝{b_i1，b_i2，…，b_inAnd randomly dispersing in a form of aggregate clusters;

s3-2, for each macrocell S_iAnd randomly placed attached standard cell cluster B_iUsing the local layout model of the electrostatic system to perform initial layout to make the attached standard cell cluster B_iPerforming a distributed movement so that macro-cells S_iAnd attached standard cell cluster B_iOverall electrostatic equilibrium is formedInitial local layout information sequence state S;

s3-3, extracting characteristic information from an initial layout information state S obtained after the initial layout of the local layout model of the electrostatic system, enabling the characteristic information to be phi (S), inputting the phi (S) into an Actor-Critic reinforcement learning network, obtaining an optimal layout strategy through network training, outputting an optimal initial local layout according to the optimal layout strategy, and outputting an Actor network parameter theta and a Critic parameter omega corresponding to the optimal strategy;

H_N＝{S_N1，S_N2，..，S_Nn}

wherein S is_N＝{L_N，W_N，P_N}，L_NTo update the length of the module, W_NTo update the width of the module, P_NIs the location information of the module.

4. The deep reinforcement learning-based chip global automatic layout method according to claim 3, wherein in the step S3-3, the specific setting and steps are as follows:

s3-3-1, markov decision:

2) Action set A: a set of actions that all standard cells may take;

3) Attenuation factor γ: setting gamma to be 1, and indicating that all subsequent states are consistent with the current reward;

4) The exploration rate belongs to the group: performing value iteration by using an element-greedy method, namely setting a smaller element value, greedily selecting the behavior which is considered to be the maximum behavior value at present by using a probability of 1-element, and randomly selecting the behavior from all m selectable behaviors by using the element probability; is formulated as:

wherein a represents an action and s represents a state;

s3-3-2, constraint setting;

s3-3-3, setting a loss function;

s3-3-4, updating the network parameters to obtain the Actor network parameters theta, critic network parameters omega and strategy gradient estimation

5. The deep reinforcement learning-based chip global automatic layout method according to claim 4, wherein the constraint setting comprises:

1) Wire length constraint:

HPWL(i)＝(max_b∈i{x_b}-min_b∈i{x_b})+(max_b∈i{y_b}-min_b∈i{y_b})

one of the goals is to make HPWL as small as possible, netlist denotes net;

2) And (3) congestion constraint:

by using maximumThe overflow mode is used as congestion measurement to evaluate whether the layout can be distributed; the maximum overflow mode is expressed as: OF (e) = max (ω)_e+b_e-c_e0); in order to make overflow of the grid boundary easily absorbed by the adjacent area and ensure routability of the design, the following congestion evaluation formula is used:

congestion(e)＝100×(ω_e+b_e)/c_e

Wherein L is_NAnd W_NThe following rules are satisfied:

max(L_N，W_N)≤min(L，W)

the space utilization function is:

6. The deep reinforcement learning-based chip global automatic layout method according to claim 5, wherein the setting of the loss function comprises:

1) The reward function R sets: the total line length, the congestion degree and the waste rate are weighted and summed to form a single-target reward function, wherein the weighting factor lambda is₁And λ₂Mainly used for balancing the influence of three indexes, the reward function for policy network optimization is as follows

R＝-Wirelength-λ₁Congestion+λ₂F

S.t.minS≤S_N≤maxS

2) Setting a loss function:

The gradient derived for θ for this equation is as follows:

estimating a strategy gradient for optimization;

for the scoring function, the direction of updating the parameters is indicated, which uses the Softemax policy function, and the linear combination of the characteristics phi (s, a) describing the state and behavior and the parameters theta is used to weigh the probability of an action occurring, namely:

the score function by derivation is:

7. the deep reinforcement learning-based chip global automatic layout method according to claim 6, wherein the step S3-3-4 comprises:

the updating process comprises the following steps:

a5, calculating a TD error δ = R + γ V (S') -V (S);

a6, V and Q conversion:

a7, calculating strategy gradient estimation:

a9, updating an Actor network parameter theta:

8. The deep reinforcement learning-based chip global automatic layout method according to claim 7, wherein the step S5 comprises:

s5-1, designing two-layer network structures which are a public network and a local network respectively; the public network comprises the functions of an Actor network and a Critic network;

s5-2, calculating gradient estimation of each local layout

s5-4, updating the macro module H through the optimal strategy_NPerforming layout, and finally finishing the global automatic layout, wherein the updated module information sequence is H_NNAnd outputs global automatic layout information H_NN。

9. The deep reinforcement learning-based chip global automatic layout method according to claim 8, wherein the step S6 comprises:

s6-1, and arranging global automatic layout information H_NNInputting the data into a force guidance method resolver;

s6-2, using a force guiding method to enable the discrete standard unit cluster B = { B = { B }₁，b₂，...，b_nFill up, make the discrete standard cell b by the continuous action of attraction and repulsion_iAfter moving continuously, the movement tends to be balanced until relative displacement does not occur any more, energy is consumed continuously, and finally the movement tends to zero;

and S6-3, outputting the optimal global automatic layout effect of the chip.