CN114372438A - Chip macro-unit layout method and system based on lightweight deep reinforcement learning - Google Patents

Chip macro-unit layout method and system based on lightweight deep reinforcement learning Download PDF

Info

Publication number
CN114372438A
CN114372438A CN202210030064.0A CN202210030064A CN114372438A CN 114372438 A CN114372438 A CN 114372438A CN 202210030064 A CN202210030064 A CN 202210030064A CN 114372438 A CN114372438 A CN 114372438A
Authority
CN
China
Prior art keywords
network
strategy
chip
sub
lightweight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210030064.0A
Other languages
Chinese (zh)
Other versions
CN114372438B (en
Inventor
李珍妮
谢胜利
王名为
元荣
凌家城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210030064.0A priority Critical patent/CN114372438B/en
Publication of CN114372438A publication Critical patent/CN114372438A/en
Application granted granted Critical
Publication of CN114372438B publication Critical patent/CN114372438B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/392Floor-planning or layout, e.g. partitioning or placement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/398Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/06Structured ASICs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention relates to a chip macro-unit layout method and a chip macro-unit layout system based on lightweight deep reinforcement learning, wherein a strategy network is divided into a plurality of mutually independent sub-networks according to channels, so that a new idea of multi-channel multi-layer structured pruning is provided for the lightweight of the strategy network, and a method is provided for the strategy network to perform block processing on data in the future; by introducing groups in the objective function of the policy network
Figure DDA0003465999130000011
The regularizer performs sparse constraint in and among groups on weight parameters of the sub-network, and performs pruning compression on a sparse strategy network, so that gradient calculation caused by some unimportant input data can be better eliminated, the problem of network weight parameter redundancy is solved, waste of storage resources and calculation resources in a chip macro unit layout process in a chip layout method based on deep reinforcement learning is reduced, requirements of the chip macro unit layout process on hardware equipment are reduced, and the updating development of hardware design is promoted.

Description

Chip macro-unit layout method and system based on lightweight deep reinforcement learning
Technical Field
The invention relates to the field of machine learning and the field of chip layout, in particular to a chip macro-unit layout method and system based on lightweight deep reinforcement learning.
Background
The birth of a chip, i.e., a carrier of an integrated circuit, requires four important processes of design, manufacture, packaging, and testing. The obvious progress of the chip promotes the rapid development of a plurality of fields such as new energy automobiles, internet of things, artificial intelligence, edge computing and the like, however, as a scientific and technological big country, the demand of the chip is the first in the world, and the self-supply rate of the chip made in China is less than 10%. Therefore, the domestic chip is vigorously developed, the domestic substitution of most commercial chips is realized, the transformation and upgrade of the manufacturing industry in China are further promoted, and the method is a necessary way for China to realize the strong science and technology. However, current chip design processes tend to take years, again with the most complicated and time consuming chip layout phase of mapping a netlist containing macro and standard cell information onto a chip canvas. The complexity of the chip layout derives mainly from three aspects: the size of the netlist, the granularity of the grid on which the chip is drawn, and the computational cost of the true target index are prohibitive (evaluation using industry standard EDA tools takes several hours or even more than one day). Despite decades of research into the chip layout problem, experts still take weeks of iteration to generate a layout solution that meets all aspects of design criteria using existing chip layout tools.
Recently, google proposed a chip layout method based on deep reinforcement learning, aiming to quickly map a netlist containing macro cells and standard cells onto a chip canvas while optimizing power consumption, performance, and area (PPA) while observing the conditional constraints of placement density and routing congestion. Google considers the chip layout as a reinforcement learning problem, and optimizes the chip layout problem by training a deep reinforcement learning network. Experimental results show that compared with the most advanced reference model, the method can realize more excellent PPA on the TPU of Google. More importantly, it can generate a chip layout that is superior or comparable to the chip designer design of the human profession within 6 hours.
However, the chip layout environment is complex, and the chip layout method based on deep reinforcement learning needs to train a huge redundant deconvolution network as a strategy network to generate an optimal layout strategy for the chip macro unit. This results in huge storage and computation resources occupied by the training of the policy network and the generation of the chip macro-cell layout policy, which puts high demands on hardware devices.
Therefore, the deep reinforcement learning network is light, the requirements of a chip macro unit layout process in the chip layout method based on the deep reinforcement learning on hardware equipment are reduced, the updating development of hardware design is promoted, and the method has a wide application scene in the field of artificial intelligence chip layout.
Disclosure of Invention
The invention aims to provide a chip macro-unit layout method and a chip macro-unit layout system based on light-weight deep reinforcement learning, which reduce the requirements of a chip macro-unit layout process in the chip layout method based on deep reinforcement learning on hardware equipment by using a light-weight deep reinforcement learning network and promote the updating and development of hardware design.
In order to achieve the purpose, the invention provides the following scheme:
a chip macro-cell layout method based on lightweight deep reinforcement learning comprises the following steps:
generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value networkComplexing; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in group
Figure BDA0003465999110000021
The regulon is obtained by training a deconvolution network through pruning operation and compression operation;
taking the three-dimensional state space as input, and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
A chip macro cell layout system based on lightweight deep reinforcement learning comprises:
the data acquisition module is used for generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
the model training module is used for training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in group
Figure BDA0003465999110000022
The regulon is obtained by training a deconvolution network through pruning operation and compression operation;
the strategy generation module is used for taking the three-dimensional state space as input and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and the mapping module is used for guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a chip macro-unit layout method and a chip macro-unit layout system based on lightweight deep reinforcement learning, wherein a strategy network is divided into a plurality of mutually independent sub-networks according to channels, so that a new idea of multi-channel multi-layer structured pruning is provided for the lightweight of the strategy network, and a method is provided for the strategy network to perform block processing on data in the future; by introducing groups in the objective function of the policy network
Figure BDA0003465999110000031
The regularizer performs sparse constraint in and among groups on weight parameters of the sub-network, and performs pruning compression on a sparse strategy network, so that gradient calculation caused by some unimportant input data can be better eliminated, the problem of network weight parameter redundancy is solved, waste of storage resources and calculation resources in a chip macro unit layout process in a chip layout method based on deep reinforcement learning is reduced, requirements of the chip macro unit layout process on hardware equipment are reduced, and the updating development of hardware design is promoted.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a chip macro-cell layout method based on lightweight deep reinforcement learning according to embodiment 1 of the present invention;
FIG. 2 is a structural view of an embedding layer in embodiment 1 of the present invention;
fig. 3 is a schematic diagram of a training process of a lightweight deep reinforcement learning network in embodiment 1 of the present invention;
fig. 4 is a diagram of a physical model structure of a second policy network in embodiment 1 of the present invention;
fig. 5 is a structural diagram of a chip macro-cell layout system based on lightweight deep reinforcement learning according to embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a chip macro-unit layout method and a chip macro-unit layout system based on light-weight deep reinforcement learning, which reduce the requirements of a chip macro-unit layout process in the chip layout method based on deep reinforcement learning on hardware equipment by using a light-weight deep reinforcement learning network and promote the updating and development of hardware design.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example 1:
a chip layout method based on deep reinforcement learning is provided by Google and specifically comprises the following two steps: firstly, a Value Network (Value Network) guides the training of a policy Network (policy Network), so that the policy Network gives the optimal layout policy of the current macro units, and then the trained policy Network guides all the macro units of a chip to be sequentially placed according to the size sequence; and secondly, after the layout of all macro cells is finished, finishing the layout of the standard cells by a force guiding method, thereby finishing the mapping from the netlist to the canvas of the chip. The method is the first placement layout of the chip with generalization capability, which can learn from the previous netlist layout and serve the new netlist layout, which enables the strategy network to generate the optimal layout strategy for the chip faster and better over time. However, the chip layout method based on deep reinforcement learning needs to train a huge redundant deconvolution network as a strategy network, which results in that the training of the strategy network and the generation of the chip macro-unit layout strategy occupy huge storage resources and calculation resources, and have high requirements on hardware devices.
In contrast, referring to fig. 1, the embodiment provides a chip macro cell layout method based on lightweight deep reinforcement learning, so as to reduce the requirements of a chip macro cell layout process on hardware devices and promote the update and development of hardware design by using a lightweight deep reinforcement learning network. The method comprises the following steps:
s1: generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the net list information of the chip includes a net list map and net list metadata.
Constructing a new neural network architecture as an embedded layer, and encoding a netlist graph, node features and information of a current macro to be placed of a chip to generate a three-dimensional state space, as shown in fig. 2, specifically comprising:
(1) inputting the macro unit features and the net list graph into a graph neural network, and generating macro unit embedding and edge embedding through graph convolution operation;
(2) inputting the network list metadata into a fully connected network to obtain network list metadata embedding;
(3) reducing the average value of the edge embedding to obtain graph embedding;
(4) embedding and fusing the current macro unit information and the macro unit to obtain the current macro unit embedding;
(5) the network list metadata embedding, the graph embedding and the current macro unit embedding are input into the fully-connected network to obtain the current three-dimensional state space St
S2: training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in group
Figure BDA0003465999110000051
And training a deconvolution network through pruning operation and compression operation to obtain the regulon.
As shown in fig. 3, the specific training process of the lightweight deep reinforcement learning network includes:
(1) initializing a deconvolution network based on a reinforcement learning structure to obtain a first deep reinforcement learning network, wherein the first deep reinforcement learning network comprises a first strategy network and a value network;
(2) carrying out multi-channel multi-layer structural processing on the first strategy network to obtain a second strategy network;
(3) introducing groups in an objective function of the second policy network
Figure BDA0003465999110000052
The regularizer performs intra-group and inter-group sparse constraint on the weight parameters of the sub-network to obtain a sparse strategy network;
(4) and pruning and compressing the sparse strategy network to obtain the lightweight strategy network.
In order to make the specific processes of (1) to (4) more clearly understood by those skilled in the art, the following description is made specifically.
1. First strategy network for constructing self-learning chip macro-unit layout based on reinforcement learning
In the embodiment, a deconvolution network is adopted as a first strategy network, and the relationship between adjacent elements in the input three-dimensional state matrix is fully utilized, so that the input three-dimensional state space can output the optimal layout strategy of the two-dimensional chip macro unit through the first strategy network. The deconvolution network is composed of an input layer, a deconvolution layer and an output layer. Similar to the convolutional network, the input layer of the deconvolution network uses a non-fully-linked mode to realize data input, the output layer uses a fully-connected mode to realize data output, and one or more deconvolution layers for deconvolution are arranged between the input layer and the output layer.
Assume a first policy network inputs a data matrix of
Figure BDA0003465999110000053
The output data matrix is
Figure BDA0003465999110000054
Figure BDA0003465999110000055
The input layer, the deconvolution layer, and the output layer include a preset number of channels (i.e., convolution kernels used per layer). The physical model of the first policy network is shown in FIG. 4, where the network layer in the first policy network is denoted L (L)kRepresenting the kth network layer), the channel is represented as θ. Stored in the input layer
Figure BDA0003465999110000056
The input matrix Y (width 8, height 8, depth 16) is input to the first deconvolution layer in a one-to-one correspondence via non-full links
Figure BDA0003465999110000057
Then enters a second layer of deconvolution layer through 4 channels
Figure BDA0003465999110000061
Then enters the output layer through 4 channels
Figure BDA0003465999110000062
Figure BDA0003465999110000063
And the output layer obtains an output matrix X through full connection operation.
2. Carrying out multi-channel and multi-layer structured preprocessing on the first policy network to obtain a second policy network
In the first policy network, the sizes of the corresponding channels of the network layers are not consistent due to the different sizes of the network layers. Therefore, the width and the height of the channel can be set according to experience when the first strategy network is constructed. However, the depth of a channel must be the depth of its corresponding network layer
Figure BDA0003465999110000064
Thus, unlike a fully connected network, in a first policy network, each channel is connected only to its corresponding network layer portion element, and each element of the network layer is connected only to one channel. That is, assuming that the number of channels of the first policy network is 4, a square formed by the height and width of the network layer is taken as a cross section, the network layer is divided into 4 blocks with the same size according to the number of channels, then deconvolution operation is performed through the corresponding channels respectively, then the obtained result is averaged through the activation function, and then the input of the next network layer can be obtained, for example, as shown by the dotted line mark of fig. 4, the deconvolution layer L of the second layer is shown as the second layer L3The first data matrix is composed of a first layer of deconvolution layer L2The first data matrix in the 4 blocks is obtained by averaging after deconvolution operation is respectively carried out. Similarly, the third layer of deconvolution layer L3The remaining 3 data matrices are also formed by the second deconvolution layer L2The 4 data matrixes corresponding to the 4 squares in the block are obtained by averaging after deconvolution operation.
According to the characteristic that each element in the first policy network is connected with only one channel, the embodiment divides the first policy network into a plurality of mutually independent sub-networks according to the channels to obtain the second policy network. Referring to fig. 4, the first policy network may be divided into 4 mutually independent sub-networks by the number of channels, looking from the output layer of the first policy network to its input layer. Input Y of the 4 sub-networks1,Y2,Y3,Y4Determined by the last deconvolution layer of the first policy network, in particular, in the first policy network of fig. 4, the third deconvolution layer L3Input first data matrix
Figure BDA0003465999110000065
Is composed of a second deconvolution layer L2Corresponding first data matrix in the 4 channels of
Figure BDA0003465999110000066
And averaging after deconvolution operation. Due to the input layer L1The neuron and the second layer deconvolution layer L2The spirit ofThe warp elements are in one-to-one correspondence, then
Figure BDA0003465999110000067
Figure BDA0003465999110000068
These 4 data matrices form the input Y of the first subnetwork1Input Y of the remaining three sub-networks2,Y3,Y4Also obtained by this process. Therefore, the input data of the 4 sub-networks are completely different and have the same size. Respectively carrying out deconvolution operation on 4 groups of input data with different data and the same size on mutually independent sub-networks, and finally respectively outputting X with the size consistent with that of X on an output layer1,X2,X3,X4And recovers the output X of the first policy network by taking the average of these four output data.
3. Introducing groups in an objective function of the second policy network
Figure BDA0003465999110000071
The regularizer carries out intra-group and inter-group sparse constraint on the weight parameters of the sub-networks to obtain a sparse strategy network
(1) Constructing value network objective functions
The current state of the agent in the environment is StPerforming action a in the current statetThe win environment gives a reward R for the action, with a discount rate of γ. The agent transitions to the next state St+1Then, the next action a is executedt+1
Constructing a value network function V (S, W) to approximate a state StThe first value below is the value V, where W represents the weight parameter of the value network. Then, the timing difference error δ (TD-error) can be expressed as:
δ=R+γV(St+1,W)-V(St,W)
the value network updates the network parameters by minimizing the TD-error, so the objective function of the value network can be obtained by solving the expectation of the square of the TD-error, which specifically includes:
Figure BDA0003465999110000072
wherein E (. circle.) represents expectation.
(2) Constructing an objective function for a policy network
Constructing a second policy network function pi (a)t|St) Wherein S istRepresenting the current state of the agent in the environment, atRepresenting actions that the agent may perform in the current state. In the chip layout method based on deep reinforcement learning, a Proximal Policy Optimization (PPO) algorithm is adopted to construct an objective function of a second Policy network:
Figure BDA0003465999110000073
wherein, theta represents the weight parameter of the strategy network,
Figure BDA0003465999110000074
representing the probability ratio between the old and new policy network functions,
Figure BDA0003465999110000075
the merit function (TD-error can be used instead) is expressed.
In order to implement effective pruning of the policy network, it is necessary to implement thinning of the weight parameters within and between the sub-network groups. To this end, the present embodiment introduces groups
Figure BDA0003465999110000076
Regularizer, adding groups of weight parameters of sub-networks of the second policy network
Figure BDA0003465999110000077
Regularizers perform intra-group and inter-group sparsity constraints. Due to policy network passing maximization
Figure BDA0003465999110000078
Network parameters are updated, so that a theta sparse regular term is negated here, and a sparsification strategy network objective function is obtained, which is specifically as follows:
Figure BDA0003465999110000081
wherein the content of the first and second substances,
Figure BDA0003465999110000082
representing an objective function of the sparsification strategy network; α > 0 represents a regularization term parameter; i | · | purple wind1To represent
Figure BDA0003465999110000083
A regularization sub;
Figure BDA0003465999110000084
representing a weight parameter matrix of the nth layer of the mth sub-network; m represents the total number of sub-networks; n denotes the total number of network layers in the sub-network.
Although groups are introduced on the objective function of the second policy network
Figure BDA0003465999110000085
The regularizer realizes intra-group and inter-group sparsity constraints on weight parameters of the sub-network, but because the optimization problem of the objective function is still a convex optimization problem, the Adam algorithm needs to be directly used for further processing the objective function of the value network and the objective function of the sparsification strategy network, so that the alternate update of the weight parameters is realized, and the method specifically comprises the following steps:
(1) optimization of value network objective function
Since the objective function of the value network is a convex function, the Adam algorithm can be directly used for direct optimization. Firstly, the target function is derived to obtain the gradient g updated by the t iterationt(θ); then, using gt(theta) solving for a first order estimate mtAnd a second order estimate vt
mt=β1mt-1+(1-β1)gt(W)
Figure BDA0003465999110000086
Wherein, beta1And beta2Respectively representing first order estimates mtAnd a second order estimate vtAttenuation coefficient of (d), mt-1And vt-1First and second order estimates at the t-1 th iterative update, respectively. Using mtAnd vtRespectively calculate their offset corrections
Figure BDA0003465999110000087
To know
Figure BDA0003465999110000088
Figure BDA0003465999110000089
Figure BDA00034659991100000810
Further, a calculation formula of the value network is obtained:
Figure BDA00034659991100000811
wherein alpha isWRepresenting a learning rate for controlling a step size; ε represents a numerical calculation stability parameter to prevent the denominator from being 0.
(2) Optimization of objective function for sparsification policy network
Since the optimization problem of the objective function of the sparsification strategy network is still a convex optimization problem, the Adam algorithm can be directly used for updating the weight parameters of the objective function. Similarly, directly deriving the objective function of the sparse strategy network to obtain the gradient g updated by the t iterationt(theta) thenBy gt(theta) solving for a first order estimate mtAnd a second order estimate vtAnd calculate a first order estimate mtAnd a second order estimate vtCorrection of deviation of
Figure BDA0003465999110000091
To know
Figure BDA0003465999110000092
And further obtaining a calculation formula of the policy network:
Figure BDA0003465999110000093
wherein alpha isθRepresenting a learning rate for controlling a step size; ε represents a numerical calculation stability parameter to prevent the denominator from being zero.
4. Pruning and compressing the sparse strategy network to obtain the lightweight strategy network
After the sparse strategy network is obtained, pruning, compression and fine adjustment can be carried out on the strategy network, so that the light weight of the strategy network is realized. In this embodiment, the threshold for limiting the number of subnetworks is set to TpThe pruning threshold of the weight parameter is TθAfter a certain number of iterative updates are performed, pruning of the policy network is started. In particular, if the number of subnetworks is greater than TpAnd the weight parameter matrix theta of the mth sub-networkmExpected value of E [ theta ]m]Satisfies the following conditions:
|E[θm]|<Tθ
namely, setting the weight parameter of the sub-network to zero, and completing the pruning operation of the sparse strategy network.
And then, removing redundant weight parameters in the pruned strategy network to obtain a non-redundant strategy network, compressing the non-redundant strategy network, and regenerating a brand-new lightweight strategy network. And finally, keeping the objective functions of the lightweight strategy network and the value network unchanged, inputting the current state space, and performing fine adjustment training on the deep reinforcement learning network to enable the deep reinforcement learning network to be converged again, so that the final lightweight deep reinforcement learning network is obtained.
And guiding the lightweight strategy network to train by using the value network, and specifically comprising the following steps:
the current three-dimensional state space StInputting the data into a trained lightweight strategy network through a full connection layer, and generating the probability distribution of the available position of the current macro unit (namely the action space a of the current macro unit) by the lightweight strategy networkt) And in the motion space atRandomly sampling an action execution to obtain the next three-dimensional state space St+1
The current three-dimensional state space StAnd the next three-dimensional state space St+1Inputting the values into a value network to respectively obtain first value values V (S) of the two state spacestW) and a second value V (S)t+1W), adding an incentive value R given by an external environment, and calculating to obtain a time sequence differential error TD-error; and replacing the TD-error with an advantage function in an objective function of the lightweight strategy network, constructing the objective function of the sparse strategy network by using a PPO algorithm, calculating the gradient of the objective function, updating weight parameters of the lightweight strategy network, and guiding the lightweight strategy network to train.
In addition, the TD-error can be used for constructing an objective function of the value network, then the gradient of the objective function is calculated, and the weight parameter of the value network is updated.
S3: taking the three-dimensional state space as input, and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
specifically, the current macro-unit information is changed one by one to change the input of the lightweight strategy network, so that the optimal layout strategy of all the chip macro-units is obtained, and the layout of the chip macro-units is guided.
S4: and guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
The embodiment constructs a new neural network architecture as an embedded layer of a strategy-value network, and carries out mapping on a network table graph of a chip, node characteristics and information of a current macro to be placedAnd encoding to generate a three-dimensional state space. After obtaining the state space, by using the group-based
Figure BDA0003465999110000101
The strategy network is subjected to light weight processing by a regular sub-multi-channel multi-layer deconvolution network pruning technology, the strategy network and the value network are trained, probability distribution of the available position of the current macro unit and reward estimation of the position of the current macro unit are respectively output by utilizing the strategy network and the value network, the three-dimensional state space is input into the trained light weight deep reinforcement learning network, the optimal layout strategy of the chip macro unit is output under the condition of occupying less storage resources and calculation resources, the chip macro unit is guided to be mapped onto a chip canvas one by one according to the size sequence, and the calculation amount of the strategy network for generating the placement strategy for the chip macro unit is reduced.
In the embodiment, the policy network is divided into a plurality of mutually independent sub-networks according to the channels, so that a new idea of multi-channel multi-layer structured pruning is provided for the light weight of the policy network, and a method is provided for the block processing of data by the policy network in the future; by introducing groups in the policy network objective function
Figure BDA0003465999110000102
The regularizer performs sparse constraint on weight parameters of the strategy network sub-networks in and among groups, and performs pruning compression on the sparse strategy network, so as to realize a self-learning chip macro-unit layout method based on lightweight deep reinforcement learning, better eliminate gradient calculation brought by some unimportant input data, solve the problem of network weight parameter redundancy, and reduce the waste of storage resources and calculation resources in the chip macro-unit layout process in the chip layout method based on deep reinforcement learning.
Example 2
Referring to fig. 5, the embodiment provides a chip macro cell layout system based on lightweight deep reinforcement learning, including:
the data acquisition module M1 is used for generating a three-dimensional state space according to the macro cell characteristics and the net list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
the model training module M2 is used for training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in group
Figure BDA0003465999110000111
The regulon is obtained by training a deconvolution network through pruning operation and compression operation;
the strategy generation module M3 is used for taking the three-dimensional state space as input and outputting the optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and the mapping module M4 is used for guiding the macro units to be mapped onto the chip canvas one by one according to the optimal layout strategy.
The emphasis of each embodiment in the present specification is on the difference from the other embodiments, and the same and similar parts among the various embodiments may be referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (10)

1. A chip macro-cell layout method based on lightweight deep reinforcement learning is characterized by comprising the following steps:
generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in group
Figure FDA0003465999100000011
The regulon is obtained by training a deconvolution network through pruning operation and compression operation;
taking the three-dimensional state space as input, and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
2. The chip macro-cell layout method based on light-weight deep reinforcement learning according to claim 1, wherein the generating of the three-dimensional state space according to macro-cell features and net list information of the chip specifically comprises:
inputting the macro unit features and the net list graph into a graph neural network, and generating macro unit embedding and edge embedding through graph convolution operation;
inputting the network list metadata into a fully connected network to obtain network list metadata embedding;
reducing the average value of the edge embedding to obtain graph embedding;
embedding and fusing the current macro unit information and the macro unit to obtain the current macro unit embedding;
and embedding the network list metadata, the graph and the current macro unit into the fully-connected network to obtain the current three-dimensional state space.
3. The chip macro cell layout method based on light weight deep reinforcement learning according to claim 1, wherein the guiding the light weight strategy network to train by using the value network specifically comprises:
inputting the current three-dimensional state space into the lightweight strategy network to obtain an action space of the current macro unit;
randomly extracting an action from the current action space and executing the action to obtain a next three-dimensional state space;
inputting the current three-dimensional state space and the next three-dimensional state space into the value network to obtain a first value and a second value;
obtaining a time sequence difference error according to the first value, the second value and an incentive value of an external environment to the current action;
replacing the timing difference error with a merit function in an objective function of the lightweight policy network;
and guiding the lightweight strategy network training according to the replaced objective function.
4. The method for chip macro-cell layout based on light-weight deep reinforcement learning of claim 1, wherein the light-weight strategy network comprises a plurality of sub-networks, and the sub-networks are grouped by introduction
Figure FDA0003465999100000022
The regularization element is obtained by training a deconvolution network through pruning operation and compression operation, and specifically comprises the following steps:
initializing a deconvolution network based on a reinforcement learning structure to obtain a first strategy network;
carrying out multi-channel multi-layer structural processing on the first strategy network to obtain a second strategy network;
introducing groups in an objective function of the second policy network
Figure FDA0003465999100000021
Regulon, weight parameter to the sub-networkPerforming intra-group and inter-group sparse constraint on the numbers to obtain a sparse strategy network;
and pruning and compressing the sparse strategy network to obtain the lightweight strategy network.
5. The method of claim 4, wherein the chip macro cell layout method based on light weight deep reinforcement learning,
the first policy network comprises a network layer comprising an input layer, an output layer, and at least one deconvolution layer located between the input layer and the output layer;
each network layer comprises a preset number of channels;
the depth of each channel is 1/channel number of the depth of the corresponding network layer.
6. The chip macro cell layout method based on light-weight deep reinforcement learning according to claim 5, wherein the performing multi-channel multi-layer structurization processing on the first policy network to obtain a second policy network specifically comprises:
correspondingly dividing each layer network layer into a plurality of areas according to the channels;
forming a plurality of mutually independent sub-networks according to corresponding areas in each network layer; wherein the input data of each sub-network is determined by the input data of the corresponding channel in the last deconvolution layer of the first policy network;
and taking a first policy network divided into a plurality of sub-networks as the second policy network.
7. The method of claim 4, wherein the introducing of groups into the objective function of the second policy network is performed by using a light weight deep reinforcement learning-based chip macro cell layout method
Figure FDA0003465999100000035
Regularizer for performing parameter interpolation on the weight parameters of the sub-networkObtaining a sparse strategy network by row group and inter-group sparse constraints, specifically comprising:
Figure FDA0003465999100000031
wherein the content of the first and second substances,
Figure FDA0003465999100000032
representing an objective function of the sparsification strategy network; α denotes the regularization term parameter, α>0;||·||1To represent
Figure FDA0003465999100000033
A regularization sub;
Figure FDA0003465999100000034
representing a weight parameter matrix of the nth layer of the mth sub-network; m represents the total number of sub-networks; n denotes the total number of network layers in the sub-network.
8. The method for chip macro cell layout based on light-weight deep reinforcement learning of claim 4, wherein a group is introduced into an objective function of the second policy network
Figure FDA0003465999100000036
And the regularizer performs parameter intra-group and inter-group sparse constraint on the weight parameters of the sub-network to obtain a sparse strategy network, and then optimizes the weight parameters of the value network and the weight parameters of the sparse strategy network through an Adam algorithm.
9. The chip macro-cell layout method based on light-weight deep reinforcement learning according to claim 4, wherein the pruning and compressing operations on the sparse strategy network to obtain the light-weight strategy network specifically include:
setting a sub-network threshold value and a pruning threshold value of the weight parameter of the sub-network;
simultaneously judging the number of the sub-networks and the size of the sub-network threshold value as well as the weight parameter of the sub-networks and the size of the pruning threshold value;
if the number of the sub-networks is larger than the sub-network threshold value and the expected value of the weight parameter of the sub-network is smaller than or equal to the pruning threshold value, setting the current weight parameter of the sub-network to zero to complete the pruning operation of the sparse strategy network;
removing the redundancy weight parameters in the pruned sparse strategy network to obtain a non-redundancy strategy network;
compressing the non-redundant strategy network, and adjusting the non-redundant strategy network to be convergent to obtain the lightweight strategy network.
10. A chip macro cell layout system based on lightweight deep reinforcement learning is characterized by comprising:
the data acquisition module is used for generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
the model training module is used for training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in group
Figure FDA0003465999100000041
The regulon is obtained by training a deconvolution network through pruning operation and compression operation;
the strategy generation module is used for taking the three-dimensional state space as input and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and the mapping module is used for guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
CN202210030064.0A 2022-01-12 2022-01-12 Chip macro-unit layout method and system based on lightweight deep reinforcement learning Active CN114372438B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210030064.0A CN114372438B (en) 2022-01-12 2022-01-12 Chip macro-unit layout method and system based on lightweight deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210030064.0A CN114372438B (en) 2022-01-12 2022-01-12 Chip macro-unit layout method and system based on lightweight deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN114372438A true CN114372438A (en) 2022-04-19
CN114372438B CN114372438B (en) 2023-04-07

Family

ID=81144202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210030064.0A Active CN114372438B (en) 2022-01-12 2022-01-12 Chip macro-unit layout method and system based on lightweight deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN114372438B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115270686A (en) * 2022-06-24 2022-11-01 无锡芯光互连技术研究院有限公司 Chip layout method based on graph neural network
CN115828831A (en) * 2023-02-14 2023-03-21 之江实验室 Multi-core chip operator placement strategy generation method based on deep reinforcement learning
CN116562218A (en) * 2023-05-05 2023-08-08 之江实验室 Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning
CN117829085A (en) * 2024-03-04 2024-04-05 中国科学技术大学 Connection diagram generation method suitable for chip wiring
CN117829085B (en) * 2024-03-04 2024-05-17 中国科学技术大学 Connection diagram generation method suitable for chip wiring

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909728A (en) * 2017-02-21 2017-06-30 电子科技大学 A kind of FPGA interconnection resources configuration generating methods based on enhancing study
US20200067637A1 (en) * 2018-08-21 2020-02-27 The George Washington University Learning-based high-performance, energy-efficient, fault-tolerant on-chip communication design framework
CN111105035A (en) * 2019-12-24 2020-05-05 西安电子科技大学 Neural network pruning method based on combination of sparse learning and genetic algorithm
CN113505210A (en) * 2021-07-12 2021-10-15 广东工业大学 Medical question-answer generating system based on lightweight Actor-Critic generating type confrontation network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909728A (en) * 2017-02-21 2017-06-30 电子科技大学 A kind of FPGA interconnection resources configuration generating methods based on enhancing study
US20200067637A1 (en) * 2018-08-21 2020-02-27 The George Washington University Learning-based high-performance, energy-efficient, fault-tolerant on-chip communication design framework
CN111105035A (en) * 2019-12-24 2020-05-05 西安电子科技大学 Neural network pruning method based on combination of sparse learning and genetic algorithm
CN113505210A (en) * 2021-07-12 2021-10-15 广东工业大学 Medical question-answer generating system based on lightweight Actor-Critic generating type confrontation network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANNA GOLDIE等: "Placement Optimization with Deep Reinforcement Learning" *
邵伟平等: "MobileNet与YOLOv3的轻量化卷积神经网络设计" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115270686A (en) * 2022-06-24 2022-11-01 无锡芯光互连技术研究院有限公司 Chip layout method based on graph neural network
CN115828831A (en) * 2023-02-14 2023-03-21 之江实验室 Multi-core chip operator placement strategy generation method based on deep reinforcement learning
CN116562218A (en) * 2023-05-05 2023-08-08 之江实验室 Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning
CN116562218B (en) * 2023-05-05 2024-02-20 之江实验室 Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning
CN117829085A (en) * 2024-03-04 2024-04-05 中国科学技术大学 Connection diagram generation method suitable for chip wiring
CN117829085B (en) * 2024-03-04 2024-05-17 中国科学技术大学 Connection diagram generation method suitable for chip wiring

Also Published As

Publication number Publication date
CN114372438B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN114372438B (en) Chip macro-unit layout method and system based on lightweight deep reinforcement learning
CN114896937A (en) Integrated circuit layout optimization method based on reinforcement learning
CN108573303A (en) It is a kind of that recovery policy is improved based on the complex network local failure for improving intensified learning certainly
CN112488208B (en) Method for acquiring remaining life of island pillar insulator
CN112508192B (en) Increment heap width learning system with degree of depth structure
CN111859790A (en) Intelligent design method for curve reinforcement structure layout based on image feature learning
CN113708969B (en) Collaborative embedding method of cloud data center virtual network based on deep reinforcement learning
CN113344174A (en) Efficient neural network structure searching method based on probability distribution
CN111242285A (en) Deep learning model training method, system, device and storage medium
Zhang et al. Memory-efficient hierarchical neural architecture search for image restoration
CN113128617B (en) Spark and ASPSO based parallelization K-means optimization method
CN112836823B (en) Convolutional neural network back propagation mapping method based on cyclic recombination and blocking
CN112651488A (en) Method for improving training efficiency of large-scale graph convolution neural network
CN114841098A (en) Deep reinforcement learning Beidou navigation chip design method based on sparse representation driving
CN111488981A (en) Method for selecting sparse threshold of depth network parameter based on Gaussian distribution estimation
CN113052810B (en) Small medical image focus segmentation method suitable for mobile application
CN106897292A (en) A kind of internet data clustering method and system
CN107273970B (en) Reconfigurable platform of convolutional neural network supporting online learning and construction method thereof
CN111160557B (en) Knowledge representation learning method based on double-agent reinforcement learning path search
CN116822617A (en) Contrast learning training method and system based on structural heavy parameterization
CN112270353B (en) Clustering method for multi-target group evolution software module
Zhang et al. Crescendonet: A simple deep convolutional neural network with ensemble behavior
CN116451586A (en) Self-adaptive DNN compression method based on balance weight sparsity and Group Lasso regularization
CN113011589B (en) Co-evolution-based hyperspectral image band selection method and system
EP4040342A1 (en) Deep neutral network structure learning and simplifying method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant