CN114372438A - Chip macro-unit layout method and system based on lightweight deep reinforcement learning - Google Patents
Chip macro-unit layout method and system based on lightweight deep reinforcement learning Download PDFInfo
- Publication number
- CN114372438A CN114372438A CN202210030064.0A CN202210030064A CN114372438A CN 114372438 A CN114372438 A CN 114372438A CN 202210030064 A CN202210030064 A CN 202210030064A CN 114372438 A CN114372438 A CN 114372438A
- Authority
- CN
- China
- Prior art keywords
- network
- strategy
- chip
- sub
- lightweight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/392—Floor-planning or layout, e.g. partitioning or placement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/398—Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2115/00—Details relating to the type of the circuit
- G06F2115/06—Structured ASICs
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Abstract
The invention relates to a chip macro-unit layout method and a chip macro-unit layout system based on lightweight deep reinforcement learning, wherein a strategy network is divided into a plurality of mutually independent sub-networks according to channels, so that a new idea of multi-channel multi-layer structured pruning is provided for the lightweight of the strategy network, and a method is provided for the strategy network to perform block processing on data in the future; by introducing groups in the objective function of the policy networkThe regularizer performs sparse constraint in and among groups on weight parameters of the sub-network, and performs pruning compression on a sparse strategy network, so that gradient calculation caused by some unimportant input data can be better eliminated, the problem of network weight parameter redundancy is solved, waste of storage resources and calculation resources in a chip macro unit layout process in a chip layout method based on deep reinforcement learning is reduced, requirements of the chip macro unit layout process on hardware equipment are reduced, and the updating development of hardware design is promoted.
Description
Technical Field
The invention relates to the field of machine learning and the field of chip layout, in particular to a chip macro-unit layout method and system based on lightweight deep reinforcement learning.
Background
The birth of a chip, i.e., a carrier of an integrated circuit, requires four important processes of design, manufacture, packaging, and testing. The obvious progress of the chip promotes the rapid development of a plurality of fields such as new energy automobiles, internet of things, artificial intelligence, edge computing and the like, however, as a scientific and technological big country, the demand of the chip is the first in the world, and the self-supply rate of the chip made in China is less than 10%. Therefore, the domestic chip is vigorously developed, the domestic substitution of most commercial chips is realized, the transformation and upgrade of the manufacturing industry in China are further promoted, and the method is a necessary way for China to realize the strong science and technology. However, current chip design processes tend to take years, again with the most complicated and time consuming chip layout phase of mapping a netlist containing macro and standard cell information onto a chip canvas. The complexity of the chip layout derives mainly from three aspects: the size of the netlist, the granularity of the grid on which the chip is drawn, and the computational cost of the true target index are prohibitive (evaluation using industry standard EDA tools takes several hours or even more than one day). Despite decades of research into the chip layout problem, experts still take weeks of iteration to generate a layout solution that meets all aspects of design criteria using existing chip layout tools.
Recently, google proposed a chip layout method based on deep reinforcement learning, aiming to quickly map a netlist containing macro cells and standard cells onto a chip canvas while optimizing power consumption, performance, and area (PPA) while observing the conditional constraints of placement density and routing congestion. Google considers the chip layout as a reinforcement learning problem, and optimizes the chip layout problem by training a deep reinforcement learning network. Experimental results show that compared with the most advanced reference model, the method can realize more excellent PPA on the TPU of Google. More importantly, it can generate a chip layout that is superior or comparable to the chip designer design of the human profession within 6 hours.
However, the chip layout environment is complex, and the chip layout method based on deep reinforcement learning needs to train a huge redundant deconvolution network as a strategy network to generate an optimal layout strategy for the chip macro unit. This results in huge storage and computation resources occupied by the training of the policy network and the generation of the chip macro-cell layout policy, which puts high demands on hardware devices.
Therefore, the deep reinforcement learning network is light, the requirements of a chip macro unit layout process in the chip layout method based on the deep reinforcement learning on hardware equipment are reduced, the updating development of hardware design is promoted, and the method has a wide application scene in the field of artificial intelligence chip layout.
Disclosure of Invention
The invention aims to provide a chip macro-unit layout method and a chip macro-unit layout system based on light-weight deep reinforcement learning, which reduce the requirements of a chip macro-unit layout process in the chip layout method based on deep reinforcement learning on hardware equipment by using a light-weight deep reinforcement learning network and promote the updating and development of hardware design.
In order to achieve the purpose, the invention provides the following scheme:
a chip macro-cell layout method based on lightweight deep reinforcement learning comprises the following steps:
generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value networkComplexing; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in groupThe regulon is obtained by training a deconvolution network through pruning operation and compression operation;
taking the three-dimensional state space as input, and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
A chip macro cell layout system based on lightweight deep reinforcement learning comprises:
the data acquisition module is used for generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
the model training module is used for training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in groupThe regulon is obtained by training a deconvolution network through pruning operation and compression operation;
the strategy generation module is used for taking the three-dimensional state space as input and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and the mapping module is used for guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a chip macro-unit layout method and a chip macro-unit layout system based on lightweight deep reinforcement learning, wherein a strategy network is divided into a plurality of mutually independent sub-networks according to channels, so that a new idea of multi-channel multi-layer structured pruning is provided for the lightweight of the strategy network, and a method is provided for the strategy network to perform block processing on data in the future; by introducing groups in the objective function of the policy networkThe regularizer performs sparse constraint in and among groups on weight parameters of the sub-network, and performs pruning compression on a sparse strategy network, so that gradient calculation caused by some unimportant input data can be better eliminated, the problem of network weight parameter redundancy is solved, waste of storage resources and calculation resources in a chip macro unit layout process in a chip layout method based on deep reinforcement learning is reduced, requirements of the chip macro unit layout process on hardware equipment are reduced, and the updating development of hardware design is promoted.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a chip macro-cell layout method based on lightweight deep reinforcement learning according to embodiment 1 of the present invention;
FIG. 2 is a structural view of an embedding layer in embodiment 1 of the present invention;
fig. 3 is a schematic diagram of a training process of a lightweight deep reinforcement learning network in embodiment 1 of the present invention;
fig. 4 is a diagram of a physical model structure of a second policy network in embodiment 1 of the present invention;
fig. 5 is a structural diagram of a chip macro-cell layout system based on lightweight deep reinforcement learning according to embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a chip macro-unit layout method and a chip macro-unit layout system based on light-weight deep reinforcement learning, which reduce the requirements of a chip macro-unit layout process in the chip layout method based on deep reinforcement learning on hardware equipment by using a light-weight deep reinforcement learning network and promote the updating and development of hardware design.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example 1:
a chip layout method based on deep reinforcement learning is provided by Google and specifically comprises the following two steps: firstly, a Value Network (Value Network) guides the training of a policy Network (policy Network), so that the policy Network gives the optimal layout policy of the current macro units, and then the trained policy Network guides all the macro units of a chip to be sequentially placed according to the size sequence; and secondly, after the layout of all macro cells is finished, finishing the layout of the standard cells by a force guiding method, thereby finishing the mapping from the netlist to the canvas of the chip. The method is the first placement layout of the chip with generalization capability, which can learn from the previous netlist layout and serve the new netlist layout, which enables the strategy network to generate the optimal layout strategy for the chip faster and better over time. However, the chip layout method based on deep reinforcement learning needs to train a huge redundant deconvolution network as a strategy network, which results in that the training of the strategy network and the generation of the chip macro-unit layout strategy occupy huge storage resources and calculation resources, and have high requirements on hardware devices.
In contrast, referring to fig. 1, the embodiment provides a chip macro cell layout method based on lightweight deep reinforcement learning, so as to reduce the requirements of a chip macro cell layout process on hardware devices and promote the update and development of hardware design by using a lightweight deep reinforcement learning network. The method comprises the following steps:
s1: generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the net list information of the chip includes a net list map and net list metadata.
Constructing a new neural network architecture as an embedded layer, and encoding a netlist graph, node features and information of a current macro to be placed of a chip to generate a three-dimensional state space, as shown in fig. 2, specifically comprising:
(1) inputting the macro unit features and the net list graph into a graph neural network, and generating macro unit embedding and edge embedding through graph convolution operation;
(2) inputting the network list metadata into a fully connected network to obtain network list metadata embedding;
(3) reducing the average value of the edge embedding to obtain graph embedding;
(4) embedding and fusing the current macro unit information and the macro unit to obtain the current macro unit embedding;
(5) the network list metadata embedding, the graph embedding and the current macro unit embedding are input into the fully-connected network to obtain the current three-dimensional state space St。
S2: training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in groupAnd training a deconvolution network through pruning operation and compression operation to obtain the regulon.
As shown in fig. 3, the specific training process of the lightweight deep reinforcement learning network includes:
(1) initializing a deconvolution network based on a reinforcement learning structure to obtain a first deep reinforcement learning network, wherein the first deep reinforcement learning network comprises a first strategy network and a value network;
(2) carrying out multi-channel multi-layer structural processing on the first strategy network to obtain a second strategy network;
(3) introducing groups in an objective function of the second policy networkThe regularizer performs intra-group and inter-group sparse constraint on the weight parameters of the sub-network to obtain a sparse strategy network;
(4) and pruning and compressing the sparse strategy network to obtain the lightweight strategy network.
In order to make the specific processes of (1) to (4) more clearly understood by those skilled in the art, the following description is made specifically.
1. First strategy network for constructing self-learning chip macro-unit layout based on reinforcement learning
In the embodiment, a deconvolution network is adopted as a first strategy network, and the relationship between adjacent elements in the input three-dimensional state matrix is fully utilized, so that the input three-dimensional state space can output the optimal layout strategy of the two-dimensional chip macro unit through the first strategy network. The deconvolution network is composed of an input layer, a deconvolution layer and an output layer. Similar to the convolutional network, the input layer of the deconvolution network uses a non-fully-linked mode to realize data input, the output layer uses a fully-connected mode to realize data output, and one or more deconvolution layers for deconvolution are arranged between the input layer and the output layer.
Assume a first policy network inputs a data matrix ofThe output data matrix is The input layer, the deconvolution layer, and the output layer include a preset number of channels (i.e., convolution kernels used per layer). The physical model of the first policy network is shown in FIG. 4, where the network layer in the first policy network is denoted L (L)kRepresenting the kth network layer), the channel is represented as θ. Stored in the input layerThe input matrix Y (width 8, height 8, depth 16) is input to the first deconvolution layer in a one-to-one correspondence via non-full linksThen enters a second layer of deconvolution layer through 4 channelsThen enters the output layer through 4 channels And the output layer obtains an output matrix X through full connection operation.
2. Carrying out multi-channel and multi-layer structured preprocessing on the first policy network to obtain a second policy network
In the first policy network, the sizes of the corresponding channels of the network layers are not consistent due to the different sizes of the network layers. Therefore, the width and the height of the channel can be set according to experience when the first strategy network is constructed. However, the depth of a channel must be the depth of its corresponding network layerThus, unlike a fully connected network, in a first policy network, each channel is connected only to its corresponding network layer portion element, and each element of the network layer is connected only to one channel. That is, assuming that the number of channels of the first policy network is 4, a square formed by the height and width of the network layer is taken as a cross section, the network layer is divided into 4 blocks with the same size according to the number of channels, then deconvolution operation is performed through the corresponding channels respectively, then the obtained result is averaged through the activation function, and then the input of the next network layer can be obtained, for example, as shown by the dotted line mark of fig. 4, the deconvolution layer L of the second layer is shown as the second layer L3The first data matrix is composed of a first layer of deconvolution layer L2The first data matrix in the 4 blocks is obtained by averaging after deconvolution operation is respectively carried out. Similarly, the third layer of deconvolution layer L3The remaining 3 data matrices are also formed by the second deconvolution layer L2The 4 data matrixes corresponding to the 4 squares in the block are obtained by averaging after deconvolution operation.
According to the characteristic that each element in the first policy network is connected with only one channel, the embodiment divides the first policy network into a plurality of mutually independent sub-networks according to the channels to obtain the second policy network. Referring to fig. 4, the first policy network may be divided into 4 mutually independent sub-networks by the number of channels, looking from the output layer of the first policy network to its input layer. Input Y of the 4 sub-networks1,Y2,Y3,Y4Determined by the last deconvolution layer of the first policy network, in particular, in the first policy network of fig. 4, the third deconvolution layer L3Input first data matrixIs composed of a second deconvolution layer L2Corresponding first data matrix in the 4 channels ofAnd averaging after deconvolution operation. Due to the input layer L1The neuron and the second layer deconvolution layer L2The spirit ofThe warp elements are in one-to-one correspondence, then These 4 data matrices form the input Y of the first subnetwork1Input Y of the remaining three sub-networks2,Y3,Y4Also obtained by this process. Therefore, the input data of the 4 sub-networks are completely different and have the same size. Respectively carrying out deconvolution operation on 4 groups of input data with different data and the same size on mutually independent sub-networks, and finally respectively outputting X with the size consistent with that of X on an output layer1,X2,X3,X4And recovers the output X of the first policy network by taking the average of these four output data.
3. Introducing groups in an objective function of the second policy networkThe regularizer carries out intra-group and inter-group sparse constraint on the weight parameters of the sub-networks to obtain a sparse strategy network
(1) Constructing value network objective functions
The current state of the agent in the environment is StPerforming action a in the current statetThe win environment gives a reward R for the action, with a discount rate of γ. The agent transitions to the next state St+1Then, the next action a is executedt+1。
Constructing a value network function V (S, W) to approximate a state StThe first value below is the value V, where W represents the weight parameter of the value network. Then, the timing difference error δ (TD-error) can be expressed as:
δ=R+γV(St+1,W)-V(St,W)
the value network updates the network parameters by minimizing the TD-error, so the objective function of the value network can be obtained by solving the expectation of the square of the TD-error, which specifically includes:
wherein E (. circle.) represents expectation.
(2) Constructing an objective function for a policy network
Constructing a second policy network function pi (a)t|St) Wherein S istRepresenting the current state of the agent in the environment, atRepresenting actions that the agent may perform in the current state. In the chip layout method based on deep reinforcement learning, a Proximal Policy Optimization (PPO) algorithm is adopted to construct an objective function of a second Policy network:
wherein, theta represents the weight parameter of the strategy network,representing the probability ratio between the old and new policy network functions,the merit function (TD-error can be used instead) is expressed.
In order to implement effective pruning of the policy network, it is necessary to implement thinning of the weight parameters within and between the sub-network groups. To this end, the present embodiment introduces groupsRegularizer, adding groups of weight parameters of sub-networks of the second policy networkRegularizers perform intra-group and inter-group sparsity constraints. Due to policy network passing maximizationNetwork parameters are updated, so that a theta sparse regular term is negated here, and a sparsification strategy network objective function is obtained, which is specifically as follows:
wherein the content of the first and second substances,representing an objective function of the sparsification strategy network; α > 0 represents a regularization term parameter; i | · | purple wind1To representA regularization sub;representing a weight parameter matrix of the nth layer of the mth sub-network; m represents the total number of sub-networks; n denotes the total number of network layers in the sub-network.
Although groups are introduced on the objective function of the second policy networkThe regularizer realizes intra-group and inter-group sparsity constraints on weight parameters of the sub-network, but because the optimization problem of the objective function is still a convex optimization problem, the Adam algorithm needs to be directly used for further processing the objective function of the value network and the objective function of the sparsification strategy network, so that the alternate update of the weight parameters is realized, and the method specifically comprises the following steps:
(1) optimization of value network objective function
Since the objective function of the value network is a convex function, the Adam algorithm can be directly used for direct optimization. Firstly, the target function is derived to obtain the gradient g updated by the t iterationt(θ); then, using gt(theta) solving for a first order estimate mtAnd a second order estimate vt:
mt=β1mt-1+(1-β1)gt(W)
Wherein, beta1And beta2Respectively representing first order estimates mtAnd a second order estimate vtAttenuation coefficient of (d), mt-1And vt-1First and second order estimates at the t-1 th iterative update, respectively. Using mtAnd vtRespectively calculate their offset correctionsTo know
Further, a calculation formula of the value network is obtained:
wherein alpha isWRepresenting a learning rate for controlling a step size; ε represents a numerical calculation stability parameter to prevent the denominator from being 0.
(2) Optimization of objective function for sparsification policy network
Since the optimization problem of the objective function of the sparsification strategy network is still a convex optimization problem, the Adam algorithm can be directly used for updating the weight parameters of the objective function. Similarly, directly deriving the objective function of the sparse strategy network to obtain the gradient g updated by the t iterationt(theta) thenBy gt(theta) solving for a first order estimate mtAnd a second order estimate vtAnd calculate a first order estimate mtAnd a second order estimate vtCorrection of deviation ofTo knowAnd further obtaining a calculation formula of the policy network:
wherein alpha isθRepresenting a learning rate for controlling a step size; ε represents a numerical calculation stability parameter to prevent the denominator from being zero.
4. Pruning and compressing the sparse strategy network to obtain the lightweight strategy network
After the sparse strategy network is obtained, pruning, compression and fine adjustment can be carried out on the strategy network, so that the light weight of the strategy network is realized. In this embodiment, the threshold for limiting the number of subnetworks is set to TpThe pruning threshold of the weight parameter is TθAfter a certain number of iterative updates are performed, pruning of the policy network is started. In particular, if the number of subnetworks is greater than TpAnd the weight parameter matrix theta of the mth sub-networkmExpected value of E [ theta ]m]Satisfies the following conditions:
|E[θm]|<Tθ
namely, setting the weight parameter of the sub-network to zero, and completing the pruning operation of the sparse strategy network.
And then, removing redundant weight parameters in the pruned strategy network to obtain a non-redundant strategy network, compressing the non-redundant strategy network, and regenerating a brand-new lightweight strategy network. And finally, keeping the objective functions of the lightweight strategy network and the value network unchanged, inputting the current state space, and performing fine adjustment training on the deep reinforcement learning network to enable the deep reinforcement learning network to be converged again, so that the final lightweight deep reinforcement learning network is obtained.
And guiding the lightweight strategy network to train by using the value network, and specifically comprising the following steps:
the current three-dimensional state space StInputting the data into a trained lightweight strategy network through a full connection layer, and generating the probability distribution of the available position of the current macro unit (namely the action space a of the current macro unit) by the lightweight strategy networkt) And in the motion space atRandomly sampling an action execution to obtain the next three-dimensional state space St+1;
The current three-dimensional state space StAnd the next three-dimensional state space St+1Inputting the values into a value network to respectively obtain first value values V (S) of the two state spacestW) and a second value V (S)t+1W), adding an incentive value R given by an external environment, and calculating to obtain a time sequence differential error TD-error; and replacing the TD-error with an advantage function in an objective function of the lightweight strategy network, constructing the objective function of the sparse strategy network by using a PPO algorithm, calculating the gradient of the objective function, updating weight parameters of the lightweight strategy network, and guiding the lightweight strategy network to train.
In addition, the TD-error can be used for constructing an objective function of the value network, then the gradient of the objective function is calculated, and the weight parameter of the value network is updated.
S3: taking the three-dimensional state space as input, and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
specifically, the current macro-unit information is changed one by one to change the input of the lightweight strategy network, so that the optimal layout strategy of all the chip macro-units is obtained, and the layout of the chip macro-units is guided.
S4: and guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
The embodiment constructs a new neural network architecture as an embedded layer of a strategy-value network, and carries out mapping on a network table graph of a chip, node characteristics and information of a current macro to be placedAnd encoding to generate a three-dimensional state space. After obtaining the state space, by using the group-basedThe strategy network is subjected to light weight processing by a regular sub-multi-channel multi-layer deconvolution network pruning technology, the strategy network and the value network are trained, probability distribution of the available position of the current macro unit and reward estimation of the position of the current macro unit are respectively output by utilizing the strategy network and the value network, the three-dimensional state space is input into the trained light weight deep reinforcement learning network, the optimal layout strategy of the chip macro unit is output under the condition of occupying less storage resources and calculation resources, the chip macro unit is guided to be mapped onto a chip canvas one by one according to the size sequence, and the calculation amount of the strategy network for generating the placement strategy for the chip macro unit is reduced.
In the embodiment, the policy network is divided into a plurality of mutually independent sub-networks according to the channels, so that a new idea of multi-channel multi-layer structured pruning is provided for the light weight of the policy network, and a method is provided for the block processing of data by the policy network in the future; by introducing groups in the policy network objective functionThe regularizer performs sparse constraint on weight parameters of the strategy network sub-networks in and among groups, and performs pruning compression on the sparse strategy network, so as to realize a self-learning chip macro-unit layout method based on lightweight deep reinforcement learning, better eliminate gradient calculation brought by some unimportant input data, solve the problem of network weight parameter redundancy, and reduce the waste of storage resources and calculation resources in the chip macro-unit layout process in the chip layout method based on deep reinforcement learning.
Example 2
Referring to fig. 5, the embodiment provides a chip macro cell layout system based on lightweight deep reinforcement learning, including:
the data acquisition module M1 is used for generating a three-dimensional state space according to the macro cell characteristics and the net list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
the model training module M2 is used for training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in groupThe regulon is obtained by training a deconvolution network through pruning operation and compression operation;
the strategy generation module M3 is used for taking the three-dimensional state space as input and outputting the optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and the mapping module M4 is used for guiding the macro units to be mapped onto the chip canvas one by one according to the optimal layout strategy.
The emphasis of each embodiment in the present specification is on the difference from the other embodiments, and the same and similar parts among the various embodiments may be referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (10)
1. A chip macro-cell layout method based on lightweight deep reinforcement learning is characterized by comprising the following steps:
generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in groupThe regulon is obtained by training a deconvolution network through pruning operation and compression operation;
taking the three-dimensional state space as input, and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
2. The chip macro-cell layout method based on light-weight deep reinforcement learning according to claim 1, wherein the generating of the three-dimensional state space according to macro-cell features and net list information of the chip specifically comprises:
inputting the macro unit features and the net list graph into a graph neural network, and generating macro unit embedding and edge embedding through graph convolution operation;
inputting the network list metadata into a fully connected network to obtain network list metadata embedding;
reducing the average value of the edge embedding to obtain graph embedding;
embedding and fusing the current macro unit information and the macro unit to obtain the current macro unit embedding;
and embedding the network list metadata, the graph and the current macro unit into the fully-connected network to obtain the current three-dimensional state space.
3. The chip macro cell layout method based on light weight deep reinforcement learning according to claim 1, wherein the guiding the light weight strategy network to train by using the value network specifically comprises:
inputting the current three-dimensional state space into the lightweight strategy network to obtain an action space of the current macro unit;
randomly extracting an action from the current action space and executing the action to obtain a next three-dimensional state space;
inputting the current three-dimensional state space and the next three-dimensional state space into the value network to obtain a first value and a second value;
obtaining a time sequence difference error according to the first value, the second value and an incentive value of an external environment to the current action;
replacing the timing difference error with a merit function in an objective function of the lightweight policy network;
and guiding the lightweight strategy network training according to the replaced objective function.
4. The method for chip macro-cell layout based on light-weight deep reinforcement learning of claim 1, wherein the light-weight strategy network comprises a plurality of sub-networks, and the sub-networks are grouped by introductionThe regularization element is obtained by training a deconvolution network through pruning operation and compression operation, and specifically comprises the following steps:
initializing a deconvolution network based on a reinforcement learning structure to obtain a first strategy network;
carrying out multi-channel multi-layer structural processing on the first strategy network to obtain a second strategy network;
introducing groups in an objective function of the second policy networkRegulon, weight parameter to the sub-networkPerforming intra-group and inter-group sparse constraint on the numbers to obtain a sparse strategy network;
and pruning and compressing the sparse strategy network to obtain the lightweight strategy network.
5. The method of claim 4, wherein the chip macro cell layout method based on light weight deep reinforcement learning,
the first policy network comprises a network layer comprising an input layer, an output layer, and at least one deconvolution layer located between the input layer and the output layer;
each network layer comprises a preset number of channels;
the depth of each channel is 1/channel number of the depth of the corresponding network layer.
6. The chip macro cell layout method based on light-weight deep reinforcement learning according to claim 5, wherein the performing multi-channel multi-layer structurization processing on the first policy network to obtain a second policy network specifically comprises:
correspondingly dividing each layer network layer into a plurality of areas according to the channels;
forming a plurality of mutually independent sub-networks according to corresponding areas in each network layer; wherein the input data of each sub-network is determined by the input data of the corresponding channel in the last deconvolution layer of the first policy network;
and taking a first policy network divided into a plurality of sub-networks as the second policy network.
7. The method of claim 4, wherein the introducing of groups into the objective function of the second policy network is performed by using a light weight deep reinforcement learning-based chip macro cell layout methodRegularizer for performing parameter interpolation on the weight parameters of the sub-networkObtaining a sparse strategy network by row group and inter-group sparse constraints, specifically comprising:
wherein the content of the first and second substances,representing an objective function of the sparsification strategy network; α denotes the regularization term parameter, α>0;||·||1To representA regularization sub;representing a weight parameter matrix of the nth layer of the mth sub-network; m represents the total number of sub-networks; n denotes the total number of network layers in the sub-network.
8. The method for chip macro cell layout based on light-weight deep reinforcement learning of claim 4, wherein a group is introduced into an objective function of the second policy networkAnd the regularizer performs parameter intra-group and inter-group sparse constraint on the weight parameters of the sub-network to obtain a sparse strategy network, and then optimizes the weight parameters of the value network and the weight parameters of the sparse strategy network through an Adam algorithm.
9. The chip macro-cell layout method based on light-weight deep reinforcement learning according to claim 4, wherein the pruning and compressing operations on the sparse strategy network to obtain the light-weight strategy network specifically include:
setting a sub-network threshold value and a pruning threshold value of the weight parameter of the sub-network;
simultaneously judging the number of the sub-networks and the size of the sub-network threshold value as well as the weight parameter of the sub-networks and the size of the pruning threshold value;
if the number of the sub-networks is larger than the sub-network threshold value and the expected value of the weight parameter of the sub-network is smaller than or equal to the pruning threshold value, setting the current weight parameter of the sub-network to zero to complete the pruning operation of the sparse strategy network;
removing the redundancy weight parameters in the pruned sparse strategy network to obtain a non-redundancy strategy network;
compressing the non-redundant strategy network, and adjusting the non-redundant strategy network to be convergent to obtain the lightweight strategy network.
10. A chip macro cell layout system based on lightweight deep reinforcement learning is characterized by comprising:
the data acquisition module is used for generating a three-dimensional state space according to the macro unit characteristics and the network list information of the chip; the network list information of the chip comprises a network list graph and network list metadata;
the model training module is used for training a lightweight deep reinforcement learning network; the lightweight deep reinforcement learning network comprises a lightweight strategy network and a monovalent value network; the value network is used for guiding the lightweight strategy network to train; the lightweight policy network includes a plurality of sub-networks that pass through a lead-in groupThe regulon is obtained by training a deconvolution network through pruning operation and compression operation;
the strategy generation module is used for taking the three-dimensional state space as input and outputting an optimal layout strategy of the chip macro unit according to the trained lightweight deep reinforcement learning network;
and the mapping module is used for guiding the macro units to be mapped to the chip canvas one by one according to the optimal layout strategy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210030064.0A CN114372438B (en) | 2022-01-12 | 2022-01-12 | Chip macro-unit layout method and system based on lightweight deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210030064.0A CN114372438B (en) | 2022-01-12 | 2022-01-12 | Chip macro-unit layout method and system based on lightweight deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114372438A true CN114372438A (en) | 2022-04-19 |
CN114372438B CN114372438B (en) | 2023-04-07 |
Family
ID=81144202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210030064.0A Active CN114372438B (en) | 2022-01-12 | 2022-01-12 | Chip macro-unit layout method and system based on lightweight deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114372438B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115270686A (en) * | 2022-06-24 | 2022-11-01 | 无锡芯光互连技术研究院有限公司 | Chip layout method based on graph neural network |
CN115828831A (en) * | 2023-02-14 | 2023-03-21 | 之江实验室 | Multi-core chip operator placement strategy generation method based on deep reinforcement learning |
CN116562218A (en) * | 2023-05-05 | 2023-08-08 | 之江实验室 | Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning |
CN117829085A (en) * | 2024-03-04 | 2024-04-05 | 中国科学技术大学 | Connection diagram generation method suitable for chip wiring |
CN117829085B (en) * | 2024-03-04 | 2024-05-17 | 中国科学技术大学 | Connection diagram generation method suitable for chip wiring |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106909728A (en) * | 2017-02-21 | 2017-06-30 | 电子科技大学 | A kind of FPGA interconnection resources configuration generating methods based on enhancing study |
US20200067637A1 (en) * | 2018-08-21 | 2020-02-27 | The George Washington University | Learning-based high-performance, energy-efficient, fault-tolerant on-chip communication design framework |
CN111105035A (en) * | 2019-12-24 | 2020-05-05 | 西安电子科技大学 | Neural network pruning method based on combination of sparse learning and genetic algorithm |
CN113505210A (en) * | 2021-07-12 | 2021-10-15 | 广东工业大学 | Medical question-answer generating system based on lightweight Actor-Critic generating type confrontation network |
-
2022
- 2022-01-12 CN CN202210030064.0A patent/CN114372438B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106909728A (en) * | 2017-02-21 | 2017-06-30 | 电子科技大学 | A kind of FPGA interconnection resources configuration generating methods based on enhancing study |
US20200067637A1 (en) * | 2018-08-21 | 2020-02-27 | The George Washington University | Learning-based high-performance, energy-efficient, fault-tolerant on-chip communication design framework |
CN111105035A (en) * | 2019-12-24 | 2020-05-05 | 西安电子科技大学 | Neural network pruning method based on combination of sparse learning and genetic algorithm |
CN113505210A (en) * | 2021-07-12 | 2021-10-15 | 广东工业大学 | Medical question-answer generating system based on lightweight Actor-Critic generating type confrontation network |
Non-Patent Citations (2)
Title |
---|
ANNA GOLDIE等: "Placement Optimization with Deep Reinforcement Learning" * |
邵伟平等: "MobileNet与YOLOv3的轻量化卷积神经网络设计" * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115270686A (en) * | 2022-06-24 | 2022-11-01 | 无锡芯光互连技术研究院有限公司 | Chip layout method based on graph neural network |
CN115828831A (en) * | 2023-02-14 | 2023-03-21 | 之江实验室 | Multi-core chip operator placement strategy generation method based on deep reinforcement learning |
CN116562218A (en) * | 2023-05-05 | 2023-08-08 | 之江实验室 | Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning |
CN116562218B (en) * | 2023-05-05 | 2024-02-20 | 之江实验室 | Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning |
CN117829085A (en) * | 2024-03-04 | 2024-04-05 | 中国科学技术大学 | Connection diagram generation method suitable for chip wiring |
CN117829085B (en) * | 2024-03-04 | 2024-05-17 | 中国科学技术大学 | Connection diagram generation method suitable for chip wiring |
Also Published As
Publication number | Publication date |
---|---|
CN114372438B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114372438B (en) | Chip macro-unit layout method and system based on lightweight deep reinforcement learning | |
CN114896937A (en) | Integrated circuit layout optimization method based on reinforcement learning | |
CN108573303A (en) | It is a kind of that recovery policy is improved based on the complex network local failure for improving intensified learning certainly | |
CN112488208B (en) | Method for acquiring remaining life of island pillar insulator | |
CN112508192B (en) | Increment heap width learning system with degree of depth structure | |
CN111859790A (en) | Intelligent design method for curve reinforcement structure layout based on image feature learning | |
CN113708969B (en) | Collaborative embedding method of cloud data center virtual network based on deep reinforcement learning | |
CN113344174A (en) | Efficient neural network structure searching method based on probability distribution | |
CN111242285A (en) | Deep learning model training method, system, device and storage medium | |
Zhang et al. | Memory-efficient hierarchical neural architecture search for image restoration | |
CN113128617B (en) | Spark and ASPSO based parallelization K-means optimization method | |
CN112836823B (en) | Convolutional neural network back propagation mapping method based on cyclic recombination and blocking | |
CN112651488A (en) | Method for improving training efficiency of large-scale graph convolution neural network | |
CN114841098A (en) | Deep reinforcement learning Beidou navigation chip design method based on sparse representation driving | |
CN111488981A (en) | Method for selecting sparse threshold of depth network parameter based on Gaussian distribution estimation | |
CN113052810B (en) | Small medical image focus segmentation method suitable for mobile application | |
CN106897292A (en) | A kind of internet data clustering method and system | |
CN107273970B (en) | Reconfigurable platform of convolutional neural network supporting online learning and construction method thereof | |
CN111160557B (en) | Knowledge representation learning method based on double-agent reinforcement learning path search | |
CN116822617A (en) | Contrast learning training method and system based on structural heavy parameterization | |
CN112270353B (en) | Clustering method for multi-target group evolution software module | |
Zhang et al. | Crescendonet: A simple deep convolutional neural network with ensemble behavior | |
CN116451586A (en) | Self-adaptive DNN compression method based on balance weight sparsity and Group Lasso regularization | |
CN113011589B (en) | Co-evolution-based hyperspectral image band selection method and system | |
EP4040342A1 (en) | Deep neutral network structure learning and simplifying method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |