CN114841098B - Deep reinforcement learning Beidou navigation chip design method based on sparse representation drive - Google Patents

Deep reinforcement learning Beidou navigation chip design method based on sparse representation drive Download PDF

Info

Publication number
CN114841098B
CN114841098B CN202210384663.2A CN202210384663A CN114841098B CN 114841098 B CN114841098 B CN 114841098B CN 202210384663 A CN202210384663 A CN 202210384663A CN 114841098 B CN114841098 B CN 114841098B
Authority
CN
China
Prior art keywords
network
objective function
value
current
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210384663.2A
Other languages
Chinese (zh)
Other versions
CN114841098A (en
Inventor
唐建浩
李珍妮
郑少龙
谢胜利
元荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210384663.2A priority Critical patent/CN114841098B/en
Publication of CN114841098A publication Critical patent/CN114841098A/en
Application granted granted Critical
Publication of CN114841098B publication Critical patent/CN114841098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/327Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/392Floor-planning or layout, e.g. partitioning or placement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Architecture (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a Beidou navigation chip design method based on sparse representation driven deep reinforcement learning, which comprises the following steps: obtaining graph embedding, current macro cell embedding and netlist metadata embedding based on macro cell features, netlist graph information and netlist metadata of the chip, and obtaining a three-dimensional state space through a second fully-connected network; neuron addition to last layer hidden layer of value network
Figure DDA0003593116540000011
Carrying out sparse constraint on the regulons to obtain a value network based on sparse representation; inputting the three-dimensional state space into a value network based on sparse representation to obtain a value function; and inputting the three-dimensional state space into a strategy network and obtaining the optimal layout strategy of the Beidou navigation chip macro unit under the guidance of the cost function. The value network based on sparse representation alleviates the problem of catastrophic interference of value network parameter learning, and improves the accuracy and robustness of the Beidou navigation chip design based on deep reinforcement learning.

Description

Deep reinforcement learning Beidou navigation chip design method based on sparse representation driving
Technical Field
The invention relates to the field of machine learning and chip design, in particular to a novel Beidou navigation chip design method based on sparse representation driven deep reinforcement learning.
Background
At present, the positioning chip used by navigation products of companies such as Xiaotiancai, huashi and millet in China basically depends on import and mainly comes from u-blox, SONY and the like of foreign enterprises. The current design of the navigation chip is Cheng Xuyao which takes a long time, so that the development speed of the navigation chip is very slow. The chip layout stage is the most complex and time-consuming, and the complexity mainly comes from three aspects of the size of a network chart, the grid granularity of a chip report and the calculation cost of the over-high real target index. Despite decades of research into chip design issues, existing chip layout tools still take weeks of iteration to generate a layout solution that meets all aspects of the design criteria. Therefore, it is very important to develop a new Beidou navigation chip design method which can improve the accuracy of chip design and shorten the chip design period.
Deep Reinforcement Learning (Deep Reinforcement Learning) combines the decision-making capability of Reinforcement Learning and the perception capability of Deep Learning, shows excellent adaptability and Learning capability, and can be used for solving the problem of complex decision-making perception of a system. More recently, google has proposed a deep reinforcement learning-based chip placement method, with the goal of quickly mapping a netlist containing macro cells and standard cells onto a chip canvas while optimizing power consumption, performance, and area (PPA) while respecting the conditional constraints of placement density and routing congestion. The chip design is regarded as a reinforcement learning problem, and a deep reinforcement learning network is trained to optimize the chip layout problem. The method comprises the following two steps: firstly, a Value Network (Value Network) guides the training of a policy Network (policy Network), so that the policy Network gives the optimal layout policy of the current macro units, and then the trained policy Network guides all the macro units of a chip to be sequentially placed according to the size sequence; and secondly, after all macro cells are laid out, the standard cells are laid out by a force guiding method, so that the mapping from the netlist to the canvas of the chip is completed. Experimental results show that compared with the most advanced reference model, the method can realize more excellent PPA on the TPU of Google. More importantly, it can generate a chip layout that is superior or comparable to the chip designer design of the human profession within 6 hours.
However, value networks in deep reinforcement learning are often affected by catastrophic interference phenomena. The value network performs back propagation on inputs in different states to act on the same neuron, so that past learning parameters are overwritten, the network forgets to learn past batch data, approximate estimation deviation of a value function is large, and accuracy of a strategy network for generating a current chip macro unit layout strategy is affected. Therefore, how to relieve the problem of catastrophic interference of value network parameter learning and improve the accuracy and robustness of the Beidou navigation chip design based on deep reinforcement learning is an urgent problem to be solved in the field of artificial intelligence chip design.
Disclosure of Invention
The invention aims to provide a Beidou navigation chip design method based on sparse representation-driven deep reinforcement learning, which is used for solving the problem of catastrophic interference of value network parameter learning based on a value network of sparse representation and improving the accuracy and robustness of the Beidou navigation chip design based on deep reinforcement learning.
In order to achieve the purpose, the invention provides the following scheme:
a Beidou navigation chip design method based on sparse representation driven deep reinforcement learning comprises the following steps:
obtaining graph embedding and current macro unit embedding based on macro unit characteristics and netlist graph information of a chip;
obtaining netlist metadata embedding by passing netlist metadata of a chip through a first full-connection network;
embedding the graph, the current macro cell and the netlist metadata through the second fully connected network to obtain a three-dimensional state space;
neuron addition to last layer hidden layer of value network
Figure SMS_1
Carrying out sparse constraint on the regulons to obtain a value network based on sparse representation;
inputting the three-dimensional state space into the value network based on sparse representation to obtain a value function;
and inputting the three-dimensional state space into the strategy network and obtaining the optimal layout strategy of the Beidou navigation chip macro unit under the guidance of the cost function.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a Beidou navigation chip design method based on sparse representation driven deep reinforcement learning, which comprises the following steps: obtaining graph embedding and current macro unit embedding based on macro unit characteristics and netlist graph information of a chip; obtaining netlist metadata embedding by passing netlist metadata of a chip through a first full-connection network; embedding the graph, the current macro cell and the netlist metadata through the second fully-connected network to obtain a three-dimensional state space; neuron addition to last hidden layer of value network
Figure SMS_2
Carrying out sparse constraint on the regulons to obtain a value network based on sparse representation; inputting the three-dimensional state space into the value network based on sparse representation to obtain a value function; and inputting the three-dimensional state space into the strategy network and obtaining the optimal layout strategy of the Beidou navigation chip macro unit under the guidance of the cost function. By carrying out sparse representation on the neurons of the last hidden layer of the value network, the problem of catastrophic interference in the learning process of the value function parameters in the chip design method based on deep reinforcement learning is solved, so that the deviation of approximate estimation of the value function is reduced, and the accuracy of the layout strategy output by the strategy network can be ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a north dipper navigation chip design method based on sparse representation-driven deep reinforcement learning according to embodiment 1 of the present invention;
FIG. 2 is a specific application process of a value network and a policy network provided in embodiment 1 of the present invention;
FIG. 3 is a diagram of a physical model architecture of a sparse representation-based value network according to embodiment 1 of the present invention;
fig. 4 is a physical model diagram of a policy network provided in embodiment 1 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A fully-connected neural network is used as a value network, the network inputs a three-dimensional state space corresponding to macro units in an environment, and a corresponding value function in the state space is output. However, the value network is often affected by disaster interference, so that the approximate estimation deviation of the value function is large. Therefore, the problem of catastrophic interference of value network parameter learning is solved, the accuracy and robustness of chip layout are improved, and the method has a wide application scene in the field of artificial intelligence-based navigation chip design.
The invention aims to provide a Beidou navigation chip design method based on sparse representation-driven deep reinforcement learning, which is used for solving the problem of catastrophic interference of value network parameter learning based on a value network of sparse representation and improving the accuracy and robustness of the Beidou navigation chip design based on deep reinforcement learning.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example 1
Firstly, constructing information of a neural network architecture coding netlist, aiming at extracting information such as node types and link property among nodes in the netlist into vector representation of low latitude and generating rich information for a strategy networkRich state space, as shown in figure 1. Then, a deconvolution network is initialized as a strategy network, and a fully-connected neural network is initialized as a value network. The input of the two is a three-dimensional state space, the strategy network outputs a two-dimensional chip layout strategy, and the value network outputs a corresponding value function in the state space; training a strategy network and a value network, respectively outputting the probability distribution of the available position of the current macro unit and the value estimation of the state space corresponding to the current macro unit, and simultaneously adopting the strategy network under the condition of maintaining the objective function of the strategy network unchanged
Figure SMS_3
The regulon performs sparse representation on neuron of the last hidden layer of the value network, and a brand new target function of the value network based on sparse representation driving is constructed. Due to the addition of>
Figure SMS_4
The optimization problem of the value network target function is still a convex optimization problem after the regular item, the objective function is solved by using a sub-gradient algorithm, the weight parameter of the value network is updated, the value function estimated value with smaller deviation is obtained, the training of the strategy network is further guided, the influence of catastrophic interference on the value network training is relieved, and the estimation accuracy and the robustness of the value network are further improved. And finally, obtaining a chip macro unit layout strategy output by the trained strategy network, accurately and efficiently guiding the layout of the Beidou navigation chip macro unit, and guiding the chip macro units to be mapped on the chip canvas one by one according to the size sequence by the layout strategy, as shown in the attached figure 2.
As shown in fig. 1, the present embodiment provides a method for designing a beidou navigation chip based on sparse representation-driven deep reinforcement learning, including:
s1: obtaining graph embedding and current macro unit embedding based on macro unit characteristics and netlist graph information of the chip;
the chip comprises a plurality of macro units and standard units, different macro units and standard units are composed of different basic circuits, macro unit features describe the functional characteristics of the macro units, and the netlist diagram describes basic information of all the macro units of the chip. The macro unit characteristic and netlist graph information obtaining graph embedding and macro unit embedding based on the chip specifically comprise:
inputting net list graph information of the navigation chip into a graph neural network;
performing graph convolution operation on macro unit features of the navigation chip and a netlist graph by using the graph neural network to generate edge embedding and macro unit embedding;
reducing the average value of the edge embedding to obtain graph embedding;
and adding current macro unit information to the macro unit embedding to obtain the current macro unit embedding.
S2: obtaining netlist metadata embedding by passing netlist metadata of a chip through a first full-connection network;
s3: embedding the graph, the current macro cell and the netlist metadata through the second fully-connected network to obtain a three-dimensional state space S t
Will state space S t Inputting the strategy network after training through the full connection layer, and executing the layout strategy a of the current macro unit generated by the strategy network in the action space t To obtain the next state space S t+1
The three-dimensional state space is generated as follows: firstly, net list graph information of a navigation chip is input into a graph neural network, and the graph neural network carries out graph convolution operation on macro unit features and the net list graph to generate edge embedding and macro unit embedding. And secondly, embedding the metadata of the netlist by the metadata of the netlist through a full-connection network, embedding the edge by reducing an average value to obtain a graph, and embedding the macro unit by adding current macro unit information to obtain current macro unit embedding. Finally, the obtained netlist metadata embedding, the graph embedding and the current macro unit embedding are input into a full-connection network together, and a three-dimensional state space is generated for a value network and a strategy network and used for training the network. As shown in particular in figure 1.
S4: neuron addition to last layer hidden layer of value network
Figure SMS_5
Carrying out sparse constraint on the regulons to obtain a value network based on sparse representation;
will state space S t And a next state space S t+1 Inputting the values into a value network to respectively obtain the values V (S) of the two state spaces t W) and V (S) t+1 W), adding a reward R given by the external environment, calculating to obtain a time sequence difference error TD-error, and constructing an objective function of the value network by using the time sequence difference error TD-error. Secondly, adding the neuron of the last hidden layer in the objective function of the value network
Figure SMS_6
And carrying out sparse constraint on the regularizer, and then updating weight parameters of the value network by adopting a subgradient descent method.
S5: inputting the three-dimensional state space into the value network based on sparse representation to obtain a value function;
the state of the intelligent agent in the t step in the environment is S t I.e. the current chip macro cell placement on the chip canvas, perform action a in the state t I.e. the collection of all locations in the space of the discrete canvas, and then obtain a reward R for the environment giving that action t Defined as the weighted sum of the wireless network and congestion. The agent transitions to the next state S t+1 Then, the next action a is executed t+1
Constructing a value network objective function:
constructing a value network function V (S, W) to approximate a state S t And V is the value of W, wherein W represents the weight parameter of the value network, and the discount rate of return is defined to be gamma. Then, the timing difference error δ (TD-error) can be expressed as:
δ=R t +γV(S t+1 ,W)-V(S t ,W)
the value network updates the network parameters by minimizing the TD-error, so the objective function of the value network is obtained by solving the expectation of the square of the TD-error, which is as follows:
f(W)=E[(R t +γV(S t+1 ,W)-V(S t ,W)) 2 ]
applying output y of neuron of last layer hidden layer of value network
Figure SMS_7
Regularization sub h (y) = lambda y | | non-calculation 1 Sparse constraint is performed, and an objective function is obtained as follows:
Figure SMS_8
wherein λ represents
Figure SMS_9
The regularization subparameter, E (-) represents the desired operation.
Before the three-dimensional state space is input to the sparse representation-based value network, the method further comprises the following steps:
(1) Constructing an objective function of the value network based on sparse representation to obtain a value network objective function;
the expression of the value network objective function is as follows:
Figure SMS_10
wherein W represents a weight parameter of the value network; r t Represents the state S t Lower execution action a t The prize value of (d); γ represents the discount rate; s t+1 And S t Representing a next layout state and a current layout state of the current macro cell on the chip canvas; v () represents a value estimation value; lambada y caldolizing 1 To represent
Figure SMS_11
A regularization sub; lambda denotes->
Figure SMS_12
A regularization sub-parameter; e (-) denotes the desired operation.
(2) And carrying out weight optimization on the value network objective function to obtain an optimized value network objective function.
Specifically, the performing weight optimization on the value network objective function specifically includes:
1) Solving the current value network objective function by using a sub-gradient descent algorithm to obtain a current updated value weight parameter;
wherein, the formula of the current update weight parameter is:
Figure SMS_13
y j the jth neuron representing the last hidden layer; k represents the total number of the last hidden layer neurons;
Figure SMS_14
denotes y j To W i-1 Operation of derivation; alpha is alpha W Represents a learning rate; w i-1 Representing the weight value of the i-1 th iteration; />
Figure SMS_15
Denotes f (W) i-1 ) For weight parameter W i-1 Calculating gradient; sign (y) j ) Is a sign function, i.e. y 1 A sub-gradient of (a).
2) Substituting the current updated value weight parameter into the current value network objective function to obtain the current updated value network objective function;
3) Judging whether the relative error between the current value network objective function and the last value network objective function is smaller than a first preset value or not to obtain a first judgment result;
if the first judgment result is yes, the current updated value network objective function is the optimized value network objective function;
if the first judgment result is negative, judging whether the current iteration times are equal to the first maximum iteration times to obtain a second judgment result;
if the second judgment result is yes, the current updated value network objective function is the optimized value network objective function;
and if the second judgment result is negative, the currently updated value network objective function is made to be the current value network objective function, and the step of solving the current value network objective function by using a sub-gradient descent algorithm to obtain the current updated value weight parameter is returned.
Training obtains a sparsely represented value network, as shown in FIG. 3. Wherein white squares represent activated (non-zero) neurons and grey squares represent inactivated neurons, such that only a small portion of the neurons in the last hidden layer are activated, thereby generating a sparse representation and alleviating the problem of catastrophic interference in value network parameter learning. By adopting the deep reinforcement learning method based on sparse representation driving, the influence of disaster interference on the performance of the value network can be relieved, so that the accuracy and the robustness of the approximate estimation of the value function are improved.
And replacing the TD-error with the dominant function, constructing an objective function of the strategy network by using a PPO algorithm, and updating the weight parameter of the strategy network by using an Adam algorithm. By the method, the influence of catastrophic interference on the performance of the value network can be relieved, and the accuracy and robustness of approximate estimation of the value function are improved. So before step S6, it further includes:
(1) Constructing an objective function of the strategy network to obtain a strategy network objective function;
constructing a policy network function pi (a) t |S t ) Wherein S is t Information representing the t-th state, including the current macro cell and the entire net list, a t Representing the t-th action is the current macrocell layout policy generated by the policy network, including a collection of locations on the discrete canvas space where all chips can be laid out, for guiding the layout of the current macrocell. In the Beidou navigation chip design method based on sparse depth reinforcement learning, a Proximal Policy Optimization (PPO) algorithm is adopted to construct an objective function of a strategy network:
Figure SMS_16
/>
wherein, theta represents the weight parameter of the strategy network,
Figure SMS_17
representing the probability ratio, pi, between the network functions of the old and the new policies old (a t |S t ) Represents an old policy, <' > or>
Figure SMS_18
Represents the merit function (TD-error can be substituted).
(2) And updating the weight of the strategy objective function to obtain an updated strategy network objective function.
The optimization problem of the objective function of the policy network is a convex optimization problem, and the Adam algorithm can be directly used for updating the weight parameter of the policy network. Wherein, the updating the weight of the policy network objective function specifically includes:
1) Deriving the current strategy objective function to obtain the current updated gradient g i (θ);
According to the current updated gradient g i (theta) calculating a first order estimate m i And a second order estimate v i
m i =β 1 m i-1 +(1-β 1 )g i (θ)
Figure SMS_19
Wherein, beta 1 And beta 2 Respectively representing first order estimates m i And a second order estimate v i The attenuation coefficient of (2).
2) From the first order estimate m i And said second order estimate v i Obtaining a first order estimate offset correction
Figure SMS_20
And a second order estimated bias correction>
Figure SMS_21
Figure SMS_22
Figure SMS_23
3) Correcting according to the first-order estimated deviation
Figure SMS_24
And said second order estimated bias correction>
Figure SMS_25
Obtaining a current policy update weight parameter;
further obtaining a calculation formula of the strategy network weight parameter:
Figure SMS_26
α θ representing a learning rate for controlling a step size;
Figure SMS_27
and &>
Figure SMS_28
Respectively representing first-order deviation correction and second-order deviation correction; ε represents the stability parameter, i.e., the numerical stability parameter is used to prevent the denominator from being zero.
4) Substituting the current strategy updating weight parameter into the current strategy network objective function to obtain the current updated strategy network objective function;
5) Judging whether the relative error between the current strategy network objective function and the previous strategy network objective function is smaller than a second preset value or not to obtain a third judgment result;
if the third judgment result is yes, the current updated policy network objective function is the optimized policy network objective function;
if the third judgment result is negative, judging whether the current iteration times are equal to the second maximum iteration times to obtain a fourth judgment result;
if the fourth judgment result is yes, the current updated policy network objective function is the optimized policy network objective function;
if the fourth judgment result is negative, the strategy network objective function after the current update is made to be the current strategy network objective function, and the step of 'obtaining the current updated gradient by differentiating the current strategy objective function' is returned.
Judging the iteration cutoff condition comprises two conditions: finally, two times of continuous strategy network target functions obj i And obj i-1 Is less than a certain small value
Figure SMS_29
I.e. is->
Figure SMS_30
And stopping iteration, and if not, continuing to iterate to the maximum iteration step number.
S6: and inputting the three-dimensional state space into the strategy network and obtaining the optimal layout strategy of the Beidou navigation chip macro unit under the guidance of the cost function.
And finishing the layout of the macro cells of the chip one by one, and sequentially generating layout strategies for the macro cells according to the size sequence of the macro cells so as to ensure sufficient space. And changing the information of the current macro unit (predefined logic modules such as a trigger, an arithmetic logic unit, a hardware temporary storage and the like) one by one to change the input of the strategy network, obtaining the optimal layout strategy of the Beidou navigation chip macro unit corresponding to the optimal cost function in the current state, and further guiding the layout of the Beidou navigation chip macro unit.
A deconvolution network is adopted as a strategy network, so that an input three-dimensional state space can output a two-dimensional chip layout strategy through the strategy network. The deconvolution network consists of an input layer, a deconvolution layer, and an output layer. Similar to the convolutional network, the input layer of the deconvolution network uses a non-fully connected mode to realize data input, and the output layer uses a fully connected mode to realize data output, and at the input layer, data is outputAnd one or more deconvolution layers for deconvolution are arranged between the output layer and the reference layer. Assume an input matrix of a deconvolution network as
Figure SMS_31
Figure SMS_32
The output matrix is->
Figure SMS_33
The number of selected channels (i.e., the convolution kernels used by each deconvolution layer) is 4, and the physical model of a deconvolution network is shown in fig. 4. Wherein the network layer in the deconvolution network is denoted as L (L) k Representing the k network layer). Stored in the input layer
Figure SMS_34
(width 8, height 8, depth 16) input matrix Y, through non-full connection, into the first deconvolution device @, one-to-one>
Figure SMS_35
Then passes through 4 channels and enters a second deconvolution layer
Figure SMS_36
Then enters an output layer after 4 channels>
Figure SMS_37
And the output layer obtains an output matrix X through full connection operation.
In this embodiment, a new neural network architecture is constructed as an embedded layer of a policy-value network, and the network list graph node features and information of the current macro to be placed are encoded to generate a three-dimensional state space. And after the state space is obtained, inputting the state space into a trained deep reinforcement learning network, and outputting a more accurate optimal layout strategy of the Beidou navigation chip macro unit under the condition of relieving catastrophic interference, thereby guiding the chip macro unit to be mapped onto the chip canvas one by one according to the size sequence.
The present embodiment has the following advantages: 1. the invention is inValue network objective function introduction
Figure SMS_38
And the regularizer updates weight parameters by using a secondary gradient method, further realizes the deep reinforcement learning based on sparse representation driving, is further applied to the layout of the Beidou navigation chip, better relieves the influence caused by catastrophic interference in the deep reinforcement learning, and improves the accuracy and the robustness of value network estimation. 2. According to the invention, the deep reinforcement learning network based on sparse representation driving is realized, manual experience is replaced, the development period of a navigation chip is shortened, the development cost of the chip is reduced, and the sparse representation deep reinforcement learning algorithm provided by the invention can be applied to other more complex chip design environments.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the foregoing, the description is not to be taken in a limiting sense.

Claims (6)

1. A Beidou navigation chip design method based on sparse representation driven deep reinforcement learning is characterized by comprising the following steps of:
obtaining graph embedding and current macro unit embedding based on macro unit characteristics and netlist graph information of the chip;
the netlist metadata of the chip is embedded into netlist metadata obtained through a first full-connection network;
embedding the graph, the current macro cell and the netlist metadata into a three-dimensional state space obtained through a second fully-connected network;
neuron addition to last layer hidden layer of value network
Figure QLYQS_1
Carrying out sparse constraint on the regulons to obtain a value network based on sparse representation; the above-mentionedThe value network is a fully-connected neural network;
inputting the three-dimensional state space into the value network based on sparse representation to obtain a value function;
inputting the three-dimensional state space into a strategy network and obtaining an optimal layout strategy of the Beidou navigation chip macro unit under the guidance of the cost function; the strategy network is a deconvolution network;
said inputting said three-dimensional state space into said sparse representation-based value network further comprises:
(1) Constructing an objective function of the value network based on sparse representation to obtain a value network objective function; the expression of the value network objective function is as follows:
Figure QLYQS_2
wherein,
Figure QLYQS_5
a weight parameter representing a value network; />
Figure QLYQS_10
Indicates a status->
Figure QLYQS_13
Down take action pick>
Figure QLYQS_4
The prize value of (d); />
Figure QLYQS_9
Representing a discount rate; />
Figure QLYQS_12
And &>
Figure QLYQS_15
Representing the next cloth of the current macro-unit on the chip canvasA local state and a current layout state; />
Figure QLYQS_3
Representing a value estimate; />
Figure QLYQS_7
Represents->
Figure QLYQS_11
A regularization sub; />
Figure QLYQS_14
Represents->
Figure QLYQS_6
A regularization subparameter; />
Figure QLYQS_8
Representing the desired operation;
(2) Carrying out weight optimization on the value network objective function to obtain an optimized value network objective function; the method specifically comprises the following steps:
solving the current value network objective function by using a sub-gradient descent algorithm to obtain a current updated value weight parameter;
substituting the current updated value weight parameter into the current value network objective function to obtain the current updated value network objective function;
judging whether the relative error between the current value network objective function and the last value network objective function is smaller than a first preset value or not to obtain a first judgment result;
if the first judgment result is yes, the current updated value network objective function is an optimized value network objective function;
if the first judgment result is negative, judging whether the current iteration times are equal to the first maximum iteration times to obtain a second judgment result;
if the second judgment result is yes, the current updated value network objective function is the optimized value network objective function;
if the second judgment result is negative, the current updated value network objective function is made to be the current value network objective function, and the step of solving the current value network objective function by using a sub-gradient descent algorithm to obtain the current updated value weight parameter is returned.
2. The method of claim 1, wherein the chip-based macro cell feature and netlist graph information derivation graph embedding and macro cell embedding specifically comprises:
inputting the net list graph information of the navigation chip into a graph neural network;
performing graph convolution operation on the macro unit features of the navigation chip and the netlist graph by using the graph neural network to generate edge embedding and macro unit embedding;
reducing the average value of the edge embedding to obtain graph embedding;
and adding current macro unit information to the macro unit embedding to obtain the current macro unit embedding.
3. The method of claim 1, wherein the current update value weight parameter is formulated as:
Figure QLYQS_16
wherein,
Figure QLYQS_18
fifth-or-fifth-representing the last hidden layer>
Figure QLYQS_23
A plurality of neurons; />
Figure QLYQS_27
Representing the total number of neurons in the last hidden layer;
Figure QLYQS_19
represents->
Figure QLYQS_22
Is paired and/or matched>
Figure QLYQS_26
Operation of derivation; />
Figure QLYQS_30
Represents a learning rate; />
Figure QLYQS_17
Indicates the fifth->
Figure QLYQS_21
The weight value in the secondary iteration;
Figure QLYQS_25
represents->
Figure QLYQS_29
To the weight parameter->
Figure QLYQS_20
Solving gradient operation; />
Figure QLYQS_24
Is a function of sign>
Figure QLYQS_28
A sub-gradient of (a).
4. The method of claim 1, wherein inputting the three-dimensional state space into the policy network further comprises:
constructing an objective function of the strategy network to obtain a strategy network objective function;
and updating the weight of the strategy network objective function to obtain the updated strategy network objective function.
5. The method according to claim 4, wherein the updating the weight of the policy network objective function specifically includes:
deriving the current strategy network objective function to obtain a current updated gradient;
calculating a first order estimate and a second order estimate from the current updated gradient;
obtaining a first order estimated bias correction and a second order estimated bias correction according to the first order estimation and the second order estimation;
obtaining a current strategy updating weight parameter according to the first-order estimation deviation correction and the second-order estimation deviation correction;
substituting the current strategy updating weight parameter into the current strategy network objective function to obtain the current updated strategy network objective function;
judging whether the relative error between the current strategy network objective function and the previous strategy network objective function is smaller than a second preset value or not to obtain a third judgment result;
if the third judgment result is yes, the current updated strategy network objective function is an optimized strategy network objective function;
if the third judgment result is negative, judging whether the current iteration times are equal to the second maximum iteration times to obtain a fourth judgment result;
if the fourth judgment result is yes, the current updated policy network objective function is the optimized policy network objective function;
if the fourth judgment result is negative, the current updated strategy network objective function is made to be the current strategy network objective function, and the step of 'obtaining the current updated gradient by differentiating the current strategy network objective function' is returned.
6. The method according to claim 5, wherein the expression of the current policy update weight parameter is:
Figure QLYQS_31
wherein,
Figure QLYQS_32
represents a learning rate; />
Figure QLYQS_33
Respectively representing first-order deviation correction and second-order deviation correction; />
Figure QLYQS_34
The stability parameter is indicated. />
CN202210384663.2A 2022-04-13 2022-04-13 Deep reinforcement learning Beidou navigation chip design method based on sparse representation drive Active CN114841098B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210384663.2A CN114841098B (en) 2022-04-13 2022-04-13 Deep reinforcement learning Beidou navigation chip design method based on sparse representation drive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210384663.2A CN114841098B (en) 2022-04-13 2022-04-13 Deep reinforcement learning Beidou navigation chip design method based on sparse representation drive

Publications (2)

Publication Number Publication Date
CN114841098A CN114841098A (en) 2022-08-02
CN114841098B true CN114841098B (en) 2023-04-18

Family

ID=82563580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210384663.2A Active CN114841098B (en) 2022-04-13 2022-04-13 Deep reinforcement learning Beidou navigation chip design method based on sparse representation drive

Country Status (1)

Country Link
CN (1) CN114841098B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116911245B (en) * 2023-07-31 2024-03-08 曲阜师范大学 Layout method, system, equipment and storage medium of integrated circuit

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753468A (en) * 2020-06-28 2020-10-09 中国科学院自动化研究所 Elevator system self-learning optimal control method and system based on deep reinforcement learning
CN112019249A (en) * 2020-10-22 2020-12-01 中山大学 Intelligent reflecting surface regulation and control method and device based on deep reinforcement learning
CN112130570A (en) * 2020-09-27 2020-12-25 重庆大学 Blind guiding robot of optimal output feedback controller based on reinforcement learning
CN113554166A (en) * 2021-06-16 2021-10-26 中国人民解放军国防科技大学 Deep Q network reinforcement learning method and equipment for accelerating cognitive behavior model

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059100B (en) * 2019-03-20 2022-02-22 广东工业大学 SQL sentence construction method based on actor-critic network
CN111204476B (en) * 2019-12-25 2021-10-29 上海航天控制技术研究所 Vision-touch fusion fine operation method based on reinforcement learning
CN113093727B (en) * 2021-03-08 2023-03-28 哈尔滨工业大学(深圳) Robot map-free navigation method based on deep security reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753468A (en) * 2020-06-28 2020-10-09 中国科学院自动化研究所 Elevator system self-learning optimal control method and system based on deep reinforcement learning
CN112130570A (en) * 2020-09-27 2020-12-25 重庆大学 Blind guiding robot of optimal output feedback controller based on reinforcement learning
CN112019249A (en) * 2020-10-22 2020-12-01 中山大学 Intelligent reflecting surface regulation and control method and device based on deep reinforcement learning
CN113554166A (en) * 2021-06-16 2021-10-26 中国人民解放军国防科技大学 Deep Q network reinforcement learning method and equipment for accelerating cognitive behavior model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨惟轶 ; 白辰甲 ; 蔡超 ; 赵英男 ; 刘鹏 ; .深度强化学习中稀疏奖励问题研究综述.计算机科学.2020,(第03期),第190-199页. *

Also Published As

Publication number Publication date
CN114841098A (en) 2022-08-02

Similar Documents

Publication Publication Date Title
CN113053115B (en) Traffic prediction method based on multi-scale graph convolution network model
CN109271683B (en) Building group automatic arrangement algorithm for sunlight constraint
CN106910337A (en) A kind of traffic flow forecasting method based on glowworm swarm algorithm Yu RBF neural
CN107705556A (en) A kind of traffic flow forecasting method combined based on SVMs and BP neural network
CN114372438B (en) Chip macro-unit layout method and system based on lightweight deep reinforcement learning
CN114841098B (en) Deep reinforcement learning Beidou navigation chip design method based on sparse representation drive
CN107578121A (en) Based on the power transformation engineering cost forecasting method for improving glowworm swarm algorithm optimization SVM
Whigham et al. Predicting chlorophyll-a in freshwater lakes by hybridising process-based models and genetic algorithms
CN115455899A (en) Analytic layout method based on graph neural network
CN115358136A (en) Structural rigidity optimization design method based on neural network
CN110766201A (en) Revenue prediction method, system, electronic device, computer-readable storage medium
CN116562218B (en) Method and system for realizing layout planning of rectangular macro-cells based on reinforcement learning
CN107491841A (en) Nonlinear optimization method and storage medium
Menzel et al. Application of free form deformation techniques in evolutionary design optimisation
US11790136B2 (en) Method for automating semiconductor design based on artificial intelligence
US11657206B1 (en) Method for semiconductor design based on artificial intelligence
CN116882539A (en) Water quality data prediction method based on improved Re-GCN model
CN110222847A (en) A kind of machine learning method and device
CN113592296B (en) Public policy decision method, device, electronic equipment and storage medium
CN109919294A (en) A kind of image enchancing method of glowworm swarm algorithm and cuckoo searching algorithm Parallel Fusion
CN109902870A (en) Electric grid investment prediction technique based on AdaBoost regression tree model
CN114707655A (en) Quantum line conversion method, quantum line conversion system, storage medium and electronic equipment
CN115169215A (en) Multi-objective optimization method and system considering nitrate pollution and seawater invasion process
CN106874998A (en) A kind of step matrix disassembling method certainly based on Pareto optimization
CN105512754A (en) Conjugate prior-based single-mode distribution estimation optimization method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant