CN116738923A - Chip layout optimization method based on reinforcement learning with constraint - Google Patents

Chip layout optimization method based on reinforcement learning with constraint Download PDF

Info

Publication number
CN116738923A
CN116738923A CN202310359245.2A CN202310359245A CN116738923A CN 116738923 A CN116738923 A CN 116738923A CN 202310359245 A CN202310359245 A CN 202310359245A CN 116738923 A CN116738923 A CN 116738923A
Authority
CN
China
Prior art keywords
constraint
soft
hard
layout
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310359245.2A
Other languages
Chinese (zh)
Other versions
CN116738923B (en
Inventor
欧阳雅捷
刘晓翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202310359245.2A priority Critical patent/CN116738923B/en
Publication of CN116738923A publication Critical patent/CN116738923A/en
Application granted granted Critical
Publication of CN116738923B publication Critical patent/CN116738923B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/392Floor-planning or layout, e.g. partitioning or placement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/36Circuit design at the analogue level
    • G06F30/367Design verification, e.g. using simulation, simulation program with integrated circuit emphasis [SPICE], direct methods or relaxation methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Architecture (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

The invention provides a chip layout optimization method based on reinforcement learning with constraint, which belongs to the field of integrated circuits, and comprises the following steps: establishing a model based on a Markov decision process for the chip layout problem; aiming at the chip design layout field, distinguishing hard constraint and soft constraint; designing a reinforcement learning algorithm to process hard constraint and soft constraint; designing a reward function to respectively process hard constraint and soft constraint; training the intelligent agent by using a reinforcement learning algorithm with constraint, so that the intelligent agent finds a strategy for optimizing soft constraint on the premise of meeting hard constraint; after the training of the agent is completed, the training method is applied to the actual chip layout problem, and an optimized layout scheme is obtained through the action sequence executed by the agent. The invention adopts reinforcement learning algorithm with constraint and targeted constraint processing mode, and can optimize soft constraint on the premise of meeting hard constraint, thereby realizing a chip layout scheme with high performance and low power consumption.

Description

Chip layout optimization method based on reinforcement learning with constraint
Technical Field
The invention belongs to the field of integrated circuits, and particularly relates to a chip layout optimization method based on reinforcement learning with constraint.
Background
In modern integrated circuit design, chip layout is a critical step that directly affects chip performance, power consumption, and cost. The chip layout problem can be regarded as an optimization problem, requiring optimization of multiple objectives under certain constraints. In chip layout, there are two constraint types, hard and soft. Hard constraints are conditions that must be met, such as routing rules, power supplies, etc. Soft constraints are then the goal that one wants to optimize to some extent, such as power consumption, performance, etc. Violating the hard constraints can cause the chip to fail, while optimizing the soft constraints affects the performance of the chip. Therefore, how to optimize soft constraints on the premise of meeting hard constraints becomes an important research direction.
Conventional chip layout optimization methods typically rely on heuristic algorithms and human experience. However, with the rapid development of integrated circuit technology, the complexity of chips is increasing, and conventional methods have been difficult to cope with. In recent years, reinforcement learning has received attention as a method of autonomous learning. However, existing reinforcement learning methods often have difficulty distinguishing between hard constraints and soft constraints, resulting in the possibility of violating hard constraints when solving chip layout problems, thereby affecting chip usability.
Disclosure of Invention
The invention aims to provide a chip layout optimization method based on reinforcement learning with constraint, which can optimize soft constraint on the premise of meeting hard constraint by adopting reinforcement learning algorithm with constraint and a targeted constraint processing mode, thereby realizing a chip layout scheme with high performance and low power consumption.
In order to achieve the above object, the present invention provides a chip layout optimization method based on constraint reinforcement learning, the method comprising:
s1: establishing a model based on a Markov decision process for the chip layout problem;
s2: aiming at the chip design layout field, distinguishing hard constraint and soft constraint;
s3: designing a reinforcement learning algorithm to process hard constraint and soft constraint;
s4: designing a reward function to respectively process hard constraint and soft constraint;
s5: training the intelligent agent by using a reinforcement learning algorithm with constraint, so that the intelligent agent finds a strategy for optimizing soft constraint on the premise of meeting hard constraint;
s6: after the training of the agent is completed, the training method is applied to the actual chip layout problem, and an optimized layout scheme is obtained through the action sequence executed by the agent.
Further, establishing a model based on a Markov decision process for the chip layout problem, wherein the model comprises states, actions, state transition probabilities and rewarding functions;
the state is S, the current situation of the chip layout is represented, and the state is defined as a tuple;
the action is A, which represents the operation of the intelligent agent on the layout;
the state transition probability is P, which indicates the probability that the system will transition to a new state after executing a certain action in a given state;
the rewarding function is R and is used for evaluating rewards obtained by the agent after executing a certain action.
Further, the tuple includes a placed element position, a list of non-placed elements, a hard constraint state, and a soft constraint state;
the placed component position represents a component which has been placed on a chip and position information thereof;
the unset component list represents a component list that has not been placed on a chip;
the hard constraint state represents the connection relation between the placed elements in the current layout, and the distance and the size between the elements;
the soft constraint state represents a performance indicator of the placed element in the current layout.
Further, for the chip design layout field, distinguishing between hard constraints and soft constraints, the hard constraints including, but not limited to, space constraints, overlap constraints, connection constraints, power constraints, and thermal constraints; the soft constraints include, but are not limited to, functional optimization, delay optimization, space utilization optimization, line length optimization, and thermal profile optimization.
Further, in the process of hard constraint and soft constraint by designing a reinforcement learning algorithm, the reinforcement learning algorithm specifically comprises:
defining a feasibility function f (s, a) representing whether the action a taken in the state s satisfies the hard constraint; when the hard constraint is satisfied, f (s, a) =1; otherwise, f (s, a) =0;
the expected soft constraint rewards are maximized without violating the hard constraints, and therefore the objective function is expressed as:
J(π)=E{s,a~π}[r(s,a)*f(s,a)]
where pi is the policy and r (s, a) represents the soft constraint reward obtained by taking action a in state s;
in order to optimize the objective function, the loss function is as follows:
L(π)=E{s,a~π}[-r(s,a)*f(s,a)+λ*D_KL(π_old||π)]
wherein D_KL represents KL divergence, which is used for measuring the difference between the new strategy pi and the old strategy pi_old; lambda is a super parameter for balancing soft constraint rewards and policy update magnitudes;
the objective function is optimized by iteratively updating the strategy pi, in each iteration track data is first collected, then the strategy is updated using the loss function, in the updating process, it is ensured that the new strategy pi satisfies f (s, a) =1.
Further, the design reward function respectively processes hard constraint and soft constraint, specifically:
hard constraint processing: defining a state transition function T (s, a, s ') representing the probability of transition to state s' after performing action a in state s; for the state transition meeting the hard constraint, the original transition probability is maintained, and if a certain state transition violates the hard constraint, the probability is set to 0 so as to inhibit the state transition;
soft constraint processing: and (3) distributing a weight to each soft constraint, wherein the weight is adjusted according to the specific requirements of the problem and the optimization target, and finally, the weighted sum of the soft constraints is incorporated into the reward function.
Further, the state transition is expressed as:
T(s,a,s')=P(s'|s,a)*f(s,a,s')
wherein P (s ' |s, a) is the original state transition probability, f (s, a, s ') is an indication function, and the value is 1 when the state transition (s, a, s ') satisfies the hard constraint, otherwise, the value is 0;
the reward function is expressed as:
R(s,a,s')=r(s,a,s')+∑w_i*g_i(s,a,s')
where r (s, a, s ') represents the original prize, w_i is the weight of the i-th soft constraint, and g_i (s, a, s ') represents the contribution of the i-th soft constraint under the state transition (s, a, s ').
Further, the reinforcement learning algorithm with constraint is used for training the intelligent agent, so that the intelligent agent finds a strategy for optimizing soft constraint on the premise of meeting hard constraint, and the strategy specifically comprises the following steps:
s5-1: ensuring that the hard constraint is satisfied;
s5-2: optimizing soft constraints;
s5-3: experience playback is adopted;
s5-4: using the target network:
s5-5: attenuation exploration rate.
Further, the reward function contains performance metrics of the layout, including power consumption, delay.
Further, after the training of the agent is completed, the training is applied to the actual chip layout problem, and an optimized layout scheme is obtained through an action sequence executed by the agent, which comprises the following specific steps:
s6-1: according to the current layout state, enabling the intelligent agent to execute actions;
s6-2: before each action is performed, checking whether the action would cause the hard constraint to be violated; if yes, skipping the action, and selecting the next action;
s6-3: updating the current layout scheme according to the action selected by the agent;
s6-4: after each time of updating the layout, calculating the satisfaction degree of soft constraint under the new layout;
s6-5: steps S6-1-S6-4 are repeated until a preset number of optimizations or other termination conditions are reached.
The beneficial technical effects of the invention are at least as follows:
(1) The hard constraint is incorporated into the calculation of the state transition probability, so that the intelligent agent is ensured to always meet the hard constraint, and the reliability of the layout scheme is improved. The optimization target of the soft constraint is embodied in the reward function, so that the intelligent agent can optimize the soft constraint in the learning process, and the performance of the layout scheme is improved. The hard constraint and the soft constraint are fully considered in the definition of the state and the action, so that the intelligent agent is helped to fully understand and master the characteristics of the layout problem in the learning process, and a better layout strategy is found.
(2) Hard constraints and soft constraints are handled separately in the reinforcement learning process. The hard constraint is guaranteed not to be violated by the adjustment of the state transition probability, while the soft constraint is optimized by the adjustment of the reward function. Such a design allows our algorithm to efficiently optimize soft constraints while following hard constraints.
(3) Chip layout optimization under the condition of considering hard constraint and soft constraint is realized. The resulting layout should be capable of exhibiting advantages in terms of hard and soft constraints.
(4) The hard constraint and the soft constraint are clearly distinguished and processed in a targeted manner, so that the reliability and the optimization degree of the chip layout scheme are improved. By adopting the reinforcement learning algorithm with the constraint, the soft constraint can be effectively optimized on the premise of meeting the hard constraint. Through autonomous learning, the intelligent agent can find out a strategy for optimizing the chip layout under the condition of no manual intervention, so that the design complexity and the labor cost are reduced.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation of the invention, and other drawings can be obtained by one of ordinary skill in the art without inventive effort from the following drawings.
FIG. 1 is a flow chart of a chip layout optimization method based on reinforcement learning with constraint.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
As shown in fig. 1, the method provided by the embodiment of the invention includes:
s1: and establishing a model based on a Markov decision process for the chip layout problem. The chip layout problem is modeled as a Markov Decision Process (MDP) that includes states, actions, transition probabilities, and rewards functions. The state represents the current layout state, the action represents the operation performed on the layout, the transition probability represents the transition relation among the states, and the reward function is used for evaluating the advantages and disadvantages of the layout. The specific definition is as follows:
a) State (S): the state represents the current situation of the chip layout. The state is defined as a tuple (placed element position, non-placed element list, hard constraint state, soft constraint state), where
Placed component position: representing the components that have been placed on the chip and their positional information.
List of unset elements: representing a list of components that have not yet been placed on the chip.
Hard constraint state: representing the connection relationships between the placed components in the current layout, as well as the distance, size, etc. constraints between the components. This information can be used to ensure that the layout always satisfies the hard constraints during state transitions.
Soft constraint state: representing performance metrics such as power consumption, delay, etc. of the placed elements in the current layout. This information can be used to evaluate the performance of the layout in optimizing the soft constraints.
b) Action (a): an action represents an action that the agent can take, such as selecting an unset element and placing it in a certain position in the layout. The action set needs to be generated on the premise that the hard constraint is satisfied to ensure that all optional actions do not violate the hard constraint.
c) State transition probability (P): state transition probabilities describe the probability that a system will transition to a new state after performing some action in a given state. The inclusion of the hard constraint in the calculation of the state transition probability causes the state transition that violates the hard constraint to be disabled. This helps ensure that the agent always meets the hard constraints during the learning process.
d) Bonus function (R): the reward function is used to evaluate rewards that an agent obtains after performing an action. In the present invention, the reward function mainly considers the optimization of the soft constraint. In order to embody the optimization target of the soft constraint, performance indexes (such as power consumption, delay and the like) of the layout are included in the reward function, so that the intelligent agent can optimize the indexes in the learning process.
By definition, the present invention has the following advantages in handling hard and soft constraints:
the hard constraint is incorporated into the calculation of the state transition probability, so that the intelligent agent is ensured to always meet the hard constraint, and the reliability of the layout scheme is improved.
The optimization target of the soft constraint is embodied in the reward function, so that the intelligent agent can optimize the soft constraint in the learning process, and the performance of the layout scheme is improved.
The hard constraint and the soft constraint are fully considered in the definition of the state and the action, so that the intelligent agent is helped to fully understand and master the characteristics of the layout problem in the learning process, and a better layout strategy is found.
S2: for the chip design layout field, hard constraints and soft constraints are distinguished. For the chip design layout field, the hard constraint and the soft constraint are clearly distinguished. Hard constraints including routing rules, power supplies, etc., violating hard constraints can cause the chip to fail to function properly; soft constraints include power consumption, performance, etc., and optimizing soft constraints can improve the overall performance of the chip.
Some specific definitions in engineering are given below:
hard constraint: the following are some examples of hard constraints for the chip layout problem:
space limitations: the component must lie entirely within the chip boundary.
Overlap limit: there cannot be any spatial overlap between the two elements.
Connection restriction: the connections between all elements must meet predefined connection rules.
Power supply limitation: the power requirements of each element must be within a specified range.
Thermal limit: the temperature profile of the chip must meet design requirements.
Soft constraint: the following are some examples of soft constraints for the chip layout problem:
and (3) power consumption optimization: the total power consumption of the chip is reduced.
Delay optimization: reducing the overall delay of the signal transmission path.
Space utilization optimization: the utilization rate of the chip space is improved, and the idle area of the layout is reduced.
Line length optimization: the overall length of the interconnect lines is reduced.
Optimizing heat distribution: the heat distribution inside the chip is improved, and the local hot spots are reduced.
S3: the design reinforcement learning algorithm handles hard constraints and soft constraints. An improved Constrained Policy Optimization (CPO) algorithm is designed that specifically handles both hard and soft constraints. Referred to as HS-CPO (Hard-Soft Constrained Policy Optimization).
The core idea of HS-CPO is to incorporate hard and soft constraints into the objective function and the loss function, respectively. Specifically, the hard constraint is represented as a feasibility function to measure whether the policy satisfies the hard constraint. Soft constraints are then included as part of the optimization objective in the loss function.
The key components of the HS-CPO algorithm are as follows:
feasibility function: a feasibility function f (s, a) is defined, indicating whether taking action a in state s satisfies the hard constraint. When the hard constraint is satisfied, f (s, a) =1; otherwise, f (s, a) =0.
To be more adaptive to the chip layout task, the hard constraints of the chip layout may be mapped to the state space and the action space and incorporated into the feasibility function.
Specifically, a deep neural network is used as a function approximator, and the state s and the action a are input to output a feasibility function value. By training the neural network, a feasibility function capable of effectively judging the hard constraint in the chip layout problem is obtained.
This straightforward definition of the above presents problems in the chip layout, so we want to be able to add to the patent the solution of redefining the feasibility function f (s, a) according to the characteristics of the chip layout problem. In a chip layout task, hard constraints may include minimum distances between elements, restrictions on hot spot areas, etc. Thus, a targeted feasibility function is designed, such as f (s, a) =1-exp (- Σc_i), where c_i is a penalty term for each hard constraint. In this way, we can better capture the hard constraint characteristics in the chip layout task.
Objective function: in HS-CPO, our goal is to maximize the desired soft constraint rewards without violating the hard constraint. Thus, the objective function can be expressed as:
J(π)=E_{s,a~π}[r(s,a)*f(s,a)]
where pi is the policy and r (s, a) represents the soft constraint reward obtained by taking action a in state s.
Loss function: in order to optimize the objective function, the loss function is designed as follows:
L(π)=E_{s,a~π}[-r(s,a)*f(s,a)+λ*D_KL(π_old||π)]
wherein D_KL represents KL divergence, which is used for measuring the difference between the new strategy pi and the old strategy pi_old; lambda is a super parameter used to balance soft constraint rewards and policy update magnitudes.
And (3) algorithm iteration: the HS-CPO optimizes the objective function by iteratively updating the strategy pi. In each iteration, the trajectory data is first collected and then the strategy is updated with the loss function. During the update process, the new policy is ensured to satisfy the hard constraint (i.e., f (s, a) =1). Through HS-CPO, the chip layout problem can be effectively solved while the hard constraint and the soft constraint are processed.
S4: the bonus function is designed to handle hard constraints and soft constraints, respectively. The hard constraint and the soft constraint are treated differently when designing the bonus function. For a hard constraint, it is translated into a portion of the state transition probability such that state transitions that violate the hard constraint are disabled. For soft constraints, they are incorporated into the reward function in order to optimize it during the learning process. The method comprises the following steps:
hard constraint processing: to incorporate the hard constraint into the state transition probabilities, a state transition function T (s, a, s ') is defined that represents the probability of transitioning to state s' after performing action a in state s. For state transitions that satisfy the hard constraint, the original transition probabilities are maintained. However, if a state transition violates a hard constraint, its probability is set to 0 to prohibit such state transition. Specifically, the state transfer function may be defined as:
T(s,a,s')=P(s'|s,a)*f(s,a,s')
wherein P (s ' |s, a) is the original state transition probability, f (s, a, s ') is an indication function, and the value is 1 when the state transition (s, a, s ') satisfies the hard constraint, otherwise, is 0.
Soft constraint processing: to incorporate soft constraints into the bonus function, each soft constraint is first assigned a weight. The weights can be adjusted according to the specific requirements and optimization objectives of the problem. The weighted sum of the individual soft constraints is then incorporated into the bonus function. Specifically, the reward function may be defined as:
R(s,a,s')=r(s,a,s')+∑w_i*g_i(s,a,s')
where r (s, a, s ') represents the original prize, w_i is the weight of the i-th soft constraint, and g_i (s, a, s ') represents the contribution of the i-th soft constraint under the state transition (s, a, s ').
In this way, hard constraints and soft constraints can be handled separately in the reinforcement learning process. The hard constraint is guaranteed not to be violated by the adjustment of the state transition probability, while the soft constraint is optimized by the adjustment of the reward function. Such a design allows the algorithm to efficiently optimize the soft constraints while following the hard constraints.
S5: training the intelligent agent by using a reinforcement learning algorithm with constraint, so that the intelligent agent finds a strategy for optimizing soft constraint on the premise of meeting hard constraint. During the training process, the agent will learn how to maximize the reward function without violating the hard constraint, thereby achieving optimization of the soft constraint. The method comprises the following steps:
s5-1: ensuring that the hard constraint satisfies: during the training process of the agent, it needs to be ensured that it does not violate hard constraints. To this end, each time an action is attempted to be performed, it is checked whether the action would result in a hard constraint being violated. If so, execution of the action is prohibited while a negative reward is given to penalize the agent. In this way, the smart body will learn to follow the hard constraints.
S5-2: optimizing soft constraints: on the premise of meeting the hard constraint, the intelligent agent is expected to find out the strategy for optimizing the soft constraint. Soft constraints can be incorporated into the reward function, enabling the intelligent agent to optimize the soft constraints during training by adjusting weights and contribution values. Specifically, each soft constraint may be assigned a weight based on the characteristics of the problem, and these weights may be added to the reward function multiplied by the contribution of the corresponding soft constraint.
S5-3: experience playback is used: in order to improve training effect of the intelligent agent, an experience playback technology is adopted. After each action is performed, state transitions (including state, action, rewards, and next state) are stored in an experience playback buffer. Then, a batch of experience is randomly extracted from the buffer for training. By doing so, the time correlation can be broken, and the learning effect of the intelligent agent is improved.
S5-4: using the target network: to stabilize the training process, a target network is employed. The target network is a network with the same structure as the main network, but its parameters are updated slowly during the training process. The target value may be calculated using a target network to reduce instability during training.
S5-5: attenuation exploration rate: in order to gradually turn the agent around during the training process to utilize the learned knowledge, the exploration rate may be gradually reduced during the training process. Thus, the agent initially explores the environment in large quantities, and as training proceeds, it is increasingly focused on learned strategies.
Only the first two steps are designed specifically for hard constraint and soft constraint, and the other steps are common skills in reinforcement learning, so that the learning effect and stability of the intelligent agent can be improved.
Specifically, the specific training process of HS-CPO:
1. the policy network pi and the value function network V are initialized, while the target networks pi_target and v_target are initialized. An initial value of the attenuation search rate epsilon and an attenuation rate are set.
2. For each training round:
a) Generating a track tau: starting from the initial state s0, the following operations are performed until the end state:
i. the actions are selected randomly with a probability epsilon or according to the current policy pi with a probability 1-epsilon.
Checking if the selected action a satisfies the hard constraint. If so, the action is performed, otherwise another action is selected and a negative prize is awarded.
Calculating soft constraint rewards r (s, a).
Storing the state transition quadruples (s, a, r, s') in an empirical playback buffer.
Updating the current state s=s'.
b) A small batch of data of size N is randomly extracted from the experience playback buffer.
c) Updating the value function network V using small batches of data:
i. calculating a target value using the target network v_target: y=r+γ v_target (s').
Calculating the predicted value of the value function network V(s).
Calculating a mean square error loss: L_V= (V(s) -y)/(2).
Updating parameters of the value function network V using a gradient descent method.
d) The policy network pi is updated using small batches of data:
i. calculating an action probability ratio: ρ=pi (a|s)/pi_old (a|s).
c) Adjusting soft constraint weights: in the training process, the weights of the soft constraints in the reward functions can be properly adjusted according to the optimization degree of different soft constraints so as to realize a more balanced optimization effect.
S6: after the training of the agent is completed, the training method is applied to the actual chip layout problem, and an optimized layout scheme is obtained through the action sequence executed by the agent. After the agent training is completed, it is applied to the actual chip layout problem. Through the action sequence executed by the intelligent agent, an optimized layout scheme can be obtained, and the scheme meets the hard constraint and simultaneously shows superiority in the aspect of soft constraint. In the project, the optimization is performed according to the following steps:
s6-1: executing the actions of the agent: at S1-S5 we have established problem modeling, defined hard and soft constraints, selected reinforcement learning algorithms, and completed agent training. Now, trained agents need to be applied to practical chip layout issues. And according to the current layout state, enabling the intelligent agent to execute a series of actions so as to optimize the layout.
S6-2: hard constraint checking: before each action is performed, it is checked whether the action would result in a hard constraint being violated. If so, this action is skipped and the next action is selected. This ensures that the layout always meets the hard constraint requirements.
S6-3: updating the layout: and updating the current layout scheme according to the action selected by the agent. This may include moving the assembly, rotating the assembly, changing the wiring, etc.
S6-4: evaluating the soft constraint satisfaction degree: after each update of the layout, the degree of satisfaction of the soft constraint under the new layout is calculated. This may be achieved by calculating a bonus function that already contains soft constraint related information.
S6-5: iterative optimization: steps S6-1-S6-4 are repeated until a preset number of optimizations or other termination conditions are reached (e.g., layout quality reaches a desired goal). In the whole optimization process, the intelligent agent can optimize soft constraint as much as possible on the premise of meeting hard constraint according to the learned strategy.
Through this process, on the basis of S1-S5, chip layout optimization under the condition of considering hard constraint and soft constraint is achieved. The resulting layout should be capable of exhibiting advantages in terms of hard and soft constraints.
While embodiments of the invention have been shown and described, it will be understood by those skilled in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A method for optimizing a chip layout based on reinforcement learning with constraints, the method comprising:
s1: establishing a model based on a Markov decision process for the chip layout problem;
s2: aiming at the chip design layout field, distinguishing hard constraint and soft constraint;
s3: designing a reinforcement learning algorithm to process hard constraint and soft constraint;
s4: designing a reward function to respectively process hard constraint and soft constraint;
s5: training the intelligent agent by using a reinforcement learning algorithm with constraint, so that the intelligent agent finds a strategy for optimizing soft constraint on the premise of meeting hard constraint;
s6: after the training of the agent is completed, the training method is applied to the actual chip layout problem, and an optimized layout scheme is obtained through the action sequence executed by the agent.
2. The method for optimizing a chip layout based on reinforcement learning with constraint according to claim 1, wherein the model based on a markov decision process is built for the chip layout problem, and the model includes states, actions, state transition probabilities and rewarding functions;
the state is S, the current situation of the chip layout is represented, and the state is defined as a tuple;
the action is A, which represents the operation of the intelligent agent on the layout;
the state transition probability is P, which indicates the probability that the system will transition to a new state after executing a certain action in a given state;
the rewarding function is R and is used for evaluating rewards obtained by the agent after executing a certain action.
3. The method for optimizing a chip layout based on reinforcement learning with constraint according to claim 2, wherein the tuple comprises a placed element position, a list of non-placed elements, a hard constraint state and a soft constraint state;
the placed component position represents a component which has been placed on a chip and position information thereof;
the unset component list represents a component list that has not been placed on a chip;
the hard constraint state represents the connection relation between the placed elements in the current layout, and the distance and the size between the elements;
the soft constraint state represents a performance indicator of the placed element in the current layout.
4. The chip layout optimization method based on constraint reinforcement learning of claim 1, wherein the hard constraints and the soft constraints are distinguished for the chip design layout field, and the hard constraints include but are not limited to space limitations, overlap limitations, connection limitations, power limitations and thermal limitations; the soft constraints include, but are not limited to, functional optimization, delay optimization, space utilization optimization, line length optimization, and thermal profile optimization.
5. The chip layout optimization method based on constraint reinforcement learning according to claim 1, wherein the reinforcement learning algorithm is designed to process hard constraints and soft constraints, and specifically comprises:
defining a feasibility function f (s, a) representing whether the action a taken in the state s satisfies the hard constraint; when the hard constraint is satisfied, f (s, a) =1; otherwise, f (s, a) =0;
the expected soft constraint rewards are maximized without violating the hard constraints, and therefore the objective function is expressed as:
J(π)=E{s,a~π}[r(s,a)*f(s,a)]
where pi is the policy and r (s, a) represents the soft constraint reward obtained by taking action a in state s;
in order to optimize the objective function, the loss function is as follows:
L(π)=E{s,a~π}[-r(s,a)*f(s,a)+λ*D_KL(π_old||π)]
wherein D_KL represents KL divergence, which is used for measuring the difference between the new strategy pi and the old strategy pi_old; lambda is a super parameter for balancing soft constraint rewards and policy update magnitudes;
the objective function is optimized by iteratively updating the strategy pi, in each iteration track data is first collected, then the strategy is updated using the loss function, in the updating process, it is ensured that the new strategy pi satisfies f (s, a) =1.
6. The chip layout optimization method based on constraint reinforcement learning according to claim 1, wherein the design reward function respectively processes hard constraints and soft constraints, specifically:
hard constraint processing: defining a state transition function T (s, a, s ') representing the probability of transition to state s' after performing action a in state s; for the state transition meeting the hard constraint, the original transition probability is maintained, and if a certain state transition violates the hard constraint, the probability is set to 0 so as to inhibit the state transition;
soft constraint processing: and (3) distributing a weight to each soft constraint, wherein the weight is adjusted according to the specific requirements of the problem and the optimization target, and finally, the weighted sum of the soft constraints is incorporated into the reward function.
7. The method for optimizing a chip layout based on constrained reinforcement learning of claim 6, wherein the state transition is expressed as:
T(s,a,s')=P(s'|s,a)*f(s,a,s')
wherein P (s ' |s, a) is the original state transition probability, f (s, a, s ') is an indication function, and the value is 1 when the state transition (s, a, s ') satisfies the hard constraint, otherwise, the value is 0;
the reward function is expressed as:
R(s,a,s')=r(s,a,s')+∑w_i*g_i(s,a,s')
where r (s, a, s ') represents the original prize, w_i is the weight of the i-th soft constraint, and g_i (s, a, s ') represents the contribution of the i-th soft constraint under the state transition (s, a, s ').
8. The chip layout optimization method based on constraint reinforcement learning according to claim 1, wherein the training of the agent by using the constraint reinforcement learning algorithm enables the agent to find a strategy for optimizing soft constraints on the premise of meeting hard constraints, specifically:
s5-1: ensuring that the hard constraint is satisfied;
s5-2: optimizing soft constraints;
s5-3: experience playback is adopted;
s5-4: using the target network:
s5-5: attenuation exploration rate.
9. The method for optimizing a chip layout based on constrained reinforcement learning according to claim 2, wherein the reward function comprises performance metrics of the layout, the performance metrics including power consumption and delay.
10. The method for optimizing the chip layout based on the reinforcement learning with constraint according to claim 1, wherein after the training of the agent is completed, the method is applied to the actual chip layout problem, and an optimized layout scheme is obtained through an action sequence executed by the agent, and the specific steps are as follows:
s6-1: according to the current layout state, enabling the intelligent agent to execute actions;
s6-2: before each action is performed, checking whether the action would cause the hard constraint to be violated; if yes, skipping the action, and selecting the next action;
s6-3: updating the current layout scheme according to the action selected by the agent;
s6-4: after each time of updating the layout, calculating the satisfaction degree of soft constraint under the new layout;
s6-5: steps S6-1-S6-4 are repeated until a preset number of optimizations or other termination conditions are reached.
CN202310359245.2A 2023-04-04 2023-04-04 Chip layout optimization method based on reinforcement learning with constraint Active CN116738923B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310359245.2A CN116738923B (en) 2023-04-04 2023-04-04 Chip layout optimization method based on reinforcement learning with constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310359245.2A CN116738923B (en) 2023-04-04 2023-04-04 Chip layout optimization method based on reinforcement learning with constraint

Publications (2)

Publication Number Publication Date
CN116738923A true CN116738923A (en) 2023-09-12
CN116738923B CN116738923B (en) 2024-04-05

Family

ID=87912185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310359245.2A Active CN116738923B (en) 2023-04-04 2023-04-04 Chip layout optimization method based on reinforcement learning with constraint

Country Status (1)

Country Link
CN (1) CN116738923B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828701A (en) * 2024-03-05 2024-04-05 中国石油大学(华东) Engineering drawing layout optimization method, system, equipment and medium
CN117972812A (en) * 2024-03-26 2024-05-03 中国石油大学(华东) Engineering drawing layout optimization method, device, equipment and medium
CN117972812B (en) * 2024-03-26 2024-06-07 中国石油大学(华东) Engineering drawing layout optimization method, device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111006693A (en) * 2019-12-12 2020-04-14 中国人民解放军陆军工程大学 Intelligent aircraft track planning system and method thereof
CN111144728A (en) * 2019-12-18 2020-05-12 东南大学 Deep reinforcement learning-based economic scheduling method for cogeneration system
US20200285204A1 (en) * 2019-03-04 2020-09-10 Fujitsu Limited Reinforcement learning method and reinforcement learning system
US20210247744A1 (en) * 2018-08-09 2021-08-12 Siemens Aktiengesellschaft Manufacturing process control using constrained reinforcement machine learning
US20220164657A1 (en) * 2020-11-25 2022-05-26 Chevron U.S.A. Inc. Deep reinforcement learning for field development planning optimization
CN115270698A (en) * 2022-06-23 2022-11-01 广东工业大学 Chip global automatic layout method based on deep reinforcement learning
CN115437406A (en) * 2022-09-16 2022-12-06 西安电子科技大学 Aircraft reentry tracking guidance method based on reinforcement learning algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210247744A1 (en) * 2018-08-09 2021-08-12 Siemens Aktiengesellschaft Manufacturing process control using constrained reinforcement machine learning
US20200285204A1 (en) * 2019-03-04 2020-09-10 Fujitsu Limited Reinforcement learning method and reinforcement learning system
CN111006693A (en) * 2019-12-12 2020-04-14 中国人民解放军陆军工程大学 Intelligent aircraft track planning system and method thereof
CN111144728A (en) * 2019-12-18 2020-05-12 东南大学 Deep reinforcement learning-based economic scheduling method for cogeneration system
US20220164657A1 (en) * 2020-11-25 2022-05-26 Chevron U.S.A. Inc. Deep reinforcement learning for field development planning optimization
CN115270698A (en) * 2022-06-23 2022-11-01 广东工业大学 Chip global automatic layout method based on deep reinforcement learning
CN115437406A (en) * 2022-09-16 2022-12-06 西安电子科技大学 Aircraft reentry tracking guidance method based on reinforcement learning algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AZALIA MIRHOSEINI ET AL.: "Chip Placement with Deep Reinforcement Learning", ARXIV, pages 1 - 15 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828701A (en) * 2024-03-05 2024-04-05 中国石油大学(华东) Engineering drawing layout optimization method, system, equipment and medium
CN117828701B (en) * 2024-03-05 2024-05-24 中国石油大学(华东) Engineering drawing layout optimization method, system, equipment and medium
CN117972812A (en) * 2024-03-26 2024-05-03 中国石油大学(华东) Engineering drawing layout optimization method, device, equipment and medium
CN117972812B (en) * 2024-03-26 2024-06-07 中国石油大学(华东) Engineering drawing layout optimization method, device, equipment and medium

Also Published As

Publication number Publication date
CN116738923B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN116738923B (en) Chip layout optimization method based on reinforcement learning with constraint
Liu et al. An adaptive online parameter control algorithm for particle swarm optimization based on reinforcement learning
Ghorbani et al. Particle swarm optimization with smart inertia factor for solving non‐convex economic load dispatch problems
CN113760553B (en) Mixed part cluster task scheduling method based on Monte Carlo tree search
Long et al. A self‐learning artificial bee colony algorithm based on reinforcement learning for a flexible job‐shop scheduling problem
CN115940294B (en) Multi-stage power grid real-time scheduling strategy adjustment method, system, equipment and storage medium
CN110490319B (en) Distributed deep reinforcement learning method based on fusion neural network parameters
CN115758981A (en) Layout planning method based on reinforcement learning and genetic algorithm
CN111768028A (en) GWLF model parameter adjusting method based on deep reinforcement learning
JP7137074B2 (en) Optimization calculation method, optimization calculation device, and optimization calculation program
CN115238599A (en) Energy-saving method for refrigerating system and model reinforcement learning training method and device
JP6975685B2 (en) Learning control method and computer system
CN114942799B (en) Workflow scheduling method based on reinforcement learning in cloud edge environment
CN117833263A (en) New energy power grid voltage control method and system based on DDPG
CN114861368A (en) Method for constructing railway longitudinal section design learning model based on near-end strategy
KR20220162096A (en) Deep Neural Network Structure for Inducing Rational Reinforcement Learning Agent Behavior
CN113919108A (en) Multi-population hierarchical assisted evolution-based reference source structure optimization method
CN113256128A (en) Task scheduling method for balancing resource usage by reinforcement learning in power internet of things
Morales Deep Reinforcement Learning
CN113128753A (en) Operation order intelligent generation method based on deep reinforcement learning
Halici Reinforcement learning in random neural networks for cascaded decisions
CN117669739B (en) Agent-based intelligent negotiation strategy optimization method and system
Itazuro et al. Design environment of reinforcement learning agents for intelligent multiagent system
CN116506352B (en) Network data continuing forwarding selection method based on centralized reinforcement learning
CN114741970B (en) Improved circuit parameter optimization method for depth deterministic strategy gradient algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant