CN117574834A

CN117574834A - Chip layout design method, system, device and medium based on reinforcement learning

Info

Publication number: CN117574834A
Application number: CN202311459320.9A
Authority: CN
Inventors: 胡建国; 沈圣智; 王雨禾; 潘家锴; 黄文俊; 黄宇轩; 丁颜玉
Original assignee: Development Research Institute Of Guangzhou Smart City; Sun Yat Sen University
Current assignee: Development Research Institute Of Guangzhou Smart City; Sun Yat Sen University
Priority date: 2023-11-03
Filing date: 2023-11-03
Publication date: 2024-02-20

Abstract

The application discloses a chip layout design method, system, device and storage medium based on reinforcement learning. The method comprises the steps of obtaining a first state of a canvas of the chip; inputting the first state into a reinforcement learning layout model to obtain a second state, a first layout action, a first global wiring length and a weighted sum of first congestion and density, and comparing the global wiring lengths of the chains and comparing the weighted sum of the two congestion and density; if the first global wiring length is greater than the preset global wiring length or the weighted sum of the first congestion and the density is greater than the weighted sum of the preset congestion and the density, taking the second state obtained at present as a new first state, and returning to the steps until the weighted sum of the first global wiring length and the first congestion and the density is less than the preset length and the weighted sum, and obtaining the chip layout; the chip layout is obtained by performing layout by a plurality of layout actions. The method can save time and cost. The method and the device can be widely applied to the technical field of integrated circuits.

Description

Chip layout design method, system, device and medium based on reinforcement learning

Technical Field

The application relates to the technical field of integrated circuits, in particular to a chip layout design method, system, device and storage medium based on reinforcement learning.

Background

In the prior art, chip layout placement can be mainly divided into three types: partition-based methods, random-based methods, and optimization-based methods. The partitioning-based method divides the chip canvas into areas according to the recursive idea, and converts the placement problem of the large chip canvas into the placement problem on the small chip canvas, thereby reducing the complexity of overall placement. However, the method of partitioning often has difficulty considering the quality of global placement, and low-quality early partitioning can lead to low-quality global placement solutions, which are difficult to expand into modern large-scale integrated circuit placement.

The random-based method is mainly developed from a mountain climbing algorithm, further developed into a simulated annealing algorithm with the capability of jumping out of a local optimal solution, and becomes a main stream based on the random algorithm. The simulated annealing algorithm references the annealing concept in metallurgy and aims to temporarily select a worse solution so as to jump out the locally optimal solution. The random-based method achieves good results in small-scale circuits, but is too time-consuming to work with large-scale circuits.

The optimization-based method is proposed by the recent generation, mainly by converting the placement problem into an optimization problem and searching a weighted optimal solution in a plurality of targets. For example, the force-directed algorithm used by google in standard cell placement models circuit devices as spring-like systems, finding the optimal distance between devices by attraction and repulsion. The method of eplace, rePlace, et al, is similar to modeling circuitry as an electrostatic system. Most of these methods are based on CPUs, and the time cost of computation is high. DREAMPlace models a circuit system as an electrostatic system, and accelerates a placement process by using a GPU based on a deep learning framework pytorch, compared with the former two methods, the method realizes great acceleration without reducing placement performance. Accordingly, there still exists a technical problem in the related art that needs to be solved.

Disclosure of Invention

The object of the present application is to solve at least one of the technical problems existing in the prior art to a certain extent.

Therefore, an object of an embodiment of the present application is to provide a method, a system, an apparatus and a storage medium for designing a chip layout based on reinforcement learning, which can save time and cost.

In order to achieve the technical purpose, the technical scheme adopted by the embodiment of the application comprises the following steps: a chip layout design method based on reinforcement learning comprises the following steps: acquiring a first state of a chip canvas and the number of devices required to be laid out by the chip canvas; inputting the first state into a reinforcement learning layout model to obtain a weighted sum of a second state, a first layout action, a first global wire length, a first congestion and a density; the second state obtained at present is used as a new first state, the first state is input into a reinforcement learning layout model, the second state, a first layout action, a first overall wiring length and a weighted sum of first congestion and density are obtained, until the number of the first layout actions is the same as the number of the devices, and a plurality of layout actions, a plurality of first overall wiring lengths and a weighted sum of a plurality of first congestion and density are obtained; and determining the chip layout according to the weighted sum of the layout actions, the first global wiring lengths and the first congestion and density.

In addition, according to the method for designing the chip layout based on reinforcement learning in the above embodiment of the present invention, the following additional technical features may be provided:

further, in an embodiment of the present application, the reinforcement learning layout model includes a first channel, a second channel, a third channel, a fourth channel, a fifth channel, and a sixth channel; the first channel and the second channel are used for generating a result that the layout is occupied; the third channel is used for generating a layout track; the fourth channel is used for generating the length of the device to be laid out currently, the fifth channel is used for generating the width of the device to be laid out currently, and the sixth channel is used for generating the number of the device to be laid out currently.

Further, in this embodiment of the present application, the reinforcement learning layout model includes a policy network and an environment, and the step of inputting the first state into the reinforcement learning layout model to obtain a weighted sum of the second state, the first layout action, the first global wire length, and the first congestion and density specifically includes: sending the first state into a reinforcement learning strategy network to obtain a first action; and sending the first state and the first action to the environment, and changing the first state by the environment according to the first action to obtain a second state and generate a weighted sum of a first global wiring length and a first congestion and density.

Further, in this embodiment of the present application, the step of sending the first state and the first action to the environment to generate a weighted sum of a first global wire length and a first congestion and density specifically includes: acquiring node data of each device when the network is connected; and determining a weighted sum of the first global wiring length and the first congestion and density according to the node data.

Further, in this embodiment of the present application, the step of determining, according to the node data, a weighted sum of the first global routing length and the first congestion and density specifically includes:

inputting the node data into a calculation formula to obtain a weighted sum of a first global wiring length and a first congestion and density;

the calculation formula comprises:

s.t.Congestion(M _x ，M _y ，M _w ，M _h )≤C _th and Overlap(M _x ，M _y ，M _w ，M _h )＝0，

wherein P (i, j) is node data, X is data in the X-axis, Y is data in the Y-axis, and connexin (M _x ,M _y ,M _w ,M _h ) C is a congestion parameter _th For preset parameters, overlay (M _x ,M _y ,M _w ,M _h ) Overlapping parameters.

Further, in this embodiment of the present application, the policy network includes a first layer network, a second layer network, a third layer network, a fourth layer network, and a full connection layer, and the step of sending the first state to the reinforcement learning policy network to obtain a first action specifically includes: sending the first state into the four-layer network to obtain a first tensor of 256x256x 6; inputting the first tensor into the full connection layer to obtain a second tensor of 256x256x 1; the second tensor is used for representing a plurality of action strategies; the action strategy with the largest action probability is taken as the first action.

Further, in the embodiment of the present application, the first layer network includes 16 convolution kernels, the second layer network includes 32 convolution kernels, the third layer network includes 64 convolution kernels, and the fourth layer network includes 6 convolution kernels.

On the other hand, the embodiment of the application also provides a chip layout design system based on reinforcement learning, which comprises:

the acquisition unit is used for acquiring a first state of the canvas of the chip;

the first processing unit is used for inputting the first state into the reinforcement learning layout model to obtain a second state, a first layout action, a first overall wiring length and a weighted sum of first congestion and density, comparing the first overall wiring length with a preset overall wiring length, and comparing the weighted sum of the first congestion and density with the weighted sum of the preset congestion and density;

the second processing unit is used for taking the second state obtained currently as a new first state and returning the first state to input the first state into the reinforcement learning layout model to obtain a second state, a first layout action, and a weighted sum of the first global wiring length and the first congestion if the first global wiring length is larger than the preset global wiring length or the weighted sum of the first congestion and the first congestion is larger than the weighted sum of the preset congestion and the first congestion, until the first global wiring length is smaller than the preset global wiring length and the weighted sum of the first congestion and the first congestion is smaller than the weighted sum of the preset congestion and the first congestion is obtained; the chip layout is obtained by carrying out layout on a plurality of layout actions.

On the other hand, the application also provides a chip layout design device based on reinforcement learning, which comprises:

at least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement a reinforcement learning-based chip layout design method as set forth in any one of the preceding summary.

Further, the present application provides a storage medium having stored therein processor-executable instructions, which when executed by a processor, are for performing a reinforcement learning based chip layout design method as set forth in any one of the above.

The advantages and benefits of the present application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the present application.

The method and the device can acquire the first state of the canvas of the chip; inputting the first state into a reinforcement learning layout model to obtain a second state, a first layout action, a first global wiring length and a weighted sum of first congestion and density, comparing the first global wiring length with a preset global wiring length, and comparing the weighted sum of the first congestion and density with the weighted sum of the preset congestion and density; if the first global wiring length is greater than the preset global wiring length or the weighted sum of the first congestion and the density is greater than the weighted sum of the preset congestion and the density, taking the second state obtained currently as a new first state, and returning to the step of inputting the first state into the reinforcement learning layout model to obtain the second state, the first layout action and the weighted sum of the first global wiring length and the first congestion and the density until the first global wiring length is smaller than the preset global wiring length and the weighted sum of the first congestion and the density is smaller than the weighted sum of the preset congestion and the density, so as to obtain the chip layout; the chip layout is obtained by carrying out layout by a plurality of layout actions, and the chip layout design is quickly obtained through the reinforcement learning model, so that the method and the device can be applied to the layout design of a large-scale circuit, and the time cost of layout can be saved.

Drawings

FIG. 1 is a schematic diagram illustrating steps of a method for designing a chip layout based on reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of another reinforcement learning-based chip layout design flow in an embodiment of the invention;

FIG. 3 is a diagram of a canvas representation of a layout multi-channel chip in accordance with one embodiment of the present invention;

FIG. 4 is a diagram of a multi-channel chip canvas update in accordance with one embodiment of the present invention;

FIG. 5 is a diagram illustrating a multi-channel chip canvas value change process in accordance with one embodiment of the present invention;

FIG. 6 is a flow chart illustrating the generation of layout actions according to an embodiment of the present invention;

FIG. 7 is a flowchart of updating network parameters of a PPO algorithm according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a chip layout design system based on reinforcement learning according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a chip layout design device based on reinforcement learning in an embodiment of the invention.

Detailed Description

The following describes the principles and processes of the reinforcement learning-based chip layout design method, system, device and storage medium in the embodiments of the present invention in detail with reference to the accompanying drawings.

Referring to fig. 1, the invention relates to a chip layout design method based on reinforcement learning. The method may comprise steps S101-S104.

S101, acquiring a first state of a chip canvas and the number of devices required to be laid out by the chip canvas;

s102, inputting the first state into a reinforcement learning layout model to obtain a weighted sum of a second state, a first layout action, a first global wiring length, a first congestion and a first density;

s103, taking the second state obtained currently as a new first state, and returning to the step of inputting the first state into a reinforcement learning layout model to obtain a weighted sum of the second state, a first layout action, a first overall wiring length and a first congestion and density, until the number of the first layout actions is the same as the number of the devices, so as to obtain a weighted sum of a plurality of layout actions, a plurality of first overall wiring lengths and a plurality of first congestion and density;

and determining the chip layout according to the weighted sum of the layout actions, the first global wiring lengths and the first congestion and density.

Further, in some embodiments of the present application, the reinforcement learning layout model includes a first channel, a second channel, a third channel, a fourth channel, a fifth channel, and a sixth channel; the first channel and the second channel are used for generating a result that the layout is occupied; the third channel is used for generating a layout track; the fourth channel is used for generating the length of the device to be laid out currently, the fifth channel is used for generating the width of the device to be laid out currently, and the sixth channel is used for generating the number of the device to be laid out currently.

Further, in some embodiments of the present application, the reinforcement learning layout model includes a policy network and an environment, and the step of inputting the first state into the reinforcement learning layout model to obtain a weighted sum of the second state, the first layout action, the first global wire length, and the first congestion and density may specifically include steps S201 to S202.

S201, sending the first state into a reinforcement learning strategy network to obtain a first action;

s202, the first state and the first action are sent to the environment, the environment changes the first state according to the first action, the second state is obtained, and a weighted sum of the first global wiring length and the first congestion and density is generated.

Further, in some embodiments of the present application, the step of entering the first state and the first action into the environment, generating a weighted sum of the first global wire length and the first congestion and density may specifically include step S301-step S302.

S301, acquiring node data of each device when the network is connected;

s302, determining a weighted sum of the first global wiring length and the first congestion and density according to the node data.

Further, in some embodiments of the present application, the step of determining the weighted sum of the first global wire length and the first congestion and density according to the node data specifically includes:

inputting the node data into a calculation formula to obtain a weighted sum of the first global wiring length and the first congestion and density;

the calculation formula comprises:

Further, in some embodiments of the present application, the policy network includes a first layer network, a second layer network, a third layer network, a fourth layer network, and a full connection layer, and the step of sending the first state to the reinforcement learning policy network to obtain the first action may specifically include step S401 to step S402.

S401, sending the first state into a four-layer network to obtain a first tensor of 256x256x 6;

s402, inputting the first tensor into the full connection layer to obtain a second tensor of 256x256x 1; the second tensor is used for representing a plurality of action strategies; the action strategy with the largest action probability is taken as the first action.

Further, in some embodiments of the present application, the first layer network includes 16 convolution kernels, the second layer network includes 32 convolution kernels, the third layer network includes 64 convolution kernels, and the fourth layer network includes 6 convolution kernels.

The following describes the specific calculation principle of the present application with reference to the drawings:

first macroscopically speaking the process of reinforcement learning layout:

reinforcement learning layout can be considered as a markov decision process, divided into three parts: status, actions, and rewards.

The state plays a role in MDP (markov decision) in carrying current information, while the current state is only related to the last state. In the layout problem, the state of the system is multi-layer chip canvas, and each layer is regarded as a two-dimensional space canvas similar to a go chessboard.

The actions are the output of the reinforcement learning model. The input of the model is state and the output is action. In layout problems, the specific meaning of the action is the lower left corner coordinates of the device. The model outputs the motion, i.e., the lower left angular position of the output device in the spatial canvas.

Rewards play a critical role in the framework of the markov decision process in guiding the RL agent to learn how to generate optimal layout results and in getting explicit and implicit rules in the layout process. In a specific layout task, the goal of the reward function is to minimize global wire length, congestion, and density. In this algorithm, the rewards do not use the sparse rewards commonly used in the reinforcement learning layout algorithm today, but instead use the dense rewards. The specific calculation method is as the latter half.

The overall design of the method is further elucidated below in connection with fig. 2-8.

Referring to fig. 2, first, it is necessary to know the flow of the reinforcement learning-based layout algorithm.

The layout space, i.e., the chip canvas, is empty in the initial state S0 and its features are reset accordingly. In each round, the reinforcement learning agent generates a corresponding action based on the current state Si. The action is then transmitted to the environment, which provides the appropriate rewards. The chip canvas, i.e., state, is updated simultaneously. The state is updated accordingly and sent to the reinforcement learning agent for the next round. In reinforcement learning based layout layouts, the reward is a weighted sum of line length, congestion, and density. Reinforcement learning places one device at a time until all devices are placed.

As shown in FIG. 3, the layout multi-channel chip canvas representation method is composed of 6 layers, and channel 0 represents the layout condition in the current layout, namely, represents which positions are occupied by devices. Channel 1 represents the current placeable location, as opposed to channel 0, representing which locations are not currently occupied by the device. Channel 2 then represents the trajectory of the layout process, representing how large and where each device is placed. Channel 3 is the length of the device currently to be laid out and channel 4 is the width of the device currently to be laid out. Channel 5 then represents what the current device is for reinforcement of the learning agent to learn and subsequent refinement of the channel 2 trajectory.

The specific updating process of the multi-channel chip canvas is as follows:

channel 0 represents the current layout situation, assuming a layout situation M, and the action represents the lower left corner of the device to be placed. After the state and action enter the environment, M (0, x:x+w, y:y+h) will be set to 1, that is:

M(0，x:x+w，y:y+h)＝1

equivalent to placing the device in this position, the channel one will also perform the corresponding operation, i.e. S (0, x: x+w, y: y+h) will be set to 1, as shown in FIG. 4

S(0，x:x+w，y:y+h)＝1

Channel 1 represents the current location in the die canvas where the device can be placed. From a computational aspect, it can be expressed by the following formula:

S(1，…)＝～S(0，…)

channel 2 then represents the trace of the layout process, representing how large and where each device is placed, and specifically, the change in each step is similar to channel 0, except that the updated value for channel 2 is the device placement number, not 1. The specific method of updating is as follows:

channel 3 and channel 4 represent the length and width of the current device, i.e., w and h, and channel 5 represents the number c of the device in the placement order, with specific parameters of each device, including w and h and the number c, known during initialization. The updating mode is as follows:

S(3，…)＝w

S(4，…)＝h

S(5，…)＝c

the multi-channel chip canvas, i.e., the state in the present reinforcement learning algorithm, uses a neural network to extract this information.

Fig. 5 shows the multi-channel chip canvas value change process, we assume that starting from state Si, which corresponds to its current chip canvas. The state Si is sent to a reinforcement learning strategy network, an Action is obtained through the strategy network, meanwhile, the state Si and the Action are sent to an environment Env together, and the environment changes the state Si according to the Action to obtain a state Si+1. In the process, the channel 0 of the multi-channel chip canvas is changed to the position occupation condition of the layout after the current device is placed. Channel 1 is opposite to channel 0, and in the case of occupied positions of the current layout, the occupied positions are represented as 0 in channel 1, and the unoccupied positions are represented as 1. At the same time, channel 2 will record the current device placement order, i.e., channel 5, and channels 3,4, and 5 will be updated accordingly, with the updated values being determined by the specific parameters of the next device.

Here we assume that in state Si, the state is generated by the environment as described above. In the policy network as shown in fig. 6, 16 convolution kernels, 32 convolution kernels, 64 convolution kernels and 6 convolution kernels are used for performing convolution operation respectively, so as to obtain a tensor of 256x256x6, wherein 256 is the size of the chip canvas, and 6 corresponds to 6 layers of chip canvas. Then a full connection layer is passed to obtain a 256x256x1 action strategy. It should be noted that the action, i.e. placing the device corresponds to channel 0 only, and the remaining 5 channels function to assist the reinforcement learning model in producing a better layout result. In the model training process, after an action strategy is obtained, an action is selected by using a random sampling method. And in the verification stage, after the action strategy is obtained, selecting the action with the highest probability in the action probabilities. We use Proximal Policy Optimization (PPO) algorithm to update network parameters.

Function of the multi-channel chip canvas: the network extracts not only the current layout situation (single-layer chip canvas, which is the method of most reinforcement learning algorithms nowadays) but also 5 channels are added to represent other characteristic information, such as the current placeable position, the past layout history and the specific parameters of the devices to be laid out, so that the network can take these information into consideration when generating actions, such as no overlapping, shorter line length and the like. The method for calculating rewards comprises the following steps:

wherein P (i, j) is node data, X is data in the X-axis, Y is data in the Y-axis, congestion (M _x ,M _y ,M _w ,M _h ) For congestion parameters, cth is a preset parameter, overlay (M _x ,M _y ,M _w ,M _h ) Overlapping parameters. The specific calculation of rewards depends on line length, congestion and overlap. First, it is guaranteed that congestion is less than a limit and overlap is 0, otherwise the prize will be a very small value, such as-999. After the two former conditions are met, it is calculated from the net. One fact with respect to net is that each device has a node, i.e., a Pin angle, denoted P, when connecting to the net. The linking of the device to the net is performed at this point P. The calculation of each net, i.e., the sum of the differences between the maximum and minimum values of the abscissa and the ordinate. In our specific reward calculation method, each of them is extractedThe net where the device is located, the reward calculation method is that when the device is placed, it will have what effect on the net where it is located (the effect will be only that the net length is constant and variable). We set a reference value, and make a difference with the actual net change to get rewards for each time step. The goal of the reinforcement learning algorithm is to maximize rewards, and to obtain better network parameters through PPO updates, the specific update process is shown in fig. 7 below.

In addition, referring to fig. 8, corresponding to the method of fig. 1, a chip layout design system based on reinforcement learning is also provided in the embodiment of the present application. The system may include: an obtaining unit 1001, configured to obtain a first state of a canvas of a chip; a first processing unit 1002, configured to input the first state into a reinforcement learning layout model, obtain a second state, a first layout action, and a weighted sum of a first global wire length and a first congestion and density, and compare the first global wire length with a preset global wire length, and compare the weighted sum of the first congestion and density with the weighted sum of the preset congestion and density; a second processing unit 1003, configured to take the currently obtained second state as a new first state if the first global wire length is greater than the preset global wire length or the weighted sum of the first congestion and the density is greater than the weighted sum of the preset congestion and the density, and return to the step of inputting the first state into the reinforcement learning layout model to obtain the second state, the first layout action, and the weighted sum of the first global wire length and the first congestion and the density until the first global wire length is less than the preset global wire length and the weighted sum of the first congestion and the density is less than the weighted sum of the preset congestion and the density, thereby obtaining the chip layout; the chip layout is obtained by performing layout by a plurality of layout actions.

It should be noted that, the content in the embodiments of the above-mentioned chip layout design method based on reinforcement learning is applicable to the embodiments of the chip layout design system based on reinforcement learning, and the functions specifically implemented by the embodiments of the chip layout design system based on reinforcement learning are the same as those of the embodiments of the chip layout design method based on reinforcement learning, and the beneficial effects achieved by the embodiments of the chip layout design method based on reinforcement learning are the same as those achieved by the embodiments of the chip layout design method based on reinforcement learning.

Corresponding to the method of fig. 1, the embodiment of the present application further provides a chip layout design device based on reinforcement learning, and the specific structure of the chip layout design device may refer to fig. 9, including:

at least one processor;

at least one memory for storing at least one program;

and when the at least one program is executed by the at least one processor, the at least one processor realizes the reinforcement learning-based chip layout design method.

It should be noted that, the content in the above method embodiment is applicable to the embodiment of the present device, and the specific functions implemented by the embodiment of the present device are the same as those of the embodiment of the above method, and the achieved beneficial effects are the same as those of the embodiment of the above method.

Corresponding to the method of fig. 1, the embodiment of the present application further provides a storage medium having stored therein processor-executable instructions, which when executed by a processor, are for performing the reinforcement learning based chip layout design method.

It should be noted that, the content in the above-mentioned embodiment of the chip layout design method based on reinforcement learning is applicable to the embodiment of the storage medium, and the functions specifically implemented by the embodiment of the storage medium are the same as those of the embodiment of the chip layout design method based on reinforcement learning, and the beneficial effects achieved by the embodiment of the chip layout design method based on reinforcement learning are the same as those achieved by the embodiment of the chip layout design method based on reinforcement learning.

In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of this application are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.

Furthermore, while the present application is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the functions and/or features may be integrated in a single physical device and/or software module or one or more of the functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present application. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Thus, those of ordinary skill in the art will be able to implement the present application as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the application, which is to be defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several programs for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable programs for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with a program execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the programs from the program execution system, apparatus, or device and execute the programs. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the program execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable program execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the foregoing description of the present specification, descriptions of the terms "one embodiment/example", "another embodiment/example", "certain embodiments/examples", and the like, are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.

While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments described above, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present invention, and these equivalent modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims

1. The chip layout design method based on reinforcement learning is characterized by comprising the following steps of:

acquiring a first state of a chip canvas and the number of devices required to be laid out by the chip canvas;

inputting the first state into a reinforcement learning layout model to obtain a weighted sum of a second state, a first layout action, a first global wire length, a first congestion and a density;

the second state obtained at present is used as a new first state, the first state is input into a reinforcement learning layout model, the second state, a first layout action, a first overall wiring length and a weighted sum of first congestion and density are obtained, until the number of the first layout actions is the same as the number of the devices, and a plurality of layout actions, a plurality of first overall wiring lengths and a weighted sum of a plurality of first congestion and density are obtained;

2. The reinforcement learning-based chip layout design method according to claim 1, wherein the reinforcement learning layout model comprises a first channel, a second channel, a third channel, a fourth channel, a fifth channel and a sixth channel; the first channel and the second channel are used for generating a result that the layout is occupied; the third channel is used for generating a layout track; the fourth channel is used for generating the length of the device to be laid out currently, the fifth channel is used for generating the width of the device to be laid out currently, and the sixth channel is used for generating the number of the device to be laid out currently.

3. The method for designing a chip layout based on reinforcement learning according to claim 1, wherein the reinforcement learning layout model includes a policy network and an environment, and the step of inputting the first state into the reinforcement learning layout model to obtain a weighted sum of the second state, the first layout action, the first global wiring length, and the first congestion and density specifically includes:

sending the first state into a reinforcement learning strategy network to obtain a first action;

and sending the first state and the first action to the environment, and changing the first state by the environment according to the first action to obtain a second state and generate a weighted sum of a first global wiring length and a first congestion and density.

4. A method of designing a chip layout based on reinforcement learning according to claim 3, wherein said step of feeding said first state and said first action into said environment to generate a weighted sum of a first global wire length and a first congestion and density comprises:

acquiring node data of each device when the network is connected;

and determining a weighted sum of the first global wiring length and the first congestion and density according to the node data.

5. The method for designing a chip layout based on reinforcement learning according to claim 4, wherein the step of determining a weighted sum of the first global wire length and the first congestion and density according to the node data comprises:

the calculation formula comprises:

wherein P (i, j) is node data, X is data in the X-axis, Y is data in the Y-axis, congestion (M _x ，M _y ，M _w ，M _h ) For congestion parameters, cth is a preset parameter, overlay (M _x ，M _y ，M _w ，M _h ) Overlapping parameters.

6. The method for designing a chip layout based on reinforcement learning according to claim 3, wherein the policy network comprises a first layer network, a second layer network, a third layer network, a fourth layer network and a full connection layer, and the step of sending the first state to the reinforcement learning policy network to obtain the first action specifically comprises:

sending the first state into the four-layer network to obtain a first tensor of 256x256x 6;

inputting the first tensor into the full connection layer to obtain a second tensor of 256x256x 1; the second tensor is used for representing a plurality of action strategies; the action strategy with the largest action probability is taken as the first action.

7. The reinforcement learning-based chip layout design method according to claim 6, wherein the first layer network comprises 16 convolution kernels, the second layer network comprises 32 convolution kernels, the third layer network comprises 64 convolution kernels, and the fourth layer network comprises 6 convolution kernels.

8. A reinforcement learning-based chip layout design system, comprising:

9. The chip layout design device based on reinforcement learning is characterized by comprising:

at least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement a reinforcement learning based chip layout design method as claimed in any one of claims 1-7.

10. A storage medium having stored therein processor-executable instructions which, when executed by a processor, are for performing a reinforcement learning based chip layout design method as claimed in any one of claims 1-7.