CN117540693A

CN117540693A - Three-dimensional integrated circuit global layout system and method based on vision reinforcement learning

Info

Publication number: CN117540693A
Application number: CN202311242388.1A
Authority: CN
Inventors: 仝明磊; 徐樊丰
Original assignee: Shanghai Electric Power University
Current assignee: Shanghai Electric Power University
Priority date: 2023-09-25
Filing date: 2023-09-25
Publication date: 2024-02-09

Abstract

The invention provides a three-dimensional integrated circuit global layout system based on vision reinforcement learning, which comprises a preprocessing module, an environment module and a global layout network module which are sequentially connected; wherein the preprocessing module is set as follows: analyzing an original netlist file of the three-dimensional integrated circuit to be laid out, and extracting macro block information and chip information of the three-dimensional integrated circuit to be laid out; the environment module is set as follows: according to macro block information and chip information of the three-dimensional integrated circuit to be laid out, interacting with a global layout network module to obtain a state space and a corresponding rewarding function; the global layout network module is configured to: and automatically distributing the positions of the macro blocks on the plane of the three-dimensional integrated circuit to be laid out according to the state space and the corresponding reward function. The invention also provides a corresponding three-dimensional integrated circuit global layout method based on visual reinforcement learning. The invention can automatically allocate the macro block position while reducing the calculated amount, and provides a quick and reasonable solution for the 3D-IC global layout task.

Description

Three-dimensional integrated circuit global layout system and method based on vision reinforcement learning

Technical Field

The invention relates to the technical field of integrated circuits, in particular to a three-dimensional integrated circuit global layout system and method based on vision reinforcement learning.

Background

In three-dimensional integrated circuits (3D-ICs), global layout is a critical issue relating to how efficiently different chip layers are arranged and connected to enable a designer to place circuit elements in the horizontal and vertical directions, respectively, so that more elements can be placed within a limited chip area. Although great progress has been made in the field of automatic global layout of chips over the past few decades, it is still very difficult to achieve fully automated design planning. Currently, even the most advanced EDA tools require manual intervention and optimization by physical design engineers in order to produce a manufacturable global layout. Manufacturability of global layout typically includes aspects of chip size, device density, wire length, etc. With the increase in integration, a large number of intellectual property cores (IPs) are used in a chip to achieve modularization. Thus, the number of macro blocks (e.g., SRAMs) that need to be considered by the global layout is gradually increasing, as are their size and complexity. The above problems become particularly pronounced for 3D-IC designs.

Global layout is a complex combinatorial optimization problem with the goal of achieving zero macroblock overlap and reducing line length. There are two methods for studying this problem, one is based on optimization methods such as Lin Y, dhar S, li W, et al, dreamplace, deep learning toolkit-enabled gpu acceleration for modern vlsi placement [ C ]// Proceedings of the 56th Annual Design Automation Conference 2019.2019:1-6, to build elasto-mechanical models between chip elements to move in different global layout spaces, ultimately forming an optimal global layout. Cheng C K, kahng A B, kang I, et al, replace Advancing solution quality and routability validation in global placement [ J ]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2018,38 (9): 1717-1730 introduced an electrostatic-based global smooth density cost function and an Nesterlov method nonlinear optimizer. Another is a learning-based approach, such as Mirhoseini A, goldie A, yazgan M, et al A graph placement methodology for fast chip design [ J ]. Nature,2021,594 (7862):207-212. 15% of the macro blocks are placed using the reinforcement learning approach, and the remainder are placed using the optimization approach. Cheng R, yan J.On joint learning for solving placement and routing in chip design [ J ]. Advances in Neural Information Processing Systems,2021,34:16508-16519. The method of pure reinforcement learning is employed, but the actual size of the macro block is not considered, resulting in extremely high overlap rates.

In summary, the above-described methods currently have the following problems: (1) is applicable only to 2D-ICs; (2) The reasoning speed is low, more calculation resources are needed, and the optimal solution cannot be achieved; (3) Regardless of the actual size of the macro block, resulting in an overlap ratio of not 0%, even if post-correction is possible without solution, any overlap is not practically feasible; (4) Without considering the pin offset problem, there may be hundreds or thousands of pins on a macro block, and the macro blocks in different networks are connected by different pins, and coarse placement may result in an increase in line length.

Disclosure of Invention

The present invention is made to solve the above-mentioned problems, and an object of the present invention is to provide a system and a method for global layout of a three-dimensional integrated circuit based on visual reinforcement learning.

The invention provides a three-dimensional integrated circuit global layout system based on vision reinforcement learning, which has the characteristics that the system comprises a preprocessing module, an environment module and a global layout network module which are sequentially connected; wherein the preprocessing module is configured to: analyzing an original netlist file of the three-dimensional integrated circuit to be laid out, and extracting macro block information and chip information of the three-dimensional integrated circuit to be laid out; the environment module is configured to: according to the macro block information and the chip information of the three-dimensional integrated circuit to be laid out, interacting with the global layout network module to obtain a state space and a corresponding rewarding function; the global layout network module is configured to: and automatically distributing the positions of the macro blocks on the plane of the three-dimensional integrated circuit to be laid out according to the state space and the corresponding reward function.

The three-dimensional integrated circuit global layout system based on visual reinforcement learning provided by the invention can also have the following characteristics: the macro block information comprises macro block positions, macro block sizes, connection relations among macro blocks and independent macro blocks of the three-dimensional integrated circuit to be laid out, and the chip information comprises pin network connection relations, pin offset and network numbers.

The three-dimensional integrated circuit global layout system based on visual reinforcement learning provided by the invention can also have the following characteristics: wherein the environment module is configured to: based on computer vision and a graph neural network, current state information of upper and lower planes of the three-dimensional integrated circuit to be laid out is obtained by adopting a multi-mode method, and the current state information is expressed as a corresponding pixel graph.

The three-dimensional integrated circuit global layout system based on visual reinforcement learning provided by the invention can also have the following characteristics: the pixel map comprises a macro block position map, a position mask map, a line length thermodynamic diagram and a pin thermodynamic diagram.

The three-dimensional integrated circuit global layout system based on visual reinforcement learning provided by the invention can also have the following characteristics: the global layout network module is a reinforcement learning network designed by adopting an A2C algorithm, and comprises a strategy network and a value network which are connected with each other, wherein the strategy network comprises a re-extraction neural network and a branch convolution network; the policy network is configured to: splicing the macro block position diagram, the position mask diagram, the line length thermodynamic diagram and the pin thermodynamic diagram, and inputting the spliced macro block position diagram, the position mask diagram, the line length thermodynamic diagram and the pin thermodynamic diagram into the branch convolution network to obtain a first output characteristic and a second output characteristic; simultaneously, splicing the position mask graph, the line length thermodynamic diagram and the pin thermodynamic diagram, and inputting the spliced position mask graph, the line length thermodynamic diagram and the pin thermodynamic diagram into the re-extraction neural network to obtain a third output characteristic; after the third output feature and the second output feature are spliced and pass through a convolution layer, the third output feature and the second output feature are multiplied by the position mask graph, the line length thermodynamic diagram and the pin thermodynamic diagram to obtain actions; the value network is set as follows: and inputting a net list graph and macro block coordinates corresponding to the original netlist file into a graph convolution neural network, and outputting values through a full connection layer after the output of the graph convolution neural network is spliced with the first output characteristics.

2. The invention also provides a three-dimensional integrated circuit global layout method based on visual reinforcement learning, which has the characteristics that the method comprises the following steps:

step S1, providing the three-dimensional integrated circuit global layout system based on the vision reinforcement learning;

s2, inputting an original netlist file of the three-dimensional integrated circuit to be laid out into a preprocessing module for analysis, and extracting macro block information and chip information of the three-dimensional integrated circuit to be laid out;

step S3, the environment module receives macro block information and chip information of the three-dimensional integrated circuit to be laid out, acquires a current state space and a corresponding rewarding function, and performs a round of iteration on the global layout network module according to the current state space and the corresponding rewarding function;

and S4, repeating the step S3 with the aim of minimizing the half-perimeter line length and zero overlapping, and updating parameters of the strategy network and parameters of the value network in the global layout network module until an optimal strategy is obtained, thereby completing the global layout of the three-dimensional integrated circuit to be laid out.

The three-dimensional integrated circuit global layout method based on vision reinforcement learning provided by the invention can also have the following characteristics: wherein, the step S3 includes:

step S31, let k=0, according to the macro block information and chip information of the three-dimensional integrated circuit to be laid out, the environment module obtains the current state S _k And a current intrinsic reward function r _i,k ；

Step S32, the agent formed by the strategy network and the value network reads the current state S _k Generating a current action a _k ；

Step S33, the current action a _k Returning to the environment module to generate the next state s _k+1 And the next intrinsic bonus function r _i,k+1 ；

Step S34, judging whether k+1 is equal to N-1, if so, entering step S35; otherwise, let k=k+1, return to step S32;

step S35, placing the Nth macro block and generating an extrinsic reward function r in the environment module _e The method comprises the steps of carrying out a first treatment on the surface of the Wherein N is the number of macro blocks of the three-dimensional integrated circuit to be laid out.

The three-dimensional integrated circuit global layout method based on vision reinforcement learning provided by the invention can also have the following characteristics: the state space includes a macro block position diagram, a position mask diagram, a line length thermodynamic diagram and a pin thermodynamic diagram, which can be understood as macro block placement state, namely, the position, the interconnection relation and the pin connection relation of the placed t macro blocks, the placeable position of the next macro block, the connection relation and the pin connection relation of the next macro block and the placed macro blocks; wherein t is more than 1 and less than N.

The three-dimensional integrated circuit global layout method based on vision reinforcement learning provided by the invention can also have the following characteristics: wherein the reward functions include an intrinsic reward function and an extrinsic reward function; the parameters of the policy network and the parameters of the value network are updated by the intrinsic rewards function, the extrinsic rewards function and the value output by the value network.

Effects and effects of the invention

According to the three-dimensional integrated circuit global layout system and the three-dimensional integrated circuit global layout method, a set of complete reinforcement learning network is designed, and the whole and local information is reasonably extracted by combining computer vision, so that a proper environment is set to facilitate interaction of an intelligent agent. Therefore, the three-dimensional integrated circuit global layout system and the method can automatically allocate the macro block positions while reducing the calculated amount, thereby completing global layout and providing a quick and reasonable solution for the 3D-IC global layout task.

Drawings

FIG. 1 is a block diagram of a three-dimensional integrated circuit global layout system based on visual reinforcement learning in accordance with a first embodiment of the present invention;

FIG. 2 is a block diagram of a global placement network module in accordance with a first embodiment of the present invention;

FIG. 3 is a flow chart of a method for global layout of a three-dimensional integrated circuit based on visual reinforcement learning in a second embodiment of the invention;

FIG. 4 is a state space-macroblock position map in accordance with a second embodiment of the invention;

FIG. 5 is a state space-location mask diagram in a second embodiment of the invention;

FIG. 6 is a state space-line length thermodynamic diagram in accordance with a second embodiment of the present invention;

FIG. 7 is a state space-pin thermodynamic diagram in accordance with a second embodiment of the present invention; and

fig. 8 is a global layout effect diagram in the second embodiment of the present invention.

Detailed Description

In order to make the technical means, creation characteristics, achievement purposes and effects of the present invention easy to understand, the following embodiments specifically describe a three-dimensional integrated circuit global layout system based on visual reinforcement learning with reference to the accompanying drawings.

Example 1

The three-dimensional integrated circuit global layout system based on visual reinforcement learning provided by the embodiment of the invention, as shown in fig. 1, comprises a preprocessing module 1, an environment module 2 and a global layout network module 3 which are sequentially connected.

Wherein the preprocessing module 1 is arranged to: analyzing an original netlist file of the three-dimensional integrated circuit to be laid out, and extracting macro block information and chip information of the three-dimensional integrated circuit to be laid out. The macro block information comprises macro block positions, macro block sizes, connection relations among macro blocks and independent macro blocks of the three-dimensional integrated circuit to be laid out, the chip information comprises pin network connection relations, pin offset and network numbers, and corresponding josn files can be generated through the information to provide specified data input formats.

The environment module 2 is configured to: and according to macro block information and chip information of the three-dimensional integrated circuit to be laid out, interacting with the global layout network module 3 to obtain a state space and a corresponding rewarding function.

Specifically, the embodiment adopts a multi-mode method to obtain the current state information of the upper and lower planes of the three-dimensional integrated circuit to be laid out based on the computer vision and the graphic neural network, and the current state information is expressed as a corresponding pixel map, so that the macro block information and the chip information are fully utilized.

In this embodiment, the entire layout process is performed in two 90 x 90 planes, which is neither too large in state space nor too precise. Thus, the context module 2 represents the current state information as seven 90×90 pixel maps, including two macroblock position maps, two position mask maps, two line length thermodynamic diagrams, and one pin thermodynamic diagram. Thus, the diversity and complexity of the states can be better captured, and the intelligent agent in the global layout network module 3 can accurately learn the optimal strategy.

The reward functions include extrinsic and intrinsic reward functions, which are set with the aim of minimizing Half-perimeter line length (Half-Perimeter Wirelength, HPWL) and zero overlap. The expression of the extrinsic reward function is: r is (r) _e ＝-HPWL(N)-λOR(S _real ) The expression of the intrinsic reward function is: r is (r) _i ＝-HPWL(n _i )。

Wherein, the overlapped calculation formula is:n represents the number of macro blocks, s _i Representing the actual size of each macroblock, s _real Indicating the total area of the macroblock that occupies the plane after placement is complete.

The calculation formula of the half-cycle length is as follows:x _i and y _i Representing the coordinates of the pins within the pin network.

The global layout network module 3 is arranged to: and automatically distributing the positions of the macro blocks on the plane of the three-dimensional integrated circuit to be laid out according to the state space of the environment module 2 and the corresponding rewarding function.

In this embodiment, the global layout network module 3 is a reinforcement learning network designed by adopting an A2C algorithm. As shown in fig. 2, the global layout network module 3 includes a policy network 31 and a value network 32 connected to each other, the policy network 31 being used for outputting actions, and the value network 32 being used for outputting values corresponding to the actions. The policy network 31 includes a re-extraction neural network 311 and a branch convolution network 312, among other things. The re-extraction neural network 311 is used to integrate shallow information into deep layers, and the branch convolution network 312 is used to share some parameters with the policy network 31 and the value network 32.

Specifically, in the policy network 31, the macroblock position map, the position mask map, the line length thermodynamic diagram and the pin thermodynamic diagram are spliced, and input to the branch convolution network 312 to obtain the first output characteristic F ₁ And a second output characteristic F ₂ . Wherein the first output characteristic F ₁ As input to the value network 32. Simultaneously, the position mask map, the line length thermodynamic diagram and the pin thermodynamic diagram are spliced and input into the re-extraction neural network 311 to obtain a third output characteristic F ₃ . This may make it easier for the network to learn fine-grained features and local information and integrate them into higher-level feature representations. The third output characteristic F ₃ And a second output characteristic F ₂ After splicing, a convolution layer of 1 multiplied by 1 is passed, and then the convolution layer is multiplied by a position mask diagram, a line length thermodynamic diagram and a pin thermodynamic diagram to obtain the action. The action in this embodiment refers to the macroblock position, each pixel point on the 90×90 pixel map represents an available position, and the obtaining action refers to obtaining a certain pixel point, that is, on which pixel point the lower left corner of the macroblock is placed, where the obtaining action in this embodiment essentially adopts a weighted fusion manner.

In the value network 32, the net list diagram and the macro block coordinates corresponding to the original netlist file are input into a diagram convolutional neural network, and the output of the diagram convolutional neural network and the first output characteristic F output by the branch convolutional network 312 ₁ And splicing, and finally outputting the value through the full connection layer. The graph convolution neural network is used for exploring the physical meaning of the netlist, and the type and connectivity information of the nodes are represented by low-dimensional vectors.

Through the reward function and the value, the agent in the global layout network module 3 can learn the optimal strategy, so that netlist information after the three-dimensional integrated circuit to be laid out is optimized, namely the global layout strategy is obtained.

In this embodiment, the portions not described in detail are known in the art.

< example two >

For convenience of expression, the same reference numerals are given to the same structures as those of the first embodiment, and the same description is omitted.

The method for global layout of a three-dimensional integrated circuit based on visual reinforcement learning according to an embodiment of the present invention, as shown in fig. 3, includes the following steps:

step S1 provides a three-dimensional integrated circuit global layout system based on visual reinforcement learning according to the first embodiment.

And S2, inputting an original netlist file of the three-dimensional integrated circuit to be laid out into a preprocessing module 1 for analysis, and extracting macro block information and chip information of the three-dimensional integrated circuit to be laid out. The macro block information comprises macro block positions, macro block sizes, connection relations among macro blocks and independent macro blocks of the three-dimensional integrated circuit to be laid out, the chip information comprises pin network connection relations, pin offset and network numbers, and corresponding josn files can be generated through the information to provide specified data input formats.

In step S3, the environment module 2 receives the macro block information and the chip information of the three-dimensional integrated circuit to be laid out, obtains the current state space and the corresponding reward function, and performs a round of iteration on the global layout network module 3 according to the current state space and the corresponding reward function.

Specifically, step S3 includes:

step S31, let k=0, according to the macro block information and chip information of the three-dimensional integrated circuit to be laid out, the environment module 2 obtains the current state S _k And a current intrinsic reward function r _i,k 。

The state space includes a macro block position diagram, a position mask diagram, a line length thermodynamic diagram and a pin thermodynamic diagram, which can also be understood as macro block placement states, namely, the positions of t macro blocks which are placed (1 < t < N, N is the number of macro blocks of the three-dimensional integrated circuit to be placed), the interconnection relation of the t macro blocks, the pin connection relation, the placeable position of the next macro block, the connection relation of the next macro block and the placed macro block, and the pin connection relation.

Step S32, the agent formed by the strategy network 31 and the value network 32 in the global layout network module 3 reads the current state S _k Generating a current action a _k 。

Step S33, the current action a _k Returning to the environment module 2, generating the next state s _k+1 And the next intrinsic bonus function r _i,k+1 。

Step S34, judging whether k+1 is equal to N-1, if so, entering step S35; otherwise, let k=k+1, return to step S32.

Step S35, the Nth macroblock is placed and an extrinsic bonus function r is generated in the context module 2 _e 。

For example, the three-dimensional integrated circuit to be laid out has 500 macro blocks, the environment module 2 with the top 499 blocks is placed to generate an intrinsic reward function, and the environment module with the 500 th block is placed to generate an extrinsic reward function, so that the global layout network completes one round of iteration.

And S4, repeating the step S3 with the aim of minimizing the half-cycle line length and zero overlapping, and updating the parameters of the strategy network 31 and the parameters of the value network 32 in the global layout network module 3 until the optimal strategy is obtained, thereby completing the global layout of the three-dimensional integrated circuit to be laid out.

The policy gradient update formula of the policy network 31 is as follows:

A ^π (s _k ,a _k )＝Q ^π (s _k ,a _k )-B(s _k )

where J (θ) represents the performance of the target policy,represents the policy gradient, pi (a _k |s _k ) Indicated in macroblock placement state s _k Lower selection action a _k Probability of A ^π (s _k ,a _k ) Representing the relative reference function B (s _k ) Is a function of the dominance of (a).

For value function update of the value network 32, it is necessary to calculate rewards each time first, then calculate an error between the current state value and the state value at the next time using the TD error, and update parameters of the value network 32, where the formula is as follows:

δ＝r+γV(s')-V(s)

where r is the prize at the current time, γ is the discount factor, V (s') is the state value at the next time, and V(s) is the state value at the current time. That is, the present embodiment uses the square of the TD error δ of each state s to measure the error of the current value function V(s) and updates the parameters of the value network with the error.

In this embodiment, the agent can learn the best strategy by iterating about 50 rounds. Namely, after repeatedly placing 50 rounds of 500 macro blocks, the network training obtains the optimal strategy of shortest half-cycle line length and zero overlapping, and the final optimized netlist file is obtained.

The state space under the optimal strategy is shown in fig. 4 to 7, and the final global layout effect is shown in fig. 8.

Wherein, macroblock position map: and the placement condition of the macro block on the current two-layer plane is represented, and the left lower corner coordinate of the macro block is taken as a reference point of the macro block position.

Position mask map: the macro block size is introduced for the fit practical situation, so that the problem of overlapping is avoided when macro blocks are placed, and position masks are generated at the upper layer and the lower layer. When the t-th macroblock M3 is placed, the first t-1 macroblocks are all expanded to the left bottom according to the size (2, 4) of M3. The macroblock M1 is of size (u, v), the macroblock M2 is of size (M, n), and the left lower corner coordinates of the expanded macroblock M1 are determined from (x) _i ,y _i ) Translate to (x) _i -1,y _i -3) expansion of the size (u+1, v+3), the same expansion operation being performed before each macro block is placed, the white area being the available position of macro block M3 in fig. 5. Compared with the traditional method, the method has the advantages that whether each pixel point is overlapped or not is verified, and the calculation complexity is greatly reduced.

Line length thermodynamic diagram: when the t-th macro block is placed, firstly, screening all macro blocks connected with the t-th macro block from the upper layer and the lower layer, then respectively taking the central point of the n 'macro blocks of the upper layer and the central point of the m' macro blocks of the lower layer, and finally generating two line length thermodynamic diagrams through a Gaussian kernel for guiding the optimal placement position of the t-th macro block so as to enable the line length to be shortest. As shown in fig. 6, M1, M2, M3 are interconnected macro blocks, M1 and M2 are located at the upper layer, a gaussian nuclear thermodynamic diagram is generated at their center points, and M3 is located at the lower layer, a gaussian nuclear thermodynamic diagram is generated at its own center point, and the lighter color indicates the shorter the line length increased after the addition of the next macro block.

Pin thermodynamic diagram: the macroblocks that generate the line length thermodynamic diagram do not necessarily have a connection relationship with each other. Therefore, accurate information of the network where the t-th macro block pin is located is also needed, the connection relation is further refined, and the pins of each network are ensured to be as close as possible, so that the line length is reduced. When the t-th macro block is placed, a network where the t-th macro block is placed needs to be determined, the network comprises pin information connected with the macro block, the macro blocks of the upper layer and the lower layer are firstly mapped to the same layer, then the shape of the network is calculated through pin offset of the placed macro block in the layer, then Manhattan distance from each point to the network is calculated, and a pin thermodynamic diagram is generated. As shown in fig. 7, if the pin of the t-th macroblock in this network is placed in the rectangular frame area, the line length increases to 0. If the t-th macroblock has p pins, then the thermodynamic diagram of p networks is calculated and superimposed, generating the pin thermodynamic diagram for this macroblock.

Effects and effects of the examples

1. The embodiment of the invention designs a reinforcement learning network based on an A2C algorithm, and adopts a pure reinforcement learning method to realize the 3D-IC global layout end to end (input original netlist-obtain optimized netlist);

2. the embodiment of the invention designs a complete reinforcement learning network, reasonably extracts whole and local information, sets a proper environment to facilitate the interaction of an intelligent agent, and sets reasonable dense intrinsic rewards and extrinsic rewards;

3. the embodiment of the invention provides a method for realizing global layout of a 3D-IC by combining reinforcement learning with computer vision, four characteristic information (a macro block position diagram, a position mask diagram, a line length thermodynamic diagram and a pin thermodynamic diagram) are extracted for a macro block placement state, and macro block positions are automatically distributed on two layers of planes with the aim of minimizing HPWL and zero overlapping under the condition of considering the size of a real macro block;

4. the overall layout overlapping rate of the embodiment of the invention can reach 0%, the line length is reduced by 50% -80% compared with the most advanced method at present, and the network convergence speed is about 400% faster than the deep PR of the reinforcement learning method;

5. the embodiment of the invention can be used for 3D-IC and 2D-IC.

The above embodiments are preferred examples of the present invention, and are not intended to limit the scope of the present invention.

Claims

1. The three-dimensional integrated circuit global layout system based on visual reinforcement learning is characterized by comprising a preprocessing module, an environment module and a global layout network module which are sequentially connected;

wherein the preprocessing module is configured to: analyzing an original netlist file of the three-dimensional integrated circuit to be laid out, and extracting macro block information and chip information of the three-dimensional integrated circuit to be laid out;

the environment module is configured to: according to the macro block information and the chip information of the three-dimensional integrated circuit to be laid out, interacting with the global layout network module to obtain a state space and a corresponding rewarding function;

the global layout network module is configured to: and automatically distributing the positions of the macro blocks on the plane of the three-dimensional integrated circuit to be laid out according to the state space and the corresponding reward function.

2. The vision-reinforcement-learning-based three-dimensional integrated circuit global layout system of claim 1, wherein:

the macro block information comprises macro block positions, macro block sizes, connection relations among macro blocks and independent macro blocks of the three-dimensional integrated circuit to be laid out, and the chip information comprises pin network connection relations, pin offset and network numbers.

3. The vision-reinforcement-learning-based three-dimensional integrated circuit global layout system of claim 1, wherein:

wherein the environment module is configured to: based on computer vision and a graph neural network, current state information of upper and lower planes of the three-dimensional integrated circuit to be laid out is obtained by adopting a multi-mode method, and the current state information is expressed as a corresponding pixel graph.

4. The vision-reinforcement-learning-based three-dimensional integrated circuit global layout system of claim 3, wherein:

the pixel map comprises a macro block position map, a position mask map, a line length thermodynamic diagram and a pin thermodynamic diagram.

5. The vision-reinforcement-learning-based three-dimensional integrated circuit global layout system of claim 4, wherein:

the global layout network module is a reinforcement learning network designed by adopting an A2C algorithm, and comprises a strategy network and a value network which are connected with each other, wherein the strategy network comprises a re-extraction neural network and a branch convolution network;

wherein the policy network is configured to: splicing the macro block position diagram, the position mask diagram, the line length thermodynamic diagram and the pin thermodynamic diagram, and inputting the spliced macro block position diagram, the position mask diagram, the line length thermodynamic diagram and the pin thermodynamic diagram into the branch convolution network to obtain a first output characteristic and a second output characteristic; simultaneously, splicing the position mask graph, the line length thermodynamic diagram and the pin thermodynamic diagram, and inputting the spliced position mask graph, the line length thermodynamic diagram and the pin thermodynamic diagram into the re-extraction neural network to obtain a third output characteristic; after the third output feature and the second output feature are spliced and pass through a convolution layer, the third output feature and the second output feature are multiplied by the position mask graph, the line length thermodynamic diagram and the pin thermodynamic diagram to obtain actions;

the value network is set as follows: and inputting a net list graph and macro block coordinates corresponding to the original netlist file into a graph convolution neural network, and outputting values through a full connection layer after the output of the graph convolution neural network is spliced with the first output characteristics.

6. The three-dimensional integrated circuit global layout method based on vision reinforcement learning is characterized by comprising the following steps of:

step S1, providing a three-dimensional integrated circuit global layout system based on vision enhancement learning as claimed in any one of claims 1 to 5;

7. The visual reinforcement learning-based three-dimensional integrated circuit global layout method according to claim 6, wherein:

wherein, the step S3 includes:

8. The visual reinforcement learning-based three-dimensional integrated circuit global layout method according to claim 7, wherein:

the state space includes a macro block position diagram, a position mask diagram, a line length thermodynamic diagram and a pin thermodynamic diagram, which can be understood as macro block placement state, namely, the position, the interconnection relation and the pin connection relation of the placed t macro blocks, the placeable position of the next macro block, the connection relation and the pin connection relation of the next macro block and the placed macro blocks; wherein t is more than 1 and less than N.

9. The visual reinforcement learning-based three-dimensional integrated circuit global layout method according to claim 6, wherein:

wherein the reward functions include an intrinsic reward function and an extrinsic reward function;

the parameters of the policy network and the parameters of the value network are updated by the intrinsic rewards function, the extrinsic rewards function and the value output by the value network.