CN111881641A

CN111881641A - Multi-process self-adaptive distribution multi-layer VLSI field coupling method

Info

Publication number: CN111881641A
Application number: CN202010515413.9A
Authority: CN
Inventors: 唐章宏; 邹军; 王芬; 黄承清; 汲亚飞
Original assignee: Beijing Wisechip Simulation Technology Co Ltd
Current assignee: Beijing Wisechip Simulation Technology Co Ltd
Priority date: 2020-06-08
Filing date: 2020-06-08
Publication date: 2020-11-03

Abstract

The application discloses a field coupling method of a multi-process self-adaptive distribution super large scale integrated circuit, which simplifies a three-dimensional problem into a two-dimensional problem, adopts a super-node technology, perfectly realizes the fast and accurate calculation target of the field coupling of the direct current field analysis of a multilayer super large scale integrated circuit, can accurately and completely carry out the field coupling on the direct current field of the multilayer integrated circuit, and improves the calculation speed of the direct current field of the multilayer integrated circuit and chip packaging. Meanwhile, coarse grain parallelism is realized in the operation process, communication among processes and waiting time generated by synchronization are greatly reduced, meanwhile, a random dynamic allocation method of computing tasks is adopted, the random uniform distribution of computing models with unequal complexity on each computing node is ensured, and hard disk read-write bottleneck caused by virtual memory access due to overhigh peak memory is avoided.

Description

Multi-process self-adaptive distribution multi-layer VLSI field coupling method

Technical Field

The application relates to the technical field of high-performance calculation of direct current fields of very large scale integrated circuits, in particular to a multi-process self-adaptive distribution multi-layer field-path coupling calculation method of the very large scale integrated circuits.

Background

In a very large scale integrated circuit system, various components are included, each having its operating voltage. Typically, the voltage input to the component is allowed to float around 5% of its operating voltage, which would otherwise cause faulty operation of the component. On the other hand, since the metal layer is not an ideal conductor, the electric energy may generate a voltage drop during the transmission of the metal layer plate. This voltage drop, coupled with ac noise caused by switching of the components, can cause the voltage actually reaching the components to exceed the allowable input voltage range, resulting in system malfunction.

An important factor that designers also consider when laying out power networks for multi-layer very large scale integrated circuits is whether the current density on the board is excessive, since excessive current over time can cause localized heating and even damage to the components. By calculating the distribution of the current density, it is possible to know which locations have current densities above a given limit, thereby improving the original design. For example, current between layers is transmitted through vias (Via), and when the current density of a Via is known to be too high, some of the same vias may be added near the Via, so that the current density of the Via is reduced.

The main research on the direct current field analysis of multilayer VLSI is the distribution of the voltage and current of the power supply network in the VLSI system. Meanwhile, by arranging a source at any two points, the impedance between the two points can be calculated, so that the voltage drop between the two points can be directly obtained. It is necessary to build a mathematical model of the integrated circuit system to describe the internal dc field distribution and then to numerically calculate the internal field distribution based on the model. On the basis of the current distribution on each layer plate of the integrated circuit and the voltage drop or resistance among different ports are calculated.

However, in the implementation process, the inventor finds that, in the prior art, there is no efficient and high-precision parallel computing method for analyzing the direct-current electric field of the multilayer very large scale integrated circuit, which cannot meet the increasing design requirements of the very large scale integrated circuit and chip packaging, and for analyzing the electric field-circuit coupling (field circuit coupling for short) of the direct-current electric field of the multilayer integrated circuit, when the electric field and the circuit equation set are combined by scanning the super node, if there are a plurality of external circuits, the mutually coupled super node and electric field unit numbers are usually written one by one, which results in low field circuit coupling efficiency, and further affects the computing efficiency of the direct-current electric field of the multilayer very large scale integrated circuit.

In addition, massive large-scale numerical calculation of the same type is required in the implementation process. In the large-scale numerical calculation, different calculation examples have different structures, so that the calculation complexity of the different calculation examples is unequal, and for the unequal mass calculation, a high-efficiency parallel calculation method design is needed, the unequal calculation complexity of the different examples is fully considered, and the parallel calculation efficiency is improved as much as possible.

The conventional parallel computing is basically parallel to a single computing example, the parallel is realized in a large number of circulating computing parts, and parallel particles are usually fine, so that a large number of data exchange exists among different processes, and the parallel efficiency is reduced; secondly, due to different calculation schedules of different processes, a large amount of waiting is inevitable when data sharing and synchronization are needed, so that the overall parallel efficiency is low; moreover, since the calculation processes of a considerable part of the calculation processes of a single instance have a sequence and data have dependency, when the single calculation instance is parallel, the calculation of a considerable part cannot be parallelized, which also seriously reduces the overall parallel efficiency.

Disclosure of Invention

Object of the application

Based on this, in order to perfectly realize the fast and accurate calculation target of the field coupling of the multilayer very large scale integrated circuit dc field analysis, the problem that no high-efficiency and high-precision parallel calculation method is available at the present stage to analyze the dc electric field of the multilayer very large scale integrated circuit, and further the design capability of the very large scale integrated circuit and the chip package is improved is solved, and in order to reduce the communication among the processes to the maximum extent in the parallel calculation process, avoid the hard disk read-write bottleneck caused by the fact that the memory peak value is larger than the available physical memory during the multi-process parallel calculation, and simultaneously perfectly solve the process waiting problem caused by the non-matching complexity of different calculation examples, and further greatly improve the parallel calculation efficiency, the application discloses the following technical scheme.

(II) technical scheme

The application discloses a multi-process self-adaptive distribution multi-layer VLSI field coupling method, which comprises the following steps:

step 100, dividing an overall operation program for executing an operation process of field coupling of an overall multilayer very large scale integrated circuit into a plurality of non-overlapping operation particles, wherein the operation particles are operation programs for executing all independent operations of the same type, and one independent operation executed by the operation particles is used as an operation task;

200, acquiring weighted CPU time of each operation particle and total CPU time of the whole integrated circuit field coupling operation process, and determining parallel coarse particles according to the ratio of the weighted CPU time to the total CPU time;

step 300, simplifying a direct-current electric field three-dimensional model of a multilayer super-large-scale integrated circuit layout by utilizing first parallel coarse particles to obtain a plurality of two-dimensional models, establishing an electric field equation set of the two-dimensional model corresponding to each first parallel coarse particle by a finite element analysis method, and finally combining all the first parallel coarse particles to obtain a total sparse matrix of the electric field equation set;

step 400, analyzing an external circuit of the very large scale integrated circuit by a circuit super-node analysis method by using second parallel coarse grains to obtain a symmetrical and positive external circuit equation set;

500, merging the electric field equation set and the external circuit equation set by utilizing a third parallel coarse grain in a mode of scanning a super node, and establishing a symmetrical positive definite equation set of electric field-circuit coupling; wherein the content of the first and second substances,

in the process of executing the parallel coarse grains, randomly disordering the sequences of all operation tasks executed by the same parallel coarse grains to form a new operation task sequence, and distributing all operation tasks executed by the parallel coarse grains to all processes according to the new operation task sequence to finish the parallel operation of the operation tasks.

In one possible implementation, the step 400 includes:

step 410, for each external circuit corresponding to the second parallel coarse grains, generating an external circuit of the integrated circuit not including the voltage source branch;

step 420, establishing a symmetrical and positive external circuit equation set for an external circuit of the integrated circuit without the voltage source branch circuit by a circuit super-node analysis method;

step 430, filling an external circuit of the integrated circuit comprising the voltage source branch circuit with a super-node voltage vector, a super-node current vector, a non-reference node voltage vector, a mutual conductance matrix of the super-node and the non-reference node and a super-node admittance matrix, and generating an external circuit equation set of the super-node voltage vector; wherein the content of the first and second substances,

the external circuit equation set comprises a super-node voltage vector, a super-node current vector, a non-reference-node voltage vector, a mutual conductance matrix of a super-node and a non-reference node and a super-node admittance matrix;

and 440, collecting the processing results of the second parallel coarse grains, coupling, and generating a total equation set of the external circuit of the super-node voltage vector.

In one possible implementation, the step 410 includes:

step 411, defining all external circuit nodes as initial nodes, and setting all the external circuit nodes as super nodes, wherein the initial nodes have initial numbers, and the initial nodes of the super nodes are set as self;

step 412, merging the two super nodes of all the branches including the voltage source in the external circuit into one super node, merging the initial nodes of the two super nodes to the merged super node, and deleting the non-merged super nodes to form an updated external circuit;

step 413, determining whether the updated external circuit includes a branch of the voltage source, executing step 412 when the branch includes the voltage source, and when the branch does not include the voltage source, selecting an initial node as a reference node for all the super nodes, and using the rest of the initial nodes as non-reference nodes, wherein the super node including only one initial node does not have a corresponding non-reference node;

and 414, dividing all the initial nodes of the external circuit into reference nodes and non-reference nodes according to the super nodes, wherein the reference nodes correspond to the super nodes, and renumbering the reference nodes and the super nodes to generate the external circuit of the integrated circuit without voltage source branches.

In a possible implementation, in step 430:

the super node voltage vector is a vector formed by the voltages of all super node reference nodes;

the super node current vector is a vector formed by the sum of all currents flowing into each super node;

the length of the voltage vector of the non-reference node is the number of all non-reference nodes, and the ith element P of the non-reference nodes_iThe potential of the non-reference node i to its reference node, which is the sum of the voltages of all ideal voltage source branches on the path from the non-reference node i to its reference node;

the rows of the mutual conductance matrixes of the super nodes and the non-reference nodes correspond to the super nodes, the columns of the mutual conductance matrixes of the super nodes and the non-reference nodes correspond to the non-reference points, and the elements P of the ith row and the jth column of the super nodes and the non-reference nodes_ijIs the mutual conductance of a super node i and a non-reference point j or the self-conductance of the non-reference node j, wherein if the non-reference point j belongs to the super node i, the P is_ijIs the self-conductance with the value of the non-reference point j being positive, if the non-reference point j does not belong to the super node i, then the P_ijMutual conductance is that the values of the supernode i and the non-reference point j are negative;

the rows and the columns of the super-node admittance matrix correspond to super-nodes, and the ith row diagonal element Pd of the super-node admittance matrix_iIs the self-conductance of the ith super node, Pd_iIs all admittance sums connected to the ith branch, all non-Pd of said super-node admittance matrix_iIs the element ofi. And the mutual conductance of the j super nodes is the negative value of the admittance sum of all the branches connected with the ith super node and the j super nodes.

In one possible implementation, the step 500 includes:

step 510, according to the numbers of the grid nodes and the external circuit super nodes independently generated by the first parallel coarse grains, scanning all the super nodes by using third parallel coarse grains, changing the numbers of the related grid nodes, and regenerating unified continuous node numbers after scanning is finished;

and 520, combining the electric field equation set corresponding to the first parallel coarse grains and the external circuit equation set according to the unified continuous node numbers to form a field-path coupled and symmetrical and positive unified equation set.

In one possible embodiment, the step 510 includes:

step 511, setting the number of the supernode as the former number, and setting the number of the grid node as the latter number, wherein the number of the grid node is a number obtained by summing the initial number and the number of the supernode;

step 512, scanning the grid nodes by using third parallel coarse particles according to the nodes j included by the super nodes i, wherein the nodes j include reference nodes and non-reference nodes, and when the grid nodes k are connected with the nodes j, the grid nodes k are numbered as j again, so that the grid nodes are the reference nodes or the non-reference nodes where the super nodes are located;

step 513, changing the number of the last grid node to k, and subtracting 1 from the number of the grid nodes;

and 514, judging whether all the super nodes are scanned completely, and executing the step 512 when all the super nodes are not scanned completely until all the super nodes are scanned completely.

In one possible embodiment, the step 520 includes:

step 521, filling unknown quantity voltage vectors corresponding to the second parallel coarse grains into the unified equation set, wherein the unknown quantity voltage vectors include voltages of super nodes serving as a front part and voltages of grid nodes serving as a rear part and not connected with nodes of the external circuit;

step 522, filling a sparse matrix corresponding to the first parallel coarse particles into the unified equation set, filling a super-node admittance matrix corresponding to the second parallel coarse particles into the sparse matrix, and filling a finite element stiffness matrix into a corresponding position of the sparse matrix according to the renumbering of the grid nodes;

step 523, filling the right-end source vector corresponding to the second parallel coarse grain into the unified equation set, filling the right-end item corresponding to the electric field equation set into a corresponding position according to the renumbering of the grid node obtained by the third parallel coarse grain to form a modified right-end item, merging the right-end item of the external circuit equation set corresponding to the second parallel coarse grain in front of the modified right-end item, and establishing a symmetric and positive unified equation set of field-path coupling, wherein the position of the right-end item of the external circuit equation set corresponds to the node number of the external circuit.

In one possible implementation, the determining parallel coarse grains according to the ratio of the weighted CPU time to the total CPU time includes:

and sequencing the weighted CPU time of each operation particle according to the sequence from big to small, and sequentially accumulating until the accumulated sum exceeds 90% of the total CPU time, and taking each operation particle in the accumulated sum as a parallel coarse particle.

In a possible implementation, the randomly scrambling the sequence of all the operation tasks executed by the same parallel coarse grain to form a new operation task sequence includes:

correspondingly generating a random number sequence { Rm }, wherein M is 1,2,3, …, M, by using the sequence List0 of the operation task as { M };

sequencing the sequence { Rm } from small to large, wherein the sequenced sequence is { Om };

a new non-repeating operation task sequence List is generated { Lm }.

In a possible implementation manner, if a certain operation task in the parallel coarse grain is allocated to a process, a flag file for indicating that the operation task is already allocated to the operation task is generated; when applying for distributing a certain operation task, another process tries to generate a mark file of the operation task, and automatically applies for distributing the next operation task when the mark file exists.

(III) advantageous effects

According to the multi-process self-adaptive distribution multi-layer VLSI field coupling method, the three-dimensional problem is simplified into a plurality of two-dimensional problems, the super-node technology is adopted, the fast and accurate calculation target of the field coupling of the multi-layer VLSI DC field analysis is perfectly realized, the field coupling of the multi-layer IC DC field can be accurately and completely carried out, and the design efficiency of multi-layer ICs and chip packaging is improved. Meanwhile, different coarse grain parallels are realized in the calculation of each level, the communication among processes and the waiting time caused by synchronization are greatly reduced, meanwhile, a random dynamic allocation method of calculation tasks is adopted, the calculation models with unequal complexity are ensured to be randomly and uniformly distributed on each calculation node, and the bottleneck of hard disk reading and writing caused by virtual memory access due to overhigh peak memory is avoided.

Drawings

The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining and illustrating the present application and should not be construed as limiting the scope of the present application.

Fig. 1 is a schematic flow chart of an embodiment of a parallel method for calculating coarse grain by field coupling of a multi-layer very large scale integrated circuit disclosed in the present application.

Fig. 2 is a schematic diagram of a set of electric field equations for building each two-dimensional model of a multi-layer vlsi layout using first parallel coarse grain as disclosed herein.

Fig. 3 is a schematic diagram of an external circuit for analyzing a vlsi circuit using a second parallel coarse grain analysis disclosed herein.

Fig. 4 is a flow chart of the method for establishing a symmetric positive definite matrix of electric field-circuit coupling using a third parallel coarse grain disclosed in the present application. FIG. 5 is a diagram illustrating the numbering of external circuits and initial nodes in an embodiment of a circuit field coupling method.

FIG. 6 is a schematic diagram of the numbering of external circuits and supernodes in an embodiment of a circuit field coupling method.

FIG. 7 is a diagram of the external circuit-field domain and its initial node numbering in an embodiment of the circuit-field coupling method.

FIG. 8 is a diagram of the external circuit-field and its unified node numbering in an embodiment of a circuit-field coupling method.

FIG. 9 is a diagram illustrating the numbering of external circuit-field and their coupling nodes in an embodiment of a circuit-field coupling method.

Detailed Description

In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the drawings in the embodiments of the present application.

The embodiments of the multi-layer VLSI field coupling method for multi-process adaptive distribution disclosed in the present application are described in detail below with reference to FIGS. 1-9. As shown in fig. 1, the method disclosed in the present embodiment includes the following steps 100 to 600.

Step 100, dividing an overall operation program for executing an operation process of field coupling of the overall multilayer VLSI into a plurality of non-overlapping operation particles. The non-overlapping computing particles are serial implementation codes which cover the whole computing process. The operation grain is an operation program for executing all independent operations of the same type, and one independent operation executed by the operation grain is taken as an operation task.

Before parallel computing, the number of processes needs to be determined manually, and one process is taken as a main process.

The operation particles are defined according to the problem operation characteristics. The problem operation characteristics are different from industry to industry. For example, for large scale integrated circuit electromagnetic field distribution calculation, when a multilayer integrated circuit board with a certain structure and an external circuit thereof are subjected to field-path coupling, the operational characteristics comprise: simplifying and aligning polygons of different layouts, subdividing polygonal meshes of different layouts, identifying field regions of a multilayer integrated circuit, symmetrically processing an external circuit matrix, forming a sparse matrix for calculating electromagnetic field distribution values of a large-scale integrated circuit, solving the sparse matrix in a large scale, calculating the current, potential and power density distribution of each layer plate based on the solved field, and the like.

Specifically, if the whole circuit field coupling operation program is divided into 4 operation particles, namely c1, c2, c3 and c4, according to the definition of the operation particles, the 4 operation particles can execute the operation task of the whole operation process; if c1 executes 1 operation task, c2 executes 200 operation tasks, c3 executes 5 operation tasks, and c4 executes 500 operation tasks; then 706 arithmetic tasks constitute the whole arithmetic process, which only needs 4 arithmetic granules of c1, c2, c3 and c 4. The whole operation process is executed by 4 operation granules of c1, c2, c3 and c4, and each of c1, c2, c3 and c4 comprises at least 1 independent operation (operation task).

And 200, acquiring the weighted CPU time of each operation particle and the total CPU time of the whole integrated circuit field coupling operation process, and determining parallel coarse particles according to the ratio of the weighted CPU time to the total CPU time. Specifically, the weighted CPU time of each operation granule is sorted and sequentially accumulated in descending order, operation granules until the accumulated sum exceeds 90% of the total CPU time are found, and each operation granule found by accumulation is used as a parallel coarse granule.

Selecting a classical operation task from all operation tasks required to be executed by each operation particle according to problem operation characteristics, realizing serial calculation of sequentially executing c 1-c 4 for 4 classical calculation tasks executed by 4 operation particles of c1, c2, c3 and c4, and counting the CPU time required by 4 calculation particles to finish the calculation of the single classical calculation task according to the serial calculation result of the single calculation.

Wherein: the operation formula of the weighted CPU time of the operation particles is as follows: t is_weight，i＝U_task，i*T_i. Wherein, T_weight，iWeighted CPU time, T, for the ith operand_iCPU time, U, for a single operation of the ith operation particle_task，iThe number of arithmetic tasks to be performed for the ith arithmetic grain.

The operation formula of the total CPU time in the whole operation process is as follows:

wherein: t is the total CPU time of the whole operation process, n is the number of operation particles divided by the whole operation program, T_weight，iIs the weighted CPU time for the ith operand.

Sorting according to the weighted CPU time obtained by each operation particle, wherein if the c1 weighted CPU time is 0.1s, the c2 weighted CPU time is 100s, the c3 weighted CPU time is 0.2s and the c4 weighted CPU time is 150s, the final sorting result is c4> c2> c3> c 1; the weighted CPU time of the 4 arithmetic particles is added from big to small, namely T (c4) + T (c2) + … until the sum of time is more than 90% of the total CPU time; if T (c4) + T (c2) > 90%, then c4, c2 are each as a parallel coarse particle; if T (c4) > 90% of the total CPU time, then c4 is parallel coarse grain.

In the whole operation program for executing the whole operation process, c1 needs to be executed before parallel coarse grain c2 is parallelly calculated; before parallel coarse grain c4 parallel computation, c3 needs to be executed, wherein c1 and c3 are executed by adopting a main process.

Step 300, as shown in fig. 2, simplifying the direct current electric field three-dimensional model of the multilayer very large scale integrated circuit layout by using the first parallel coarse grains to obtain a two-dimensional model of the multilayer circuit direct current electric field, establishing an electric field equation set (i.e., a field solving equation set) of the two-dimensional model corresponding to each first parallel coarse grain by using a finite element analysis method, and finally combining all the first parallel coarse grains to obtain a total sparse matrix of the electric field equation set. In fig. 2, the different domains and their external circuits are coupled to each other through vias, because each domain and its external circuit formed by each layer of integrated circuit layout is not an isolated electrical connection, and all domains eventually form an integrated system.

However, the computational resources required for solving the three-dimensional model of the multilayer integrated circuit dc electric field by the three-dimensional method are huge, and it is difficult to actually implement and analyze a complex very large scale integrated circuit system under the existing computational resources. However, it can be known from the analysis of the dimensional characteristics of the multi-layer vlsi circuit that, in the aspect of placing a circuit in a dc electric field, since the actual placement size of the PCB or the board packaged by the chip in the multi-layer vlsi circuit is much larger than the thickness and the distance between the boards, it can be known that the dc electric field is not changed in the thickness direction of the boards, and therefore, it can be known that an equation simplified into two dimensions can be used to solve the problem that a three-dimensional equation needs to be used to solve, and therefore, in step 300, the three-dimensional electric field problem can be simplified into a two-dimensional.

In step 300, the three-dimensional model of the dc electric field of the multi-layer integrated circuit means that the distributions of the conductivity σ and the potential u in the dc electric field model are both functions of three-dimensional space coordinates (x, y, z), that is: u ═ u (x, y, z), and σ ═ σ (x, y, z). The function of the three-dimensional model satisfies the following equation:

and the following boundary conditions are satisfied:

wherein, J_nIs the bulk current density of the external circuit.

In step 300, the two-dimensional model of the dc electric field of the multi-layer integrated circuit refers to the distribution of the conductivity σ and the potential u in the dc electric field model as a function of two-dimensional plane coordinates (x, y), that is: u ═ u (x, y), σ ═ σ (x, y), the distribution of which is independent of z. The two-dimensional finite element functional corresponding to the two-dimensional model is as follows:

wherein h is the thickness of the metal layer, σ^eIs the conductivity of the grid cell e, u^eIs the potential of a grid cell e, S^eIs the area of the grid cell e. Taking the extreme value of equation (3), a limit can be formedA matrix of element stiffness. J. the design is a square_sThe surface current density generated for external excitation, which is an unknown quantity generated by an external circuit, while the chip and the circuit board are usually driven by a voltage source through the external circuit, so that the external circuit and the field can be coupled through the access point of the circuit for joint solution.

And 400, analyzing an external circuit of the super-large-scale integrated circuit by using the second parallel coarse grains through a circuit super-node analysis method to obtain a symmetrical and positive external circuit equation set. The circuit super-node analysis method is an external circuit equation set analysis method based on a super node. The super node is a node comprising a group of circuits, one super node comprises a group of circuit nodes, the voltage between any two points of the group of nodes can be intuitively obtained through an ideal voltage source contained in the super node, but the potential between any point outside the super node and any point inside the super node is unknown. Typically, the external circuit comprises a plurality of supernodes, and the external circuit may be divided into different circuit blocks, each of which may be processed using the second parallel coarse grain, as shown in fig. 3. Similarly, different external circuit modules are coupled, and after each external circuit module is processed by the second parallel coarse grains, the processing results of the second parallel coarse grains are collected and coupled.

In the prior art, a common node analysis method is used for analyzing a branch external circuit to obtain the following node voltage equation:

wherein G ═ A_YG_EA_Y ^T，A_YAnd A_EAre all basic correlation matrices, A_YAssociated with branches without ideal voltage source, A_EAssociated with a branch containing an ideal voltage source, J ═ A_Y(G_Eu_g-i_g) Is an equivalent node current source vector, G_EIs a branch admittance matrix, i_EIs the current vector of the ideal voltage source branch u_gIs a voltage source branch vector, i_gAs a current sourceAnd a branch vector u is a node voltage vector, and E is a voltage vector of an ideal voltage source branch.

In the field-circuit coupling process, the following defects are generated by directly coupling the node voltage equation of equation (4) with the finite element equation: first, the part of diagonal elements in the matrix of equation (4) is 0, which makes the coupled matrix not a positive definite matrix, resulting in an increase in solution time. Second, directly incorporating the matrix in equation (4) into the finite element stiffness matrix would make the current of the ideal voltage source the unknown to be solved, resulting in an increase in the unknown, and not consistent with the finite element method's use of the node voltage as the unknown.

Thus, a symmetrical positive external circuit equation set is formed in step 400 using a circuit supernode analysis method.

Specifically, step 400 includes the following steps 410 to 430.

Step 410, for each external circuit corresponding to the second parallel coarse grains, generating an external circuit of the integrated circuit not including the voltage source branch. Specifically, step 410 includes the following steps 411 to 414.

In step 411, all external circuit nodes before the super node is formed are defined as initial nodes, and all external circuit nodes are set as super nodes. Wherein, each initial node has an initial number, and the initial node of the supernode is set as itself.

Step 412, searching all the branches containing the voltage source in the external circuit, merging the two super nodes of all the branches containing the voltage source in the external circuit into one super node, merging the initial nodes of the two super nodes to the merged super node, and deleting the non-merged super nodes to form an updated external circuit.

Step 413, determining whether the updated external circuit includes a branch of the voltage source, when the updated external circuit includes a branch of the voltage source, jumping to step 412 and executing step 412, and when the updated external circuit does not include a branch of the voltage source, selecting an initial node as a reference node for all super nodes, and using the rest of the initial nodes as non-reference nodes, wherein the super nodes including only one initial node do not have corresponding non-reference nodes.

In a super node, one of the nodes will be selected as a reference point, such as

nodes

1 and 2 in FIG. 5, and the other nodes will be selected as non-reference points, such as nodes 1 'and 2' in FIG. 6.

And 414, dividing all initial nodes of the external circuit into reference nodes and non-reference nodes according to the super nodes, wherein the reference nodes correspond to the super nodes, and renumbering the reference nodes and the super nodes to generate the external circuit of the integrated circuit without the voltage source branch circuit. The renumbering means that one node is selected as a reference node of each super node, other nodes are non-reference nodes of the super node, the nodes are numbered continuously for all reference nodes, the number is the number of the super nodes, the nodes are numbered continuously for all non-reference nodes, and the number is the number of the non-reference nodes.

The external circuit of fig. 5 comprises a circuit with three ideal voltage source branches, and nodes (1,3) form a supernode, and nodes (2,7) also form a supernode. The supernodes are identified in the external circuit of fig. 6, where (1,1') corresponds to nodes (1,3) in fig. 5 and (2,2') corresponds to nodes (2,7) in fig. 6.

And step 420, establishing a symmetrical and positive external circuit equation set for an external circuit of the integrated circuit without the voltage source branch circuit by a circuit super-node analysis method. The external circuit equation set comprises a super-node voltage vector, a super-node current vector, a non-reference-node voltage vector, a mutual conductance matrix of a super-node and a non-reference node and a super-node admittance matrix.

Step 430, filling the external circuit of the integrated circuit including the voltage source branch circuit with the super-node voltage vector, the super-node current vector, the voltage vector of the non-reference node, the mutual conductance matrix of the super-node and the non-reference node, and the super-node admittance matrix, and generating the external circuit equation set of the super-node voltage vector.

Specifically, in step 430:

the length of the voltage vector of the non-reference node is the number of all non-reference nodes, and the ith element P of the non-reference nodes_iThe potential of a non-reference node i to a reference node thereof is the sum of the voltages of all ideal voltage source branches on a path from the non-reference node i to the reference node thereof;

the rows of the mutual conductance matrix of the super nodes and the non-reference nodes correspond to the super nodes, the columns of the mutual conductance matrix of the super nodes and the non-reference nodes correspond to the non-reference points, and the ith row and the jth column of the super nodes and the non-reference nodes correspond to the element P_ijIs the mutual conductance of a super node i and a non-reference point j or the self-conductance of the non-reference node j, wherein if the non-reference point j belongs to the super node i, P is_ijIs the self-conductance with the value of the non-reference point j being positive, if the non-reference point j does not belong to the super node i, P is_ijMutual conductance is that the values of the supernode i and the non-reference point j are negative;

the rows and columns of the super-node admittance matrix correspond to super-nodes, and the ith row diagonal element Pd of the super-node admittance matrix_iIs the self-conductance of the ith super node, Pd_iThe values of (c) are all non-Pd of the admittance-sum, supernode admittance matrix connected to the ith branch_iIs the mutual conductance of the ith and j supernodes, which is the negative of the admittance sum of all the branches connecting the ith and j supernodes, i.e. not Pd_iIs the ith row and jth (j ≠ i) column element.

The super node voltage refers to the voltage of a reference point corresponding to the super node, and the self conductance, mutual conductance and current of the super node are the sum of the self conductance, mutual conductance and current of all nodes contained in the super node. After the formula (4) is rewritten by adopting a super-node method, the following external circuit equation is obtained:

G_supU_sup＝I_sup-G_mulU_nonrefequation (5);

in equation (5), U_supIs the voltage vector of the supernode; g_supAn admittance matrix for the supernode, the matrix being positively symmetric; i is_supA current vector of a supernode; u shape_nonrefIs the voltage of a non-reference nodeVector of length n_nonref，n_nonrefThe number of all non-reference nodes is counted; g_mulA transconductance matrix of a supernode and a non-reference node of size n × n_nonrefAnd n is the number of supernodes.

Suppose all supernodes are numbered 1,2, …, n, and the non-reference nodes are numbered 1,2, …, n_nonrefMatrix U_nonrefAnd G_mulCan be formed according to the following rules: matrix U_nonrefIs the potential of the non-reference node i to its reference node, which is the algebraic sum of the voltages of all the ideal voltage source branches on the path from the non-reference node to its reference node. Matrix G_mulThe rows of (1) correspond to supernodes, the columns correspond to non-reference points, G_i,jAnd if the non-reference point j belongs to the super node i, the mutual conductance is the self-conductance of the non-reference point j, and the value is positive, otherwise, the value is negative.

Before calculating the matrix, the supernodes and the non-reference points need to be numbered again. Taking the model shown in fig. 5 as an example, the non-reference points in the graph are labeled 1',2', the supernodes are renumbered to 1,2, …,5, and the renumbering results of the supernodes and the non-reference points are shown in fig. 6. In the figure, there are 2 super nodes and 2 non-reference points in the model, and the matrix corresponding to equation (5) is as follows:

the matrix (6) is:

the matrix (7) is:

the matrix (8) is:

the matrix (9) is:

this results in a symmetrical positive external circuit equation set.

And step 440, collecting the processing results of the second parallel coarse particles, and coupling to finally obtain a total equation set of the symmetrical and positive external circuit.

Step 500, as shown in fig. 4, combining the electric field equation set and the external circuit equation set independently generated by the first parallel coarse grains by using the third parallel coarse grains in a mode of scanning the super node, and coupling the external circuit with the contact point of the very large scale integrated circuit to establish the electric field-circuit coupled symmetrical positive definite equation set. And merging the equation sets by utilizing a third parallel coarse grain scanning super node, finishing scanning if all super nodes are scanned completely, and finally collecting the scanning results of all parallel coarse grains for merging.

In step 500, assume that there is N in the external circuit_CA super node having N in the finite element mesh_DA node common to both has N_CDThat is to say with N_CDIf each grid node is connected with a circuit node, the total unknown quantity after combination is known to be N_C+N_D-N_CDNodes are renumbered prior to field-line coupling.

Specifically, step 500 includes the following steps 510 and 520.

Step 510, according to the numbers of the mesh nodes and the external circuit super nodes independently generated by the first parallel coarse grain, scanning all the super nodes and changing the numbers of the related mesh nodes by using the third parallel coarse grain, and regenerating the numbers of the unified continuous nodes after the scanning is finished.

Step 510 includes the following steps 511 and 514.

And 511, setting the number of the supernode as the former number, and setting the number of the grid node as the latter number, wherein the number of the grid node is a number obtained by summing the initial number and the number of the supernodes. For example, the number of supernodes is m, and the number of mesh nodes is m added to the initial number.

And 512, scanning the grid nodes by using a third parallel coarse grain according to the nodes j included by the super nodes i, wherein the nodes j include reference nodes and non-reference nodes, and when the grid nodes k are connected with the nodes j, the grid nodes k are numbered as j again, so that the grid nodes are the reference nodes or the non-reference nodes where the super nodes are located.

Step 513, change the number of the last grid node to k, and subtract 1 from the number of the grid nodes.

Continuing with the above assumption as an example, since there is N_CThe number of each super node and the number of each grid node are added with N on the original basis_C. For a supernode i, the grid nodes are searched for based on all nodes j (reference nodes and non-reference nodes) it includes. If a mesh node k is connected to circuit node j, the mesh node k is renumbered as j, while the number of the last mesh node is changed to k. After all circuit nodes are scanned, the last grid node is numbered to N_C+N_D-N_CD。

As shown in fig. 7, in which there are 16 mesh nodes, the external circuit employs the model shown in fig. 5, and

mesh nodes

1,3,4 are connected to

circuit nodes

6,7,5, respectively. Fig. 8 shows the result after the modification of the external circuit node to a supernode and the renumbering of the mesh nodes, for merging the circuit nodes and mesh nodes in fig. 8, the

mesh nodes

8,9,6 in fig. 8 are renumbered as 2',4 and 5, respectively, in fig. 9, while the

mesh nodes

21,20,19 in fig. 8 are renumbered as 8,9,6, respectively, in fig. 9. Finally, the total number of unknowns after combination is 5+ 16-3-18, and the unknowns are the potential of the supernode and the potential of the grid node in the circuit. Fig. 9 is a schematic diagram of the numbering of the external circuit-field and its coupled nodes in this embodiment, which shows the final numbering of the nodes.

And step 520, combining the electric field equation set and the external circuit equation set corresponding to the first parallel coarse grains according to the uniform continuous node numbers to form a uniform equation set which is coupled by the field lines and is symmetrical and positive.

Step 520 includes the following steps 521-523.

And step 521, filling unknown quantity voltage vectors corresponding to the second parallel coarse grains into the unified equation set, wherein the unknown quantity voltage vectors comprise the voltage of the super node serving as the front part and the voltage of the grid node not connected with the node of the external circuit serving as the rear part.

And 522, filling a sparse matrix corresponding to the first parallel coarse particles into the unified equation set, filling a super-node admittance matrix corresponding to the second parallel coarse particles into the sparse matrix, and filling a finite element stiffness matrix into a corresponding position of the sparse matrix according to the renumbering of the grid nodes.

Step 523, filling the right-end source vector corresponding to the second parallel coarse grain into the unified equation set, filling the right-end item corresponding to the electric field equation set into the corresponding position according to the renumbering of the grid node obtained by the third parallel coarse grain to form a modified right-end item, merging the right-end item of the external circuit equation set corresponding to the second parallel coarse grain in front of the modified right-end item, and establishing a field-path coupled symmetric and positive unified equation set, wherein the position of the right-end item of the external circuit equation set corresponds to the node number of the external circuit.

Depending on the renumbered node number, the voltage vector may be constructed as follows:

the front portion of the voltage vector is the circuit supernode voltage and the back portion is the voltage of the grid node that is not connected to the external circuit node. Form a matrix G_supThereafter, the number of supernodes is no longer changed, so the matrix G_supCan be directly filled in the same position of the sparse matrix. However, the finite element stiffness matrix obtained by solving the equation processing according to the field needs to be written into the corresponding position of the sparse matrix according to the renumbering of the nodes. Because the new node number arranges the external circuit node in front, firstly, the right end item corresponding to the finite element stiffness matrix of the finite element equation set is filled in the corresponding position to form the modified right end item, and then, the right end item of the external circuit is directly merged in front of the modified right end item, and the position of the right end item corresponds to the node number of the external circuit. So that the whole field coupling process is completed.

In the execution process from step 300 to step 500, parallel coarse grain execution operation is adopted. In the process of executing each parallel coarse grain, sequences of all operation tasks executed by the same parallel coarse grain are randomly disturbed to form a new operation task sequence, and all operation tasks executed by the parallel coarse grain are distributed to all processes according to the new operation task sequence to complete parallel operation of the operation tasks.

Specifically, the way of randomly scrambling the operation task sequence is as follows:

first, a sequence List0 of an operation task is set to { m }, and a random number sequence { Rm }, where m is set to m, is generated correspondingly

1,2,3, …, M. Then, the sequence { Rm } is sorted from small to large, and the sorted sequence is { Om }. And finally, generating a new non-repeated operation task sequence List which is { Lm }, wherein Lm is the position of Om in Rm.

The method is characterized in that sequences List0 of all operation tasks in parallel coarse grains are randomly disturbed to generate new unrepeated operation task sequences List { L1, L2, …, LM }, then the operation tasks are distributed according to the sequence, namely the operation tasks are equivalently distributed randomly to original operation tasks, and the random distribution strategy is characterized in that the distribution sequence of all the operation tasks can be thoroughly disturbed by a random distribution scheme, so that the sum of peak value memories occupied by the tasks operated simultaneously by all the operation nodes is determined by the average value of the process number and the peak value memories occupied by all models (operation grains) rather than the highest value.

And the main process distributes all the operation tasks required to be executed by the parallel coarse grains to all the processes including the main process according to the formed new calculation task sequence, and completes the parallel operation of all the operation tasks executed by the parallel coarse grains.

In addition, if a certain operation task in the parallel coarse grains is distributed to a process, a mark file which is used for indicating that the operation task is already distributed to the operation task is generated; when applying for distributing a certain calculation task, the other process tries to generate a mark file of the calculation task, and automatically applies for distributing the next calculation task by the other process under the condition that the mark file exists.

In the multi-process parallel operation process, the chances of allocating a certain operation task to each process are equal, if no measure is taken, multiple processes may be allocated to the same operation task, and the waste of operation resources is caused, so that some measure must be taken, and all operation tasks are uniquely allocated to a certain process. The simplest and most intuitive measure for achieving this is to assign a task a time stamp, i.e. a task is assigned to a process at the same time as it is marked so that other processes are no longer assigned the task. However, because the variables of each process are generally independent of each other during parallel operation, the operation tasks are asymmetric, the operation states of each process are different, and information distributed by any process through the variable marking task cannot be immediately transmitted to other processes, an external explicit marking method is needed to be adopted so that all processes can obtain the information once the operation tasks are marked. Therefore, if the operation task in the parallel coarse grains is distributed to the process, a mark file of the operation task is immediately generated; when a process applies for distributing a certain operation task, the process will try to generate a mark file of the operation task, if the mark file exists, the operation task is indicated to be distributed, and the process will automatically apply for distributing the next operation task.

The specific implementation steps for realizing the correct allocation of the operation tasks by utilizing the marker files are as follows:

step A1, applying for distributing the ith operation task by a process;

step A2, judging whether a sign file Fi of the ith calculation task exists, if so, jumping to step A8, and if not, jumping to step A3;

step A3, judging whether the logo file Fi is locked, if so, jumping to step A8, and if not, jumping to step A4;

step A4, locking a logo file Fi;

step A5, generating a mark file Fi;

step A6, unlocking the logo file Fi;

step A7, completing the operation of the ith operation task;

step A8, judging whether all the operation tasks in the parallel coarse grains are completed, if not, i is i +1, and returning to step a1, and if so, jumping to step a 9;

step A9, distributing all the operation tasks to be executed by the parallel coarse grain to all the processes, and ending the distribution of the parallel coarse grain; it returns to performing all of the computational tasks that the other parallel coarse grain allocations need to perform each.

The above-mentioned using the flag file to achieve the correct allocation of the operation task may be implemented using a file marking technique. The file marking technology adopts a file locking and unlocking technology which ensures that only one process can read/write the same operation task at a time, and prevents multiple processes from operating the same file at the same time to cause repeated operation of the same operation task. The file read-write lock has high parallelism, a plurality of threads can occupy the read-write lock of a read mode at the same time, but only one thread can occupy the read-write lock of the write mode and three states of the read-write lock:

1. when the read-write lock is in a write-locked state, all threads attempting to lock the lock are blocked before the lock is unlocked;

2. when the read-write lock is in a read-locking state, all threads trying to lock it in a read mode can get access, but the threads locking it in a write mode will be blocked;

3. when the read-write lock is in the lock state of the read mode, if another thread tries to lock in the write mode, the read-write lock usually blocks the subsequent request of the read mode lock, so that the long-term occupation of the read mode lock can be avoided, and the long-term blocking of the waiting request of the write mode lock can be avoided.

In the execution of

steps

300, 400, and 500, three parallel coarse grains are utilized, and it is assumed that the three parallel coarse grains are c2, c3, and c4, respectively, and there are a preprocessing step (preprocessing) executed before step 300 and a post-processing step (such as merging and sorting) executed after step 500, where the preprocessing step corresponds to the operand c1 that does not require parallel operation, and the post-processing step corresponds to the operand c5 that does not require parallel operation. The main process executes the operand c1, and realizes parallel operation of all the operation tasks to be executed by the parallel coarse grain c2 through all the processes, the main process c transfers to the next parallel coarse grain c3, and by adopting a method similar to that of c2, the parallel coarse grains c3 and c4 realize parallel operation of all the operation tasks to be executed through all the processes, so that the parallel operation of all the parallel coarse grains is completed, and finally, c5 is executed to finish the whole circuit field coupling program.

In the circuit field coupling procedure, the memories required for different model operations are also greatly different, for example, the minimum memory peak value required for operating different models is about 8GB, and the maximum memory peak value is more than 20 GB. If the memory of each node of a cluster is 48GB, the cluster is utilized to adopt second-order finite element parallel operation, the simplest model can simultaneously start 6 processes, the most complex model can only simultaneously start 2 processes, and otherwise, the system can use part of hard disk space as a virtual memory for programs to use. The read-write speed of the current HDD hard disk is about 80MB/s, while the read-write speed of the physical memory is improved by more than one hundred times, for example, for the DDR31333MHz server memory, the data transmission rate reaches 10.6 GB/s. This comparison result shows that if the parallel operation is started in too many processes, part of the hard disk storage space is read as the virtual memory in the operation process, which reduces the program running speed by over one hundred times. In order to avoid the phenomenon that part of hard disk storage space is read as a virtual memory in the operation process, when a process is started, the maximum memory possibly required by each process during operation needs to be considered, and the maximum process number which can be started by each node is determined according to the maximum memory. If the coarse grain parallel operation is adopted by a common method, at most 2 processes can be started in each node. Experimental results show that by adopting the embodiment, 4 processes can be started on each node, the memory utilization rate is over 80% for a long time, and no hard disk space is used as a virtual memory for program use, so that the averaging of memory use peak values and the dislocation of memory peak values in the occurrence time are basically realized.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A multi-layer VLSI field coupling method with multi-process adaptive distribution is characterized by comprising the following steps:

2. The method of claim 1, wherein the step 400 comprises:

step 430, filling an external circuit of the integrated circuit containing the voltage source branch with a super-node voltage vector, a super-node current vector, a non-reference node voltage vector, a mutual conductance matrix of the super-node and the non-reference node and a super-node admittance matrix, and generating an external circuit equation set of the super-node voltage vector, wherein the external circuit equation set contains the super-node voltage vector, the super-node current vector, the non-reference node voltage vector, the mutual conductance matrix of the super-node and the non-reference node and the super-node admittance matrix;

3. The method of claim 2, wherein the step 410 comprises:

4. A method according to claim 2 or 3, wherein in step 430:

the length of the voltage vector of the non-reference node is the number of all non-reference nodes, the ith element Pi of the non-reference node is the potential of the non-reference node i to the reference node, and the potential is the sum of the voltages of all ideal voltage source branches on the path from the non-reference node i to the reference node;

the rows and the columns of the super-node admittance matrix correspond to super-nodes, and the ith row diagonal element Pd of the super-node admittance matrix_iIs the self-conductance of the ith super node, Pd_iHas a value ofAll non-Pd of admittance-sums, super-node admittance matrices connected to the ith branch_iThe element of (1) is the mutual conductance of the ith and j-th super nodes, and the value is the negative value of the admittance sum of all branches connecting the ith and j-th super nodes.

5. The method of any of claims 1 to 4, wherein the step 500 comprises:

6. The method of claim 5, wherein said step 510 comprises:

7. The method of claim 5 or 6, wherein the step 520 comprises:

8. The method of any of claims 1 to 7, wherein said determining parallel coarse grains from a ratio of the weighted CPU time to the total CPU time comprises:

9. The method of claim 8, wherein said randomly shuffling a sequence of all arithmetic tasks performed by the same said parallel coarse grain to form a new sequence of arithmetic tasks comprises:

sequencing the sequence { Rm } from small to large, wherein the sequenced sequence is {0m };

a new non-repeating operation task sequence List is generated { Lm }.

10. The method of claim 8, wherein if an operation task in the parallel coarse grain is allocated to a process, generating a flag file for the operation task indicating that the operation task has been allocated; when applying for distributing a certain operation task, another process tries to generate a mark file of the operation task, and automatically applies for distributing the next operation task when the mark file exists.