CN113128149A

CN113128149A - Power consumption-based netlist partitioning method for multi-die FPGA

Info

Publication number: CN113128149A
Application number: CN202110428920.3A
Authority: CN
Inventors: 杜学军; 惠锋; 李卿; 王新晨; 刘佩
Original assignee: Wuxi Zhongwei Yixin Co Ltd
Current assignee: Wuxi Zhongwei Yixin Co Ltd
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2021-07-16
Anticipated expiration: 2041-04-21
Also published as: CN113128149B

Abstract

The invention discloses a net list segmentation method based on power consumption for a multi-die FPGA, which relates to the technical field of FPGA, and the method comprises the steps of firstly estimating power consumption values of all example modules when a segmentation user inputs a net list, then firstly allocating the high-power-consumption example modules, and allocating each high-power-consumption example module to each sub net list according to a power consumption balance distribution principle during allocation, so that the power consumption of the whole multi-die FPGA can be distributed on each FPGA die in a balanced manner, the condition that normal work is influenced because a local burden of a system for supplying and radiating the FPGA is overhigh can be effectively avoided, the segmentation result obtained by the method is more reasonable, and the working performance of the multi-die FPGA can be better.

Description

Power consumption-based netlist partitioning method for multi-die FPGA

Technical Field

The invention relates to the technical field of FPGA (field programmable gate array), in particular to a power consumption-based netlist segmentation method for a multi-die FPGA.

Background

An FPGA (Field Programmable Gate Array) is a Programmable logic device of hardware, and is widely applied to prototype verification in integrated circuit design besides the fields of mobile communication, data center, etc., so as to effectively verify the correctness of circuit functions and accelerate the circuit design speed.

With the increasing scale of integrated circuits and the realization of complex functions, the demand for the number of programmable logic resources of FPGAs is increasing, and in order to avoid the increase of processing difficulty and the reduction of production yield caused by the increase of Chip area, the interconnection design of a plurality of FPGA dies is realized by using Silicon Stack Interconnection (SSI), cogos (Chip on Wafer on Substrate) or other technologies, so as to form a multi-die FPGA, and the required circuit structure is realized by using the logic resources on the plurality of FPGA dies.

However, this poses a challenge to the layout and routing of the multi-die FPGA, and thus how to reasonably arrange the complex circuits on multiple chips to obtain better performance is a key issue in the design flow of the multi-die FPGA. Before layout and wiring of the multi-die FPGA are carried out, firstly, a user input netlist corresponding to the multi-die FPGA is required to be divided into a plurality of connected sub-netlists, each sub-netlist corresponds to one FPGA die, and then layout and wiring are carried out on the corresponding FPGA die according to each sub-netlist. Therefore, the partitioning method for the user input netlist directly affects the layout and routing process of the multi-die FPGA and also affects the final performance of the multi-die FPGA. At present, when a user input netlist is divided, only the matching degree between the logic resource quantity contained in an FPGA bare chip is considered, that is, only the corresponding FPGA bare chip is required to meet the logic resource requirement of a sub-netlist, but the performance of a multi-bare chip FPGA is difficult to guarantee by the method.

Disclosure of Invention

The invention provides a netlist segmentation method based on power consumption for a multi-die FPGA aiming at the problems and technical requirements, and the technical scheme of the invention is as follows:

a method for power consumption based netlist partitioning for a multi-die FPGA, the method comprising:

determining the type and the maximum number of example modules which can be contained in the corresponding sub-netlist according to the logic resources contained in each FPGA die in the multi-die FPGA;

acquiring a user input netlist, sequentially distributing high-energy-consumption example modules with estimated power consumption values reaching a preset threshold value in the user input netlist to corresponding target sub-netlists, wherein the target sub-netlists can also accommodate example modules of corresponding types, after the high-energy-consumption example modules are distributed to the corresponding target sub-netlists, the power consumption difference between the maximum netlist power consumption value and the minimum netlist power consumption value in all the sub-netlists is minimum, and the netlist power consumption value of each sub-netlist is the power consumption value when a corresponding FPGA bare chip is distributed on the basis of the example modules to which the sub-netlists are distributed;

according to the example module to which each sub netlist is distributed, distributing the example module which is input into the netlist by a user and is not distributed to each sub netlist according to preset distribution logic;

obtaining a sub netlist corresponding to each FPGA bare chip by division, wherein each sub netlist comprises all distributed example modules and netlist nets among the example modules;

the logic resource quantity on each FPGA bare chip meets the logic resource requirement of the corresponding sub-netlist obtained through segmentation, the input signal connection point leading-out end on each FPGA bare chip meets the input signal quantity of the corresponding sub-netlist, the output signal connection point leading-out end on each FPGA bare chip meets the output signal quantity of the corresponding sub-netlist, and the FPGA bare chip connected with the IO pin of the multi-die FPGA meets the IO port requirement of the corresponding sub-netlist.

The further technical scheme is that the method also comprises the following steps: allocating an IO port in the user input netlist to a pin of the multi-die FPGA to be used as an IO pin;

and when the high-energy-consumption example module is distributed to the corresponding target sub-netlists, if at least two sub-netlists can contain the example modules of the corresponding types and the power consumption difference between all the sub-netlists is minimized, taking the sub-netlists of the IO pins which are connected with the high-energy-consumption example module by the corresponding FPGA bare chip as the target sub-netlists corresponding to the high-energy-consumption example module.

The further technical scheme is that the method for allocating the IO port in the user input netlist to the pin of the multi-die FPGA comprises the following steps:

and allocating at least one IO port to a corresponding pin of the multi-die FPGA according to a user instruction, or allocating at least one IO port to a corresponding pin of the multi-die FPGA according to any sequence arrangement, or allocating at least one IO port to a corresponding pin of the multi-die FPGA according to an IO automatic arrangement algorithm.

The further technical scheme is that the estimated power consumption value of the example module is the maximum power consumption value of the example module in different working modes and/or different working conditions.

After the high-energy-consumption instance module is distributed, the user inputs the instance modules which are not distributed in the netlist and comprise the pre-configured special function instance module and the basic function instance module, and then:

when the example modules which are not distributed in the user input netlist are distributed to the sub-netlists according to the preset distribution logic, the special function example modules are distributed to the sub-netlists in sequence according to the preset processing sequence, and then the rest basic function example modules are distributed to the sub-netlists.

The further technical scheme is that when each special function instance module is distributed to each sub netlist:

distributing the special function instance module to a sub netlist which contains instance modules with connection relation with the special function instance module;

and if the sub netlist does not contain the example module which has the connection relation with the special function example module, distributing the special function example module into the sub netlist which minimizes the power consumption difference among all the sub netlists.

The method comprises the step of determining an instance module which has a connection relation with a special function instance module in a sub-netlist when at least one instance module in the sub-netlist is directly connected with the special function instance module or indirectly connected with the special function instance module through a plurality of other instance modules on the same data path.

The further technical scheme is that the remaining basic function instance modules are distributed into the sub netlists, and the method comprises the following steps of:

calculating the score of allocating the basic function instance module to each sub netlist capable of accommodating the basic function instance module according to the first objective function, and allocating the basic function instance module to the sub netlist with the highest corresponding score; the first objective function is constructed based on at least one chip performance parameter, and the more excellent the chip performance parameter realized by allocating the basic function instance module to the sub-netlist, the higher the corresponding score.

The further technical scheme is that the first objective functions corresponding to different basic function example modules are the same or different.

The further technical scheme is that the method also comprises the following steps:

and before distributing the high-energy-consumption instance module, distributing the preset specific instance module to the specific sub netlist according to a user instruction.

and obtaining an initial distribution result after distributing all the example modules to each sub-netlist, adjusting the distribution result of the example module to be optimized according to an optimization target on the basis of keeping the distribution result of the preset fixed example module unchanged so as to optimize the initial distribution result, and finally segmenting to obtain the sub-netlist corresponding to each FPGA bare chip, wherein the example module to be optimized is an example module except the fixed example module.

The further technical scheme is that the method for adjusting the distribution result of the example module to be optimized according to the optimization target comprises the following steps:

calculating scores of the to-be-optimized example modules distributed to the sub netlists capable of containing the to-be-optimized example modules according to a second objective function, wherein the second objective function is constructed based on at least one chip performance parameter corresponding to the optimization target, and the better the chip performance parameters and the higher the corresponding scores of the to-be-optimized example modules distributed to the sub netlists are;

redistributing the to-be-optimized instance module with the highest corresponding score into the sub netlist with the highest corresponding score;

and updating the scores corresponding to other example modules to be optimized according to the second objective function, and executing the step of reallocating the example module to be optimized with the highest corresponding score to the sub-netlist with the highest corresponding score again until reallocating all the example modules to be optimized.

The further technical scheme is that the optimization target comprises at least one of power consumption of the multi-die FPGA after layout, and/or timing allowance, and/or signal quantity across the FPGA dies, and/or clock parameters.

The further technical scheme is that when the optimization target comprises power consumption, the optimization target comprises that the maximum power consumption of each FPGA bare chip is within a first preset range, and/or the total power consumption of all FPGA bare chips is minimum;

when the optimization target comprises the timing margins, the optimization target comprises timing margins of all first preset critical paths in the user input netlist, wherein the timing margins are within a second preset range;

when the optimization objective includes a number of signals across the FPGA die, the optimization objective includes: the total number of signals crossing the FPGA bare chips reaches the minimum value, and/or the maximum times of all second preset critical paths crossing the FPGA bare chips in the user input netlist are all in a third preset range, and/or the total times of all third preset critical paths crossing the FPGA bare chips in the user input netlist obtain the minimum value;

when the optimization objective includes clock parameters, the optimization objective includes that the number of FPGA dies of the clock tree used to achieve the highest frequency reaches a minimum.

The further technical scheme is that at least one FPGA bare chip in the multi-bare-chip FPGA has the number of the logic resources different from that of other FPGA bare chips, and/or at least one FPGA bare chip has the logic resources of the type different from that of other FPGA bare chips.

The beneficial technical effects of the invention are as follows:

the method comprises the steps of firstly estimating power consumption values of all example modules when a user inputs a netlist in the splitting process, then firstly distributing high-energy-consumption example modules, and distributing each high-energy-consumption example module to each sub netlist according to a power consumption balance distribution principle during distribution, so that the power consumption of the whole multi-die FPGA can be distributed on each FPGA die in a balanced manner, the condition that normal work is influenced due to overhigh local burden of an FPGA power supply and heat dissipation system can be effectively avoided, the splitting result obtained by the method is more reasonable, and the working performance of the multi-die FPGA can be better.

Although some multi-FPGA systems (composed of multiple FPGA chips) also involve netlist division, different FPGA chips in the multi-FPGA systems are connected through IO of the FPGA chips, so that the number of cross-division signals is limited by the number of IO of a single FPGA chip (about 2000 IO at most), and when the netlist is divided, the constraint condition of the number of cross-division signals is difficult to achieve. Different FPGA bare chips in the multi-bare-chip FPGA are connected through silicon stacking connection points, so that the number of cross-division signals is limited by the number of the silicon stacking connection points of a single FPGA bare chip, the number of the cross-division signals is far greater than the number of IO (input/output) of an FPGA chip, generally about 40000, the difficulty in achieving the constraint condition of the number of the cross-division signals is low when a netlist is divided, and the rest constraint conditions are considered in the remaining capacity, so that the method disclosed by the application can be realized and practically applied in the scene of the multi-bare-chip FPGA.

Drawings

FIG. 1 is a method flow diagram of one embodiment of a netlist partitioning method as disclosed herein.

Fig. 2 is a schematic diagram of a partial structure of a multi-die FPGA to which the present application is directed.

FIG. 3 is a method flow diagram of another embodiment of a netlist partitioning method as disclosed herein.

FIG. 4 is a method flow diagram of yet another embodiment of a netlist partitioning method as disclosed herein.

Detailed Description

The following further describes the embodiments of the present invention with reference to the drawings.

The application discloses a netlist partitioning method based on power consumption for a multi-die FPGA, which comprises the following steps, please refer to FIG. 1:

step S1, determining the type and the maximum number of example modules that can be included in the corresponding sub-netlist according to the logic resources included in each FPGA die in the multi-die FPGA.

Referring to fig. 2, a partial structural schematic diagram of a multi-die FPGA mainly includes a substrate 1, a silicon connection layer 2, and a plurality of FPGA dies (fig. 2 includes the FPGA die 1 and the FPGA die 2) stacked in sequence from bottom to top, and the silicon connection layer 2 covers all the FPGA dies. Each FPGA die contains a plurality of logic resources with different types and quantities, including at least one of IOBs, LUTs, REGs, DSPs, BRAMs, Clocks, CMTs and GTP, and the types and/or quantities of the logic resources contained in the different FPGA dies are the same or different. In the present application, at least one FPGA die in the multi-die FPGA has a different number of logic resources than other FPGA dies, and/or at least one FPGA die has a different type of logic resources than other FPGA dies.

Each FPGA bare chip also comprises a plurality of input signal connection points for inputting signals and a plurality of output signal connection points for outputting signals, the input signal connection points and the output signal connection points are realized by silicon stacking connection points 3 arranged on the FPGA bare chip, and logic resources inside the FPGA bare chip can be directly and indirectly connected to the input signal connection points and/or the output signal connection points for inputting and outputting signals. Different FPGA bare chips are connected through an input signal connection point and/or an output signal connection point by using a cross-bare chip connection line 4 in a silicon connection layer 2 to form a signal cascade structure between the FPGA bare chips. Silicon connecting layer 2 is range upon range of the setting on base plate 1, and is concrete, and silicon connecting layer 2 keeps away from the one side growth of FPGA bare chip and has had little protruding ball, and silicon connecting layer 2 passes through little protruding ball and connects base plate 1. The silicon connection layer 2 is further provided with a silicon through hole 5, and corresponding input signal connection points and/or output signal connection points on the FPGA bare chip are connected to the substrate 1 through the silicon through hole 5 on the silicon connection layer 2 and then connected to pins connected with the substrate and used for signal extraction.

The netlist dividing method comprises the step of dividing a user input netlist aiming at the whole multi-die FPGA into a plurality of sub netlists, wherein each sub netlist corresponds to one FPGA die, so that layout and wiring can be carried out on the corresponding FPGA die according to the sub netlist corresponding to each FPGA die, and wiring of the whole multi-die FPGA is completed. The user input netlist comprises a plurality of instance modules and netlist nets among the different instance modules, common instance modules comprise GTP, a look-up table, a register, PCIE, EMAC, CMT, BRAM, DSP, IOB and the like, and the types of the instance modules contained in the user input netlist are not more than the types of logic resources in the multi-die FPGA. Each sub-netlist obtained by partitioning also comprises a plurality of example modules and netlist nets among different example modules, and when the corresponding FPGA bare chip is laid out and wired according to the sub-netlist, the FPGA bare chip is required to be used for realizing the function of the corresponding sub-netlist, so that the most basic requirement during partitioning of the netlist is that the FPGA bare chip can meet the logic resource requirement of the corresponding sub-netlist obtained by partitioning. The method mainly comprises the following two aspects: an instance module of the sub-netlist needs to be implemented using logic resources within the FPGA die, and a signal connection function of the sub-netlist needs to be implemented using input signal connection points and/or output signal connection points on the FPGA die. In addition to this, there is also one aspect as follows: the IO port of the sub netlist needs to be realized by using pins connected by the FPGA bare chip.

Therefore, the present application first determines a constraint condition of a sub netlist corresponding to each FPGA die, and must be performed on the basis of meeting the constraint condition in the following segmentation process, including: the logic resource quantity on each FPGA bare chip meets the logic resource requirement of the corresponding sub-netlist obtained through segmentation, the input signal connection point leading-out end on each FPGA bare chip meets the input signal quantity of the corresponding sub-netlist, the output signal connection point leading-out end on each FPGA bare chip meets the output signal quantity of the corresponding sub-netlist, and the FPGA bare chip connected with the IO pin of the multi-die FPGA meets the IO port requirement of the corresponding sub-netlist. The requirement that the number of logic resources on the FPGA die meets the logic resource requirement of the corresponding sub-netlist obtained by segmentation mainly comprises the following steps: the method and the device for determining the sub-netlist of the multi-die FPGA determine the types and the maximum number of the example modules which can be contained in the corresponding sub-netlist according to the logic resources contained in each FPGA die in the multi-die FPGA, the types of the example modules which can be contained in the sub-netlist are not more than the types of the logic resources contained in the corresponding FPGA die, and the maximum number of the example modules of each type is smaller than or equal to the number of the logic resources of the type contained in the corresponding FPGA die. For example, it may be determined that a sub-netlist can include at most one GTP, at most 2 PCIE, and at most 3 EMACs. In actual application, the maximum number of each type of instance module may be determined according to the set module usage rate, for example, the set module usage rate is 80%, and then the maximum number of each type of instance module is 80% of the number of the type of logic resources included in the corresponding FPGA die.

Step S2, a user input netlist is obtained, where the user input netlist is for the entire multi-die FPGA and the total logic resource requirement exceeds the number of logic resources on any one of the FPGA dies but is less than or equal to the sum of the number of logic resources of all the FPGA dies.

Firstly, all high-energy-consumption example modules in a user input netlist are distributed, the high-energy-consumption example modules refer to example modules of which the estimated power consumption value in the user input netlist reaches a preset threshold value, and power consumption estimation is not performed on a lookup table and a register generally, so that the high-energy-consumption example modules refer to example modules of which the estimated power consumption value reaches the preset threshold value, except the lookup table and the register, in the user input netlist. The estimated power consumption value of each instance module can be obtained through experience values, circuit simulation results or actual measurement. Some example modules have multiple different operating modes and/or different operating conditions, and have different power consumption in different operating modes and/or different operating conditions, such as GTP having at least 3 modes: and in a GTP independent working mode, a GTP and PCI cooperative mode and a GTP and EMAC cooperative mode, taking the maximum power consumption value of the instance module in different working modes and/or different working conditions as the estimated power consumption value of the instance module.

When all the high-energy-consumption example modules are sequentially distributed, the distribution sequence mainly comprises the following two types: (1) and sequentially reading all the example modules in the user input netlist, and distributing all the high-energy-consumption example modules according to the reading sequence, namely distributing the high-energy-consumption example modules when the example modules are sequentially read in. (2) After all the example modules in the user input netlist are read in, all the high-energy-consumption example modules are sequenced according to a preset rule and are sequentially distributed according to the sequencing sequence, the preset rule is set according to actual needs, for example, all the high-energy-consumption example modules can be sequenced according to the sequence of the estimated power consumption values from large to small, and all the high-energy-consumption example modules are sequentially distributed according to the sequence of the estimated power consumption values from large to small.

When each high-energy-consumption instance module is distributed, the high-energy-consumption instance module is distributed into the corresponding target sub-netlist, the target sub-netlist is one of all sub-netlists, and the target sub-netlist simultaneously meets the following two conditions: (1) as described above, each sub-netlist can only accommodate a specific type and a specific number of instance modules, and if a sub-netlist does not contain the type of the energy-intensive instance module or contains the type of the energy-intensive instance module but has already been assigned to the maximum number, the energy-intensive instance module can no longer be assigned to the sub-netlist. Therefore, the target sub-netlist of the high-energy-consumption instance module can also accommodate the corresponding type of instance module, that is, the target sub-netlist can contain the type of instance module to which the high-energy-consumption instance module belongs and the allocated type of instance module has not yet reached the maximum number. (2) After the high-power-consumption example modules are allocated to the corresponding target sub-netlists, the power consumption difference between the maximum netlist power consumption value and the minimum netlist power consumption value in all the sub-netlists is minimum, wherein the netlist power consumption value of each sub-netlist is the power consumption value of the corresponding FPGA bare chip when the corresponding FPGA bare chip is arranged based on the example modules allocated to the sub-netlist, and the same power consumption value can be an empirical value, a simulated value or an actual measurement value.

When high-energy-consumption instance modules are distributed, at least two sub netlists may exist, which can include instance modules of corresponding types, and the power consumption difference between all the sub netlists is minimized, that is, the two conditions that can become target sub netlists are all satisfied, and then one of the sub netlists is selected as a target sub netlist of the high-energy-consumption instance module, the following two selection principles are provided in the application, please refer to the flowchart shown in fig. 3:

selecting a principle I: the actual process of allocating each high-energy-consumption instance module is usually to traverse each sub-netlist according to a preset sequence, detect whether the currently traversed sub-netlist can accommodate the high-energy-consumption instance module when traversing each sub-netlist, calculate the power consumption difference between the sub-netlists when allocating the high-energy-consumption instance module to the sub-netlist, then continue traversing the next sub-netlist, and after traversing all the sub-netlists is completed, select the sub-netlist which meets the above two conditions at the same time as the target sub-netlist of the high-energy-consumption instance module. And when at least two sub netlists can contain instance modules of corresponding types and the power consumption difference among all the sub netlists is minimum, selecting the sub netlists with the most front traversal order according to the traversal order of all the sub netlists, and selecting the sub netlists with the most front traversal order as target sub netlists.

Selecting a principle II: the second selection principle can be considered as a further optimization principle of the first selection principle, the user input netlist comprises IO ports, the IO ports need to be realized by using pins on the multi-die FPGA, and before the user input netlist is divided, the IO ports in the user input netlist are firstly distributed to corresponding pins on the multi-die FPGA to be used as IO pins. The method for allocating the IO port comprises the following steps: and allocating at least one IO port to a corresponding pin of the multi-die FPGA according to a user instruction, or allocating at least one IO port to a corresponding pin of the multi-die FPGA according to any sequence arrangement, or allocating at least one IO port to a corresponding pin of the multi-die FPGA according to an IO automatic arrangement algorithm.

After the assignment of the IO ports is completed, for each assigned IO pin: based on the hardware structure of the multi-die FPGA, the IO pin has an inherent line connection relationship with some FPGA dies in hardware; in circuit logic, an IO port in a user input netlist, which the IO pin is used to implement, and some instance modules in the user input netlist have a connection relationship in circuit logic, so that the IO pin may be considered to be associated with some instance modules in circuit logic, and the IO pin associated with one instance module may be defined as a pin allocated to the IO port having a circuit connection relationship with the instance module, where the existing circuit connection relationship may be that the instance module is directly connected to the IO port or indirectly connected to the same data path through a plurality of other instance modules.

And a second selection principle refers to the allocated IO pins, and selects the target sub netlist by considering the relevance of the IO pins to the FPGA bare chip on a hardware structure and the relevance to the example module on a circuit logic, namely: when the high-energy-consumption example module is distributed to the corresponding target sub-netlists, if at least two sub-netlists can contain the example modules of the corresponding types and the power consumption difference among all the sub-netlists is the minimum, the sub-netlists of the IO pins which are connected with the corresponding FPGA bare chips and associated with the high-energy-consumption example module are used as the target sub-netlists corresponding to the high-energy-consumption example module, and if a plurality of sub-netlists exist, the sub-netlists of the IO pins which are connected with the corresponding FPGA bare chips and associated with the high-energy-consumption example module are used as the target sub-netlists, so that the closer the circuit connection relation is, the stronger the association is.

For example, assume GTP is connected to IO port, IO port is correspondingly allocated to IO pin 1, and assume FPGA die 1 is connected to IO pin 1 and FPGA die 2 is not connected to IO pin 1. When distributing GTP, assuming that it is determined that each of sub netlist 1 corresponding to FPGA die 1 and sub netlist 2 corresponding to FPGA die 2 can contain instance modules of corresponding types and each minimizes the power consumption difference between all sub netlists, distributing GTP into sub netlist 1.

It should be noted that, during actual operation, it is first determined whether there is at least one FPGA die corresponding to the sub netlist connected to the IO pin associated with the high-energy-consumption example module, and if so, the selection is performed based on the IO pin using the second selection principle, and if all the FPGA dies corresponding to the sub netlist meeting the condition are not connected to the IO pin associated with the high-energy-consumption example module, the selection may be continued according to the traversal order by using the first selection principle.

Based on the steps, the distribution of all high-energy-consumption example modules is completed, and based on the distribution principle of the application, the high-energy-consumption example modules can be considered to be distributed in each sub-netlist according to the power consumption balance, so that the power consumption of the multi-die FPGA is distributed in each FPGA die in a balanced manner, and the condition that the normal work is influenced due to the fact that the local burden of an FPGA power supply and heat dissipation system is too high can be effectively avoided.

The method can also process the special specification requirements of a user on certain example modules, and is generally executed before the high-energy-consumption example modules are distributed, so that the method optionally further comprises the step of distributing the preset specific example modules to the specific sub-netlists according to the user instruction before the high-energy-consumption example modules are distributed.

And step S3, according to the example module to which each sub netlist has been distributed, distributing the example module which is input by the user and is not distributed in the netlist to each sub netlist according to a preset distribution logic. According to the method, other unallocated instance modules are divided into a special function instance module and a basic function instance module to be sequentially allocated, the special function instance modules are sequentially allocated to the sub netlists according to a preset processing sequence, and then the residual basic function instance modules are allocated to the sub netlists.

(one) assignment of special function instance modules.

The special function instance module referred to in the present application refers to an instance module for implementing some personalized functions, and mainly includes instance modules, except for a lookup table, a register, a BRAM, a DSP, and an IOB, in other instance modules that are not yet allocated, and commonly includes PCIE, EMAC, and CMT.

Similar to the high-energy-consumption example module allocation, when special function example modules are allocated in sequence, the allocation sequence mainly includes the following two types: the special function instance modules are sequentially distributed according to the sequence from first to last when the user inputs the netlist, or all the special function instance modules are sequentially distributed according to the sequence after being sequenced according to actual needs, and when all the special function instance modules are sequenced, the sequencing can be performed according to various different principles, for example, the sequencing can be performed according to a certain performance parameter of the special function instance modules. The special function instance modules may actually be allocated in at least two different orders in turn to find the optimal solution for allocation.

After each special function instance module is allocated, the allocation is performed according to the circuit connection relationship between the special function instance module and the allocated instance module, that is: and allocating the special function instance module to the sub-netlist which contains the instance module in the connection relation with the special function instance module and can accommodate the special function instance module, wherein the allocated instance module in the connection relation with the special function instance module comprises the allocated specific instance module, the high-energy-consumption instance module and other allocated special function instance modules. The existing connection relation comprises a direct connection relation and an indirect connection relation, so when at least one instance module in the sub-netlist is directly connected with the special function instance module or indirectly connected with the special function instance module through a plurality of other instance modules on the same data path, the instance module in the sub-netlist which has the connection relation with the special function instance module is determined. If at least two of the sub-netlists contain an instance module that has a wiring relationship with the special function instance module, the special function instance module is assigned to the instance module that contains the nearest wiring relationship with the special function instance module.

If all the sub netlists capable of containing the special function instance module do not contain the instance module having the connection relation with the special function instance module, the special function instance module is distributed to the sub netlist enabling the power consumption difference among all the sub netlists to be minimum, the specific operation is similar to the method for distributing the high-energy-consumption instance module to the target sub netlist, and the detailed description is omitted.

(II) Allocation of basic function instance modules

The basic function example modules referred to in the present application include example modules for implementing basic functions, such as lookup tables, registers, BRAMs, DSPs, and IOBs. For each basic function instance module, the allocation is carried out according to the following method: and calculating the score for allocating the basic function instance module to each sub-netlist capable of accommodating the basic function instance module according to the first objective function, and allocating the basic function instance module to the sub-netlist with the highest corresponding score. The first objective function is constructed based on at least one chip performance parameter, and the more excellent the chip performance parameter realized by allocating the basic function instance module to the sub-netlist, the higher the corresponding score. The chip performance parameters include at least one of power consumption of a single FPGA die, power consumption of a multi-die FPGA, timing margins, number of signals across the FPGA die, and clock parameters. For example, if the first objective function is constructed based on the power consumption of a single FPGA die, the base function instance module is assigned to the sub-netlist so that the lower the power consumption of the single FPGA die is, the higher the corresponding score is, and therefore, the assignment effect achieved is to assign the base function instance module to the sub-netlist which minimizes the power consumption of the single FPGA die. The first objective functions corresponding to different basic function example modules are the same or different.

It should be noted that, regardless of the special function instance module or the basic function instance module, when the instance module is allocated to the corresponding sub-netlist, the sub-netlist first needs to satisfy the condition that the corresponding type of instance module can be included, and the specific meaning is similar to that when the high-energy-consumption instance module is allocated, which is not described in detail herein.

And step S4, obtaining a sub netlist corresponding to each FPGA bare chip by division, wherein each sub netlist comprises all distributed example modules and netlist nets among the example modules, and then performing layout and wiring on each FPGA bare chip according to the sub netlist corresponding to each FPGA bare chip to realize the layout and wiring of the whole multi-die FPGA. Moreover, based on the netlist division method provided by the application, the number of logic resources on each FPGA bare chip meets the logic resource requirement of the corresponding sub-netlist obtained through division, the number of input signals of the corresponding sub-netlist is met by the leading-out end of the input signal connection point on each FPGA bare chip, the number of output signals of the corresponding sub-netlist is met by the leading-out end of the output signal connection point on each FPGA bare chip, and the IO port of the multi-die FPGA meets the IO port requirement of the corresponding sub-netlist.

In this application, steps S1-S3 complete the initial assignment of the user input netlist to obtain an initial assignment result, and sequentially assign various instance modules according to the sequence of a specific instance module, a high-energy-consumption instance module, a special function instance module, and a basic function instance module, where the initial assignment result may be directly used as a final assignment result to obtain a sub-netlist corresponding to each FPGA die, or a final assignment result may be obtained after performing secondary optimization on the initial assignment result, where the secondary optimization may solve a possible illegal situation in the initial assignment result, and may further seek a better solution for netlist segmentation, and then the method further includes the following steps between steps S3 and S4, please refer to fig. 4:

and step S5, obtaining an initial distribution result after distributing all the example modules to each sub netlist, adjusting the distribution result of the example module to be optimized according to an optimization target on the basis of keeping the distribution result of the preset fixed example module unchanged so as to optimize the initial distribution result, and finally segmenting to obtain the sub netlist corresponding to each FPGA bare chip.

The example module to be optimized is an example module except for a fixed example module, the fixed example module is usually specified by a user, and no matter whether one example module belongs to the above-mentioned specific example module, the high-energy-consumption example module, the special function example module or the basic function example module, the example module can be used as the fixed example module or the example module to be optimized in the secondary optimization process. In general, all of the high-power-consumption instance modules and all of the IOBs are used as fixed instance modules, and all of the other instance modules are used as to-be-optimized instance modules.

The method for adjusting the distribution result of the example module to be optimized according to the optimization target comprises the following steps: and calculating scores for distributing each example module to be optimized to each sub-netlist capable of accommodating the example module to be optimized according to the second objective function. The example module to be optimized with the highest corresponding score is redistributed to the sub-netlist with the highest corresponding score, if a plurality of example modules to be optimized are distributed to the corresponding sub-netlist with the highest corresponding score, the example module to be optimized with the top sorting order and the highest corresponding score is redistributed according to the reading order or the predefined sorting order of the example modules to be optimized; and if one to-be-optimized instance module corresponds to the highest score when being distributed to a plurality of sub netlists, the to-be-optimized instance module is redistributed to the sub netlist which is the top in the sequencing order and corresponds to the highest score according to the traversal order or the custom sequencing order of the sub netlists. And then updating the corresponding scores of other example modules to be optimized according to a second objective function, and executing the step of reallocating the example module to be optimized with the highest corresponding score to the sub-netlist with the highest corresponding score again until reallocating all the example modules to be optimized. In practical application, the above processes may be regarded as sequential optimization cycles, and the whole secondary optimization process includes a plurality of the above optimization cycles, and the secondary optimization process may be ended by defining the number of cycles.

And the second objective function is constructed based on at least one chip performance parameter corresponding to the optimization target, and the more excellent the chip performance parameter realized by distributing the example module to be optimized to the sub-netlist, the higher the corresponding score. The second objective function is usually different from the first objective function used in the initial allocation, so that the allocation result can be optimized from the perspective of different chip performance parameters to obtain an optimal solution.

The optimization target of the secondary optimization process comprises at least one of power consumption of the multi-die FPGA after layout, and/or timing margin, and/or signal quantity across the FPGA dies, and/or clock parameters, and the optimization target comprises the following steps:

(a) when the optimization objective includes power consumption, the optimization objective includes that the maximum power consumption of each FPGA die is within a first predetermined range, and/or that the total power consumption of all FPGA dies is minimized.

(b) When the optimization objective includes timing margins, the optimization objective includes that the timing margins of all first predetermined critical paths in the user-input netlist are within a second predetermined range.

(c) When the optimization objective includes a number of signals across the FPGA die, the optimization objective includes: and/or the maximum times of crossing the FPGA dies of all the second preset critical paths in the user input netlist are all in a third preset range, and/or the total times of crossing the FPGA dies of all the third preset critical paths in the user input netlist are minimized.

(d) When the optimization objective includes clock parameters, the optimization objective includes that the number of FPGA dies of the clock tree used to achieve the highest frequency reaches a minimum.

The above is only a preferred embodiment of the present application, and the present invention is not limited to the above embodiments. It is to be understood that other modifications and variations directly derivable or suggested by those skilled in the art without departing from the spirit and concept of the present invention are to be considered as included within the scope of the present invention.

Claims

1. A method for power consumption based netlist partitioning for a multi-die FPGA, the method comprising:

acquiring a user input netlist, sequentially allocating high-energy-consumption example modules with estimated power consumption values reaching a preset threshold value in the user input netlist to corresponding target sub-netlists, wherein the target sub-netlists can also contain example modules of corresponding types, after the high-energy-consumption example modules are allocated to the corresponding target sub-netlists, the power consumption difference between the maximum netlist power consumption value and the minimum netlist power consumption value in all the sub-netlists is minimum, and the netlist power consumption value of each sub-netlist is the power consumption value of a corresponding FPGA bare chip when the FPGA bare chip is allocated based on the allocated example modules;

according to the example module to which each sub netlist is distributed, distributing the example module which is not distributed in the user input netlist to each sub netlist according to preset distribution logic;

the number of logic resources on each FPGA bare chip meets the logic resource requirement of the corresponding sub-netlist obtained through segmentation, the leading-out end of the input signal connection point on each FPGA bare chip meets the number of input signals of the corresponding sub-netlist, the leading-out end of the output signal connection point on each FPGA bare chip meets the number of output signals of the corresponding sub-netlist, and the FPGA bare chip connected with the IO pin of the multi-die FPGA meets the IO port requirement of the corresponding sub-netlist.

2. The method of claim 1, further comprising: allocating an IO port in the user input netlist to a pin of the multi-die FPGA to be used as an IO pin;

3. The method of claim 2, wherein the assigning the IO port in the user input netlist to a pin of a multi-die FPGA comprises:

4. The method of claim 1, wherein the estimated power consumption value for an instance module is a maximum power consumption value for the instance module under different operating modes and/or different operating conditions.

5. The method of claim 1, wherein after completing the assignment of the energy-intensive instance modules, the un-assigned instance modules in the user-input netlist comprise pre-configured special function instance modules and basic function instance modules, and then:

6. The method of claim 5, wherein, in assigning each special function instance module into a respective sub-netlist:

allocating the special function instance module to a sub-netlist which already contains instance modules with connection relation with the special function instance module;

and if the sub netlist does not contain an instance module in a connection relation with the special function instance module, distributing the special function instance module into the sub netlist which minimizes the power consumption difference among all the sub netlists.

7. The method according to claim 6, characterized in that when at least one instance module in a sub-netlist is directly connected to the special function instance module or indirectly connected to the same data path through a plurality of other instance modules, it is determined that the sub-netlist contains the instance module having a connection relation with the special function instance module.

8. The method of claim 5, wherein the assigning the remaining base function instance modules into respective sub-netlists comprises, for each base function instance module:

calculating scores of the basic function instance module to be distributed to each sub netlist capable of accommodating the basic function instance module according to a first objective function, and distributing the basic function instance module to the sub netlist with the highest corresponding score; the first objective function is constructed based on at least one chip performance parameter, and the more excellent the chip performance parameter realized by allocating the basic function instance module to the sub-netlist, the higher the corresponding score.

9. The method of claim 8, wherein the first objective functions corresponding to different base function instance modules are the same or different.

10. The method of claim 1, further comprising:

11. The method of claim 1, further comprising:

and obtaining an initial distribution result after distributing all the example modules to each sub netlist, adjusting the distribution result of the example module to be optimized according to an optimization target on the basis of keeping the distribution result of the preset fixed example module unchanged so as to optimize the initial distribution result, and finally segmenting to obtain the sub netlist corresponding to each FPGA bare chip, wherein the example module to be optimized is an example module except the fixed example module.

12. The method of claim 11, wherein adjusting the assignment of the to-be-optimized instance module according to the optimization objective comprises:

calculating scores of the example modules to be optimized distributed to the sub netlists capable of containing the example modules to be optimized according to a second objective function, wherein the second objective function is constructed based on at least one chip performance parameter corresponding to the optimization target, and the better the chip performance parameter and the higher the corresponding score are when the example modules to be optimized are distributed to the sub netlists;

and updating the scores corresponding to other example modules to be optimized according to the second objective function, and executing the step of reallocating the example module to be optimized with the highest corresponding score to the sub netlist with the highest corresponding score again until reallocating all the example modules to be optimized.

13. The method of claim 11 or 12, wherein the optimization objectives comprise at least one of power consumption of the multi-die FPGA after layout, and/or timing margins, and/or number of signals across the FPGA dies, and/or clock parameters.

14. The method of claim 13,

when the optimization objective comprises power consumption, the optimization objective comprises that the maximum power consumption of each FPGA die is within a first preset range, and/or the total power consumption of all FPGA dies reaches the minimum;

when the optimization target comprises timing margins, the optimization target comprises timing margins of all first predetermined critical paths in the user input netlist, wherein the timing margins are within a second predetermined range;

when the optimization objective comprises a number of signals across the FPGA die, the optimization objective comprises: the total number of signals crossing the FPGA bare chips reaches the minimum value, and/or the maximum times of crossing the FPGA bare chips of all the second preset critical paths in the user input netlist are all in a third preset range, and/or the total times of crossing the FPGA bare chips of all the third preset critical paths in the user input netlist obtain the minimum value;

when the optimization objective includes clock parameters, the optimization objective includes that a number of FPGA dies of a clock tree for achieving a highest frequency reaches a minimum.

15. The method of claim 1, wherein at least one FPGA die of the multi-die FPGA has a different number of logic resources than other FPGA dies and/or at least one FPGA die has a different type of logic resources than other FPGA dies.