CN114090250A - EDA hardware acceleration method and system based on Banyan network and multi-FPGA structure - Google Patents

EDA hardware acceleration method and system based on Banyan network and multi-FPGA structure Download PDF

Info

Publication number
CN114090250A
CN114090250A CN202111389032.1A CN202111389032A CN114090250A CN 114090250 A CN114090250 A CN 114090250A CN 202111389032 A CN202111389032 A CN 202111389032A CN 114090250 A CN114090250 A CN 114090250A
Authority
CN
China
Prior art keywords
data
eda
simulation
algorithm
fpgas
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111389032.1A
Other languages
Chinese (zh)
Inventor
郭东辉
沈云飞
马钦鸿
贺珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202111389032.1A priority Critical patent/CN114090250A/en
Publication of CN114090250A publication Critical patent/CN114090250A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Abstract

The invention provides an EDA hardware acceleration method and system based on a Banyan network and a multi-FPGA structure, which comprises the steps of combining the acceleration of an EDA algorithm and the acceleration of simulation in a system, starting a top-layer EDA algorithm to control the sending and receiving of data when the EDA algorithm is accelerated, and accelerating the simulation of the EDA according to a design structure to be tested designed by a user and simulation data input by the user; meanwhile, a multichannel SCE-MI interface is adopted to carry out software and hardware data cooperation, and then a Banyan network is used for realizing data exchange of multiple FPGAs; and finally, returning the accelerated data back for processing, and comparing the simulation data with the verification data to verify the simulation result or returning the operation result data to the external EDA software. The method accelerates the algorithm and simulation in a software and hardware cooperative mode, combines the EDA algorithm acceleration and the simulation acceleration, adopts a multi-channel PIPE type SCE-MI standard protocol interface, has universality, and simultaneously applies a Banyan network to multi-FPGA data exchange, reduces the data exchange delay, so that the system is simple in structure and efficient in function.

Description

EDA hardware acceleration method and system based on Banyan network and multi-FPGA structure
Technical Field
The invention relates to the technical field of EDA hardware acceleration, in particular to an EDA hardware acceleration method and system based on a Banyan network and a multi-FPGA structure.
Background
With the continuous development of the IC design industry and the continuous progress of the super-large scale IC process technology, the scale and the complexity of the IC design are improved by times, and the requirement on the computing capacity of the EDA software in the processes of processing simulation, synthesis, layout and wiring, verification and the like is greatly increased. The computation process of these EDA software tends to take up a significant amount of design time. As a computing-intensive high-speed operation device with flexible configuration, the FPGA has certain advantages in processing large-scale data operation and circuit simulation, and the proposal of a multi-FPGA system is undoubtedly better.
In a multi-FPGA system, the realization form of the interconnection structure greatly affects the transmission efficiency of data. Typical interconnect structures today are: linear array type, lattice type, cross-linked type, and hybrid-linked type. These structures suffer from high delay, low efficiency, complex implementation, etc. The Banyan network structure adopted by the invention is an air-division switching network, is applied to the fields of parallel computers, ATM switches and the like, and has unique path characteristics and a self-routing function. A Banyan network is built among the FPGA blocks for data transmission, so that the delay performance is good, and the transmission efficiency can be effectively improved.
In the process of performing hardware acceleration of the EDA algorithm by using the FPGA, a computer host end is required to send data to be processed to a hardware side. The invention adopts PCIE hardware interface, PCIE is high speed serial computer I/O bus, and adopts high speed serial point-to-point double channel high bandwidth transmission, and the third generation transmission rate can reach 8 GT/s. In addition, the common hardware acceleration system has low universality, and different systems are required to realize algorithm acceleration and simulation acceleration. In order to improve the universality of the design, the invention can combine the EDA algorithm acceleration function with the simulation acceleration function, and adopts the SCE-MI standard. The SCE-MI standard defines an interface model that connects a software model described in behavior with a system under test described in synthesizable HDL code, and that enables communication between the two via a number of independent virtual communication channels.
The traditional hardware acceleration system of the EDA algorithm has the problems of low portability, insignificant acceleration effect and the like. The invention combines EDA algorithm acceleration and simulation acceleration, and improves the overall universality. In addition, the invention adopts a multi-FPGA structure, can solve the problem of less single FPGA resources, and fully utilizes the parallelism of the FPGA to improve the overall acceleration effect.
Common FPGA interconnection structures are mainly linear array type, grid type, cross interconnection type and hybrid interconnection type. The linear array type has a simple structure, but has larger transmission delay, and is not suitable for a large-scale multi-FPGA system. The grid structure enhances a certain transmission capability, but also has a large delay for transmission across the FPGA. The transmission efficiency of the cross-connection type and hybrid-connection type structures is greatly improved, but a large amount of resources are occupied when the number of FPGA blocks is large. The invention adopts the Banyan network to realize a multi-FPGA structure, and can realize high-efficiency cross-FPGA transmission with lower delay and a small amount of resources.
The invention aims to build an EDA algorithm acceleration and simulation acceleration system with high universality and high efficiency, and realize high-efficiency and low-delay cross-FPGA transmission by utilizing the characteristics of a Banyan network.
Disclosure of Invention
The invention provides an EDA hardware acceleration method and system based on a Banyan network and a multi-FPGA structure, which aim to solve the defects of the prior art.
In one aspect, the invention provides an EDA hardware acceleration method based on a Banyan network and a multi-FPGA architecture, which comprises the following steps:
s1: the method comprises the steps that a user selects an acceleration mode, wherein the acceleration mode comprises an EDA algorithm acceleration mode and an EDA simulation acceleration mode;
s2: if the EDA algorithm acceleration mode is selected: starting a top layer EDA algorithm to control the sending and receiving of data, and carrying out batch processing on the data and then carrying out packaging processing based on the SCE-MI channel;
if the EDA simulation acceleration mode is selected: according to a to-be-tested design structure designed by a user and simulation data input by the user, packaging the simulation data based on an SCE-MI channel;
s3: the method comprises the steps that packaged data are sent to a PCIE core on an FPGA board through a PCIE drive and hardware, the PCIE core transmits the packaged data through an AXI transmission protocol in a DMA reading mode, and then management is carried out through SCE-MI, wherein the FPGA board comprises a plurality of FPGAs;
s4: unpacking the packaged data based on SCE-MI, then sending the unpacked data to a corresponding node of a Banyan network through a corresponding SCE-MI channel, and sending the packaged data to receiving channels of a plurality of FPGAs corresponding to the node according to a cell scheduling algorithm;
s5: if the EDA algorithm acceleration mode is selected: accelerating the packaged data based on parallel calculation of the plurality of FPGAs, and then returning the accelerated data to the top layer EDA algorithm for processing;
if the EDA simulation acceleration mode is selected: and after the encapsulated data and the design structure to be tested are subjected to simulation verification by utilizing the plurality of FPGAs, simulation data is generated, and the simulation data is compared with standard verification data provided by a user to verify a simulation result.
The method combines EDA algorithm acceleration and simulation acceleration into one system, realizes data exchange of multiple FPGAs by using a Banyan network, adopts a multi-channel SCE-MI interface to carry out software and hardware data cooperation, and adopts a software and hardware cooperation mode to accelerate the algorithm and the simulation. The method realizes the combination of EDA algorithm acceleration and simulation acceleration, adopts a multichannel PIPE type SCE-MI standard protocol interface, has universality, and simultaneously applies the Banyan network to multi-FPGA data exchange, reduces data exchange delay, and ensures that the system has simple structure and high function efficiency.
In a specific embodiment, an exhaustive calculation method is adopted in a scheduling algorithm of a Banyan network to avoid blocking, and specifically includes: all cell transmissions are sequenced, checked and scheduled according to the sequence of the priority from large to small, if a cell transmission route with low priority is blocked with a cell with high priority, the current transmission is abandoned, and a cell with a lower level is selected to be transmitted until the whole Banyan network has no vacant lines;
selecting buffers with the same priority in a Banyan network in a random mode, determining a route where each cell collides with other cells, and determining a possible blocking result according to the route;
and calculating the possible blocking result and storing the possible blocking result in a BRAM on an FPGA board.
In a specific embodiment, the clock synchronization is performed by using a clock tree among the plurality of FPGAs, and the clock synchronization specifically includes: when the internal clock drives different registers, the clock tree mode is adopted to lead the time required by the clock to reach different registers to be different; meanwhile, the clock tree mode is adopted among the FPGAs so as to enable the clocks among the FPGAs to be synchronous. The embodiment solves the problem of FPGA clock deviation.
In a specific embodiment, the PCIE data is received and transmitted on the FPGA board in a DMA manner, and is transmitted by using an AXI bus structure.
In a specific embodiment, the SCE-MI channel includes operations to unpack and pack data passing through the SCE-MI channel and to build a plurality of FIFOs to construct a transmission channel for the data passing through the SCE-MI channel.
In a specific embodiment, the data exchange in the multiple FPGAs is based on data transmission of a Banyan network, which specifically includes:
based on the Banyan network, the data sending ends of the plurality of FPGAs are connected to the input port of the Banyan network, the data receiving ends of the plurality of FPGAs are connected to the output port of the Banyan network, and the data of the plurality of FPGAs are controlled to be respectively sent to corresponding nodes of the Banyan network according to corresponding SCE-MI channels.
In a specific embodiment, the returning the accelerated data to the top-level EDA algorithm for processing specifically includes:
and transmitting the data of the plurality of FPGAs back to the PCIE driver, and the PCIE driver receives the data transmitted back by the plurality of FPGAs, unpacks the transmitted data and returns the unpacked data to the top layer EDA algorithm for processing.
In a specific embodiment, the enabling of the top-level EDA algorithm to control the sending and receiving of data specifically includes:
and controlling data transmission by using the EDA algorithm so as to generate a function interface capable of being scheduled by the EDA algorithm, and controlling the transmission and the reception of the data through the function interface, wherein the function interface comprises the steps of analyzing sparse matrix operation in SPICE software, transmitting the matrix data to a transmission unit in a proper sequence and controlling the size of matrix data transmission quantity.
According to a second aspect of the present invention, a computer-readable storage medium is presented, having stored thereon a computer program which, when executed by a computer processor, implements the above-described method.
According to a third aspect of the present invention, an EDA hardware acceleration system based on Banyan network and multi-FPGA architecture is provided, the system comprising:
an acceleration mode selection unit: configuring a user-selected acceleration mode, the acceleration mode comprising an EDA algorithm acceleration mode and an EDA simulation acceleration mode;
a data transmitting/receiving unit: configured to, if the EDA algorithm acceleration mode is selected: enabling a top-layer EDA algorithm to control the sending and receiving of data, carrying out batch processing on the data, and then carrying out packaging processing based on the SCE-MI channel;
if the EDA simulation acceleration mode is selected: according to a to-be-tested design structure designed by a user and simulation data input by the user, packaging the simulation data based on an SCE-MI channel;
a PCIE data transmission unit: the method comprises the steps that configuration is carried out, packaged data are sent to a PCIE core on an FPGA board through a PCIE drive and hardware, the PCIE core transmits the packaged data through an AXI transmission protocol in a DMA reading mode, and management is carried out through SCE-MI, wherein the FPGA board comprises a plurality of FPGAs;
the Banyan network and FPGA data exchange unit: the SCE-MI-based data transmission device is configured and used for unpacking the packaged data and then transmitting the unpacked data to a corresponding node of a Banyan network through a corresponding SCE-MI channel, and then transmitting the packaged data to receiving channels, corresponding to the node, of a plurality of FPGAs according to a cell scheduling algorithm;
a data acceleration unit: configured to, if the EDA algorithm acceleration mode is selected: accelerating the packaged data based on parallel calculation of the plurality of FPGAs, and then returning the accelerated data to the top layer EDA algorithm for processing;
if the EDA simulation acceleration mode is selected: and after the encapsulated data and the design structure to be tested are subjected to simulation verification by utilizing the plurality of FPGAs, simulation data is generated, and the simulation data is compared with standard verification data provided by a user to verify a simulation result.
The method combines EDA algorithm acceleration and simulation acceleration into a system, a top-layer EDA algorithm is started to control the sending and receiving of data when the EDA algorithm is accelerated, and the EDA simulation is accelerated according to a design structure to be tested designed by a user and simulation data input by the user; meanwhile, a multichannel SCE-MI interface is adopted to carry out software and hardware data cooperation, and then a Banyan network is used for realizing data exchange of multiple FPGAs; and finally, returning the accelerated data for processing, and comparing the verification data with the simulation data to obtain a simulation result. The method adopts a software and hardware cooperative mode to accelerate the algorithm and the simulation, realizes the combination of EDA algorithm acceleration and simulation acceleration, adopts a multichannel PIPE type SCE-MI standard protocol interface, has universality, simultaneously applies a Banyan network to multi-FPGA data exchange, reduces the data exchange delay, and ensures that the system has simple structure and high function efficiency.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of an EDA hardware acceleration method based on a Banyan network and multiple FPGA architectures according to an embodiment of the present invention;
FIG. 2 is an architecture diagram of an EDA algorithm acceleration mode of a specific embodiment of the present invention;
FIG. 3 is an architecture diagram of an EDA simulation acceleration mode of a specific embodiment of the present invention;
FIG. 4 is a diagram of a Banyan network architecture in accordance with a specific embodiment of the present invention;
FIG. 5 is a block diagram of an EDA hardware acceleration system based on a Banyan network and multiple FPGA architectures, according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows a flowchart of an EDA hardware acceleration method based on a Banyan network and a multi-FPGA structure according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
s1: the method comprises the steps that a user selects an acceleration mode, wherein the acceleration mode comprises an EDA algorithm acceleration mode and an EDA simulation acceleration mode;
s2: if the EDA algorithm acceleration mode is selected: enabling a top-layer EDA algorithm to control the sending and receiving of data, carrying out batch processing on the data, and then carrying out packaging processing based on the SCE-MI channel;
if the EDA simulation acceleration mode is selected: according to a to-be-tested design structure designed by a user and simulation data input by the user, packaging the simulation data based on an SCE-MI channel;
s3: the method comprises the steps that packaged data are sent to a PCIE core on an FPGA board through a PCIE drive and hardware, the PCIE core transmits the packaged data through an AXI transmission protocol in a DMA reading mode, and then management is carried out through SCE-MI, wherein the FPGA board comprises a plurality of FPGAs;
s4: unpacking the packaged data based on SCE-MI, then sending the unpacked data to a corresponding node of a Banyan network through a corresponding SCE-MI channel, and sending the packaged data to receiving channels of a plurality of FPGAs corresponding to the node according to a cell scheduling algorithm;
s5: if the EDA algorithm acceleration mode is selected: accelerating the packaged data based on parallel calculation of the plurality of FPGAs, and then returning the accelerated data to the top layer EDA algorithm for processing;
if the EDA simulation acceleration mode is selected: and after the encapsulated data and the design structure to be tested are subjected to simulation verification by utilizing the plurality of FPGAs, simulation data is generated, and the simulation data is compared with standard verification data provided by a user to verify a simulation result.
In a specific embodiment, an exhaustive calculation method is adopted in a scheduling algorithm of a Banyan network to avoid blocking, and specifically includes: all cell transmissions are sequenced, checked and scheduled according to the sequence of the priority from large to small, if a cell transmission route with low priority is blocked with a cell with high priority, the current transmission is abandoned, and a cell with a lower level is selected to be transmitted until the whole Banyan network has no vacant line;
selecting buffers with the same priority in a Banyan network in a random mode, determining a route where each cell collides with other cells, and determining a possible blocking result according to the route;
and calculating the possible blocking result and storing the possible blocking result in a BRAM on an FPGA board.
In a specific embodiment, the clock synchronization is performed by using a clock tree among the plurality of FPGAs, and the clock synchronization specifically includes: when the internal clock drives different registers, the clock tree mode is adopted to lead the time required by the clock to reach different registers to be different; meanwhile, the clock tree mode is adopted among the FPGAs so that clocks among the FPGAs can be synchronized. The embodiment solves the problem of FPGA clock deviation.
In a specific embodiment, the PCIE data is received and transmitted on the FPGA board in a DMA manner, and is transmitted by using an AXI bus structure.
In a specific embodiment, the SCE-MI channel includes operations to unpack and pack data passing through the SCE-MI channel and to build a plurality of FIFOs to construct a transmission channel for the data passing through the SCE-MI channel.
In a specific embodiment, the data exchange in the multiple FPGAs is based on data transmission of a Banyan network, which specifically includes:
based on the Banyan network, the data sending ends of the plurality of FPGAs are connected to the input port of the Banyan network, the data receiving ends of the plurality of FPGAs are connected to the output port of the Banyan network, and the data of the plurality of FPGAs are controlled to be respectively sent to corresponding nodes of the Banyan network according to corresponding SCE-MI channels.
In a specific embodiment, the returning the accelerated data to the top-level EDA algorithm for processing specifically includes:
and transmitting the data of the plurality of FPGAs back to the PCIE driver, and the PCIE driver receives the data transmitted back by the plurality of FPGAs, unpacks the transmitted data and returns the unpacked data to the top layer EDA algorithm for processing.
In a specific embodiment, the enabling of the top-level EDA algorithm to control the sending and receiving of data specifically includes:
and controlling data transmission by using the EDA algorithm so as to generate a function interface capable of being scheduled by the EDA algorithm, and controlling the transmission and the reception of the data through the function interface, wherein the function interface comprises the steps of analyzing sparse matrix operation in SPICE software, transmitting the matrix data to a transmission unit in a proper sequence and controlling the size of matrix data transmission quantity.
In a specific embodiment, the system constructed by the method mainly comprises a software side and a hardware side, wherein the software side is realized on a computer host, the hardware side is realized through a plurality of FPGA interconnection systems, and the software side and the hardware side are communicated at a high speed through a PCIE physical interface. The EDA hardware acceleration system of the present embodiment is explained below using fig. 2 and 3:
FIG. 2 shows an architecture diagram of an EDA algorithm acceleration mode of a specific embodiment of the present invention; FIG. 3 shows an architecture diagram of an EDA simulation acceleration mode of a specific embodiment of the present invention; the method comprises two modes of EDA algorithm acceleration and simulation acceleration, wherein the started internal modules are different in different modes;
the software side mainly comprises a top layer EDA algorithm control module, a simulation data generation and verification module, an SCE-MI channel module and a PCIE drive. The hardware side mainly comprises a PCIE data transceiving module, an SCE-MI management module, a Banyan network management module, a multi-FPGA clock control module and a test module;
in the algorithm acceleration mode, a top layer EDA algorithm control module is enabled, and the main function is to generate a function interface which can be scheduled by an EDA algorithm through controlling data transmission; for example, sparse matrix operation in SPICE software is analyzed, firstly, matrix data are sent to a transmission module according to a proper sequence, and secondly, the size of data transmission quantity is controlled;
under a simulation acceleration mode, a simulation data generation and verification module is started, and the module is used for generating corresponding test data according to a tested simulation design and receiving verification return data;
the software side SCE-MI channel module adopts a PIPE mode in an SCE-MI2.0 protocol, packages data transmitted by an upper module according to an SCE-MI protocol format, and transmits the data to the PCIE driving module through a plurality of PIPE channels; in addition, according to data returned by the hardware, the data needs to be unpacked and returned to the superior processing module;
the PCIE driving and hardware side PCIE transceiver module has the main functions of packaging data of an upper module into a PCIE TLP data packet format, transmitting the data to the FPGA board through physical hardware, analyzing the data through a PCIE data transceiver IP on the board, and then transmitting and receiving the data on the board through a DMA mode, and adopting an AXI bus structure;
the SCE-MI channel at the hardware side corresponds to the channel at the software side, the module is responsible for unpacking and packaging data at the hardware side, and a plurality of FIFOs are established to realize a data transmission channel. The module is connected with a Banyan network management module, and the Banyan network management module controls SCE-MI data of each channel to be respectively sent to corresponding nodes of the network.
In a specific embodiment, the invention adopts a clock tree to solve the problem of FPGA clock skew, that is, when an internal clock drives different registers, the time required for the clock to reach different registers is different. Similarly, to ensure clock synchronization of multiple FPGAs, a clock tree design is also employed between the FPGAs.
In a specific embodiment, the invention adopts a Banyan network to realize data exchange among multiple FPGAs. The network is composed of a plurality of 2 x 2 switching units to form a multi-stage N-input N-output switching structure. Two modes of horizontal connection and cross connection exist in each 2 x 2 switching unit, and switching can be performed according to needs. The invention connects the data transmitting terminals of a plurality of FPGAs to the input port of the Banyan network, and connects the data receiving terminals of a plurality of FPGAs to the output port of the Banyan network.
Fig. 4 shows a structure diagram of a Banyan network according to a specific embodiment of the present invention, in this embodiment, taking an 8 × 8 Banyan network as an example, if the FPGA1 needs to send data to the FPGA7, a first switch unit of a first stage starts a transverse connection mode, a second switch unit of a second stage starts a cross connection mode, a fourth switch unit of a third stage starts a cross connection mode, and a sending channel from a sending end of the FPGA1 to a receiving end of the FPGA7 is opened, thereby completing a cell transmission.
Although the Banyan network has a simple structure and is easy to expand, serious congestion exists inside the Banyan network, and cells need to be scheduled through a reasonable algorithm. In the process of transmitting cells, the simultaneous transmission of all cells can cause serious blockage, only cells which are not blocked in internal transmission can be simultaneously transmitted, and in order to avoid cell loss, a buffer needs to be added at an input port. In a preferred embodiment, for an N × N Banyan architecture, the buffer of one input port is split into N, and for N output ports, there are N × N input ports of the buffer.
In the preferred embodiment, because the buffer length is limited, to prevent data loss and data delay problems, each buffer queue needs to be prioritized. The method specifically comprises the following steps: the high priority buffer can give priority to transmission without considering the blocking condition, and the low priority buffer should consider the blocking condition of all the cell transmission lines which are decided to be scheduled; if congestion occurs, the current situation should not consider delivery; the length of the waiting queue in each queue and the waiting time of the head cell are taken as parameters for calculating the priority; the longer the waiting queue length is, the longer the waiting time of the head cell is, the higher the transmission priority of the buffer is.
In the preferred embodiment, the invention adopts an exhaustive calculation method to avoid blocking, all the cell transmission priorities are sorted and checked and scheduled in a small order, if the cell transmission route with low priority is blocked with the cell with high priority, the current transmission is abandoned, and the cell with lower level is considered until the whole Banyan network has no spare route. And randomly selecting the buffers with the same priority. After the Banyan network structure is fixed, the route of each cell which can generate conflict with other cells is also determined.
The above method of avoiding congestion using exhaustive calculations is described below by taking the Banyan network structure identified in fig. 4 as an example:
based on the input and output ports, each cell transmission block can be determined according to the following equation:
[ input port ] [ output port ] ═ blocker 1, blocker 2, …, blocker buffer n }.
In the structure shown in FIG. 4, the cell with [1] [7] buffer respectively blocks the buffers:
[1][7]={[0][4],[0][5],[0][6],[0][7],[1][0],[1][1],[1][2],[1][3],[1][4],[1][5],[1][6],[2][6],[2][7],[3][6],[3][7],[4][7],[5][7],[6][7],[7][7]}
after the network structure is determined, all possible blocking results are calculated and stored in the BRAM on the FPGA board. In the process of cell scheduling, corresponding scheduling work is carried out by matching and checking the corresponding blocking buffer.
In a specific embodiment, the data transmission time of the Banyan network exists in the data exchange among a plurality of FPGAs.
FIG. 5 shows a block diagram of an EDA hardware acceleration system based on a Banyan network and multiple FPGA architectures, according to an embodiment of the present invention. The system comprises an acceleration mode selection unit 501, a data transceiving unit 502, a PCIE data transmission unit 503, a Banyan network and FPGA data exchange unit 504 and a data acceleration unit 505.
In a specific embodiment, the acceleration mode selection unit 501 is configured to select an acceleration mode by a user, where the acceleration mode includes an EDA algorithm acceleration mode and an EDA simulation acceleration mode;
the data transceiver unit 502 is configured to, if the EDA algorithm acceleration mode is selected: enabling a top-layer EDA algorithm to control the sending and receiving of data, carrying out batch processing on the data, and then carrying out packaging processing based on the SCE-MI channel;
if the EDA simulation acceleration mode is selected: according to a to-be-tested design structure designed by a user and simulation data input by the user, packaging the simulation data based on an SCE-MI channel;
the PCIE data transmission unit 503 is configured to send the encapsulated data to a PCIE core on an FPGA board through PCIE driver and hardware, where the PCIE core transmits the encapsulated data through an AXI transmission protocol in a DMA read manner, and then manages the encapsulated data by using SCE-MI, where the FPGA board includes a plurality of FPGAs;
the Banyan network and FPGA data switching unit 504 is configured to unpack the encapsulated data based on SCE-MI, send the unpacked data to a corresponding node of the Banyan network through a corresponding SCE-MI channel, and send the encapsulated data to receiving channels of multiple FPGAs corresponding to the node according to a cell scheduling algorithm;
the data acceleration unit 505 is configured to, if said EDA algorithm acceleration mode is selected: accelerating the packaged data based on parallel calculation of the plurality of FPGAs, and then returning the accelerated data to the top layer EDA algorithm for processing;
if the EDA simulation acceleration mode is selected: and after the encapsulated data and the design structure to be tested are subjected to simulation verification by utilizing the plurality of FPGAs, simulation data is generated, and the simulation data is compared with standard verification data provided by a user to verify a simulation result.
The system combines EDA algorithm acceleration and simulation acceleration into one system, a top-layer EDA algorithm is started to control sending and receiving of data when the EDA algorithm is accelerated, and the EDA simulation is accelerated according to a design structure to be tested designed by a user and simulation data input by the user; meanwhile, a multichannel SCE-MI interface is adopted to carry out software and hardware data cooperation, and then a Banyan network is used for realizing data exchange of multiple FPGAs; and finally, returning the accelerated data for processing, and comparing the verification data with the simulation data to obtain a simulation result. The system accelerates the algorithm and the simulation in a software and hardware cooperation mode, realizes the combination of EDA algorithm acceleration and simulation acceleration, adopts a multichannel PIPE type SCE-MI standard protocol interface, has universality, and simultaneously applies a Banyan network to multi-FPGA data exchange, reduces data exchange delay, so that the system has simple structure and high function efficiency.
Embodiments of the invention also relate to a computer-readable storage medium having stored thereon a computer program which, when executed by a computer processor, implements the method above. The computer program comprises program code for performing the method shown in the flow chart. It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable medium or any combination of the two.
The method combines EDA algorithm acceleration and simulation acceleration into a system, a top-layer EDA algorithm is started to control the sending and receiving of data when the EDA algorithm is accelerated, and the EDA simulation is accelerated according to a design structure to be tested designed by a user and simulation data input by the user; meanwhile, a multichannel SCE-MI interface is adopted to carry out software and hardware data cooperation, and then a Banyan network is used for realizing data exchange of multiple FPGAs; and finally, returning the accelerated data for processing, and comparing the verification data with the simulation data to obtain a simulation result. The method adopts a software and hardware cooperative mode to accelerate the algorithm and the simulation, realizes the combination of EDA algorithm acceleration and simulation acceleration, adopts a multichannel PIPE type SCE-MI standard protocol interface, has universality, simultaneously applies a Banyan network to multi-FPGA data exchange, reduces the data exchange delay, and ensures that the system has simple structure and high function efficiency.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. An EDA hardware acceleration method based on a Banyan network and a multi-FPGA structure is characterized by comprising the following steps:
s1: the method comprises the steps that a user selects an acceleration mode, wherein the acceleration mode comprises an EDA algorithm acceleration mode and an EDA simulation acceleration mode;
s2: if the EDA algorithm acceleration mode is selected: enabling a top-layer EDA algorithm to control the sending and receiving of data, carrying out batch processing on the data, and then carrying out packaging processing based on the SCE-MI channel;
if the EDA simulation acceleration mode is selected: according to a to-be-tested design structure designed by a user and simulation data input by the user, packaging the simulation data based on an SCE-MI channel;
s3: the method comprises the steps that packaged data are sent to a PCIE core on an FPGA board through a PCIE drive and hardware, the PCIE core transmits the packaged data through an AXI transmission protocol in a DMA reading mode, and then management is carried out through SCE-MI, wherein the FPGA board comprises a plurality of FPGAs;
s4: unpacking the packaged data based on SCE-MI, then sending the unpacked data to a corresponding node of a Banyan network through a corresponding SCE-MI channel, and sending the packaged data to receiving channels of a plurality of FPGAs corresponding to the node according to a cell scheduling algorithm;
s5: if the EDA algorithm acceleration mode is selected: accelerating the packaged data based on parallel calculation of the plurality of FPGAs, and then returning the accelerated data to the top layer EDA algorithm for processing;
if the EDA simulation acceleration mode is selected: and after the encapsulated data and the design structure to be tested are subjected to simulation verification by utilizing the plurality of FPGAs, simulation data is generated, and the simulation data is compared with standard verification data provided by a user to verify a simulation result.
2. The method of claim 1, wherein the avoiding of congestion is performed by an exhaustive computation method in a scheduling algorithm of a Banyan network, and specifically comprises: all cell transmissions are sequenced, checked and scheduled according to the sequence of the priority from large to small, if a cell transmission route with low priority is blocked with a cell with high priority, the current transmission is abandoned, and a cell with a lower level is selected to be transmitted until the whole Banyan network has no vacant lines;
selecting buffers with the same priority in a Banyan network in a random mode, determining a route where each cell collides with other cells, and determining a possible blocking result according to the route;
and calculating the possible blocking result and storing the possible blocking result in a BRAM on an FPGA board.
3. The method according to claim 1, wherein clock synchronization among the plurality of FPGAs is performed using a clock tree, and the clock synchronization specifically comprises: when the internal clock drives different registers, the clock tree mode is adopted to lead the time required by the clock to reach different registers to be different; meanwhile, the clock tree mode is adopted among the FPGAs so that clocks among the FPGAs can be synchronized.
4. The method of claim 1, wherein the PCIE data is received and transmitted on the FPGA board by DMA and transmitted by using an AXI bus structure.
5. The method of claim 1, wherein the SCE-MI channel comprises the operations of unpacking and packing data passing through the SCE-MI channel and building multiple FIFOs to build a transmission channel for the data passing through the SCE-MI channel.
6. The method according to claim 1, wherein the data exchange in the multiple FPGAs is based on data transmission of a Banyan network, and specifically comprises:
based on the Banyan network, the data sending ends of the plurality of FPGAs are connected to the input port of the Banyan network, the data receiving ends of the plurality of FPGAs are connected to the output port of the Banyan network, and the data of the plurality of FPGAs are controlled to be respectively sent to corresponding nodes of the Banyan network according to corresponding SCE-MI channels.
7. The method of claim 1, wherein the step of returning the accelerated data to the top-level EDA algorithm for processing comprises:
and transmitting the data of the plurality of FPGAs back to the PCIE driver, and the PCIE driver receives the data transmitted back by the plurality of FPGAs, unpacks the transmitted data and returns the unpacked data to the top layer EDA algorithm for processing.
8. The method of claim 1, wherein enabling the top-level EDA algorithm controls the sending and receiving of data, specifically comprising:
and controlling data transmission by using the EDA algorithm so as to generate a function interface capable of being scheduled by the EDA algorithm, and controlling the transmission and the reception of the data through the function interface, wherein the function interface comprises the steps of analyzing sparse matrix operation in SPICE software, transmitting the matrix data to a transmission unit in a proper sequence and controlling the size of matrix data transmission quantity.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a computer processor, carries out the method of any one of claims 1 to 8.
10. An EDA hardware acceleration system based on a Banyan network and a multi-FPGA structure is characterized by comprising:
an acceleration mode selection unit: configuring a user-selected acceleration mode, the acceleration mode comprising an EDA algorithm acceleration mode and an EDA simulation acceleration mode;
a data transmitting/receiving unit: configured to, if the EDA algorithm acceleration mode is selected: enabling a top-layer EDA algorithm to control the sending and receiving of data, carrying out batch processing on the data, and then carrying out packaging processing based on the SCE-MI channel;
if the EDA simulation acceleration mode is selected: according to a design structure to be tested designed by a user and simulation data input by the user, packaging the simulation data based on an SCE-MI channel;
a PCIE data transmission unit: the method comprises the steps that configuration is carried out, packaged data are sent to a PCIE core on an FPGA board through a PCIE drive and hardware, the PCIE core transmits the packaged data through an AXI transmission protocol in a DMA reading mode, and management is carried out through SCE-MI, wherein the FPGA board comprises a plurality of FPGAs;
the Banyan network and FPGA data exchange unit: the SCE-MI-based data encapsulation device is configured and used for unpacking the encapsulated data and then sending the unpacked data to corresponding nodes of a Banyan network through corresponding SCE-MI channels, and then sending the encapsulated data to receiving channels, corresponding to the nodes, of a plurality of FPGAs according to a cell scheduling algorithm;
a data acceleration unit: configured to, if the EDA algorithm acceleration mode is selected: accelerating the packaged data based on parallel calculation of the plurality of FPGAs, and then returning the accelerated data to the top layer EDA algorithm for processing;
if the EDA simulation acceleration mode is selected: and performing simulation verification on the packaged data and the design structure to be tested by utilizing the plurality of FPGAs, generating simulation data, and comparing the simulation data with standard verification data provided by a user to verify a simulation result.
CN202111389032.1A 2021-11-22 2021-11-22 EDA hardware acceleration method and system based on Banyan network and multi-FPGA structure Pending CN114090250A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111389032.1A CN114090250A (en) 2021-11-22 2021-11-22 EDA hardware acceleration method and system based on Banyan network and multi-FPGA structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111389032.1A CN114090250A (en) 2021-11-22 2021-11-22 EDA hardware acceleration method and system based on Banyan network and multi-FPGA structure

Publications (1)

Publication Number Publication Date
CN114090250A true CN114090250A (en) 2022-02-25

Family

ID=80302992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111389032.1A Pending CN114090250A (en) 2021-11-22 2021-11-22 EDA hardware acceleration method and system based on Banyan network and multi-FPGA structure

Country Status (1)

Country Link
CN (1) CN114090250A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114679415A (en) * 2022-05-07 2022-06-28 厦门大学 Non-blocking banyan network meeting AXI5-Lite protocol standard
CN114785746A (en) * 2022-04-19 2022-07-22 厦门大学 Banyan network for single-broadcast mixed transmission
CN114884903A (en) * 2022-04-29 2022-08-09 绿盟科技集团股份有限公司 Data processing method, field programmable gate array chip and network safety equipment
CN117290653A (en) * 2023-11-24 2023-12-26 巨霖科技(上海)有限公司 Matrix solving method and system based on EDA system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150839A1 (en) * 2007-12-10 2009-06-11 Inpa Systems, Inc. Integrated prototyping system for validating an electronic system design
CN101499937A (en) * 2009-03-16 2009-08-05 盛科网络(苏州)有限公司 Software and hardware collaborative simulation verification system and method based on FPGA
CN109711071A (en) * 2018-12-29 2019-05-03 成都海光集成电路设计有限公司 A kind of server S oC software and hardware cooperating simulation accelerated method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090150839A1 (en) * 2007-12-10 2009-06-11 Inpa Systems, Inc. Integrated prototyping system for validating an electronic system design
CN101499937A (en) * 2009-03-16 2009-08-05 盛科网络(苏州)有限公司 Software and hardware collaborative simulation verification system and method based on FPGA
CN109711071A (en) * 2018-12-29 2019-05-03 成都海光集成电路设计有限公司 A kind of server S oC software and hardware cooperating simulation accelerated method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
廖永波;李平;阮爱武;李威;李文昌;李辉;: "SOC软硬件协同仿效系统的通讯协议设计", 微电子学, no. 02, 20 April 2010 (2010-04-20), pages 15 - 19 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114785746A (en) * 2022-04-19 2022-07-22 厦门大学 Banyan network for single-broadcast mixed transmission
CN114785746B (en) * 2022-04-19 2023-06-16 厦门大学 Banyan network for unicast and multicast hybrid transmission
CN114884903A (en) * 2022-04-29 2022-08-09 绿盟科技集团股份有限公司 Data processing method, field programmable gate array chip and network safety equipment
CN114884903B (en) * 2022-04-29 2023-06-02 绿盟科技集团股份有限公司 Data processing method, field programmable gate array chip and network security device
CN114679415A (en) * 2022-05-07 2022-06-28 厦门大学 Non-blocking banyan network meeting AXI5-Lite protocol standard
CN117290653A (en) * 2023-11-24 2023-12-26 巨霖科技(上海)有限公司 Matrix solving method and system based on EDA system
CN117290653B (en) * 2023-11-24 2024-02-20 巨霖科技(上海)有限公司 Matrix solving method and system based on EDA system

Similar Documents

Publication Publication Date Title
CN114090250A (en) EDA hardware acceleration method and system based on Banyan network and multi-FPGA structure
US10084692B2 (en) Streaming bridge design with host interfaces and network on chip (NoC) layers
US9742630B2 (en) Configurable router for a network on chip (NoC)
Dall'Osso et al. Xpipes: a latency insensitive parameterized network-on-chip architecture for multi-processor SoCs
Wolkotte et al. Fast, accurate and detailed NoC simulations
US20190266088A1 (en) Backbone network-on-chip (noc) for field-programmable gate array (fpga)
US9825809B2 (en) Dynamically configuring store-and-forward channels and cut-through channels in a network-on-chip
US11023377B2 (en) Application mapping on hardened network-on-chip (NoC) of field-programmable gate array (FPGA)
EP2573980B1 (en) Parallel traffic generator with priority flow control
US10523599B2 (en) Buffer sizing of a NoC through machine learning
US10547514B2 (en) Automatic crossbar generation and router connections for network-on-chip (NOC) topology generation
KR101495811B1 (en) Apparatus and method for high speed packet routing system
CN108768778A (en) A kind of network delay computational methods, device, equipment and storage medium
Wissem et al. A quality of service network on chip based on a new priority arbitration mechanism
Kim et al. Asynchronous FIFO interfaces for GALS on-chip switched networks
CN105049377B (en) AFDX exchange datas bus structures and method for interchanging data based on Crossbar frameworks
Seifi et al. A clustered NoC in group communication
Aziz et al. Implementation of an on-chip interconnect using the i-SLIP scheduling algorithm
Jara-Berrocal et al. SCORES: A scalable and parametric streams-based communication architecture for modular reconfigurable systems
US8594111B2 (en) Method and device for buffering cell by crossbar switching matrix
Hassan et al. Hardware Implementation of NoC based MPSoC Prototype using FPGA
Chatrath et al. Design and implementation of high speed reconfigurable NoC router
Hsu et al. Design of a dual-mode noc router integrated with network interface for amba-based ips
Shermi et al. A novel architecture of bidirectional NoC router using flexible buffer
Simos et al. Building an FoC using large, buffered crossbar cores

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination