CN113032329B

CN113032329B - Computing structure, hardware architecture and computing method based on reconfigurable memory chip

Info

Publication number: CN113032329B
Application number: CN202110555316.7A
Authority: CN
Inventors: 耿云川; 陈巍; 陶嘉; 尚会滨; 江博; 李冰倩
Original assignee: Qianxin Semiconductor Technology Beijing Co ltd
Current assignee: Qianxin Semiconductor Technology Beijing Co ltd
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-09-14
Anticipated expiration: 2041-05-21
Also published as: CN113032329A

Abstract

The invention provides a computing structure, a hardware architecture and a computing method based on reconfigurable computing chips, which are characterized in that a main reconfigurable computing chip and at least one slave reconfigurable computing chip are arranged, wherein the main reconfigurable computing chip is provided with at least one first high-speed serial transceiver, each slave reconfigurable computing chip is provided with a second high-speed serial transceiver, and the main reconfigurable computing chip and each slave reconfigurable computing chip are connected through the first high-speed serial transceiver and the second high-speed serial transceiver, so that data exchange and routing between the main reconfigurable computing chip and each slave reconfigurable computing chip are realized. The invention improves the communication speed between the main reconfigurable memory chip and the slave reconfigurable memory chip, has fewer device pins and reduces the board space requirement, thereby being convenient for the PCB wiring.

Description

Computing structure, hardware architecture and computing method based on reconfigurable memory chip

Technical Field

The invention relates to the technical field of computers, in particular to a computing structure, a hardware architecture and a computing method based on a reconfigurable memory chip.

Background

The reconfigurable memory chips have abundant logic resources and interface resources inside, and can realize high-speed data processing and transmission, but one reconfigurable memory chip has limited resources, and often needs a plurality of reconfigurable memory chips to work cooperatively, which requires real-time communication among the reconfigurable memory chips, however, the speed of data transmission among the reconfigurable memory chips is often a key factor limiting the system performance and transmission speed.

In a traditional scheme, the reconfigurable memory chip-to-chip interconnection mainly adopts Low-Voltage Differential Signaling (LVDS), and the LVDS is generally used for transmission of parallel data, and the data rate is 155MHz, 622MHz or 1.25 GHz. Although the traditional scheme is easy to implement on a physical layer, has low power supply voltage and can effectively inhibit electromagnetic interference, the transmission speed cannot be broken through, and the traditional parallel transmission mode improves the transmission speed by increasing the clock frequency and the data bit width, so that the problems of difficult system layout and wiring, deviation of clock and data signals, difficult multi-endpoint interconnection, poor expandability and the like are caused.

The reconfigurable storage and computation chip interconnection also adopts a Serial Peripheral Interface (SPI) mode, SPI signals are unidirectional signals and are easy to electrically isolate, particularly in industrial products, the requirements of electrical isolation on anti-interference and safety are high, meanwhile, an SPI bus does not have a complex bus arbitration mechanism and is relatively robust, but the SPI bus does not have an addressing mechanism, additional chip selection signals are needed, multi-slave mode support is not friendly, two topologies can not support a plurality of slave devices, a system only has one master device, cooperative work of the reconfigurable storage and computation chips can not be efficiently supported, and the requirements of different fields on high-speed data transmission can not be met.

The reconfigurable memory computing chip interconnection also has a parallel bus mode, and has the advantages of simple interface, only common LVTTL level of 3.3V, but more IO pins.

In summary, the conventional schemes all have different degrees of defects, and the conventional bus transmission cannot meet the requirements of internal transmission of chips, provide no standard interface, add no design of intermediate links, and cannot meet the requirements of high-speed data transmission and low delay between chips (wafer level or board level). The traditional scheme can not realize dynamic reconfiguration for a single or a plurality of reconfigurable computing chips. Meanwhile, there is no effective solution in the prior art for the problem that other memories are accessed too slowly or cannot be accessed.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiment of the invention provides a computing structure, a hardware architecture and a computing method based on a reconfigurable memory chip.

Specifically, the embodiment of the invention provides the following technical scheme:

in a first aspect, an embodiment of the present invention provides a computing structure based on a reconfigurable memory chip, including:

the master reconfigurable computing chip and the at least one slave reconfigurable computing chip; the hardware architecture bottom layer of the master-slave reconfigurable storage and computation chip system is flexible and changeable, and can be flexibly selected at a wafer level, a multi-chip packaging level, a PCI Express board card level and a PCI Express board card level. Single or multiple reconfigurable computing chips may also implement dynamic reconfiguration.

The master reconfigurable computing chip is provided with at least one first high-speed serial transceiver, and each slave reconfigurable computing chip is provided with a second high-speed serial transceiver;

the main reconfigurable computing chip and each slave reconfigurable computing chip are connected through the first high-speed serial transceiver and the second high-speed serial transceiver, and the first high-speed serial transceiver and the second high-speed serial transceiver are used for realizing data exchange and routing between the main reconfigurable computing chip and each slave reconfigurable computing chip.

Furthermore, a block random access memory internal memory is arranged in the master reconfigurable computing chip and the slave reconfigurable computing chip;

each slave reconfigurable computing chip is respectively connected with a double-speed synchronous dynamic random access memory external memory, and each external memory and an internal memory in the master reconfigurable computing chip and the slave reconfigurable computing chip are used as a storage unit of the computing structure;

wherein each of the external memories operates in parallel.

Furthermore, at least one first node is arranged on the master reconfigurable computing chip, and a second node is respectively arranged on each slave reconfigurable computing chip;

wherein, the first node comprises: the system comprises 2 first IP cores and 2 second IP cores, or 2 first IP cores and 1 second IP core, wherein the second IP core is packaged with a first high-speed serial transceiver, and the first IP core is used for controlling the second IP core to call the first high-speed serial transceiver to carry out high-speed serial communication with the second high-speed serial transceiver;

the second node comprises: the device comprises 2 third IP cores and 2 fourth IP cores, or 2 third IP cores and 1 fourth IP core, wherein a second high-speed serial transceiver is packaged in the fourth IP core, and the third IP core is used for controlling the fourth IP core to call the second high-speed serial transceiver to carry out high-speed serial communication with the first high-speed serial transceiver.

Further, when the first node includes 2 first IP cores and 2 second IP cores, the first IP core is connected to the second IP core, and is configured to control the second IP core to call the first high-speed serial transceiver to perform high-speed serial communication with the second high-speed serial transceiver;

when the second node comprises 2 third IP cores and 2 fourth IP cores, the third IP core is connected to the fourth IP core, and is configured to control the fourth IP core to call the second high-speed serial transceiver to perform high-speed serial communication with the first high-speed serial transceiver.

Further, when the first node includes 2 first IP cores and 1 second IP core, the first IP core is connected to the second IP core through a first logic module, and is configured to control the second IP core to call the first high-speed serial transceiver to perform high-speed serial communication with the second high-speed serial transceiver;

when the second node comprises 2 third IP cores and 1 fourth IP core, the third IP core is connected to the fourth IP core through a second logic module, and is configured to control the fourth IP core to call the second high-speed serial transceiver to perform high-speed serial communication with the first high-speed serial transceiver.

Further, the first IP core and the third IP core are chip2chip IP cores; the second IP core and the fourth IP core are Aurora IP cores;

the second IP core and the fourth IP core are used for providing an advanced extensible interface AXI4 to realize user logic.

Furthermore, an AXI interconnect is arranged in the main reconfigurable computing chip, and routing is realized inside the main reconfigurable computing chip through the AXI interconnect.

Furthermore, a processor core for serial port debugging is arranged in the main reconfigurable memory chip, and an ethernet interface for communication and a universal asynchronous receiver transmitter interface for controlling the acceleration board card are arranged on the processor core.

In a second aspect, an embodiment of the present invention provides a hardware architecture based on a reconfigurable memory chip, including: in the computing structure based on the reconfigurable computing chips, the master reconfigurable computing chip and the slave reconfigurable computing chip in the computing structure are packaged in a wafer, or packaged in a plurality of wafers, or packaged on a PCI Express board card, or packaged on different PCI Express board cards.

In a third aspect, an embodiment of the present invention provides a parallel computing method for a computing structure based on a reconfigurable memory chip, including:

acquiring data to be operated and processed and an operation algorithm;

and inputting the data to be operated and processed and the operation algorithm into the main reconfigurable computing chip so as to enable the main reconfigurable computing chip and the slave reconfigurable computing chip to jointly complete the operation process.

Further, the inputting the data to be operated and processed and the operation algorithm into the master reconfigurable computing chip to make the master reconfigurable computing chip and the slave reconfigurable computing chip jointly complete the operation process includes:

and inputting the data to be operated and processed and the operation algorithm into the main reconfigurable computing chip, so that the main reconfigurable computing chip exchanges and routes data with each slave reconfigurable computing chip by using each first high-speed serial transceiver and each second high-speed serial transceiver, and stores the data by using an internal memory in the main reconfigurable computing chip and each slave reconfigurable computing chip and an external memory connected with each slave reconfigurable computing chip.

According to the computing structure, the hardware architecture and the computing method based on the reconfigurable computing chips, a main reconfigurable computing chip and at least one slave reconfigurable computing chip are arranged, wherein the main reconfigurable computing chip is provided with at least one first high-speed serial transceiver, each slave reconfigurable computing chip is provided with a second high-speed serial transceiver, and the main reconfigurable computing chip and each slave reconfigurable computing chip are connected through the first high-speed serial transceiver and the second high-speed serial transceiver, so that data exchange and routing between the main reconfigurable computing chip and each slave reconfigurable computing chip are realized. Therefore, the high-speed interconnection between the master reconfigurable computing chip and the slave reconfigurable computing chip is realized through the high-speed serial transceiver, the communication speed between the master reconfigurable computing chip and the slave reconfigurable computing chip is improved, the high-speed serial transceiver adopts the serial interface, the pin number of devices is less, the board space requirement is reduced, and the PCB wiring of the printed circuit board is facilitated.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic structural diagram of a computing architecture based on a reconfigurable memory chip according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of an interconnection architecture between reconfigurable computing chips according to a first embodiment of the present invention;

FIG. 3 is a schematic diagram of a configuration mode of a computing architecture based on a reconfigurable memory chip according to a first embodiment of the present invention;

FIG. 4 is a second schematic diagram of a configuration mode of a computing structure based on a reconfigurable memory chip according to a first embodiment of the present invention;

FIG. 5 is a schematic diagram of a hardware architecture of a reconfigurable computing chip at a wafer level or a multi-wafer level according to a second embodiment of the present invention;

FIG. 6 is a schematic diagram of a hardware architecture of a reconfigurable memory chip packaged at a PCI Express board level or a level between PCI Express boards according to a second embodiment of the present invention;

fig. 7 is a schematic flowchart of a parallel computing method based on a computing architecture of a reconfigurable memory chip according to a third embodiment of the present invention;

reference numerals:

101: a main reconfigurable memory chip; 102: a slave reconfigurable computing chip; 103: a first high-speed serial transceiver; 104: a second high-speed serial transceiver.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic structural diagram of a computing structure based on a reconfigurable memory chip according to a first embodiment of the present invention, and as shown in fig. 1, the computing structure based on a reconfigurable memory chip according to the first embodiment of the present invention includes: a master reconfigurable computing chip 101 and at least one slave reconfigurable computing chip 102;

at least one first high-speed serial transceiver 103 is arranged on the master reconfigurable computing chip 101, and a second high-speed serial transceiver 104 is respectively arranged on each slave reconfigurable computing chip 102;

the master reconfigurable computing chip 101 and each slave reconfigurable computing chip 102 are connected through the first high-speed serial transceiver 103 and the second high-speed serial transceiver 104, and the first high-speed serial transceiver 103 and the second high-speed serial transceiver 104 are used for realizing data exchange and routing between the master reconfigurable computing chip 101 and the slave reconfigurable computing chip 102.

In this embodiment, it should be noted that a Reconfigurable Computing Chip (Reconfigurable Computing Memory Chip) is a semi-custom circuit that can be obtained on the basis of a programmable logic device and can change an internal circuit structure according to a specific design target through a development tool to implement a predetermined algorithm or function, and has better flexibility than an application-specific integrated circuit because of programmability, and has more gate circuits integrated by the Reconfigurable Computing Chip than the programmable logic device, thereby better supporting a developer to perform circuit design. The reconfigurable memory chips have abundant logic resources and interface resources inside, and can realize high-speed data processing and transmission, but one reconfigurable memory chip has limited resources, and often needs a plurality of reconfigurable memory chips to work cooperatively, which requires real-time communication among the reconfigurable memory chips, however, the speed of data transmission among the reconfigurable memory chips is often a key factor limiting the system performance and transmission speed.

In the traditional scheme, the reconfigurable memory chip interconnection mainly adopts LVDS parallel data transmission, but the transmission speed cannot be broken through, the traditional parallel transmission mode improves the transmission speed by increasing the clock frequency and the data bit width, so that the problems of difficult system layout and wiring, clock and data signal offset, difficult multi-endpoint interconnection, poor expandability and the like are caused, and the requirements of different fields on high-speed data transmission cannot be met.

Therefore, in order to solve the problem that the communication speed, bandwidth and storage resources between the reconfigurable computing chips are limited, the present embodiment interconnects one master reconfigurable computing chip and a plurality of slave reconfigurable computing chips at high speed, and realizes data exchange and routing between the slave reconfigurable computing chips by means of a high-speed serial transceiver, so that the slave reconfigurable computing chips can realize accelerated computing. Specifically, the present embodiment includes a master reconfigurable computing chip 101 and at least one slave reconfigurable computing chip 102, where the master reconfigurable computing chip 101 is provided with at least one first high-speed serial transceiver 103, each slave reconfigurable computing chip 102 is provided with a second high-speed serial transceiver 104, and the master reconfigurable computing chip 101 and each slave reconfigurable computing chip 102 are connected through the first high-speed serial transceiver 103 and the second high-speed serial transceiver 104, so as to perform high-speed serial data transmission and implement data exchange and routing between the master reconfigurable computing chip and the slave reconfigurable computing chip.

Among them, the HSSB is a High Speed Serial Bus (High Speed Serial Bus), and includes a Gigabit Transceiver (GTX), which is a technology for satisfying High Speed and real-time transmission. The current linear speed range of GTX is 1Gbps-12Gbps, and the effective load range is 0.8Gbps-10 Gbps. GTX has been applied to Fibre Channel (FC), PCIE, Rapid i/o, serial SATA, gigabit Ethernet, ten-gigabit Ethernet, and the like. The GTX transceiver adopts differential signals to transmit data, so that the stronger anti-interference capability is ensured, and adopts a self-synchronization technology to contain a clock in a data stream and recover the clock from the data stream, so that the clock delay and the data delay are the same, and the sampling accuracy is ensured.

Therefore, in the embodiment, the master reconfigurable computing chip and the slave reconfigurable computing chip can cooperate with each other to complete one algorithm, instead of simple multithreading of one algorithm, and the number of the slave reconfigurable computing chips can be flexibly configured according to the algorithm requirements, so that the realization of large-scale and complex algorithms can be supported, the method can be applied to various complex algorithm acceleration scenes, and the data processing and transmission speed is high and the performance is stable.

Fig. 2 is a schematic diagram of an interconnection architecture between reconfigurable computing chips provided in an embodiment of the present invention, and as shown in fig. 2, a computing structure based on a reconfigurable computing chip includes 1 master reconfigurable computing chip and 8 slave reconfigurable computing chips, the master reconfigurable computing chip selects a 7-series reconfigurable computing chip of Xilinx in this example, the slave reconfigurable computing chip selects a Kintex 7-series reconfigurable computing chip of Xilinx, 8 Node nodes are provided on the master reconfigurable computing chip, a first high-speed serial transceiver is encapsulated in each Node, a second high-speed serial transceiver is encapsulated in a Node on the slave reconfigurable computing chip, and the first high-speed serial transceiver is connected to the second high-speed serial transceiver, so as to implement data exchange and routing between the master reconfigurable computing chip and the slave reconfigurable computing chip.

According to the reconfigurable computing chip-based computing structure provided by the embodiment of the invention, a main reconfigurable computing chip and at least one slave reconfigurable computing chip are arranged, wherein the main reconfigurable computing chip is provided with at least one first high-speed serial transceiver, each slave reconfigurable computing chip is provided with a second high-speed serial transceiver, and the main reconfigurable computing chip and each slave reconfigurable computing chip are connected through the first high-speed serial transceiver and the second high-speed serial transceiver, so that data exchange and routing between the main reconfigurable computing chip and each slave reconfigurable computing chip are realized. Therefore, the high-speed interconnection between the master reconfigurable computing chip and the slave reconfigurable computing chip is realized through the high-speed serial transceiver, the communication speed between the master reconfigurable computing chip and the slave reconfigurable computing chip is improved, the high-speed serial transceiver adopts the serial interface, the pin number of devices is less, the board space requirement is reduced, and the PCB wiring of the printed circuit board is facilitated.

The Controller is designed with a main reconfigurable storage chip, can carry out logic control on the main reconfigurable storage chip and the slave reconfigurable storage chip, can be provided with a processor core ZYNQ PS in the Controller, can provide a serial port for debugging, can also be provided with an Ethernet interface ETH for communication, and a universal asynchronous receiving and transmitting transmitter interface UART for controlling an acceleration board card.

Therefore, in the computing structure based on the reconfigurable memory chip provided by the embodiment of the invention, the processor core for serial port debugging is arranged in the main reconfigurable memory chip, and the Ethernet interface for communication and the universal asynchronous transceiver interface for controlling the acceleration board card are arranged on the processor core, so that the main reconfigurable memory chip can receive signals transmitted from the outside so as to communicate with the slave reconfigurable memory chip for acceleration operation.

A high-speed buffer memory Cache is arranged in the main reconfigurable memory chip, and the high-speed buffer memory Cache is a memory with small capacity and high speed between a Controller and a main memory. The method has the effects of solving the problem of unmatched data reading and writing speeds in the system and improving the data processing efficiency of the system.

Therefore, by the computing structure based on the reconfigurable memory chip provided by the embodiment of the invention, the problem of unmatched data reading and writing speeds in the system can be solved by arranging the Cache in the main reconfigurable memory chip, and the data processing efficiency of the system is improved, so that the data processing process in the main reconfigurable memory chip system is accelerated.

Based on the content of the above embodiments, in the present embodiment, an internal memory is provided in the master reconfigurable computing chip 101 and the slave reconfigurable computing chip 102;

each of the slave reconfigurable computing chips 102 is connected to an external memory, and each of the external memories is used as a storage unit of the computing configuration together with an internal memory provided in each of the master reconfigurable computing chip 101 and the slave reconfigurable computing chip 102;

wherein each of the external memories operates in parallel.

In this embodiment, an internal Memory, such as a Block Random Access Memory (BRAM) or the like, is disposed in the chip of the master reconfigurable computing chip 101 and the chip of the slave reconfigurable computing chip 102, and an external General Memory, such as a Double Data Rate (DDR) or a high Bandwidth (hbm) or the like, is disposed outside the chip. Each slave reconfigurable Memory chip 102 is connected with an external Memory General Memory through an MIG interface, and each external Memory General Memory works in parallel, so that the parallelization of the external Memory General memories is realized, the Memory access bandwidth is increased, and the acceleration of a high-parallelism and high-bandwidth requirement algorithm can be further performed.

The MIG Interface is an IP core provided by Xilinx, named as MIG (memory Interface generator), and may provide interfaces for various memories such as DDR3 and DDR 4. DDR is a memory capable of transferring data on both rising and falling edges of one clock to realize double data rate, but it is too complicated to directly perform a read operation or a write operation on interface signals from the viewpoint of the interface signals and the read/write process thereof. And through MIG interface IP core can solve these problems directly, the developer only needs to set DDR write address and write data through MIG interface, is convenient for developing and using. The MIG Interface consists of a Physical Layer (Physical Layer), a Controller (Memory Controller) and a User Interface (User Interface), wherein the Physical Layer provides a high-speed Interface with an external DDR SDRAM (Double Data Rate SDRAM), and the high-speed Interface comprises Data transmission, high-speed clock generation and recovery, and initialization operation and correction operation of the DDR SDRAM; the controller receives the read-write command from the user interface and processes the burst transaction, which mainly changes the command of the user interface to be more effectively executed on the DDR SDRAM and more effectively transmits the transactions such as transmission on the DDR SDRAM and the like to the user interface; the user interface provides a simple APP interface for the user to use.

Meanwhile, in the embodiment, the in-chip of the master reconfigurable computing chip and the slave reconfigurable computing chip are interconnected, so that the logic of the multiple reconfigurable computing chips is realized, and the resource sharing of the internal memory and the external memory is realized through the interconnection, so that the available resources of the system are greatly improved, and compared with the reconfigurable computing chips with the same resources, the cost can be effectively reduced.

In addition, in the embodiment, the in-chip of the master reconfigurable Memory chip and the slave reconfigurable Memory chip are interconnected to realize the logic of a plurality of reconfigurable Memory chips, and the internal Memory and the external General Memory are shared by interconnection, so that more reconfigurable computing units can be used, the available resources of the system are greatly improved, and the integration of data storage and computing is realized. The capacity and the function of a high-end chip with high price and rich resources can be achieved or even exceeded by high-speed cascading of a plurality of low-end chips with limited resources and low price, so that the expenditure can be effectively saved and the cost can be reduced on the premise of ensuring the rich resources of a chip system. And single or multiple reconfigurable memory chips can realize dynamic reconfiguration according to the requirements of different projects, and the embodiment not only is the innovation of a pure hardware framework, but also has the innovation of a matched middle layer, so that the hardware framework of the multiple reconfigurable memory chips is transparent to upper-layer application. By providing a high-level extensible interface, the design of a plurality of reconfigurable memory chips is realized in one reconfigurable memory chip, so that the reconfigurable memory chip is simpler and more universal for developers, and is convenient for design transplantation. The master-slave reconfigurable storage and computation chip can work cooperatively and is flexibly configured in quantity, so that the realization of a large-scale and high-complexity algorithm acceleration scene is supported, the data processing and transmission speed is high, the performance is stable, and the method is suitable for the computing power enhancement and acceleration of cloud computing. The computing structure of the memory chip can be reconstructed, the parallelization of the general memory is realized, the computing power is improved, the memory access bandwidth is increased, and further the algorithm with high parallelism, high bandwidth and low delay requirements can be accelerated.

According to the computing structure based on the reconfigurable memory chip provided by the embodiment of the invention, the slave reconfigurable memory chips are respectively connected with the internal memory, and the resource sharing between the internal memory and the external memory is realized through interconnection, so that the integration of data storage and computation is realized, the memory access bandwidth is increased, and the acceleration of a high-parallelism and high-bandwidth-demand algorithm can be further carried out.

Based on the content of the foregoing embodiment, in this embodiment, at least one first node is disposed on the master reconfigurable computing chip 101, and a second node is disposed on each slave reconfigurable computing chip 102;

wherein, the first node comprises: 2 first IP cores and 2 second IP cores, or 2 first IP cores and 1 second IP core, where the second IP core is encapsulated with a first high-speed serial transceiver 103, and the first IP core is used to control the second IP core to call the first high-speed serial transceiver 103 to perform high-speed serial communication with the second high-speed serial transceiver 104;

the second node comprises: the second high-speed serial transceiver 104 is encapsulated in the fourth IP core, and the third IP core is used for controlling the fourth IP core to call the second high-speed serial transceiver 104 to perform high-speed serial communication with the first high-speed serial transceiver 101.

In this embodiment, the master reconfigurable computing chip 101 is provided with at least one first node for exchanging and routing data with a second node of the slave reconfigurable computing chip 102; wherein the number of first nodes is the same as the number of slave reconfigurable computing chips 102. Each slave reconfigurable computing chip 102 is provided with a second node for exchanging and routing data with the first node of the master reconfigurable computing chip 101.

In this embodiment, the first node includes 2 selectable schemes according to different configuration modes:

2 first IP cores +2 second IP cores;

2 first IP cores +1 second IP core. The second IP core encapsulates the first high-speed serial transceiver 103, and the first IP core is used for controlling the second IP core to call the first high-speed serial transceiver 103 to perform high-speed serial communication with the second high-speed serial transceiver 104.

Similarly, the second node includes 2 alternative schemes:

2 third IP cores +2 fourth IP cores;

2 third IP cores +1 thAnd (4) four IP cores. The fourth IP core is packaged with a second high-speed serial transceiver 104, and the third IP core is used for controlling the fourth IP core to call the second high-speed serial transceiver 104 to perform high-speed serial communication with the first high-speed serial transceiver 103.

According to the computing structure based on the reconfigurable computing chip provided by the embodiment of the invention, at least one first node is arranged on the main reconfigurable computing chip, and each slave reconfigurable computing chip is respectively provided with a second node, so that high-speed serial communication between the main reconfigurable computing chip and the slave reconfigurable computing chip can be realized.

Based on the content of the foregoing embodiment, in this embodiment, when the first node includes 2 first IP cores and 2 second IP cores, the first IP core is connected to the second IP core, and is configured to control the second IP core to call the first high-speed serial transceiver 103 to perform high-speed serial communication with the second high-speed serial transceiver 104;

when the second node includes 2 third IP cores and 2 fourth IP cores, the third IP cores are connected to the fourth IP cores, and are configured to control the fourth IP cores to call the second high-speed serial transceiver 104 to perform high-speed serial communication with the first high-speed serial transceiver 103.

In this embodiment, when the first node includes 2 first IP cores and 2 second IP cores, the first IP core is connected to the second IP core. Fig. 3 is a schematic structural diagram of a first node according to an embodiment of the present invention, and as shown in fig. 3, the first node includes a first IP core (chip 2chip, abbreviated as C2C) and a second IP core (Aurora 64b/66b, abbreviated as Aurora), and two-way communication is implemented by calling two paths of C2C and Aurora inside the first node. The Aurora bottom layer calls a first high-speed serial transceiver for high-speed serial communication, the first high-speed serial transceiver comprising a pair of transmit/receive interfaces (Tx/Rx MGT I/O). Similarly, the second node comprises a third IP core and a fourth IP core, and the inside of the second node realizes bidirectional communication by calling the third IP core and the fourth IP core. The fourth IP core bottom layer calls a second high-speed serial transceiver for high-speed serial communication, wherein the second high-speed serial transceiver comprises a pair of transmitting/receiving interfaces (Tx/Rx MGT I/O).

The first IP core C2C and the second IP core Aurora realize full-duplex bidirectional communication, and the full-duplex communication adopts a method that data sending and data receiving are independent respectively, so that data can be transmitted in two directions simultaneously, and the two operations are performed synchronously without mutual interference. Under the full-duplex transmission mode, higher data transmission speed in the reconfigurable memory chip can be realized.

Aurora 64b/66b is a scalable lightweight link layer protocol for high-speed serial communication. Are typically used in applications requiring the construction of low cost, high data rate, scalable, flexible serial data channels. The Aurora protocol is an open, scalable, small, link layer protocol, can be used for point-to-point serial path data transmission, simultaneously eliminates the resource inefficiency problem of other serial protocols, can be realized in any silicon device/technology including reconfigurable computing chips, ASICs and ASSPs, and can use 1 or more high-speed serial channels.

In the computing structure based on the reconfigurable memory chip provided by the embodiment of the invention, the corresponding IP cores are respectively arranged on the first node and the second node, so that the IP cores call the corresponding high-speed serial transceivers to carry out high-speed serial communication, and high-speed data transmission is realized.

Based on the content of the foregoing embodiment, in this embodiment, when the first node includes 2 first IP cores and 1 second IP core, the first IP core is connected to the second IP core through a first logic module, and is configured to control the second IP core to call the first high-speed serial transceiver 103 to perform high-speed serial communication with the second high-speed serial transceiver 104;

when the second node includes 2 third IP cores and 1 fourth IP core, the third IP core is connected to the fourth IP core through a second logic module, and is configured to control the fourth IP core to call the second high-speed serial transceiver 104 to perform high-speed serial communication with the first high-speed serial transceiver 103.

In this embodiment, when the first node includes 2 first IP cores and 1 second IP core, the first IP core is connected to the second IP core through the first logic module. Fig. 4 is a schematic structural diagram of another first node according to an embodiment of the present invention, and as shown in fig. 4, the first node includes a first IP core (C2C) and a second IP core (Aurora), the first IP core is connected to the second IP core through a first logic module, and two-way communication is implemented inside the first node by calling two paths of C2C and Aurora. The Aurora bottom layer calls a first high-speed serial transceiver for high-speed serial communication, the first high-speed serial transceiver comprising a pair of transmit/receive interfaces (Tx/Rx MGT I/O). The first node shown in fig. 4 uses 1 second IP core, and needs to add a level of arbitration logic between Aurora and C2C to time-division multiplex the AXIS signals, thereby saving resources. Similarly, the second node comprises a third IP core and a fourth IP core, and the inside of the second node realizes bidirectional communication by calling the third IP core and the fourth IP core. The fourth IP core bottom layer calls a second high-speed serial transceiver for high-speed serial communication, wherein the second high-speed serial transceiver comprises a pair of transmitting/receiving interfaces (Tx/Rx MGT I/O).

The first IP core is connected with the second IP core through the first logic module, and the first node realizes half-duplex bidirectional communication by calling two paths of C2C and Aurora. The third IP core is connected with the fourth IP core through a second logic module, and the second node internally calls two paths of C2C and Aurora to realize half-duplex bidirectional communication. Half-duplex communication means that data can be transmitted in two directions, but one channel only allows one-way transmission at the same time, and if the transmission direction is to be changed, the first logic module and the second logic module are switched. The first-level arbitration logic and the second-level arbitration logic are added to the first logic module and the second logic module, the AXIS signals can be multiplexed in a time-sharing mode, and resources of a master-slave reconfigurable memory chip can be effectively saved.

Based on the content of the above embodiment, in this embodiment, the first IP core and the third IP core are chip2chip IP cores; the second IP core and the fourth IP core are Aurora IP cores;

In this embodiment, the second IP core and the fourth IP core are configured to provide an Advanced extensible Interface (AXI 4), so that an upper layer application (external input signal) does not need to pay attention to building of lower layer hardware, a hardware framework of multiple reconfigurable computing chips is made transparent to the upper layer application, and the multiple reconfigurable computing chips are designed as if they are implemented in one reconfigurable computing chip, which is simpler and more versatile for developers, and facilitates migration of design. The AXI4 is a bus interface protocol mainly oriented to the requirement of high-performance address mapping communication, the read operation between the AXI4 protocol master device and the slave device uses independent read address and read data channel, and the maximum 256 burst length read operation can be executed only by one address.

In the computing structure based on the reconfigurable memory chip provided by the embodiment of the invention, the second IP core and the fourth IP core provide the advanced extensible interface AXI4, so that the hardware architecture of the multi-chip reconfigurable memory chip is transparent to upper-layer application, simpler and more universal for developers, and convenient for design transplantation.

Based on the content of the foregoing embodiment, in this embodiment, the main reconfigurable computing chip 101 is provided with an AXI interconnect, and routing is implemented inside the main reconfigurable computing chip 101 through the AXI interconnect.

As shown in fig. 2, the main reconfigurable memory chip is provided with an AXI interconnect, so that routing can be implemented inside the main reconfigurable memory chip. Therefore, in the computing structure based on the reconfigurable computing chip provided by the embodiment of the invention, the AXI interconnect is arranged in the master reconfigurable computing chip, so that the internal routing of the master reconfigurable computing chip is realized, and the master reconfigurable computing chip is convenient to communicate with the slave reconfigurable computing chip.

Based on the content of the above embodiment, in this embodiment, a processor core for serial port debugging is disposed in the main reconfigurable memory chip 101, and an ethernet interface for communication and a universal asynchronous receiver/transmitter interface for controlling an acceleration board card are disposed on the processor core.

As shown in fig. 2, the main reconfigurable memory chip may be configured with a processor core ZYNQ PS capable of providing a serial port for debugging, an ethernet interface ETH capable of being used for communication, and a universal asynchronous receiver/transmitter interface UART for controlling an acceleration board. Therefore, in the computing structure based on the reconfigurable memory chip provided by the embodiment of the invention, the main reconfigurable memory chip is provided with the processor core for serial port debugging, and the processor core is provided with the Ethernet interface for communication and the universal asynchronous transceiver interface for controlling the acceleration board card, so that the main reconfigurable memory chip can receive signals transmitted from the outside so as to communicate with the slave reconfigurable memory chip for acceleration operation.

Therefore, the computing structure based on the reconfigurable memory chip provided by the embodiment has the advantages that through the high-speed interconnection of the reconfigurable memory chip, the innovation of a pure hardware framework and the innovation of a matched middle layer are included, the transparency of bottom-layer hardware to upper-layer application is realized, and the dynamic reconfiguration can be realized by a single or a plurality of reconfigurable memory chips, so that the resource sharing is realized, the memory access bandwidth is increased, and the final purpose of hardware acceleration with high parallelism, high bandwidth and low delay requirements is realized.

The hardware architecture bottom layer of the master-slave reconfigurable computing chip system is flexible and changeable, and can be flexibly selected in wafer level, PCI Express board level and PCI Express board level integrated packaging. The embodiment provides a high-speed serial transceiver standard interface for high-speed interconnection of a master-slave reconfigurable memory chip. And a standard protocol and a matched middle layer design are provided, so that the bottom layer hardware is transparent to the upper layer application. The master-slave reconfigurable storage and computation high-speed interconnection realizes the sharing of external storage and internal storage resources, can directly call the memory at high speed, greatly improves the available resources of the system and realizes the integration of storage and computation.

The master-slave reconfigurable memory chip can work cooperatively, the number of the master-slave reconfigurable memory chips is flexible to configure, reconfigurable computing units can be used more, and the computing speed is improved. The realization of large-scale and high-complexity algorithm acceleration scenes is supported, the data processing and transmission speed is high, the performance is stable, and the method is suitable for computing power enhancement and acceleration of cloud computing. The computing structure of the reconfigurable memory chip realizes the parallelization of the universal memory, improves the computing power, increases the memory access bandwidth, and further can accelerate algorithms with high parallelism, high bandwidth and low delay requirements. The present embodiment is simpler and more versatile for developers, facilitating migration of designs, by providing an advanced extensible interface.

A hardware architecture based on a reconfigurable memory chip according to a second embodiment of the present invention includes: in the computing structure based on reconfigurable computing chips according to any of the embodiments, the master reconfigurable computing chip and the slave reconfigurable computing chip in the computing structure are packaged in one wafer, or in multiple wafers, or on one PCI Express board card, or on different PCI Express board cards.

As shown in fig. 5 and 6, the hardware architecture based on the reconfigurable computing chip can integrate and package the master-slave reconfigurable computing chip into one wafer, or package multiple wafers, or package the master-slave reconfigurable computing chip into the same PCI Express board card, or package the master-slave reconfigurable computing chip into different PCI Express board cards, and the master reconfigurable computing chip can control the slave reconfigurable computing chips on different PCI Express board cards. The hardware architecture bottom layer of the master-slave reconfigurable memory chip is flexible and variable, and can be flexibly selected on the hardware architecture bottom layer platforms at a wafer level, a multi-wafer packaging level, a PCI Express board card level and a PCI Express board card level according to actual project requirements. The multi-core packaging integration can increase the calculation scale and realize great improvement of performance.

The wafer is a silicon wafer used for manufacturing a silicon semiconductor circuit, and the raw material of the wafer is silicon. And dissolving the high-purity polycrystalline silicon, doping the dissolved high-purity polycrystalline silicon into silicon crystal seed crystals, and slowly pulling out the silicon crystal seed crystals to form cylindrical monocrystalline silicon. After the silicon crystal bar is ground, polished and sliced, a silicon wafer, namely a wafer, is formed. At present, domestic wafer production lines are mainly 8 inches and 12 inches.

Among them, a PCI Express board (PCI Express board) is a hardware platform interconnected by a PCI Express bus, and the introduction of PCI Express is to overcome the limitation of the PCI bus. The PCI bus works at 33MHz 32 bits, the peak theoretical bandwidth can reach 132MB/s, the PCI bus adopts a shared bus topology, the bus bandwidth is distributed to a plurality of devices, and different devices can communicate through the bus. With the development of devices, the occupation amount of the bandwidth is larger and larger. Eventually, these devices on the bus, which occupy a large amount of bandwidth, will make other devices unable to share any bandwidth, resulting in a tight supply of bandwidth on the PCI bus. The most significant advantage of PCI Express over PCI is the point-to-point bus topology. PCI Express replaces the shared bus on PCI with a shared switch, and each device can be connected to the bus through a dedicated channel. Unlike devices on the PCI bus that share bandwidth, PCI Express provides a dedicated data channel for each device. After being packaged into packets, the data are transmitted serially in a paired sending signal and receiving signal mode, and the channel is called as a channel (lane), and the unidirectional bandwidth of each channel is 250 MB/s. Multiple channels may be combined into a channel bandwidth of x1 ("single"), x2, x4, x8, x12, and x16, thereby increasing the bandwidth of each slot up to a total throughput of 4 GB/s. Most PCs today are equipped with both PCI and PCI Express slots. Common PCI Express slot sizes are x1 and x 16. The x1 slot is a typical universal slot and the x16 slot is for a graphics card or other high performance device. Typically x4 and x8 slots are used only in server-level devices.

By cascading the small chips with limited resources, which can be flexibly configured, into the wafer at a high speed, because data transmission and data transfer are at the wafer level, the transmission speed of data signals from the main reconfigurable memory chip to the internal memory is faster, data storage and operation are integrated, and compared with a large chip, the time delay can be effectively reduced, and the power consumption is reduced. The low-level hardware is packaged and abstracted through the high-speed interconnection of the reconfigurable memory chip with limited resources, and the function of recombining the architecture into a resource-shared high-capacity chip is realized.

Fig. 7 is a schematic flowchart of a parallel computing method based on a computing architecture of a reconfigurable computing chip according to a third embodiment of the present invention, and as shown in fig. 7, the parallel computing method based on a computing architecture of a reconfigurable computing chip according to the third embodiment of the present invention includes:

acquiring data to be operated and processed and an operation algorithm;

and inputting data to be operated and processed and an operation algorithm into the main reconfigurable computing chip so as to enable the main reconfigurable computing chip and the slave reconfigurable computing chip to jointly complete an operation process.

As shown in fig. 7, first, a master-slave reconfigurable computing chip reconfiguration system is powered on and reset, configuration data is written into a reconfigurable computing chip device, internal logic and registers of the device are initialized, and then a user mode is entered, and the master reconfigurable computing chip communicates through an external interface to obtain data and an operation algorithm which need to be operated. And then inputting data to be operated and processed and an operation algorithm into a main reconfigurable Memory chip, wherein the main reconfigurable Memory chip utilizes a high-speed serial transceiver to exchange and route data, and utilizes an internal Memory and a parallel General external Memory in the main reconfigurable Memory chip and the slave reconfigurable Memory chip to store data. The main reconfigurable computing chip and the slave reconfigurable computing chip jointly complete the operation process, and high-speed data processing and transmission are realized.

Based on any of the above embodiments, inputting data to be run and processed and an operation algorithm to the master reconfigurable computing chip to enable the master reconfigurable computing chip and the slave reconfigurable computing chip to jointly complete an operation process, including:

and inputting data and operation algorithms which need to be operated and processed into the main reconfigurable computing chip, so that the main reconfigurable computing chip exchanges and routes data with each slave reconfigurable computing chip by using each first high-speed serial transceiver and each second high-speed serial transceiver, and stores the data by using an internal memory in the main reconfigurable computing chip and each slave reconfigurable computing chip and an external memory connected with each slave reconfigurable computing chip.

In this embodiment, based on the computing structure based on the reconfigurable computing chips provided in the above embodiments, the master reconfigurable computing chip communicates through the ethernet interface to obtain data and an operation algorithm that need to be executed, and after the data and the operation algorithm that need to be executed are input to the master reconfigurable computing chip, the master reconfigurable computing chip performs data exchange and routing with each slave reconfigurable computing chip through the first high-speed serial transceiver and the second high-speed serial transceiver, so that the master reconfigurable computing chip and the slave reconfigurable computing chip perform operations in a combined manner, and data storage is performed by using an internal Memory (such as a BRAM) in the master reconfigurable computing chip and each slave reconfigurable computing chip and a parallel General Memory (including a DDR Memory and an HBM Memory) connected to each slave reconfigurable computing chip.

According to the parallel computing method of the computing structure based on the reconfigurable computing chip, provided by the embodiment of the invention, data to be operated and processed and an operation algorithm are input into the main reconfigurable computing chip, so that the main reconfigurable computing chip exchanges and routes data with each slave reconfigurable computing chip by using each first high-speed serial transceiver and each second high-speed serial transceiver, and stores data by using the internal memories in the main reconfigurable computing chip and each slave reconfigurable computing chip and the General Memory (including DDR, HBM and the like) connected with each slave reconfigurable computing chip, so that the main reconfigurable computing chip and the slave reconfigurable computing chip jointly complete an operation process, and high-speed data processing and transmission are realized.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A computing architecture based on a reconfigurable memory chip, comprising:

the master reconfigurable computing chip and the at least one slave reconfigurable computing chip;

the main reconfigurable computing chip and each slave reconfigurable computing chip are connected through the first high-speed serial transceiver and the second high-speed serial transceiver, and the first high-speed serial transceiver and the second high-speed serial transceiver are used for realizing data exchange and routing between the main reconfigurable computing chip and each slave reconfigurable computing chip;

the master reconfigurable storage chip is provided with at least one first node, and each slave reconfigurable storage chip is provided with a second node; wherein, the first node comprises: the system comprises 2 first IP cores and 2 second IP cores, or 2 first IP cores and 1 second IP core, wherein the second IP core is packaged with a first high-speed serial transceiver, and the first IP core is used for controlling the second IP core to call the first high-speed serial transceiver to carry out high-speed serial communication with the second high-speed serial transceiver;

2. The computing architecture based on reconfigurable memory chip according to claim 1, characterized in that an internal memory is arranged in the master reconfigurable memory chip and the slave reconfigurable memory chip;

each slave reconfigurable computing chip is respectively connected with an external memory, and each external memory and an internal memory in the master reconfigurable computing chip and the slave reconfigurable computing chip are used as storage units of the computing structure;

wherein each of the external memories operates in parallel.

3. The computing architecture based on reconfigurable memory chip according to claim 1, wherein when the first node includes 2 first IP cores and 2 second IP cores, the first IP core is connected to the second IP core, and is configured to control the second IP core to call the first high-speed serial transceiver to perform high-speed serial communication with the second high-speed serial transceiver;

4. The computing architecture based on the reconfigurable memory chip according to claim 1, wherein when the first node includes 2 first IP cores and 1 second IP core, the first IP core is connected to the second IP core through a first logic module, and is configured to control the second IP core to call the first high-speed serial transceiver to perform high-speed serial communication with the second high-speed serial transceiver;

5. The computing structure based on the reconfigurable memory computing chip according to any one of claims 1 to 4, wherein the first IP core and the third IP core are chip2chip IP cores; the second IP core and the fourth IP core are Aurora IP cores;

6. The computing structure based on the reconfigurable memory chip according to claim 1, wherein a processor core for serial port debugging is arranged in the main reconfigurable memory chip, and an ethernet interface for communication and a universal asynchronous receiver transmitter interface for controlling an acceleration board card are arranged on the processor core.

7. A hardware architecture based on a reconfigurable memory chip, comprising: the computing structure based on reconfigurable computing chips as claimed in any one of claims 1 to 6, wherein the master reconfigurable computing chip and the slave reconfigurable computing chip in the computing structure are packaged in a wafer, or packaged in multiple wafers, or packaged on a PCI Express board card, or packaged on different PCI Express board cards.

8. A parallel computing method based on the computing structure based on the reconfigurable memory chip according to any one of claims 1 to 6, characterized by comprising the following steps:

acquiring data to be operated and processed and an operation algorithm;

9. The parallel computing method of the computing architecture based on reconfigurable computing chips according to claim 8, wherein the inputting of the data to be operated and processed and the operation algorithm to the master reconfigurable computing chip to make the master reconfigurable computing chip and the slave reconfigurable computing chip jointly complete the operation process comprises: