WO2020113966A1

WO2020113966A1 - High-performance fusion server architecture

Info

Publication number: WO2020113966A1
Application number: PCT/CN2019/096749
Authority: WO
Inventors: 姜凯; 于治楼; 郝虹; 李朋
Original assignee: 山东浪潮人工智能研究院有限公司
Priority date: 2018-12-03
Filing date: 2019-07-19
Publication date: 2020-06-11
Also published as: CN109558373B; CN109558373A

Abstract

The present invention particularly relates to a high-performance fusion server architecture. The high-performance fusion server architecture is a heterogeneous architecture using a general-purpose processor plus dual FPGA chips, efficiently integrating network, computing and storage, and comprising a general-purpose processor, an FPGA 1 chip, an FPGA 2 chip, a local memory, a memory array, a flash memory array, and an FPGA local memory. The FPGA 1 chip, the FPGA 2 chip, and the local memory are connected to the general-purpose processor; the memory array and the flash memory array are connected to the FPGA 1 chip; the FPGA local memory is connected to the FPGA 2 chip; and the FPGA 1 chip and the FPGA 2 chip are connected by means of a data bus. The present high-performance fusion server architecture is a heterogeneous architecture using a general-purpose processor plus dual FPGA chips, having high flexibility, low energy consumption, and high fault tolerance, and integrating computing, storage, and network, so that the efficiency of cloud applications is greatly improved.

Description

An efficient and converged server architecture

Technical field

The invention relates to the technical field of servers, in particular to a high-efficiency fusion server architecture.

Background technique

With the rapid growth of Internet users and the rapid expansion of data volume, the demand for computing in data centers has also risen rapidly. Various applications such as deep learning online prediction, video transcoding in live broadcast, image compression and decompression, and HTTPS encryption have far exceeded the computing power of traditional CPU processors.

Historically, benefiting from the continuous evolution of semiconductor technology, the throughput and system performance of computer architectures have continued to increase, and the performance of processors has doubled every 18 months (well-known "Moore's Law"), making processor performance Meet the needs of application software. However, in recent years, the improvement of semiconductor technology has reached the physical limit, and the circuit has become more and more complex. The development cost of each design is as high as millions of dollars, and billions of dollars can form a new product production capacity. On March 24, 2016, Intel announced the official suspension of the "Tick-Tock" processor R&D model. The future R&D cycle will change from a two-year cycle to a three-year cycle. At this point, Moore's Law almost failed for Intel.

On the one hand, processor performance can no longer be increased in accordance with Moore's Law, on the other hand, data growth requires more computing performance than the speed of growth according to "Moore's Law". The processor itself cannot meet the performance requirements of HPC (High Performance Compute) application software, resulting in a gap between demand and performance.

In response to this situation, technicians have proposed a solution to improve processing performance through hardware acceleration and a heterogeneous computing method using a dedicated coprocessor.

FPGA (Field Programmable Gate Array), that is, field programmable gate array, is a product of further development on the basis of programmable devices such as PAL, GAL, and CPLD. FPGA appears as a semi-custom circuit in the field of application specific integrated circuits (ASICs), which not only solves the shortcomings of custom circuits, but also overcomes the shortcomings of the limited number of gates of the original programmable devices.

Compared with traditional general-purpose processors, the use of CPU processors + FPGA reconfigurable architecture for heterogeneous computing has many advantages, such as: higher performance, greater flexibility, lower power consumption characteristics, natural Fault-tolerant features and can greatly reduce the product development cycle. Using FPGA chip to replace GPU (Graphics Processing Unit) as the accelerator of high-performance computing in the future should be the main theme of the development of FPGA heterogeneous intelligent computing at this stage.

Based on the above situation, the present invention proposes a high-performance fusion server architecture.

Summary of the invention

In order to make up for the shortcomings of the prior art, the present invention provides a simple and efficient high-performance fusion server architecture.

The present invention is achieved through the following technical solutions:

A high-performance fusion server architecture, which is characterized by: a heterogeneous architecture with a general-purpose processor + dual FPGA chips to achieve efficient integration of network, computing and storage, including a general-purpose processor, FPGA 1 chip, FPGA 2 chip, local memory , Memory array, flash memory array and FPGA local memory; the FPGA 1 chip, FPGA 2 chip and local memory are connected to a general-purpose processor, the memory array and flash memory array are connected to FPGA 1 chip, the FPGA local memory is connected To the FPGA 2 chip, the FPGA 1 chip and the FPGA 2 chip are connected by a data bus.

The FPGA 1 chip uses a high-speed memory interface to achieve high-speed interconnection with a general-purpose processor, and expands the memory array interface and flash memory interface to increase the high-speed storage space, and realizes the interconnection with the SRIO interface of the FPGA 2 chip; the FPGA 2 The chip adopts a general heterogeneous architecture, which is used to realize network packet analysis and offload, and arbitrate data functions and transmission directions.

The FPGA 1 chip is used for storage expansion and acceleration. There are 2 DDR4 interfaces, 1 SRIO interface, 1 flash memory controller interface, internal RAM logic module and storage control and arbitration logic module; the 2 DDR4 interfaces , 1 SRIO interface, 1 flash controller interface and internal RAM logic module are connected to the storage control and arbitration logic module.

The two DDR4 interfaces are used to connect the memory array and the general-purpose processor, the SRIO interface is used to implement data interconnection between the FPGA 1 chip and the FPGA 2 chip, and the flash memory controller interface is used to connect the flash memory array. The internal RAM logic module is used to store a data mapping table, and the storage control and arbitration logic module is responsible for classifying data instructions and confirming that data is read or written in a memory array or a flash memory array.

The FPGA 2 chip is used as an intelligent network card, which has a network interface, DDR4 interface, SRIO interface, PCIE interface, and network message offload and arbitration logic module; the network interface, SRIO interface, and PCIE interface are all connected to the network message offload And arbitration logic module.

The network interface is used for external data interconnection, the DDR4 interface is used to connect the FPGA local memory, the SRIO interface is used to implement data interconnection between the FPGA 2 chip and the FPGA 1 chip, and the PCIE interface is interconnected with a general-purpose processor , The network packet offloading and arbitration logic module is used to parse and offload network protocols and arbitrate the data sending direction.

The data mapping table in the internal RAM logic module includes two parts: a data storage location and a data hot and cold table. The data storage location is a memory array or a flash memory array. The data hot and cold table saves the data usage heat. Hot data is stored in the memory array, and cold data is stored in the flash memory array; the degree of cold and hot data is evaluated based on the number of times the data is written once per unit time. The number of reads is set according to the application.

The high-efficiency fusion server architecture receives external data through the network interface of the FPGA 2 chip. After the network data is unloaded, the external data passes through the arbitration logic to confirm whether the message needs to be sent to the general processor or the FPGA 1 chip; Processor, the general-purpose processor determines whether the data is sent or written to the FPGA 1 chip; if the data is sent to the FPGA 1 chip, the data is analyzed by the storage control and arbitration logic module, and the data is read or written according to the instructions To the memory array or flash memory array; and the data reading or writing strategy, according to the hot and cold table of the data, save the data usage heat, store the hot data to the memory array, and the cold data to the flash memory array.

The beneficial effects of the present invention are: the high-efficiency fusion server architecture adopts a heterogeneous architecture of general processor + dual FPGA chips, which has high flexibility, low energy consumption, strong fault tolerance, and realizes the fusion of computing, storage and network. The earth improves the efficiency of cloud applications, can meet the performance requirements of HPC application software, fills the gap between demand and performance, and is suitable for popularization and application.

BRIEF DESCRIPTION

FIG. 1 is a schematic diagram of the architecture of a high-performance fusion server of the present invention.

detailed description

In order to make the technical problems, technical solutions and beneficial effects to be solved by the present invention more clear, the present invention will be described in detail below in conjunction with the drawings and embodiments. It should be noted that the specific embodiments described here are only used to explain the present invention, and are not intended to limit the present invention.

This high-performance fusion server architecture uses a heterogeneous architecture of general-purpose processor + dual FPGA chips to achieve efficient integration of network, computing and storage, including general-purpose processor, FPGA 1 chip, FPGA 2 chip, local memory, memory array, flash memory Array and FPGA local memory; the FPGA 1 chip, FPGA 2 chip and local memory are connected to a general-purpose processor, the memory array and flash memory array are connected to FPGA 1 chip, and the FPGA local memory is connected to FPGA 2 chip, The FPGA 1 chip and the FPGA 2 chip are connected by a data bus.

The FPGA 1 chip uses a high-speed memory interface to achieve high-speed interconnection with a general-purpose processor, and expands the memory array interface and flash memory interface to increase the high-speed storage space, and realizes the interconnection with the SRIO interface of the FPGA 2 chip; the FPGA 2 The chip adopts a general heterogeneous architecture, which is used to realize network packet analysis and offload, and arbitrate data function and transmission direction.

The FPGA 2 chip serves as an intelligent network card, which is internally equipped with a network interface, DDR4 interface, SRIO interface, PCIE interface, and network message offload and arbitration logic module; the network interface, SRIO interface, and PCIE interface are all connected to the network message offload And arbitration logic module.

The high-efficiency fusion server architecture receives external data through the network interface of the FPGA 2 chip. After the network data is unloaded, the external data passes through the arbitration logic to confirm whether the message needs to be sent to the general processor or the FPGA 1 chip; if the data is sent to the general purpose Processor, the general processor determines whether the data is sent out or written to the FPGA 1 chip; if the data is sent to the FPGA 1 chip, the data is analyzed by the storage control and arbitration logic module, and the data is read or written according to the instructions To the memory array or flash memory array; and the data reading or writing strategy, according to the hot and cold table of the data, save the data usage heat, store the hot data to the memory array, and the cold data to the flash memory array.

The high-efficiency fusion server architecture uses a heterogeneous architecture with a general-purpose processor + dual FPGA chips. It has high flexibility, low energy consumption, and strong fault tolerance. It realizes the integration of computing, storage and network, which greatly improves the efficiency of cloud applications. It can meet the performance requirements of HPC application software, fill the gap between demand and performance, and is suitable for popularization and application.

Claims

A high-performance fusion server architecture, which is characterized by: a heterogeneous architecture with a general-purpose processor + dual FPGA chips to achieve efficient integration of network, computing and storage, including a general-purpose processor, FPGA 1 chip, FPGA 2 chip, local memory , Memory array, flash memory array and FPGA local memory; the FPGA 1 chip, FPGA 2 chip and local memory are connected to a general-purpose processor, the memory array and flash memory array are connected to FPGA 1 chip, the FPGA local memory is connected To the FPGA 2 chip, the FPGA 1 chip and the FPGA 2 chip are connected by a data bus.
The high-performance fusion server architecture according to claim 1, wherein the FPGA chip uses a high-speed memory interface and a general-purpose processor to realize high-speed interconnection, and a memory array interface and a flash memory interface are extended to increase the high-speed storage space And realize the interconnection with the SRIO interface of the FPGA 2 chip; the FPGA 2 chip adopts a general heterogeneous architecture, which is used to implement network message parsing and offloading, and arbitrate data functions and sending directions.
The high-performance converged server architecture according to claim 2, wherein the FPGA chip is used for storage expansion and acceleration, and has two DDR4 interfaces, one SRIO interface, one flash memory controller interface and internal RAM logic module and memory control and arbitration logic module; the 2 DDR4 interfaces, 1 SRIO interface, 1 flash memory controller interface and internal RAM logic module are all connected to the memory control and arbitration logic module.
The high-performance converged server architecture according to claim 3, wherein the two DDR4 interfaces are respectively used to connect a memory array and a general-purpose processor, and the SRIO interface is used to implement between an FPGA 1 chip and an FPGA 2 chip Data interconnection, the flash memory controller interface is used to connect flash memory arrays, the internal RAM logic module is used to store data mapping tables, and the storage control and arbitration logic module is responsible for classifying data instructions and confirming data read or write Into the memory array or flash memory array.
The high-performance converged server architecture according to claim 2, wherein the FPGA 2 chip is used as an intelligent network card and has a network interface, a DDR4 interface, a SRIO interface, a PCIE interface, and a network message offloading and arbitration logic module; The network interface, SRIO interface and PCIE interface are all connected to the network packet offloading and arbitration logic module.
The high-performance converged server architecture according to claim 5, wherein the network interface is used for external data interconnection, the DDR4 interface is used to connect FPGA local memory, and the SRIO interface is used to implement the FPGA2 chip and FPGA1 For data interconnection between chips, the PCIE interface is interconnected with a general-purpose processor, and the network packet offloading and arbitration logic module is used to parse and offload network protocols and arbitrate the data transmission direction.
The high-efficiency fusion server architecture according to claim 4, wherein the data mapping table in the internal RAM logic module includes two parts of a data storage location and a hot and cold data table, and the data storage location is a memory array or Flash memory array, the hot and cold table of the data is to save the data usage heat, store the hot data to the memory array, and store the cold data into the flash memory array; the degree of cold and hot data is evaluated based on the number of times the data is written and read in a unit time , The number of readings is set according to the application.
The high-performance converged server architecture according to claim 7, characterized in that: external data is received through the network interface of the FPGA 2 chip, and the external data is unloaded by the network message and then passed through the arbitration logic to confirm whether the message needs to be transmitted to the general processor FPGA 1 chip; if the data is sent to the general-purpose processor, the general-purpose processor determines whether the data is sent out or written to the FPGA 1 chip; if the data is sent to the FPGA 1 chip, the data is analyzed by the storage control and arbitration logic module, and According to the instruction, read or write data to the memory array or flash memory array; and the data read or write strategy, according to the cold and hot table of the data, save the data usage heat, store the hot data to the memory array, and store the cold data into the flash memory Array.