WO2020113966A1 - High-performance fusion server architecture - Google Patents

High-performance fusion server architecture Download PDF

Info

Publication number
WO2020113966A1
WO2020113966A1 PCT/CN2019/096749 CN2019096749W WO2020113966A1 WO 2020113966 A1 WO2020113966 A1 WO 2020113966A1 CN 2019096749 W CN2019096749 W CN 2019096749W WO 2020113966 A1 WO2020113966 A1 WO 2020113966A1
Authority
WO
WIPO (PCT)
Prior art keywords
fpga
data
chip
interface
memory array
Prior art date
Application number
PCT/CN2019/096749
Other languages
French (fr)
Chinese (zh)
Inventor
姜凯
于治楼
郝虹
李朋
Original Assignee
山东浪潮人工智能研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东浪潮人工智能研究院有限公司 filed Critical 山东浪潮人工智能研究院有限公司
Publication of WO2020113966A1 publication Critical patent/WO2020113966A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F2015/761Indexing scheme relating to architectures of general purpose stored programme computers
    • G06F2015/766Flash EPROM
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to the technical field of servers, in particular to a high-efficiency fusion server architecture.
  • processor performance can no longer be increased in accordance with Moore's Law
  • data growth requires more computing performance than the speed of growth according to "Moore's Law”.
  • HPC High Performance Compute
  • FPGA Field Programmable Gate Array
  • ASIC application specific integrated circuit
  • the present invention proposes a high-performance fusion server architecture.
  • the present invention provides a simple and efficient high-performance fusion server architecture.
  • a high-performance fusion server architecture which is characterized by: a heterogeneous architecture with a general-purpose processor + dual FPGA chips to achieve efficient integration of network, computing and storage, including a general-purpose processor, FPGA 1 chip, FPGA 2 chip, local memory , Memory array, flash memory array and FPGA local memory; the FPGA 1 chip, FPGA 2 chip and local memory are connected to a general-purpose processor, the memory array and flash memory array are connected to FPGA 1 chip, the FPGA local memory is connected To the FPGA 2 chip, the FPGA 1 chip and the FPGA 2 chip are connected by a data bus.
  • the FPGA 1 chip uses a high-speed memory interface to achieve high-speed interconnection with a general-purpose processor, and expands the memory array interface and flash memory interface to increase the high-speed storage space, and realizes the interconnection with the SRIO interface of the FPGA 2 chip; the FPGA 2
  • the chip adopts a general heterogeneous architecture, which is used to realize network packet analysis and offload, and arbitrate data functions and transmission directions.
  • the FPGA 1 chip is used for storage expansion and acceleration.
  • the two DDR4 interfaces are used to connect the memory array and the general-purpose processor, the SRIO interface is used to implement data interconnection between the FPGA 1 chip and the FPGA 2 chip, and the flash memory controller interface is used to connect the flash memory array.
  • the internal RAM logic module is used to store a data mapping table, and the storage control and arbitration logic module is responsible for classifying data instructions and confirming that data is read or written in a memory array or a flash memory array.
  • the FPGA 2 chip is used as an intelligent network card, which has a network interface, DDR4 interface, SRIO interface, PCIE interface, and network message offload and arbitration logic module; the network interface, SRIO interface, and PCIE interface are all connected to the network message offload And arbitration logic module.
  • the network interface is used for external data interconnection
  • the DDR4 interface is used to connect the FPGA local memory
  • the SRIO interface is used to implement data interconnection between the FPGA 2 chip and the FPGA 1 chip
  • the PCIE interface is interconnected with a general-purpose processor
  • the network packet offloading and arbitration logic module is used to parse and offload network protocols and arbitrate the data sending direction.
  • the data mapping table in the internal RAM logic module includes two parts: a data storage location and a data hot and cold table.
  • the data storage location is a memory array or a flash memory array.
  • the data hot and cold table saves the data usage heat. Hot data is stored in the memory array, and cold data is stored in the flash memory array; the degree of cold and hot data is evaluated based on the number of times the data is written once per unit time. The number of reads is set according to the application.
  • the high-efficiency fusion server architecture receives external data through the network interface of the FPGA 2 chip. After the network data is unloaded, the external data passes through the arbitration logic to confirm whether the message needs to be sent to the general processor or the FPGA 1 chip; Processor, the general-purpose processor determines whether the data is sent or written to the FPGA 1 chip; if the data is sent to the FPGA 1 chip, the data is analyzed by the storage control and arbitration logic module, and the data is read or written according to the instructions To the memory array or flash memory array; and the data reading or writing strategy, according to the hot and cold table of the data, save the data usage heat, store the hot data to the memory array, and the cold data to the flash memory array.
  • the beneficial effects of the present invention are: the high-efficiency fusion server architecture adopts a heterogeneous architecture of general processor + dual FPGA chips, which has high flexibility, low energy consumption, strong fault tolerance, and realizes the fusion of computing, storage and network.
  • the earth improves the efficiency of cloud applications, can meet the performance requirements of HPC application software, fills the gap between demand and performance, and is suitable for popularization and application.
  • FIG. 1 is a schematic diagram of the architecture of a high-performance fusion server of the present invention.
  • This high-performance fusion server architecture uses a heterogeneous architecture of general-purpose processor + dual FPGA chips to achieve efficient integration of network, computing and storage, including general-purpose processor, FPGA 1 chip, FPGA 2 chip, local memory, memory array, flash memory Array and FPGA local memory; the FPGA 1 chip, FPGA 2 chip and local memory are connected to a general-purpose processor, the memory array and flash memory array are connected to FPGA 1 chip, and the FPGA local memory is connected to FPGA 2 chip, The FPGA 1 chip and the FPGA 2 chip are connected by a data bus.
  • the FPGA 1 chip uses a high-speed memory interface to achieve high-speed interconnection with a general-purpose processor, and expands the memory array interface and flash memory interface to increase the high-speed storage space, and realizes the interconnection with the SRIO interface of the FPGA 2 chip; the FPGA 2
  • the chip adopts a general heterogeneous architecture, which is used to realize network packet analysis and offload, and arbitrate data function and transmission direction.
  • the FPGA 1 chip is used for storage expansion and acceleration.
  • the two DDR4 interfaces are used to connect the memory array and the general-purpose processor, the SRIO interface is used to implement data interconnection between the FPGA 1 chip and the FPGA 2 chip, and the flash memory controller interface is used to connect the flash memory array.
  • the internal RAM logic module is used to store a data mapping table, and the storage control and arbitration logic module is responsible for classifying data instructions and confirming that data is read or written in a memory array or a flash memory array.
  • the FPGA 2 chip serves as an intelligent network card, which is internally equipped with a network interface, DDR4 interface, SRIO interface, PCIE interface, and network message offload and arbitration logic module; the network interface, SRIO interface, and PCIE interface are all connected to the network message offload And arbitration logic module.
  • the network interface is used for external data interconnection
  • the DDR4 interface is used to connect the FPGA local memory
  • the SRIO interface is used to implement data interconnection between the FPGA 2 chip and the FPGA 1 chip
  • the PCIE interface is interconnected with a general-purpose processor
  • the network packet offloading and arbitration logic module is used to parse and offload network protocols and arbitrate the data sending direction.
  • the data mapping table in the internal RAM logic module includes two parts: a data storage location and a data hot and cold table.
  • the data storage location is a memory array or a flash memory array.
  • the data hot and cold table saves the data usage heat. Hot data is stored in the memory array, and cold data is stored in the flash memory array; the degree of cold and hot data is evaluated based on the number of times the data is written once per unit time. The number of reads is set according to the application.
  • the high-efficiency fusion server architecture receives external data through the network interface of the FPGA 2 chip. After the network data is unloaded, the external data passes through the arbitration logic to confirm whether the message needs to be sent to the general processor or the FPGA 1 chip; if the data is sent to the general purpose Processor, the general processor determines whether the data is sent out or written to the FPGA 1 chip; if the data is sent to the FPGA 1 chip, the data is analyzed by the storage control and arbitration logic module, and the data is read or written according to the instructions To the memory array or flash memory array; and the data reading or writing strategy, according to the hot and cold table of the data, save the data usage heat, store the hot data to the memory array, and the cold data to the flash memory array.
  • the high-efficiency fusion server architecture uses a heterogeneous architecture with a general-purpose processor + dual FPGA chips. It has high flexibility, low energy consumption, and strong fault tolerance. It realizes the integration of computing, storage and network, which greatly improves the efficiency of cloud applications. It can meet the performance requirements of HPC application software, fill the gap between demand and performance, and is suitable for popularization and application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Microcomputers (AREA)

Abstract

The present invention particularly relates to a high-performance fusion server architecture. The high-performance fusion server architecture is a heterogeneous architecture using a general-purpose processor plus dual FPGA chips, efficiently integrating network, computing and storage, and comprising a general-purpose processor, an FPGA 1 chip, an FPGA 2 chip, a local memory, a memory array, a flash memory array, and an FPGA local memory. The FPGA 1 chip, the FPGA 2 chip, and the local memory are connected to the general-purpose processor; the memory array and the flash memory array are connected to the FPGA 1 chip; the FPGA local memory is connected to the FPGA 2 chip; and the FPGA 1 chip and the FPGA 2 chip are connected by means of a data bus. The present high-performance fusion server architecture is a heterogeneous architecture using a general-purpose processor plus dual FPGA chips, having high flexibility, low energy consumption, and high fault tolerance, and integrating computing, storage, and network, so that the efficiency of cloud applications is greatly improved.

Description

一种高效能融合服务器架构An efficient and converged server architecture 技术领域Technical field
本发明涉及服务器技术领域,特别涉及一种高效能融合服务器架构。The invention relates to the technical field of servers, in particular to a high-efficiency fusion server architecture.
背景技术Background technique
随着互联网用户的快速增长,数据体量的急剧膨胀,数据中心对计算的需求也在迅猛上涨。诸如深度学习在线预测、直播中的视频转码、图片压缩解压缩以及HTTPS加密等各类应用对计算的需求已远远超出了传统CPU处理器的能力所及。With the rapid growth of Internet users and the rapid expansion of data volume, the demand for computing in data centers has also risen rapidly. Various applications such as deep learning online prediction, video transcoding in live broadcast, image compression and decompression, and HTTPS encryption have far exceeded the computing power of traditional CPU processors.
历史上,受益于半导体技术的持续演进,计算机体系结构的吞吐量和系统性能不断提高,处理器的性能每18个月就能翻倍(众所周知的“摩尔定律”),使得处理器的性能可以满足应用软件的需求。但是,近几年半导体技术改进达到了物理极限,电路越来越复杂,每一个设计的开发成本高达数百万美元,数十亿美元才能形成新产品投产能力。2016年3月24日,英特尔宣布正式停用“Tick-Tock”处理器研发模式,未来研发周期将从两年周期向三年期转变。至此,摩尔定律对英特尔几近失效。Historically, benefiting from the continuous evolution of semiconductor technology, the throughput and system performance of computer architectures have continued to increase, and the performance of processors has doubled every 18 months (well-known "Moore's Law"), making processor performance Meet the needs of application software. However, in recent years, the improvement of semiconductor technology has reached the physical limit, and the circuit has become more and more complex. The development cost of each design is as high as millions of dollars, and billions of dollars can form a new product production capacity. On March 24, 2016, Intel announced the official suspension of the "Tick-Tock" processor R&D model. The future R&D cycle will change from a two-year cycle to a three-year cycle. At this point, Moore's Law almost failed for Intel.
一方面处理器性能再无法按照摩尔定律进行增长,另一方面数据增长对计算性能要求超过了按“摩尔定律”增长的速度。处理器本身无法满足HPC(High Performance Compute,高性能计算)应用软件的性能需求,导致需求和性能之间出现了缺口。On the one hand, processor performance can no longer be increased in accordance with Moore's Law, on the other hand, data growth requires more computing performance than the speed of growth according to "Moore's Law". The processor itself cannot meet the performance requirements of HPC (High Performance Compute) application software, resulting in a gap between demand and performance.
针对这一情况,技术人员提出了一种解决方法,通过硬件加速,采用专用协处理器的异构计算方式来提升处理性能。In response to this situation, technicians have proposed a solution to improve processing performance through hardware acceleration and a heterogeneous computing method using a dedicated coprocessor.
FPGA(Field Programmable Gate Array),即现场可编程门阵列,是在PAL、GAL、CPLD等可编程器件的基础上进一步发展的产物。FPGA是作为专用集成电路(ASIC)领域中的一种半定制电路而出现的,既解决了定制电路的不足,又克服了原有可编程器件门电路数有限的缺点。FPGA (Field Programmable Gate Array), that is, field programmable gate array, is a product of further development on the basis of programmable devices such as PAL, GAL, and CPLD. FPGA appears as a semi-custom circuit in the field of application specific integrated circuits (ASICs), which not only solves the shortcomings of custom circuits, but also overcomes the shortcomings of the limited number of gates of the original programmable devices.
与传统通用处理器相比,采用CPU处理器+FPGA的可重构架构的异构计算宰割方面具有很多优势,例如:较高的性能、较大的灵活性、较低的功耗特性、天生的容错特性以及能够大大缩减产品开发周期等。采用FPGA芯片来替代GPU(Graphics Processing Unit,图形处理器)作为未来高性能计算的加速器,应该是现阶段的FPGA异构智能计算发展的主旋律。Compared with traditional general-purpose processors, the use of CPU processors + FPGA reconfigurable architecture for heterogeneous computing has many advantages, such as: higher performance, greater flexibility, lower power consumption characteristics, natural Fault-tolerant features and can greatly reduce the product development cycle. Using FPGA chip to replace GPU (Graphics Processing Unit) as the accelerator of high-performance computing in the future should be the main theme of the development of FPGA heterogeneous intelligent computing at this stage.
基于上述情况,本发明提出了一种高效能融合服务器架构。Based on the above situation, the present invention proposes a high-performance fusion server architecture.
发明内容Summary of the invention
本发明为了弥补现有技术的缺陷,提供了一种简单高效的高效能融合服务器架构。In order to make up for the shortcomings of the prior art, the present invention provides a simple and efficient high-performance fusion server architecture.
本发明是通过如下技术方案实现的:The present invention is achieved through the following technical solutions:
一种高效能融合服务器架构,其特征在于:采用通用处理器+双FPGA芯片的异构架构,将网络,计算和存储实现高效融合,包括通用处理器,FPGA 1芯片,FPGA 2芯片,本地内存,内存阵列,闪存阵列和FPGA本地内存;所述FPGA 1芯片,FPGA 2芯片与本地内存均连接到通用处理器,所述内存阵列和闪存阵列均连接到FPGA 1芯片,所述FPGA本地内存连接到FPGA 2芯片,所述FPGA 1芯片与FPGA 2芯片之间通过数据总线相连接。A high-performance fusion server architecture, which is characterized by: a heterogeneous architecture with a general-purpose processor + dual FPGA chips to achieve efficient integration of network, computing and storage, including a general-purpose processor, FPGA 1 chip, FPGA 2 chip, local memory , Memory array, flash memory array and FPGA local memory; the FPGA 1 chip, FPGA 2 chip and local memory are connected to a general-purpose processor, the memory array and flash memory array are connected to FPGA 1 chip, the FPGA local memory is connected To the FPGA 2 chip, the FPGA 1 chip and the FPGA 2 chip are connected by a data bus.
所述FPGA 1芯片采用高速内存接口与通用处理器实现高速互联,扩展出内存阵列接口和闪存接口,用于加大高速存储空间,并实现与FPGA 2芯片的SRIO接口实现互联;所述FPGA 2芯片采用通用异构架构,用于实现网络报文解析和卸载,并仲裁数据功能及发送方向。The FPGA 1 chip uses a high-speed memory interface to achieve high-speed interconnection with a general-purpose processor, and expands the memory array interface and flash memory interface to increase the high-speed storage space, and realizes the interconnection with the SRIO interface of the FPGA 2 chip; the FPGA 2 The chip adopts a general heterogeneous architecture, which is used to realize network packet analysis and offload, and arbitrate data functions and transmission directions.
所述FPGA 1芯片用于存储扩展与加速,内部设有2个DDR4接口,1个SRIO接口,1个闪存控制器接口以及内部RAM逻辑模块和存储控制与仲裁逻辑模块;所述2个DDR4接口,1个SRIO接口,1个闪存控制器接口以及内部RAM逻辑模块均连接到存储控制与仲裁逻辑模块。The FPGA 1 chip is used for storage expansion and acceleration. There are 2 DDR4 interfaces, 1 SRIO interface, 1 flash memory controller interface, internal RAM logic module and storage control and arbitration logic module; the 2 DDR4 interfaces , 1 SRIO interface, 1 flash controller interface and internal RAM logic module are connected to the storage control and arbitration logic module.
所述2个DDR4接口分别用于连接内存阵列和通用处理器,所述SRIO接口用于实现FPGA 1芯片与FPGA 2芯片之间的数据互联,所述闪存控制器接口用于连接闪存阵列,所述内部RAM逻辑模块用于存放数据Mapping表,所述存储控制与仲裁逻辑模块负责对数据指令进行分类,确认数据读取或写入在内存阵列或者闪存阵列。The two DDR4 interfaces are used to connect the memory array and the general-purpose processor, the SRIO interface is used to implement data interconnection between the FPGA 1 chip and the FPGA 2 chip, and the flash memory controller interface is used to connect the flash memory array. The internal RAM logic module is used to store a data mapping table, and the storage control and arbitration logic module is responsible for classifying data instructions and confirming that data is read or written in a memory array or a flash memory array.
所述FPGA 2芯片作为智能网卡,内部设有网络接口,DDR4接口,SRIO接口,PCIE接口和网络报文卸载及仲裁逻辑模块;所述网络接口,SRIO接口和PCIE接口均连接到网络报文卸载及仲裁逻辑模块。The FPGA 2 chip is used as an intelligent network card, which has a network interface, DDR4 interface, SRIO interface, PCIE interface, and network message offload and arbitration logic module; the network interface, SRIO interface, and PCIE interface are all connected to the network message offload And arbitration logic module.
所述网络接口用于外部数据互联,所述DDR4接口用于连接FPGA本地内存,所述SRIO接口用于实现FPGA 2芯片与FPGA 1芯片之间的数据互联,所述PCIE接口与通用 处理器互联,所述网络报文卸载和仲裁逻辑模块用于解析和卸载网络协议并仲裁数据发送方向。The network interface is used for external data interconnection, the DDR4 interface is used to connect the FPGA local memory, the SRIO interface is used to implement data interconnection between the FPGA 2 chip and the FPGA 1 chip, and the PCIE interface is interconnected with a general-purpose processor , The network packet offloading and arbitration logic module is used to parse and offload network protocols and arbitrate the data sending direction.
所述内部RAM逻辑模块中的数据Mapping表包含数据存储位置和数据的冷热表两部分,所述数据存储位置即内存阵列或闪存阵列,所述数据的冷热表即保存数据使用热度,将热数据存储到内存阵列,冷数据存入闪存阵列;数据冷热程度依据在单位时间内数据一次写入被读取次数来评估,读取次数根据应用进行设置。The data mapping table in the internal RAM logic module includes two parts: a data storage location and a data hot and cold table. The data storage location is a memory array or a flash memory array. The data hot and cold table saves the data usage heat. Hot data is stored in the memory array, and cold data is stored in the flash memory array; the degree of cold and hot data is evaluated based on the number of times the data is written once per unit time. The number of reads is set according to the application.
所述高效能融合服务器架构,通过FPGA 2芯片的网络接口接收外部数据,外部数据经网络报文卸载后通过仲裁逻辑,确认报文需要传送至通用处理器还是FPGA 1芯片;如果数据送入通用处理器,则通用处理器处理后决定数据是发送出去还是写入FPGA 1芯片;如果数据送入FPGA 1芯片,则通过存储控制与仲裁逻辑模块解析数据,并依据指令,读取或写入数据至内存阵列或闪存阵列;而数据读取或写入策略,依据数据的冷热表,保存数据使用热度,将热数据存储到内存阵列,冷数据存入闪存阵列。The high-efficiency fusion server architecture receives external data through the network interface of the FPGA 2 chip. After the network data is unloaded, the external data passes through the arbitration logic to confirm whether the message needs to be sent to the general processor or the FPGA 1 chip; Processor, the general-purpose processor determines whether the data is sent or written to the FPGA 1 chip; if the data is sent to the FPGA 1 chip, the data is analyzed by the storage control and arbitration logic module, and the data is read or written according to the instructions To the memory array or flash memory array; and the data reading or writing strategy, according to the hot and cold table of the data, save the data usage heat, store the hot data to the memory array, and the cold data to the flash memory array.
本发明的有益效果是:该高效能融合服务器架构,采用通用处理器+双FPGA芯片的异构架构,灵活性高,能耗低,容错特性强,实现了计算,存储与网络的融合,极大地提升了云应用效率,能够满足HPC应用软件的性能需求,填补了需求和性能之间的缺口,适宜推广应用。The beneficial effects of the present invention are: the high-efficiency fusion server architecture adopts a heterogeneous architecture of general processor + dual FPGA chips, which has high flexibility, low energy consumption, strong fault tolerance, and realizes the fusion of computing, storage and network. The earth improves the efficiency of cloud applications, can meet the performance requirements of HPC application software, fills the gap between demand and performance, and is suitable for popularization and application.
附图说明BRIEF DESCRIPTION
附图1为本发明高效能融合服务器架构示意图。FIG. 1 is a schematic diagram of the architecture of a high-performance fusion server of the present invention.
具体实施方式detailed description
为了使本发明所要解决的技术问题、技术方案及有益效果更加清楚明白,以下结合附图和实施例,对本发明进行详细的说明。应当说明的是,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the technical problems, technical solutions and beneficial effects to be solved by the present invention more clear, the present invention will be described in detail below in conjunction with the drawings and embodiments. It should be noted that the specific embodiments described here are only used to explain the present invention, and are not intended to limit the present invention.
该高效能融合服务器架构,采用通用处理器+双FPGA芯片的异构架构,将网络,计算和存储实现高效融合,包括通用处理器,FPGA 1芯片,FPGA 2芯片,本地内存,内存阵列,闪存阵列和FPGA本地内存;所述FPGA 1芯片,FPGA 2芯片与本地内存均连接到通用处理器,所述内存阵列和闪存阵列均连接到FPGA 1芯片,所述FPGA本地内存连接到FPGA 2芯片,所述FPGA 1芯片与FPGA 2芯片之间通过数据总线相连接。This high-performance fusion server architecture uses a heterogeneous architecture of general-purpose processor + dual FPGA chips to achieve efficient integration of network, computing and storage, including general-purpose processor, FPGA 1 chip, FPGA 2 chip, local memory, memory array, flash memory Array and FPGA local memory; the FPGA 1 chip, FPGA 2 chip and local memory are connected to a general-purpose processor, the memory array and flash memory array are connected to FPGA 1 chip, and the FPGA local memory is connected to FPGA 2 chip, The FPGA 1 chip and the FPGA 2 chip are connected by a data bus.
所述FPGA 1芯片采用高速内存接口与通用处理器实现高速互联,扩展出内存阵列接口和闪存接口,用于加大高速存储空间,并实现与FPGA 2芯片的SRIO接口实现互联;所述FPGA 2芯片采用通用异构架构,用于实现网络报文解析和卸载,并仲裁数据功能及发送方向。The FPGA 1 chip uses a high-speed memory interface to achieve high-speed interconnection with a general-purpose processor, and expands the memory array interface and flash memory interface to increase the high-speed storage space, and realizes the interconnection with the SRIO interface of the FPGA 2 chip; the FPGA 2 The chip adopts a general heterogeneous architecture, which is used to realize network packet analysis and offload, and arbitrate data function and transmission direction.
所述FPGA 1芯片用于存储扩展与加速,内部设有2个DDR4接口,1个SRIO接口,1个闪存控制器接口以及内部RAM逻辑模块和存储控制与仲裁逻辑模块;所述2个DDR4接口,1个SRIO接口,1个闪存控制器接口以及内部RAM逻辑模块均连接到存储控制与仲裁逻辑模块。The FPGA 1 chip is used for storage expansion and acceleration. There are 2 DDR4 interfaces, 1 SRIO interface, 1 flash memory controller interface, internal RAM logic module and storage control and arbitration logic module; the 2 DDR4 interfaces , 1 SRIO interface, 1 flash controller interface and internal RAM logic module are connected to the storage control and arbitration logic module.
所述2个DDR4接口分别用于连接内存阵列和通用处理器,所述SRIO接口用于实现FPGA 1芯片与FPGA 2芯片之间的数据互联,所述闪存控制器接口用于连接闪存阵列,所述内部RAM逻辑模块用于存放数据Mapping表,所述存储控制与仲裁逻辑模块负责对数据指令进行分类,确认数据读取或写入在内存阵列或者闪存阵列。The two DDR4 interfaces are used to connect the memory array and the general-purpose processor, the SRIO interface is used to implement data interconnection between the FPGA 1 chip and the FPGA 2 chip, and the flash memory controller interface is used to connect the flash memory array. The internal RAM logic module is used to store a data mapping table, and the storage control and arbitration logic module is responsible for classifying data instructions and confirming that data is read or written in a memory array or a flash memory array.
所述FPGA 2芯片作为智能网卡,内部设有网络接口,DDR4接口,SRIO接口,PCIE接口和网络报文卸载及仲裁逻辑模块;所述网络接口,SRIO接口和PCIE接口均连接到网络报文卸载及仲裁逻辑模块。The FPGA 2 chip serves as an intelligent network card, which is internally equipped with a network interface, DDR4 interface, SRIO interface, PCIE interface, and network message offload and arbitration logic module; the network interface, SRIO interface, and PCIE interface are all connected to the network message offload And arbitration logic module.
所述网络接口用于外部数据互联,所述DDR4接口用于连接FPGA本地内存,所述SRIO接口用于实现FPGA 2芯片与FPGA 1芯片之间的数据互联,所述PCIE接口与通用处理器互联,所述网络报文卸载和仲裁逻辑模块用于解析和卸载网络协议并仲裁数据发送方向。The network interface is used for external data interconnection, the DDR4 interface is used to connect the FPGA local memory, the SRIO interface is used to implement data interconnection between the FPGA 2 chip and the FPGA 1 chip, and the PCIE interface is interconnected with a general-purpose processor , The network packet offloading and arbitration logic module is used to parse and offload network protocols and arbitrate the data sending direction.
所述内部RAM逻辑模块中的数据Mapping表包含数据存储位置和数据的冷热表两部分,所述数据存储位置即内存阵列或闪存阵列,所述数据的冷热表即保存数据使用热度,将热数据存储到内存阵列,冷数据存入闪存阵列;数据冷热程度依据在单位时间内数据一次写入被读取次数来评估,读取次数根据应用进行设置。The data mapping table in the internal RAM logic module includes two parts: a data storage location and a data hot and cold table. The data storage location is a memory array or a flash memory array. The data hot and cold table saves the data usage heat. Hot data is stored in the memory array, and cold data is stored in the flash memory array; the degree of cold and hot data is evaluated based on the number of times the data is written once per unit time. The number of reads is set according to the application.
所述高效能融合服务器架构,通过FPGA 2芯片的网络接口接收外部数据,外部数据经网络报文卸载后通过仲裁逻辑,确认报文需要传送至通用处理器还是FPGA 1芯片;如果数据送入通用处理器,则通用处理器处理后决定数据是发送出去还是写入FPGA 1芯片;如果数据送入FPGA 1芯片,则通过存储控制与仲裁逻辑模块解析数据,并依据指令,读 取或写入数据至内存阵列或闪存阵列;而数据读取或写入策略,依据数据的冷热表,保存数据使用热度,将热数据存储到内存阵列,冷数据存入闪存阵列。The high-efficiency fusion server architecture receives external data through the network interface of the FPGA 2 chip. After the network data is unloaded, the external data passes through the arbitration logic to confirm whether the message needs to be sent to the general processor or the FPGA 1 chip; if the data is sent to the general purpose Processor, the general processor determines whether the data is sent out or written to the FPGA 1 chip; if the data is sent to the FPGA 1 chip, the data is analyzed by the storage control and arbitration logic module, and the data is read or written according to the instructions To the memory array or flash memory array; and the data reading or writing strategy, according to the hot and cold table of the data, save the data usage heat, store the hot data to the memory array, and the cold data to the flash memory array.
该高效能融合服务器架构,采用通用处理器+双FPGA芯片的异构架构,灵活性高,能耗低,容错特性强,实现了计算,存储与网络的融合,极大地提升了云应用效率,能够满足HPC应用软件的性能需求,填补了需求和性能之间的缺口,适宜推广应用。The high-efficiency fusion server architecture uses a heterogeneous architecture with a general-purpose processor + dual FPGA chips. It has high flexibility, low energy consumption, and strong fault tolerance. It realizes the integration of computing, storage and network, which greatly improves the efficiency of cloud applications. It can meet the performance requirements of HPC application software, fill the gap between demand and performance, and is suitable for popularization and application.

Claims (8)

  1. 一种高效能融合服务器架构,其特征在于:采用通用处理器+双FPGA芯片的异构架构,将网络,计算和存储实现高效融合,包括通用处理器,FPGA 1芯片,FPGA 2芯片,本地内存,内存阵列,闪存阵列和FPGA本地内存;所述FPGA 1芯片,FPGA 2芯片与本地内存均连接到通用处理器,所述内存阵列和闪存阵列均连接到FPGA 1芯片,所述FPGA本地内存连接到FPGA 2芯片,所述FPGA 1芯片与FPGA 2芯片之间通过数据总线相连接。A high-performance fusion server architecture, which is characterized by: a heterogeneous architecture with a general-purpose processor + dual FPGA chips to achieve efficient integration of network, computing and storage, including a general-purpose processor, FPGA 1 chip, FPGA 2 chip, local memory , Memory array, flash memory array and FPGA local memory; the FPGA 1 chip, FPGA 2 chip and local memory are connected to a general-purpose processor, the memory array and flash memory array are connected to FPGA 1 chip, the FPGA local memory is connected To the FPGA 2 chip, the FPGA 1 chip and the FPGA 2 chip are connected by a data bus.
  2. 根据权利要求1所述的高效能融合服务器架构,其特征在于:所述FPGA 1芯片采用高速内存接口与通用处理器实现高速互联,扩展出内存阵列接口和闪存接口,用于加大高速存储空间,并实现与FPGA 2芯片的SRIO接口实现互联;所述FPGA 2芯片采用通用异构架构,用于实现网络报文解析和卸载,并仲裁数据功能及发送方向。The high-performance fusion server architecture according to claim 1, wherein the FPGA chip uses a high-speed memory interface and a general-purpose processor to realize high-speed interconnection, and a memory array interface and a flash memory interface are extended to increase the high-speed storage space And realize the interconnection with the SRIO interface of the FPGA 2 chip; the FPGA 2 chip adopts a general heterogeneous architecture, which is used to implement network message parsing and offloading, and arbitrate data functions and sending directions.
  3. 根据权利要求2所述的高效能融合服务器架构,其特征在于:所述FPGA 1芯片用于存储扩展与加速,内部设有2个DDR4接口,1个SRIO接口,1个闪存控制器接口以及内部RAM逻辑模块和存储控制与仲裁逻辑模块;所述2个DDR4接口,1个SRIO接口,1个闪存控制器接口以及内部RAM逻辑模块均连接到存储控制与仲裁逻辑模块。The high-performance converged server architecture according to claim 2, wherein the FPGA chip is used for storage expansion and acceleration, and has two DDR4 interfaces, one SRIO interface, one flash memory controller interface and internal RAM logic module and memory control and arbitration logic module; the 2 DDR4 interfaces, 1 SRIO interface, 1 flash memory controller interface and internal RAM logic module are all connected to the memory control and arbitration logic module.
  4. 根据权利要求3所述的高效能融合服务器架构,其特征在于:所述2个DDR4接口分别用于连接内存阵列和通用处理器,所述SRIO接口用于实现FPGA 1芯片与FPGA 2芯片之间的数据互联,所述闪存控制器接口用于连接闪存阵列,所述内部RAM逻辑模块用于存放数据Mapping表,所述存储控制与仲裁逻辑模块负责对数据指令进行分类,确认数据读取或写入在内存阵列或者闪存阵列。The high-performance converged server architecture according to claim 3, wherein the two DDR4 interfaces are respectively used to connect a memory array and a general-purpose processor, and the SRIO interface is used to implement between an FPGA 1 chip and an FPGA 2 chip Data interconnection, the flash memory controller interface is used to connect flash memory arrays, the internal RAM logic module is used to store data mapping tables, and the storage control and arbitration logic module is responsible for classifying data instructions and confirming data read or write Into the memory array or flash memory array.
  5. 根据权利要求2所述的高效能融合服务器架构,其特征在于:所述FPGA 2芯片作为智能网卡,内部设有网络接口,DDR4接口,SRIO接口,PCIE接口和网络报文卸载及仲裁逻辑模块;所述网络接口,SRIO接口和PCIE接口均连接到网络报文卸载及仲裁逻辑模块。The high-performance converged server architecture according to claim 2, wherein the FPGA 2 chip is used as an intelligent network card and has a network interface, a DDR4 interface, a SRIO interface, a PCIE interface, and a network message offloading and arbitration logic module; The network interface, SRIO interface and PCIE interface are all connected to the network packet offloading and arbitration logic module.
  6. 根据权利要求5所述的高效能融合服务器架构,其特征在于:所述网络接口用于外部数据互联,所述DDR4接口用于连接FPGA本地内存,所述SRIO接口用于实现FPGA2芯片与FPGA 1芯片之间的数据互联,所述PCIE接口与通用处理器互联,所述网络 报文卸载和仲裁逻辑模块用于解析和卸载网络协议并仲裁数据发送方向。The high-performance converged server architecture according to claim 5, wherein the network interface is used for external data interconnection, the DDR4 interface is used to connect FPGA local memory, and the SRIO interface is used to implement the FPGA2 chip and FPGA1 For data interconnection between chips, the PCIE interface is interconnected with a general-purpose processor, and the network packet offloading and arbitration logic module is used to parse and offload network protocols and arbitrate the data transmission direction.
  7. 根据权利要求4所述的高效能融合服务器架构,其特征在于:所述内部RAM逻辑模块中的数据Mapping表包含数据存储位置和数据的冷热表两部分,所述数据存储位置即内存阵列或闪存阵列,所述数据的冷热表即保存数据使用热度,将热数据存储到内存阵列,冷数据存入闪存阵列;数据冷热程度依据在单位时间内数据一次写入被读取次数来评估,读取次数根据应用进行设置。The high-efficiency fusion server architecture according to claim 4, wherein the data mapping table in the internal RAM logic module includes two parts of a data storage location and a hot and cold data table, and the data storage location is a memory array or Flash memory array, the hot and cold table of the data is to save the data usage heat, store the hot data to the memory array, and store the cold data into the flash memory array; the degree of cold and hot data is evaluated based on the number of times the data is written and read in a unit time , The number of readings is set according to the application.
  8. 根据权利要求7所述的高效能融合服务器架构,其特征在于:通过FPGA 2芯片的网络接口接收外部数据,外部数据经网络报文卸载后通过仲裁逻辑,确认报文需要传送至通用处理器还是FPGA 1芯片;如果数据送入通用处理器,则通用处理器处理后决定数据是发送出去还是写入FPGA 1芯片;如果数据送入FPGA 1芯片,则通过存储控制与仲裁逻辑模块解析数据,并依据指令,读取或写入数据至内存阵列或闪存阵列;而数据读取或写入策略,依据数据的冷热表,保存数据使用热度,将热数据存储到内存阵列,冷数据存入闪存阵列。The high-performance converged server architecture according to claim 7, characterized in that: external data is received through the network interface of the FPGA 2 chip, and the external data is unloaded by the network message and then passed through the arbitration logic to confirm whether the message needs to be transmitted to the general processor FPGA 1 chip; if the data is sent to the general-purpose processor, the general-purpose processor determines whether the data is sent out or written to the FPGA 1 chip; if the data is sent to the FPGA 1 chip, the data is analyzed by the storage control and arbitration logic module, and According to the instruction, read or write data to the memory array or flash memory array; and the data read or write strategy, according to the cold and hot table of the data, save the data usage heat, store the hot data to the memory array, and store the cold data into the flash memory Array.
PCT/CN2019/096749 2018-12-03 2019-07-19 High-performance fusion server architecture WO2020113966A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811465942.1A CN109558373B (en) 2018-12-03 2018-12-03 High-performance fusion server
CN201811465942.1 2018-12-03

Publications (1)

Publication Number Publication Date
WO2020113966A1 true WO2020113966A1 (en) 2020-06-11

Family

ID=65868661

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/096749 WO2020113966A1 (en) 2018-12-03 2019-07-19 High-performance fusion server architecture

Country Status (2)

Country Link
CN (1) CN109558373B (en)
WO (1) WO2020113966A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112363977A (en) * 2020-11-11 2021-02-12 北京大地信合信息技术有限公司 VPX single-board computer main board
CN114490023A (en) * 2021-12-20 2022-05-13 中国科学院高能物理研究所 High-energy physical calculable storage device based on ARM and FPGA

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558373B (en) * 2018-12-03 2022-03-01 山东浪潮科学研究院有限公司 High-performance fusion server
CN110177083B (en) * 2019-04-26 2021-07-06 创新先进技术有限公司 Network card, data sending/receiving method and equipment
US11082410B2 (en) 2019-04-26 2021-08-03 Advanced New Technologies Co., Ltd. Data transceiving operations and devices
CN110765064B (en) * 2019-10-18 2022-08-23 山东浪潮科学研究院有限公司 Edge-end image processing system and method of heterogeneous computing architecture
CN114968886B (en) * 2022-07-27 2022-10-25 中科亿海微电子科技(苏州)有限公司 Reconfigurable financial acceleration card based on double FPGA, reconfigurable method and acceleration method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106020425A (en) * 2016-05-27 2016-10-12 浪潮(北京)电子信息产业有限公司 FPGA heterogeneous acceleration calculating system
CN106250349A (en) * 2016-08-08 2016-12-21 浪潮(北京)电子信息产业有限公司 A kind of high energy efficiency heterogeneous computing system
CN207067982U (en) * 2017-08-08 2018-03-02 重庆跃途科技有限公司 A kind of isomery board based on FPGA
CN108696390A (en) * 2018-05-09 2018-10-23 济南浪潮高新科技投资发展有限公司 A kind of software-defined network safety equipment and method
CN108776649A (en) * 2018-06-11 2018-11-09 山东超越数控电子股份有限公司 One kind being based on CPU+FPGA heterogeneous computing systems and its accelerated method
CN109558373A (en) * 2018-12-03 2019-04-02 济南浪潮高新科技投资发展有限公司 A kind of high-effect converged services device framework

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7444454B2 (en) * 2004-05-11 2008-10-28 L-3 Communications Integrated Systems L.P. Systems and methods for interconnection of multiple FPGA devices
CN202662010U (en) * 2012-01-04 2013-01-09 青岛海信信芯科技有限公司 FPGA (Field Programmable Gate Array) interaction device, verification board and SOC (System On Chip) system
DE102012004844B4 (en) * 2012-03-13 2018-05-17 Phoenix Contact Gmbh & Co. Kg System of measured value monitoring and shutdown when measured value deviations occur
CN102902628B (en) * 2012-09-18 2016-06-01 记忆科技(深圳)有限公司 A kind of cold and hot data automatic separation method, system and flash memory realized based on flash memory
US9294097B1 (en) * 2013-11-15 2016-03-22 Scientific Concepts International Corporation Device array topology configuration and source code partitioning for device arrays
CN107346170A (en) * 2017-07-20 2017-11-14 郑州云海信息技术有限公司 A kind of FPGA Heterogeneous Computings acceleration system and method
CN207799667U (en) * 2018-01-30 2018-08-31 郑州云海信息技术有限公司 A kind of isomery mixing inner server framework based on BBU power supplys

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106020425A (en) * 2016-05-27 2016-10-12 浪潮(北京)电子信息产业有限公司 FPGA heterogeneous acceleration calculating system
CN106250349A (en) * 2016-08-08 2016-12-21 浪潮(北京)电子信息产业有限公司 A kind of high energy efficiency heterogeneous computing system
CN207067982U (en) * 2017-08-08 2018-03-02 重庆跃途科技有限公司 A kind of isomery board based on FPGA
CN108696390A (en) * 2018-05-09 2018-10-23 济南浪潮高新科技投资发展有限公司 A kind of software-defined network safety equipment and method
CN108776649A (en) * 2018-06-11 2018-11-09 山东超越数控电子股份有限公司 One kind being based on CPU+FPGA heterogeneous computing systems and its accelerated method
CN109558373A (en) * 2018-12-03 2019-04-02 济南浪潮高新科技投资发展有限公司 A kind of high-effect converged services device framework

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112363977A (en) * 2020-11-11 2021-02-12 北京大地信合信息技术有限公司 VPX single-board computer main board
CN114490023A (en) * 2021-12-20 2022-05-13 中国科学院高能物理研究所 High-energy physical calculable storage device based on ARM and FPGA
CN114490023B (en) * 2021-12-20 2024-05-07 中国科学院高能物理研究所 ARM and FPGA-based high-energy physical computable storage device

Also Published As

Publication number Publication date
CN109558373B (en) 2022-03-01
CN109558373A (en) 2019-04-02

Similar Documents

Publication Publication Date Title
WO2020113966A1 (en) High-performance fusion server architecture
US7477257B2 (en) Apparatus, system, and method for graphics memory hub
CN102710890B (en) Video processing on-chip system of double AHB (Advanced High Performance Bus) buses
CN104954795A (en) Image acquisition and transmission system based on JPEG2000
WO2016192217A1 (en) Apb bus bridge
CN105573959A (en) Computation and storage integrated distributed computer architecture
CN103106173A (en) Interconnection method among cores of multi-core processor
WO2014173231A1 (en) Memory access method and memory system
DE102021122233A1 (en) ACCELERATOR CONTROLLER HUB
CN111813736B (en) System on chip and signal processing method
CN107844433A (en) A kind of isomery mixing inner server framework
CN109062858A (en) A kind of FPGA accelerator card based on Xilinx XCVU37P chip
CN109801207B (en) CPU-FPGA collaborative image feature high-speed detection and matching system
CN109144927B (en) Multi-FPGA interconnection device
TW202131329A (en) Method for integrating processing-in-sensor and in-memory computing and system thereof
KR20210008776A (en) Semiconductor memory device and electronic system the same
Kim et al. Compression accelerator for hadoop appliance
US20210064530A1 (en) Coherent node controller
US11360701B1 (en) Memory and storage controller with integrated memory coherency interconnect
CN201444298U (en) Communication module between multi-core processor and second level caches
CN112740193B (en) Method for executing operation by big data operation acceleration system
TWI732523B (en) Storage device and method for manufacturing the same
CN202102433U (en) Device for expanding IO (input and output) bandwidth of dragon core CPU (central processing unit)
CN208255879U (en) System on chip framework based on distributed memory
US11704271B2 (en) Scalable system-in-package architectures

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19893561

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19893561

Country of ref document: EP

Kind code of ref document: A1