WO2021143135A1 - 一种基于fpga云平台的远端数据搬移装置和方法 - Google Patents

一种基于fpga云平台的远端数据搬移装置和方法 Download PDF

Info

Publication number
WO2021143135A1
WO2021143135A1 PCT/CN2020/111006 CN2020111006W WO2021143135A1 WO 2021143135 A1 WO2021143135 A1 WO 2021143135A1 CN 2020111006 W CN2020111006 W CN 2020111006W WO 2021143135 A1 WO2021143135 A1 WO 2021143135A1
Authority
WO
WIPO (PCT)
Prior art keywords
fpga
data
acceleration
accelerated
card
Prior art date
Application number
PCT/CN2020/111006
Other languages
English (en)
French (fr)
Inventor
王江为
郝锐
阚宏伟
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US17/792,265 priority Critical patent/US11868297B2/en
Publication of WO2021143135A1 publication Critical patent/WO2021143135A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Definitions

  • the invention relates to the field of FPGA data movement application technology, in particular to a remote data movement device and method based on an FPGA cloud platform.
  • Cloud computing is an Internet-based computing method. In this way, shared software and hardware resources and information can be provided to computers and other devices on demand. On the cloud, the growth of data is about 30% per year. At the same time, the rapid development of AI also brings requirements for high-performance data computing. Traditional CPUs cannot solve the problem of computing performance.
  • FPGA uses its advantages of high performance, low latency, flexibility, scalability, and low power consumption for computing acceleration in data centers. At present, data centers such as Microsoft, Amazon, Baidu, Tencent, and Facebook have all launched FPGA cloud platforms, using FPGAs as sharable resources on the cloud to accelerate computing. Multiple FPGA acceleration units can form a computing resource pool through the network to achieve distributed data acceleration.
  • a key technology to realize distributed FPGA cloud platform is how to realize data movement in different FPGA acceleration units and how to improve the efficiency of data movement.
  • FIG. 1 it is a schematic diagram of the existing FPGA cloud platform network topology.
  • the FPGA board is connected to the switch through the MAC network interface to form an FPGA resource pool.
  • the form of the FPGA board can be the PCIE accelerator card in the server, or it can be multiple separate FPGA boards in JBOF (JBOF, Just a Bunch Of FPGAs, only FPGA pool of FPGA accelerator cards).
  • JBOF Just a Bunch Of FPGAs, only FPGA pool of FPGA accelerator cards.
  • the FPGA cloud platform is usually used to accelerate algorithms with very large computational data such as AI algorithms, image processing, and gene sequencing. The acceleration algorithm needs to be accelerated by multiple FPGA boards, and data is exchanged between multiple FPGA boards.
  • RDMA technology is a modern high-performance network communication technology based on hardware acceleration.
  • RoCE RDMA RoCE, RDMA over Converged Ethernet
  • RoCE RDMA transfers data directly from the memory of one computer to another computer without the intervention of the operating systems of both parties.
  • Figure 2 it is a functional schematic diagram of the existing RDMA technology.
  • Server A transfers data to server B.
  • Server A's application executes RDMA write requests. Without the involvement of kernel memory, RDMA write requests run in user space.
  • FPGA board A reads the data in the buffer Memory, and transmits it to the buffer Memory of FPGA board B through the network, and FPGA board B then Write the data directly to the application cache of server B.
  • the RDMA protocol standard is formulated by the IBTA (IBTA, InfiniBand Trade Association, Infiniband standard makers) organization, and is used to realize data transmission between endpoints.
  • the FPGA implements RDMA function, which needs to follow the RDMA protocol standard, which is more complicated to implement and takes up more FPGAs. Resources; at the same time, the RDMA standard defines the protocol standard for data transmission between two hosts, but it does not define the data transmission protocol standard between FPGA boards in JBOF topology. You need to find the data transfer between FPGA boards in JBOF topology. method.
  • the embodiment of the present invention provides a remote data transfer device and method based on an FPGA cloud platform to solve the problem of data acceleration and transfer between FPGA boards in the JBOF topology under the FPGA cloud platform.
  • the first aspect of the present invention provides a remote data moving device based on an FPGA cloud platform, including a server, a switch, and an FPGA accelerator card.
  • the device includes a plurality of FPGA accelerator cards, and the server transmits data to be accelerated through the switch.
  • the FPGA accelerator card is used for primary and/or secondary acceleration of data, and the FPGA accelerator card is used for moving the accelerated data.
  • the FPGA accelerator card includes a SHELL and an acceleration unit FAU
  • the SHELL is used for the interface connection between the FPGA accelerator card and the switch
  • the SHELL is used for moving data on the FPGA accelerator card
  • the acceleration unit FAU is used for For the primary and/or secondary acceleration of the data on the FPGA accelerator card.
  • the SHELL includes iRDMA, Memory, PCIE and MAC, the Memory is connected to the iRDMA, and the iRDMA is connected to the PCIE and MAC; when the Memory on the FPGA acceleration card is accelerated by the acceleration unit FAU, the iRDMA is used To realize the data movement between the Memory on the FPGA accelerator card and the acceleration unit FAU, when the data is moved on multiple FPGA accelerator cards, the iRDMA realizes the data movement between the memories on the multiple FPGA accelerator cards through the MAC interface.
  • the acceleration algorithm of the acceleration unit FAU includes LZ77 and Huffman.
  • the acceleration algorithm LZ77 achieves a one-time acceleration of data by performing the first stage compression on the data on the FPGA acceleration card, and the acceleration algorithm Huffman performs the first stage compression on the FPGA acceleration card. Accelerate the data once and perform the second stage of compression to realize the second acceleration of the data.
  • the iRDMA includes a bridge module Bridge, a message processing module pkt_pro, and a parsing module fau_parse.
  • the message processing module pkt_pro parses and encapsulates the data movement read and write instruction messages received through the PCIE interface or the MAC interface.
  • the parsing module fau_parse is used to parse the data transfer read instruction message initiated by the acceleration unit FAU, and the bridge module Bridge is used to convert the transfer instructions parsed by the bridge module Bridge and the message processing module pkt_pro into an interface for reading and writing Memory Timing.
  • the second aspect of the present invention provides a remote data transfer method based on FPGA cloud platform, including:
  • the data to be accelerated is transmitted from the server to the FPGA acceleration card through the switch;
  • FPGA acceleration card performs primary and/or secondary acceleration on the data to be accelerated
  • the FPGA accelerator card moves the accelerated data.
  • the FPGA accelerator card performs primary and/or secondary acceleration of the data to be accelerated specifically as follows:
  • the acceleration unit FAU of the FPGA acceleration card performs primary and/or secondary acceleration on the data to be accelerated.
  • the FPGA accelerator card moves the accelerated data specifically as follows:
  • the iRDMA of the FPGA accelerator card is used to realize the data movement between the Memory on the FPGA accelerator card and the acceleration unit FAU.
  • the iRDMA realizes data movement between memories on multiple FPGA accelerator cards through the MAC interface.
  • the acceleration algorithm of the acceleration unit FAU includes LZ77 and Huffman.
  • the acceleration algorithm LZ77 achieves a one-time acceleration of data by performing the first stage compression on the data on the FPGA acceleration card, and the acceleration algorithm Huffman performs the first stage compression on the FPGA acceleration card. Accelerate the data once and perform the second stage of compression to realize the second acceleration of the data.
  • the iRDMA includes a bridge module Bridge, a message processing module pkt_pro, and a parsing module fau_parse.
  • the message processing module pkt_pro parses and encapsulates the data movement read and write instruction messages received through the PCIE interface or the MAC interface.
  • the parsing module fau_parse is used to parse the data transfer read instruction message initiated by the acceleration unit FAU, and the bridge module Bridge is used to convert the transfer instructions parsed by the bridge module Bridge and the message processing module pkt_pro into an interface for reading and writing Memory Timing.
  • the remote data moving device and method based on the FPGA cloud platform provided by the present invention complete the data moving between multiple FPGA accelerator cards through the read and write instructions defined by the iRDMA on the FPGA accelerator card, the acceleration unit FAU and the MAC interface, and the present invention simplifies It adopts the RoCE RDMA protocol and can be used in JBOF topology with high transmission efficiency, which improves the competitiveness of the company's cloud platform products.
  • Figure 1 is a schematic diagram of the network topology of the existing FPGA cloud platform according to the present invention.
  • Figure 2 is a functional schematic diagram of the existing RDMA technology according to the present invention.
  • Figure 3 is a block diagram of the device structure of the present invention.
  • FIG. 4 is a block diagram of the iRDMA module according to an embodiment of the present invention.
  • Figure 5 is a flow chart of the method of the present invention.
  • the device includes a server, a switch, and an FPGA accelerator card.
  • the device includes multiple FPGA (FPGA, Field Programmable Gate Array) accelerator cards.
  • the server will be accelerated
  • the data is transmitted to the FPGA accelerator card through the switch.
  • the FPGA accelerator card is used to accelerate the data once and/or twice, and the FPGA accelerator card is used to move the accelerated data.
  • FPGA accelerator card includes SHELL (SHELL, FPGA shell unit, static part of FPGA project, user can not change) and acceleration unit FAU (FAU, FPGA Accelerator Unit, FPGA acceleration unit, application acceleration module, dynamically reconfigurable), SHELL It is used for the interface connection between the FPGA accelerator card and the switch. SHELL is used to move the data on the FPGA accelerator card, and the acceleration unit FAU is used to accelerate the data on the FPGA accelerator card once and/or twice.
  • SHELL SHELL, FPGA shell unit, static part of FPGA project, user can not change
  • acceleration unit FAU FAU, FPGA Accelerator Unit, FPGA acceleration unit, application acceleration module, dynamically reconfigurable
  • SHELL implements the interface functions of FPGA, including PCIE DMA interface, MAC interface, Memory interface, etc.).
  • SHELL is a static part of FPGA and cannot be changed by users.
  • the acceleration unit FAU is a user-reconfigurable acceleration unit. Different users can load different acceleration applications, and different boards can also load different applications.
  • the FAU of board A can be a CNN acceleration algorithm, while that of board B FAU is a DNN acceleration algorithm, but the static SHELL part of the two boards remains the same.
  • SHELL includes iRDMA (iRDMA, custom RDMA of the present invention, RDMA for FPGA cloud platform), Memory, PCIE and MAC (MAC, Media Access Control, media access control, in the second layer of the network), Memory connection iRDMA, iRDMA connect PCIE and MAC; when the server moves the data to the FPGA accelerator card, iRDMA realizes the data movement between the CPU Memory on the server and the FPGA accelerator card Memory through the PCIE interface. When the data in the memory on the FPGA accelerator card passes When the acceleration unit FAU accelerates, iRDMA is used to realize the data transfer between the Memory on the FPGA accelerator card and the acceleration unit FAU. When the accelerated data is moved on multiple FPGA accelerator cards, iRDMA implements multiple FPGA accelerator cards through the MAC interface Data movement between the upper memory.
  • iRDMA iRDMA, custom RDMA of the present invention, RDMA for FPGA cloud platform
  • Memory PCIE and MAC (MAC, Media Access Control, media access control, in the second layer of
  • the acceleration algorithm of the acceleration unit FAU includes LZ77 and Huffman.
  • the acceleration algorithm LZ77 implements the first-stage compression of the data on the FPGA accelerator card to achieve data acceleration
  • the acceleration algorithm Huffman implements the second-stage compression of the previous acceleration data of the FPGA accelerator card to achieve data. Second acceleration.
  • the acceleration algorithm combines the dictionary mode LZ77 algorithm and the statistically redundant Huffman algorithm, which can achieve a high compression rate.
  • the Huffman algorithm relies on the LZ77 algorithm, and these two algorithms can respectively accelerate the processing for two FPGA accelerator cards.
  • the iRDMA includes a bridge module Bridge, a message processing module pkt_pro, and a parsing module fau_parse.
  • the message processing module pkt_pro transfers read and write instructions to data received through the PCIE interface or the MAC interface.
  • the message is parsed and packaged.
  • the parsing module fau_parse is used to parse the data transfer read instruction message initiated by the acceleration unit FAU, and the bridge module Bridge is used to convert the transfer instruction parsed by the bridge module Bridge and the message processing module pkt_pro into read Interface timing for writing Memory.
  • the message processing module pkt_proc receives the iRDMA read and write instruction messages input from the PCIE interface or the MAC interface, and sends read and write instructions to the bridge module Bridge after the message analysis and processing, and the bridge module Bridge converts the read and write instructions into read and write memory interfaces.
  • the write or read of Memory is completed in time sequence.
  • the bridge module Bridge If it is an iRDMA read command, the bridge module Bridge reads the data from the Memory and sends it to the message processing module pkt_proc, and the message processing module pkt_proc completes the message encapsulation and other processing and sends it to the PCIE interface or MAC interface.
  • the acceleration unit FAU initiates the iRDMA_rd instruction after completing the data acceleration processing. After processing by the message processing module pkt_proc, it sends a read instruction to the bridge module Bridge.
  • the bridge module Bridge converts the read instruction into a memory read sequence. The data read from the memory passes through the bridge module Bridge. After processing, it is sent to the message processing module pkt_proc, and the message processing module pkt_proc encapsulates the message and sends it to the PCIE interface or MAC interface for output.
  • the method of the present invention customizes a simple and efficient method that can complete data movement under the JBOF topology FPGA cloud platform.
  • FPGA parses the custom iRDMA read and write instruction message, automatically completes the data movement, and the acceleration unit FAU can also trigger the instruction to perform the data movement move.
  • the method includes:
  • the data to be accelerated is transmitted from the server to the FPGA acceleration card through the switch;
  • FPGA acceleration card performs primary and/or secondary acceleration on the data to be accelerated
  • the FPGA accelerator card moves the accelerated data.
  • the transmission of the data to be accelerated from the server through the switch to the FPGA accelerator card is specifically: the data to be accelerated is transmitted from the server to the switch, and the data to be accelerated is transmitted from the switch to the PCIE interface of the FPGA accelerator card.
  • the primary and/or secondary acceleration of the data to be accelerated by the FPGA accelerator card is specifically: the acceleration unit FAU of the FPGA accelerator card performs primary and/or secondary acceleration on the data to be accelerated.
  • the FPGA accelerator card moves the accelerated data as follows:
  • the iRDMA of the FPGA accelerator card is used to realize the data movement between the memory on the FPGA accelerator card and the acceleration unit FAU.
  • the accelerated data is in multiple FPGA accelerator cards
  • iRDMA realizes data movement between memories on multiple FPGA accelerator cards through the MAC interface.
  • the acceleration algorithm of the acceleration unit FAU includes LZ77 and Huffman.
  • the acceleration algorithm LZ77 implements the first-stage compression of the data on the FPGA accelerator card to accelerate the data once
  • the acceleration algorithm Huffman implements the second-stage compression of the previous acceleration data of the FPGA accelerator card. Accelerate the data twice.
  • iRDMA includes the bridge module Bridge, the message processing module pkt_pro and the parsing module fau_parse.
  • the message processing module pkt_pro parses and packs the data transfer read and write instruction messages received through the PCIE interface or the MAC interface.
  • the parsing module fau_parse is used to accelerate the unit
  • the data transfer read instruction message initiated by FAU is parsed, and the bridge module Bridge is used to convert the transfer instructions parsed by the bridge module Bridge and the message processing module pkt_pro into the interface timing of reading and writing Memory.
  • the data to be accelerated is transmitted from the server through the switch to the PCIE interface of the first FPGA accelerator card;
  • the first iRDMA of the first FPGA accelerator card receives the write instruction, and stores the data to be accelerated on the PCIE interface into the first Memory of the first FPGA accelerator card.
  • the first iRDMA receives the read instruction and extracts the data to be accelerated from the first Memory.
  • the data is read out to the first acceleration unit FAU of the first FPGA acceleration card, and the first acceleration unit FAU uses the acceleration algorithm LZ77 to compress the data to be accelerated in the first stage to achieve the first acceleration of the data;
  • the first acceleration unit FAU After the first acceleration unit FAU completes the first data acceleration, it sends the iRDMA read instruction.
  • the first iRDMA of the first FPGA acceleration card receives the read instruction, and reads the first acceleration data from the first Memory.
  • One iRDMA encapsulates the primary acceleration data into iRDMA write instructions, and transmits them to the second FPGA accelerator card through the first MAC interface of the first FPGA accelerator card;
  • the second FPGA accelerator card receives the iRDMA write command and stores the one-time acceleration data in the second Memory of the second FPGA accelerator card.
  • the second iRDMA of the second FPGA accelerator card receives the read command and accelerates once from the second Memory.
  • the data is read out to the second acceleration unit FAU of the second FPGA acceleration card, and the second acceleration unit FAU uses the acceleration algorithm Huffman to compress the first acceleration data in the second stage to achieve the second acceleration of the data.
  • the self-defined iRDMA data transfer method of the present invention is applied to the JBOF network topology, and realizes the FPGA cloud platform data transfer simply and efficiently.
  • the iRDMA module uses FPGA's 15K LUT (LUT, Look-Up-Table, look-up table, important resource of FPGA) resources, occupying about 1% of VU37P FPGA LUT resources, while RoCE RDMA requires 40K LUT resources, occupying VU37P (VU37P, an FPGA with more FPGA resources from Xilinx) FPGA has about 3% of LUT resources, so iRDMA greatly simplifies the efficiency of data transfer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Logic Circuits (AREA)
  • Advance Control (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

一种基于FPGA云平台的远端数据搬移装置和方法,装置包括服务器、交换机和FPGA加速卡,所述装置包括多个FPGA加速卡,所述服务器将待加速的数据通过交换机传输给所述FPGA加速卡,所述FPGA加速卡用于对数据进行一次和/或二次加速,所述FPGA加速卡用于对加速后的数据进行搬移;方法包括:待加速数据从服务器上通过交换机传输到FPGA加速卡,FPGA加速卡对待加速数据进行一次和/或二次加速,FPGA加速卡对加速后的数据进行搬移。本发明通过FPGA加速卡上iRDMA定义的读写指令、加速单元FAU和MAC接口,解决了在FPGA云平台下,JBOF拓扑中FPGA板卡间进行数据的加速和搬移的问题。

Description

一种基于FPGA云平台的远端数据搬移装置和方法
本申请要求于2020年01月13日提交中国专利局、申请号为202010031268.7、发明名称为“一种基于FPGA云平台的远端数据搬移装置和方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及FPGA数据搬移应用技术领域,具体涉及一种基于FPGA云平台的远端数据搬移装置和方法。
背景技术
云计算是一种基于互联网的计算方式,通过这种方式,共享的软硬件资源和信息可以按需提供给计算机和其他设备。在云上,数据的增长每年呈30%左右,同时AI的快速发展也带来了对数据高性能计算的要求,传统CPU无法解决计算性能的问题。FPGA利用其高性能低延时、灵活可扩展、以及低功耗等优势在数据中心用于计算加速。目前,微软,亚马逊,百度,腾讯,阿里等数据中心都推出了FPGA云平台,把FPGA作为云上可共享的资源实现计算加速。多个FPGA加速单元可以通过网络组成计算资源池,实现分布式数据加速。实现分布式FPGA云平台的一项关键技术是如何在不同FPGA加速单元中实现数据搬移,并如何提高数据搬移的效率。
如图1所示,为现有的FPGA云平台网络拓扑示意图,FPGA板卡通过MAC网络接口连接到交换机形成FPGA资源池。FPGA板卡的形态可以是服务器中的PCIE加速卡,也可以是JBOF(JBOF,Just a Bunch Of FPGAs,只有FPGA加速卡的FPGA池)中多个单独的FPGA板卡。FPGA云平台通常用于AI算法,图片处理,基因测序等计算数据量非常大的算法的加速。加速算法需要由多块FPGA板卡共同完成加速,在多个FPGA板卡间进行数据的交互。
RDMA技术是现代化高性能网络通信技术,以硬件加速为基础。RoCE RDMA(RoCE,RDMA over Converged Ethernet)定义了如何在以太网运行RDMA,是目前在FPGA云上普遍使用的技术。RoCE RDMA将数据直接从 一台计算机的内存传输到另一台计算机,无需双方操作系统的介入。如图2所示,为现有RDMA技术的功能示意图,服务器A向服务器B传输数据,服务器A的应用执行RDMA写请求,在不需要内核内存参与的条件下,RDMA写请求从运行在用户空间中的应用中发送到带RDMA功能的FPGA板卡A的缓存Memory中;FPGA板卡A读取缓存Memory中的数据,并通过网络传送到FPGA板卡B的缓存Memory中,FPGA板卡B再将数据直接写入服务器B的应用缓存中。
RDMA协议标准是由IBTA(IBTA,InfiniBand Trade Association,Infiniband标准制定者)组织制定的,用于实现端点间的数据传输,FPGA实现RDMA功能,需要遵循RDMA协议标准,实现较为复杂,占用较多FPGA资源;同时RDMA标准定义了两台主机间数据传输的协议标准,但对于JBOF拓扑中FPGA板卡间的数据传输协议标准并没有定义,需要自己寻找适用于JBOF拓扑中FPGA板卡间数据搬移的方法。
发明内容
本发明实施例中提供了一种基于FPGA云平台的远端数据搬移装置和方法,用以解决在FPGA云平台下,在JBOF拓扑中FPGA板卡间进行数据的加速和搬移的问题。
本发明实施例公开了如下技术方案:
本发明第一方面提供了一种基于FPGA云平台的远端数据搬移装置,包括服务器、交换机和FPGA加速卡,所述装置包括多个FPGA加速卡,所述服务器将待加速的数据通过交换机传输给所述FPGA加速卡,所述FPGA加速卡用于对数据进行一次和/或二次加速,所述FPGA加速卡用于对加速后的数据进行搬移。
进一步地,所述FPGA加速卡包括SHELL和加速单元FAU,所述SHELL用于FPGA加速卡和交换机的接口连接,所述SHELL用于对FPGA加速卡上的数据进行搬移,所述加速单元FAU用于对FPGA加速卡上的数据进行一次和/或二次加速。
进一步地,所述SHELL包括iRDMA、Memory、PCIE和MAC,所述 Memory连接所述iRDMA,所述iRDMA连接所述PCIE和MAC;当FPGA加速卡上Memory通过加速单元FAU加速时,所述iRDMA用于实现FPGA加速卡上Memory与加速单元FAU之间的数据搬移,当数据在多个FPGA加速卡上搬移时,所述iRDMA通过MAC接口实现多个FPGA加速卡上Memory之间的数据搬移。
进一步地,所述加速单元FAU的加速算法包括LZ77和Huffman,所述加速算法LZ77通过对FPGA加速卡上数据进行第一阶段压缩实现对数据一次加速,所述加速算法Huffman通过对FPGA加速卡上一次加速数据进行第二阶段压缩实现对数据二次加速。
进一步地,所述iRDMA包括桥接模块Bridge、报文处理模块pkt_pro和解析模块fau_parse,所述报文处理模块pkt_pro对通过PCIE接口或MAC接口接收的数据搬移读写指令报文进行解析和封包,所述解析模块fau_parse用于对加速单元FAU发起的数据搬移读指令报文进行解析,所述桥接模块Bridge用于将桥接模块Bridge和报文处理模块pkt_pro解析出来的搬移指令转换为读写Memory的接口时序。
本发明第二方面提供了一种基于FPGA云平台的远端数据搬移方法,包括:
待加速数据从服务器上通过交换机传输到FPGA加速卡;
FPGA加速卡对待加速数据进行一次和/或二次加速;
FPGA加速卡对加速后的数据进行搬移。
进一步地,所述FPGA加速卡对待加速数据进行一次和/或二次加速具体为:
所述FPGA加速卡的加速单元FAU对待加速数据进行一次和/或二次加速。
进一步地,所述FPGA加速卡对加速后的数据进行搬移具体为:
当FPGA加速卡上Memory中的数据通过加速单元FAU加速时,所述FPGA加速卡的iRDMA用于实现FPGA加速卡上Memory与加速单元FAU之间的数据搬移,当加速后的数据在多个FPGA加速卡上搬移时,所述iRDMA通过MAC接口实现多个FPGA加速卡上Memory之间的数据搬移。
进一步地,所述加速单元FAU的加速算法包括LZ77和Huffman,所述加速算法LZ77通过对FPGA加速卡上数据进行第一阶段压缩实现对数据一次加速,所述加速算法Huffman通过对FPGA加速卡上一次加速数据进行第二阶段压缩实现对数据二次加速。
进一步地,所述iRDMA包括桥接模块Bridge、报文处理模块pkt_pro和解析模块fau_parse,所述报文处理模块pkt_pro对通过PCIE接口或MAC接口接收的数据搬移读写指令报文进行解析和封包,所述解析模块fau_parse用于对加速单元FAU发起的数据搬移读指令报文进行解析,所述桥接模块Bridge用于将桥接模块Bridge和报文处理模块pkt_pro解析出来的搬移指令转换为读写Memory的接口时序。
发明内容中提供的效果仅仅是实施例的效果,而不是发明所有的全部效果,上述技术方案中的一个技术方案具有如下优点或有益效果:
本发明提供的基于FPGA云平台的远端数据搬移装置和方法,通过FPGA加速卡上iRDMA定义的读写指令、加速单元FAU和MAC接口,完成多个FPGA加速卡间的数据搬移,本发明简化了RoCE RDMA协议,并可用于JBOF拓扑,传输效率高,提高了公司云平台产品的竞争力。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明所述现有FPGA云平台网络拓扑示意图;
图2为本发明所述现有RDMA技术的功能示意图;
图3为本发明所述装置结构框图;
图4为本发明实施例所述iRDMA模块框图;
图5为本发明所述方法流程图。
具体实施方式
为了能清楚说明本方案的技术特点,下面通过具体实施方式,并结合其附图,对本发明进行详细阐述。下文的公开提供了许多不同的实施例或例子用来实现本发明的不同结构。为了简化本发明的公开,下文中对特定例子的部件和设置进行描述。此外,本发明可以在不同例子中重复参考数字和/或字母。这种重复是为了简化和清楚的目的,其本身不指示所讨论各种实施例和/或设置之间的关系。应当注意,在附图中所图示的部件不一定按比例绘制。本发明省略了对公知组件和处理技术及工艺的描述以避免不必要地限制本发明。
如图3所示,为本发明装置结构框图,装置包括服务器、交换机和FPGA加速卡,装置包括多个FPGA(FPGA,Field Programmable Gate Array,现场可编程门阵列)加速卡,服务器将待加速的数据通过交换机传输给FPGA加速卡,FPGA加速卡用于对数据进行一次和/或二次加速,FPGA加速卡用于对加速后的数据进行搬移。
FPGA加速卡包括SHELL(SHELL,FPGA外壳单元,FPGA工程中的静态部分,用户不可更改)和加速单元FAU(FAU,FPGA Accelerator Unit,FPGA加速单元,应用加速模块,可动态可重配置),SHELL用于FPGA加速卡和交换机的接口连接,SHELL用于对FPGA加速卡上的数据进行搬移,加速单元FAU用于对FPGA加速卡上的数据进行一次和/或二次加速。
SHELL实现了FPGA的接口功能,包括PCIE DMA接口,MAC接口,Memory接口等),SHELL是FPGA的静态部分,用户不可更改。加速单元FAU为用户可重配置的加速单元,不同的用户可加载不同的加速应用,不同的板卡也可加载不同的应用,如板卡A的FAU可以是CNN加速算法,而板卡B的FAU是DNN加速算法,但两块板卡的静态SHELL部分保持一致。
SHELL包括iRDMA(iRDMA,本发明自定义的RDMA,用于FPGA云平台的RDMA)、Memory、PCIE和MAC(MAC,Media Access Control,媒体存取控制,处于网络中的第二层),Memory连接iRDMA,iRDMA连接PCIE和MAC;当服务器将数据搬移至FPGA加速卡上时,iRDMA通过PCIE接口实现服务器上CPU Memory与FPGA加速卡Memory之间的数据搬 移,当FPGA加速卡上Memory中的数据通过加速单元FAU加速时,iRDMA用于实现FPGA加速卡上Memory与加速单元FAU之间的数据搬移,当加速后的数据在多个FPGA加速卡上搬移时,iRDMA通过MAC接口实现多个FPGA加速卡上Memory之间的数据搬移。
加速单元FAU的加速算法包括LZ77和Huffman,加速算法LZ77通过对FPGA加速卡上数据进行第一阶段压缩实现数据一次加速,加速算法Huffman通过对FPGA加速卡上一次加速数据进行第二阶段压缩实现数据二次加速。
加速算法综合了字典模式LZ77算法和统计冗余的Huffman两种算法,可以达到很高的压缩率。其中Huffman算法依赖LZ77算法,这两种算法可以分别给两个FPGA加速卡进行加速处理。
如图4所示,为本发明实施例iRDMA模块框图,iRDMA包括桥接模块Bridge、报文处理模块pkt_pro和解析模块fau_parse,报文处理模块pkt_pro对通过PCIE接口或MAC接口接收的数据搬移读写指令报文进行解析和封包,解析模块fau_parse用于对加速单元FAU发起的数据搬移读指令报文进行解析,桥接模块Bridge用于将桥接模块Bridge和报文处理模块pkt_pro解析出来的搬移指令转换为读写Memory的接口时序。
报文处理模块pkt_proc接收从PCIE接口或MAC接口输入的iRDMA读写指令报文,经报文解析处理后发送读写指令给桥接模块Bridge,桥接模块Bridge把读写指令转换成读写Memory接口的时序完成Memory的写或读。
如果是iRDMA读指令,则桥接模块Bridge从Memory读取数据后发送给报文处理模块pkt_proc,文处理模块pkt_proc完成报文封装等处理后发送给PCIE接口或MAC接口。
加速单元FAU完成数据加速处理后发起iRDMA_rd指令,经报文处理模块pkt_proc处理后发送读指令给桥接模块Bridge,桥接模块Bridge把读指令转换成Memory读时序,从Memory读取的数据经桥接模块Bridge处理后发送给报文处理模块pkt_proc,报文处理模块pkt_proc进行报文封装后发送给PCIE接口或MAC接口输出。
本发明方法自定义能在JBOF拓扑FPGA云平台下完成数据搬移的简单高效的方法,FPGA解析自定义iRDMA读写指令报文,自动完成数据的搬移,并且加速单元FAU也能触发指令进行数据的搬移。
如图5所示,为本发明方法流程图,方法包括:
待加速数据从服务器上通过交换机传输到FPGA加速卡;
FPGA加速卡对待加速数据进行一次和/或二次加速;
FPGA加速卡对加速后的数据进行搬移。
待加速数据从服务器上通过交换机传输到FPGA加速卡具体为:待加速数据从服务器上传输到交换机,待加速数据再从交换机传输到FPGA加速卡的PCIE接口。
FPGA加速卡对待加速数据进行一次和/或二次加速具体为:FPGA加速卡的加速单元FAU对对待加速数据进行一次和/或二次加速。
FPGA加速卡对加速后的数据进行搬移具体为:
当FPGA加速卡上Memory中的数据通过加速单元FAU加速时,FPGA加速卡的iRDMA用于实现FPGA加速卡上Memory与加速单元FAU之间的数据搬移,当加速后的数据在多个FPGA加速卡上搬移时,iRDMA通过MAC接口实现多个FPGA加速卡上Memory之间的数据搬移。
加速单元FAU的加速算法包括LZ77和Huffman,加速算法LZ77通过对FPGA加速卡上数据进行第一阶段压缩实现对数据一次加速,加速算法Huffman通过对FPGA加速卡上一次加速数据进行第二阶段压缩实现对数据二次加速。
iRDMA包括桥接模块Bridge、报文处理模块pkt_pro和解析模块fau_parse,报文处理模块pkt_pro对通过PCIE接口或MAC接口接收的数据搬移读写指令报文进行解析和封包,解析模块fau_parse用于对加速单元FAU发起的数据搬移读指令报文进行解析,桥接模块Bridge用于将桥接模块Bridge和报文处理模块pkt_pro解析出来的搬移指令转换为读写Memory的接口时序。
本发明方法详细工作过程为:
待加速数据从服务器上通过交换机传输到第一FPGA加速卡的PCIE接 口;
第一FPGA加速卡的第一iRDMA接收写指令,并将PCIE接口上的待加速数据存入第一FPGA加速卡的第一Memory,第一iRDMA接收读指令,并从第一Memory中将待加速数据读出到第一FPGA加速卡的第一加速单元FAU,第一加速单元FAU通过加速算法LZ77对待加速数据进行第一阶段压缩,实现数据第一次加速;
第一加速单元FAU完成数据一次加速后,发送iRDMA读指令,第一FPGA加速卡的第一iRDMA接收读指令,并将一次加速数据从第一Memory中读取出来,第一FPGA加速卡的第一iRDMA将一次加速数据封装成iRDMA写指令,并通过第一FPGA加速卡的第一MAC接口传给第二FPGA加速卡;
第二FPGA加速卡接收iRDMA写指令,并将一次加速数据存入第二FPGA加速卡的第二Memory中,第二FPGA加速卡的第二iRDMA接收读指令,并从第二Memory中将一次加速数据读出到第二FPGA加速卡的第二加速单元FAU,第二加速单元FAU通过加速算法Huffman对一次加速数据进行第二阶段压缩,实现数据第二次加速。
本发明自定义iRDMA数据搬移方法用于JBOF网络拓扑,简单高效实现了FPGA云平台数据搬移。iRDMA模块使用了FPGA的15K的LUT(LUT,Look-Up-Table,查找表,FPGA的重要资源)资源,占用VU37P FPGA约1%的LUT资源,而RoCE RDMA需要占用40K的LUT资源,占用VU37P(VU37P,Xilinx FPGA资源较多的一款FPGA)FPGA约3%的LUT资源,因此iRDMA大大简化了数据搬移的效率。
以上所述只是本发明的优选实施方式,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也被视为本发明的保护范围。

Claims (10)

  1. 一种基于FPGA云平台的远端数据搬移装置,其特征在于,所述装置包括服务器、交换机和多个FPGA加速卡,所述服务器将待加速的数据通过交换机传输给所述FPGA加速卡,所述FPGA加速卡用于对数据进行一次和/或二次加速,所述FPGA加速卡用于对加速后的数据进行搬移。
  2. 根据权利要求1所述的一种基于FPGA云平台的远端数据搬移装置,其特征在于,所述FPGA加速卡包括SHELL和加速单元FAU,所述SHELL用于FPGA加速卡和交换机的接口连接,所述SHELL用于对FPGA加速卡上的数据进行搬移,所述加速单元FAU用于对FPGA加速卡上的数据进行一次和/或二次加速。
  3. 根据权利要求2所述的一种基于FPGA云平台的远端数据搬移装置,其特征在于,所述SHELL包括iRDMA、Memory、PCIE和MAC,所述Memory连接所述iRDMA,所述iRDMA连接所述PCIE和MAC;当FPGA加速卡上Memory中的数据通过加速单元FAU加速时,所述iRDMA用于实现FPGA加速卡上Memory与加速单元FAU之间的数据搬移,当加速后的数据在多个FPGA加速卡上搬移时,所述iRDMA通过MAC接口实现多个FPGA加速卡上Memory之间的数据搬移。
  4. 根据权利要求2所述的一种基于FPGA云平台的远端数据搬移装置,其特征在于,所述加速单元FAU的加速算法包括LZ77和Huffman,所述加速算法LZ77通过对FPGA加速卡上数据进行第一阶段压缩实现对数据一次加速,所述加速算法Huffman通过对FPGA加速卡上一次加速数据进行第二阶段压缩实现对数据二次加速。
  5. 根据权利要求3所述的一种基于FPGA云平台的远端数据搬移装置,其特征在于,所述iRDMA包括桥接模块Bridge、报文处理模块pkt_pro和解析模块fau_parse,所述报文处理模块pkt_pro对通过PCIE接口或MAC接口接收的数据搬移读写指令报文进行解析和封包,所述解析模块fau_parse用于对加速单元FAU发起的数据搬移读指令报文进行解析,所述桥接模块Bridge用于将桥接模块Bridge和报文处理模块pkt_pro解析出来的搬移指令转换为读写Memory的接口时序。
  6. 一种基于FPGA云平台的远端数据搬移方法,基于权利要求1-5任一项所述装置实现,其特征在于,所述方法包括:
    待加速数据从服务器上通过交换机传输到FPGA加速卡;
    FPGA加速卡对待加速数据进行一次和/或二次加速;
    FPGA加速卡对加速后的数据进行搬移。
  7. 根据权利要求6所述的一种基于FPGA云平台的远端数据搬移方法,其特征在于,所述FPGA加速卡对待加速数据进行一次和/或二次加速具体为:
    所述FPGA加速卡的加速单元FAU对待加速数据进行一次和/或二次加速。
  8. 根据权利要求6所述的一种基于FPGA云平台的远端数据搬移方法,其特征在于,所述FPGA加速卡对加速后的数据进行搬移具体为:
    当FPGA加速卡上Memory中的数据通过加速单元FAU加速时,所述FPGA加速卡的iRDMA用于实现FPGA加速卡上Memory与加速单元FAU之间的数据搬移,当加速后的数据在多个FPGA加速卡上搬移时,所述iRDMA通过MAC接口实现多个FPGA加速卡上Memory之间的数据搬移。
  9. 根据权利要求7所述的一种基于FPGA云平台的远端数据搬移方法,其特征在于,所述加速单元FAU的加速算法包括LZ77和Huffman,所述加速算法LZ77通过对FPGA加速卡上数据进行第一阶段压缩实现对数据一次加速,所述加速算法Huffman通过对FPGA加速卡上一次加速数据进行第二阶段压缩实现对数据二次加速。
  10. 根据权利要求8所述的一种基于FPGA云平台的远端数据搬移方法,其特征在于,所述iRDMA包括桥接模块Bridge、报文处理模块pkt_pro和解析模块fau_parse,所述报文处理模块pkt_pro对通过FPGA加速卡的PCIE接口或MAC接口接收的数据搬移读写指令报文进行解析和封包,所述解析模块fau_parse用于对加速单元FAU发起的数据搬移读指令报文进行解析,所述桥接模块Bridge用于将桥接模块Bridge和报文处理模块pkt_pro解析出来的搬移指令转换为读写Memory的接口时序。
PCT/CN2020/111006 2020-01-13 2020-08-25 一种基于fpga云平台的远端数据搬移装置和方法 WO2021143135A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/792,265 US11868297B2 (en) 2020-01-13 2020-08-25 Far-end data migration device and method based on FPGA cloud platform

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010031268.7 2020-01-13
CN202010031268.7A CN111262917A (zh) 2020-01-13 2020-01-13 一种基于fpga云平台的远端数据搬移装置和方法

Publications (1)

Publication Number Publication Date
WO2021143135A1 true WO2021143135A1 (zh) 2021-07-22

Family

ID=70953978

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111006 WO2021143135A1 (zh) 2020-01-13 2020-08-25 一种基于fpga云平台的远端数据搬移装置和方法

Country Status (3)

Country Link
US (1) US11868297B2 (zh)
CN (1) CN111262917A (zh)
WO (1) WO2021143135A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756379A (zh) * 2022-05-20 2022-07-15 苏州浪潮智能科技有限公司 一种基于混合加速卡进行任务训练的方法及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112087471A (zh) * 2020-09-27 2020-12-15 山东云海国创云计算装备产业创新中心有限公司 一种数据传输方法及fpga云平台
CN112416840B (zh) 2020-11-06 2023-05-26 浪潮(北京)电子信息产业有限公司 一种计算资源的远程映射方法、装置、设备及存储介质
CN112527714B (zh) * 2020-11-13 2023-03-28 苏州浪潮智能科技有限公司 一种服务器的peci信号互联方法、系统、设备以及介质
CN116684506B (zh) * 2023-08-02 2023-11-07 浪潮电子信息产业股份有限公司 数据处理方法、系统、电子设备及计算机可读存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183683A (zh) * 2015-08-31 2015-12-23 浪潮(北京)电子信息产业有限公司 一种多fpga芯片加速卡
CN106250349A (zh) * 2016-08-08 2016-12-21 浪潮(北京)电子信息产业有限公司 一种高能效异构计算系统
CN106547720A (zh) * 2015-09-17 2017-03-29 张洪 一种基于fpga的服务器加速技术
CN106681793A (zh) * 2016-11-25 2017-05-17 同济大学 一种基于kvm的加速器虚拟化数据处理系统及方法
CN106778015A (zh) * 2016-12-29 2017-05-31 哈尔滨工业大学(威海) 一种基于云平台中fpga异构加速基因计算方法
WO2019165355A1 (en) * 2018-02-25 2019-08-29 Intel Corporation Technologies for nic port reduction with accelerated switching
CN110519090A (zh) * 2019-08-23 2019-11-29 苏州浪潮智能科技有限公司 一种fpga云平台的加速卡分配方法、系统及相关组件
CN110622145A (zh) * 2017-05-15 2019-12-27 莫列斯有限公司 可重新配置的服务器以及具有可重新配置的服务器的服务器机架

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103002046B (zh) * 2012-12-18 2015-07-08 无锡众志和达数据计算股份有限公司 多系统数据拷贝的rdma装置
CN103647807B (zh) * 2013-11-27 2017-12-15 华为技术有限公司 一种信息缓存方法、装置和通信设备
US9792154B2 (en) * 2015-04-17 2017-10-17 Microsoft Technology Licensing, Llc Data processing system having a hardware acceleration plane and a software plane
US10169065B1 (en) * 2016-06-29 2019-01-01 Altera Corporation Live migration of hardware accelerated applications
US10614028B2 (en) * 2017-09-14 2020-04-07 Microsoft Technology Licensing, Llc Network traffic routing in distributed computing systems
US11270201B2 (en) * 2017-12-29 2022-03-08 Intel Corporation Communication optimizations for distributed machine learning
CN114095427A (zh) * 2017-12-29 2022-02-25 西安华为技术有限公司 一种处理数据报文的方法和网卡
CN110647480B (zh) * 2018-06-26 2023-10-13 华为技术有限公司 数据处理方法、远程直接访存网卡和设备
CN109032982A (zh) * 2018-08-02 2018-12-18 郑州云海信息技术有限公司 一种数据处理方法、装置、设备、系统、fpga板卡及其组合
CN109857620A (zh) * 2019-03-06 2019-06-07 苏州浪潮智能科技有限公司 加速卡辅助功能管理系统、方法、装置及相关组件
CN110399719A (zh) * 2019-06-28 2019-11-01 苏州浪潮智能科技有限公司 Bit文件加载方法、装置、设备及计算机可读存储介质
CN110618956B (zh) * 2019-08-01 2021-06-29 苏州浪潮智能科技有限公司 一种bmc云平台资源池化方法与系统
US11228539B2 (en) * 2019-08-14 2022-01-18 Intel Corporation Technologies for managing disaggregated accelerator networks based on remote direct memory access

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183683A (zh) * 2015-08-31 2015-12-23 浪潮(北京)电子信息产业有限公司 一种多fpga芯片加速卡
CN106547720A (zh) * 2015-09-17 2017-03-29 张洪 一种基于fpga的服务器加速技术
CN106250349A (zh) * 2016-08-08 2016-12-21 浪潮(北京)电子信息产业有限公司 一种高能效异构计算系统
CN106681793A (zh) * 2016-11-25 2017-05-17 同济大学 一种基于kvm的加速器虚拟化数据处理系统及方法
CN106778015A (zh) * 2016-12-29 2017-05-31 哈尔滨工业大学(威海) 一种基于云平台中fpga异构加速基因计算方法
CN110622145A (zh) * 2017-05-15 2019-12-27 莫列斯有限公司 可重新配置的服务器以及具有可重新配置的服务器的服务器机架
WO2019165355A1 (en) * 2018-02-25 2019-08-29 Intel Corporation Technologies for nic port reduction with accelerated switching
CN110519090A (zh) * 2019-08-23 2019-11-29 苏州浪潮智能科技有限公司 一种fpga云平台的加速卡分配方法、系统及相关组件

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756379A (zh) * 2022-05-20 2022-07-15 苏州浪潮智能科技有限公司 一种基于混合加速卡进行任务训练的方法及系统
CN114756379B (zh) * 2022-05-20 2024-06-11 苏州浪潮智能科技有限公司 一种基于混合加速卡进行任务训练的方法及系统

Also Published As

Publication number Publication date
US11868297B2 (en) 2024-01-09
CN111262917A (zh) 2020-06-09
US20230045601A1 (en) 2023-02-09

Similar Documents

Publication Publication Date Title
WO2021143135A1 (zh) 一种基于fpga云平台的远端数据搬移装置和方法
US10275375B2 (en) Network interface controller with compression capabilities
JP7141902B2 (ja) ブリッジ装置、ブリッジ装置を用いたストレージ隣接演算方法
US10152441B2 (en) Host bus access by add-on devices via a network interface controller
US7937447B1 (en) Communication between computer systems over an input/output (I/O) bus
US20140032796A1 (en) Input/output processing
US9864717B2 (en) Input/output processing
CN112291293B (zh) 任务处理方法、相关设备及计算机存储介质
US10609125B2 (en) Method and system for transmitting communication data
WO2023155526A1 (zh) 一种数据流处理方法、存储控制节点及非易失性可读存储介质
WO2022032984A1 (zh) 一种mqtt协议仿真方法及仿真设备
WO2023000670A1 (zh) 数据写入方法、数据读取方法、装置、设备、系统及介质
CN116069711B (zh) 直接内存访问控制器、异构设备、内存访问方法及介质
CN114153778A (zh) 跨网络桥接
WO2024082944A1 (zh) 一种多处理器数据交互方法、装置、设备及存储介质
KR20240004315A (ko) Smartnic들 내의 네트워크 연결형 mpi 프로세싱 아키텍처
US20140025859A1 (en) Input/output processing
CN113691466B (zh) 一种数据的传输方法、智能网卡、计算设备及存储介质
CN117240935A (zh) 基于dpu的数据平面转发方法、装置、设备及介质
CN112929183A (zh) 智能网卡、报文传输方法、装置、设备及存储介质
WO2023030195A1 (zh) 缓存管理方法和装置、控制程序及控制器
CN116136790A (zh) 任务处理方法和装置
US20240223500A1 (en) Peripheral component interconnect express over fabric networks
CN117041147B (zh) 智能网卡设备、主机设备和方法及系统
WO2023231330A1 (zh) 一种池化平台的数据处理方法、装置、设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20914632

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20914632

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20914632

Country of ref document: EP

Kind code of ref document: A1