WO2021135283A1 - Heterogeneous computing system and computing method therefor - Google Patents

Heterogeneous computing system and computing method therefor Download PDF

Info

Publication number
WO2021135283A1
WO2021135283A1 PCT/CN2020/110980 CN2020110980W WO2021135283A1 WO 2021135283 A1 WO2021135283 A1 WO 2021135283A1 CN 2020110980 W CN2020110980 W CN 2020110980W WO 2021135283 A1 WO2021135283 A1 WO 2021135283A1
Authority
WO
WIPO (PCT)
Prior art keywords
accelerator card
root
level
local server
card
Prior art date
Application number
PCT/CN2020/110980
Other languages
French (fr)
Chinese (zh)
Inventor
许溢允
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2021135283A1 publication Critical patent/WO2021135283A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4221Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus

Definitions

  • This application relates to the field of computer technology, and in particular to a heterogeneous computing system and a computing method thereof.
  • the purpose of this application is to provide a heterogeneous computing system to solve the problem that the number of PHYs is limited due to the tight coupling between the current PHY boards and the CPU, and the communication between the PHY boards requires server computing resources.
  • this application provides a heterogeneous computing system, including: a local server, a first root-level accelerator card, and a secondary accelerator card.
  • the first root-level accelerator card directly communicates with the A PHY heterogeneous accelerator card connected to a local server, where the secondary accelerator card is a PHY heterogeneous accelerator card directly or indirectly connected to the first root-level accelerator card through a MAC module;
  • the local server is used to send source operands to the first root-level accelerator card
  • the first root-level accelerator card is used to allocate the source operands to the secondary accelerator card for calculation, And feedback the calculation results of each of the secondary accelerator cards to the local server.
  • the secondary accelerator card is arranged in the PHY disk cabinet.
  • the first root-level accelerator card is connected to the secondary accelerator card in the PHY panel through a MAC module via an Ethernet switch.
  • the local server when the local server sends the source operand to the first root-level accelerator card, the local server recognizes the first root-level accelerator card through a software driver, and controls the first root-level accelerator card by configuring registers.
  • a level accelerator card reads the source operand locally.
  • the first root-level accelerator card determines the data amount of the source operand when reading the source operand from the local server, and if the data amount of the source operand exceeds a preset threshold, Then, the source operand is packaged according to the configuration register on the local server side, and the data packet is sent to the secondary accelerator card.
  • the secondary accelerator card unpacks the data packet after receiving the data packet, and if the data amount of the source operand obtained by the unpacking exceeds the preset threshold, it calls the implementation based on RTL
  • the interface of encapsulates the source operand obtained by unpacking again, and sends the data packet obtained by encapsulating again to the secondary accelerator card connected to itself.
  • the second root-level accelerator card is a PHY heterogeneous accelerator card directly connected to the local server through a MAC module.
  • the local server when the local server sends the source operand to the second root-level accelerator card, it generates an Ethernet data frame according to the source operand and the target network layer protocol, and sends the Ethernet data frame to The second root-level accelerator card; the second root-level accelerator card processes or forwards the Ethernet data frame according to the target network layer protocol.
  • it further includes: a remote server connected to the secondary accelerator card through a MAC module.
  • this application also provides a heterogeneous computing method, which is implemented based on the above-mentioned heterogeneous computing system, and the method includes:
  • the local server sends the source operands to the first root-level accelerator card through the PCIE module;
  • the first root-level accelerator card allocates the source operand to the secondary accelerator card through the MAC module
  • the secondary accelerator card calculates the source operand to obtain a calculation result
  • the first root-level accelerator card obtains the calculation result of each secondary accelerator card through the MAC module, and feeds it back to the local server through the PCIE module.
  • a heterogeneous computing system provided by this application includes a local server, a first root-level accelerator card, and a secondary accelerator card.
  • the first root-level accelerator card is a PHY heterogeneous acceleration that is directly connected to the local server through a PCIE module.
  • the secondary accelerator card is a PHY heterogeneous accelerator card that is directly or indirectly connected to the first-level accelerator card through the MAC module.
  • the local server is used to send source operands to the first root-level accelerator card, and the first root-level accelerator card then allocates the source operands to the secondary accelerator cards for calculation, and accelerates each secondary accelerator card. The calculation result of the card is fed back to the local server.
  • the first root-level accelerator card in the heterogeneous computing system is not only connected to the local server through the PCIE module, but also connected to the secondary accelerator card through the MAC module. Therefore, on the one hand, the PHY heterogeneous accelerator card is no longer connected to the local server.
  • the CPU is tightly coupled, and the number is no longer limited; on the other hand, communication between PHY heterogeneous accelerator cards no longer needs to pass through the CPU, which reduces the resource occupancy rate of the local server.
  • this application also provides a heterogeneous calculation method, the technical effect of which corresponds to the technical effect of the above-mentioned system, and will not be repeated here.
  • Fig. 1 is a schematic structural diagram of an embodiment of a heterogeneous computing system provided by this application;
  • FIG. 2 is a schematic diagram of the structure of an embodiment of a heterogeneous computing system provided by this application;
  • FIG. 3 is a schematic structural diagram of an embodiment of a heterogeneous computing system provided by this application.
  • FIG. 4 is an implementation flowchart of an embodiment of a heterogeneous computing method provided by this application.
  • the current deployment method is generally the method of binding the capture machine card, that is, each PHY board is directly plugged into the PCIE slot.
  • the capture machine card On the standard bus interface of the local server, when the user applies for the use of the PHY instance, the user will generally be assigned a virtual machine environment, and the user will access and use the board under the virtual machine.
  • the above-mentioned machine-card binding architecture causes the server to be tightly coupled with the PHY board, and the number of PHY boards is limited. If you want to increase the PHY board, you need to support the server. And because there is no direct communication link between PHY boards, it cannot meet the needs of flexible deployment of services, and it cannot form an effective distributed acceleration architecture.
  • this application provides a heterogeneous computing system, so that PHY heterogeneous accelerator cards are no longer tightly coupled with the CPU, and the number is no longer limited, and communication between PHY heterogeneous accelerator cards no longer needs to pass through the CPU. Reduce the resource occupancy rate of the local server.
  • This embodiment includes: a local server, a first root-level accelerator card, and a secondary accelerator card.
  • the PCIE module is directly connected to the PHY heterogeneous accelerator card of the local server, and the secondary accelerator card is the PHY heterogeneous accelerator card directly or indirectly connected to the first root-level accelerator card through the MAC module;
  • the local server is used to send source operands to the first root-level accelerator card
  • the first root-level accelerator card is used to allocate the source operands to the secondary accelerator card for calculation, And feedback the calculation results of each of the secondary accelerator cards to the local server.
  • the PHY heterogeneous accelerator card can use the high-speed computing power of the PHY to accelerate the calculation of the source operands sent by the CPU, and return the calculated results to the CPU, thereby achieving high-performance computing capabilities, such as video encoding and decoding, in-depth Learning, scientific computing, and graphics processing require higher computing capabilities.
  • each PHY heterogeneous accelerator card is the same, and the first-level accelerator card and the secondary accelerator card are only used to distinguish the two connection relationships, as shown in Figure 1. That is, the PCIE module is directly connected to the local server, and the MAC module is directly or indirectly connected to the first root-level accelerator card. In addition, this embodiment does not limit the number of first-level accelerator cards and secondary accelerator cards.
  • this embodiment retains the form of machine-card binding on the one hand, and introduces the BOX OF PHY (PHY disk cabinet) mode on the other hand, as shown in FIG. 2.
  • the secondary accelerator cards are stored in the PHY cabinet, and each secondary accelerator card is connected to each other through the MAC module.
  • the PHY enclosure may include various types of heterogeneous accelerator cards, such as Intel chips, PHY manufacturer chips, and so on.
  • the first root-level accelerator card is connected to the secondary accelerator card in the PHY enclosure through a MAC module via an Ethernet switch, so as to decouple the tight coupling between the PHY and the CPU.
  • the local server When the local server sends the source operand to the first root-level accelerator card, the local server recognizes the first root-level accelerator card through a software driver, and controls the first root-level accelerator card to read the source operation locally by configuring registers. number.
  • the first-level accelerator card When the first-level accelerator card reads the source operand from the local server, it first determines the data amount of the source operand. If the data amount of the source operand exceeds the preset threshold, the source operand is checked according to the configuration register on the local server side. The operand is packaged and the data packet is sent to the secondary accelerator card; if the data amount of the source operand does not exceed the preset threshold, the first root accelerator card completes the calculation of the source operand by itself.
  • the secondary accelerator card After receiving the data packet, the secondary accelerator card unpacks the data packet. If the data volume of the source operand obtained by unpacking exceeds the preset threshold, the interface based on the RTL implementation is called to the source operation obtained from the unpacking The data packet is repacked, and the data packet obtained by the repackaging is sent to the secondary accelerator card connected to itself; if the data amount of the source operand obtained by unpacking exceeds the preset threshold, the source operand is calculated.
  • this embodiment further includes a remote server. As shown in FIG. 3, the remote server is connected to the secondary accelerator card through a MAC module.
  • the remote server uses the secondary accelerator card through the network, and its interface form is MAC to MAC.
  • a software driver is required to packetize according to the format, and the PHY decodes according to the format. Package and match the package type.
  • the remote server uses the secondary accelerator card to accelerate data distribution through the network, and the interface form is MAC to MAC then To MAC, in this scenario, a software driver is required to packetize according to the format, and the secondary accelerator card re-encapsulates the MAC packet and forwards it.
  • the secondary accelerator card When the data transmission path is from the secondary accelerator card to the remote server, the secondary accelerator card sends the calculation result to the remote server, and its interface is in the form of MAC to MAC.
  • the secondary accelerator card needs to be packaged according to the format, and the software is driven. Unpack according to the format.
  • the first root accelerator card When the data transmission path is the remote server ⁇ the first root accelerator card, the first root accelerator card is used locally or through virtual machine passthrough, and its interface form is PCIE. In this scenario, a software driver is required to identify the floor card. The board is used in the manner of configuration registers, and the data is directly transmitted.
  • the first root accelerator card is used for data forwarding locally or through virtual machine passthrough, and the interface form is PCIe to MAC and then to MAC, in this scenario, a software driver is required to identify the floor card, and control the first-level accelerator card to forward data by configuring registers, and the first-level accelerator card needs to perform packet forwarding in the above-mentioned format.
  • the first root-level accelerator card When the data transmission path is the first root-level accelerator card ⁇ remote server, the first root-level accelerator card returns the calculation result to the local server, and its interface form is PCIE. In this scenario, the first root-level accelerator card is required to return the result directly On the local server, the software driver directly receives the data.
  • the secondary accelerator card When the data transmission path is the secondary accelerator card ⁇ the first root accelerator card, the secondary accelerator card sends the calculation result to the first root accelerator card, and its interface form is MAC to MAC. In this scenario, a secondary accelerator card is required Packet in the above format.
  • this embodiment further includes a second root-level accelerator card, and the second root-level accelerator card is a heterogeneous PHY that is directly connected to the local server through a MAC module. Accelerator card.
  • the local server when the local server sends a source operand to the second root-level accelerator card, an Ethernet data frame is generated according to the source operand and the target network layer protocol , And send the Ethernet data frame to the second root-level accelerator card; the second root-level accelerator card processes or forwards the Ethernet data frame according to the target network layer protocol.
  • the heterogeneous computing system provided in this embodiment includes a local server, a first root-level accelerator card, and a secondary accelerator card.
  • the first root-level accelerator card in the heterogeneous computing system is not only connected to the local server through a PCIE module , It is also connected to the secondary accelerator card through the MAC module.
  • applications that need to be accelerated can transmit data to the acceleration card in two ways: PCIE or MAC.
  • the PHY resources that can be divided by users are not restricted by the host, so that PHY resources can be more flexibly allocated and deployed, and seamlessly connect to the existing server cloud ecosystem.
  • the local server sends the source operand to the first root-level accelerator card through the PCIE module;
  • the first root-level accelerator card allocates the source operand to the secondary accelerator card through the MAC module;
  • the secondary accelerator card calculates the source operand to obtain a calculation result
  • the first root-level accelerator card obtains the calculation result of each secondary accelerator card through the MAC module, and feeds it back to the local server through the PCIE module.
  • the remote server sends the source operands to the secondary accelerator card through the MAC module;
  • the secondary accelerator card calculates the source operand to obtain the calculation result
  • the secondary accelerator card sends the calculation result to the remote server through the MAC module.
  • the sending of the source operand by the local server to the first root-level accelerator card includes:
  • the local server recognizes the first root-level accelerator card through a software drive, and controls the first root-level accelerator card to read the source operand locally by means of a configuration register.
  • the method includes:
  • the secondary accelerator card unpacks the data packet, and if the data volume of the source operand obtained by unpacking exceeds the preset threshold, the interface based on the RTL implementation is called to perform the unpacking on the source operand again Encapsulate and send the data packet obtained by enveloping again to the secondary accelerator card connected to itself.
  • the local server generates an Ethernet data frame according to the source operand and the target network layer protocol, and sends the Ethernet data frame to the second root-level accelerator card;
  • the second root-level accelerator card processes or forwards the Ethernet data frame according to the target network layer protocol.
  • the steps of the method or algorithm described in the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Provided are a heterogeneous computing system and a computing method therefor. The heterogeneous computing system comprises a local server, a first root-level acceleration card, and a secondary acceleration card, wherein the first root-level acceleration card is a PHY heterogeneous acceleration card which is directly connected to the local server via a PCIE module; and the secondary acceleration card is a PHY heterogeneous acceleration card which is directly or indirectly connected to the first root-level acceleration card via a MAC module. It can be seen that the first root-level acceleration card in the heterogeneous computing system is not only connected to the local server via the PCIE module, but is also connected to the secondary acceleration card via the MAC module, such that, on one hand, the PHY heterogeneous acceleration card is no longer tightly coupled with a CPU, and the number of the PHY heterogeneous acceleration cards is no longer limited; in addition, the communication between the PHY heterogeneous acceleration cards does not need to be performed via the CPU, thereby reducing the resource occupation rate of the local server.

Description

一种异构计算系统及其计算方法A heterogeneous computing system and its computing method
本申请要求于2019年12月29日提交中国专利局、申请号为201911386453.1、发明名称为“一种异构计算系统及其计算方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on December 29, 2019, the application number is 201911386453.1, and the invention title is "a heterogeneous computing system and its calculation method", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及计算机技术领域,特别涉及一种异构计算系统及其计算方法。This application relates to the field of computer technology, and in particular to a heterogeneous computing system and a computing method thereof.
背景技术Background technique
目前,国内PHY云服务厂商几乎都是采用的单机单卡、单机多卡的绑定模式,即一台服务器上插一张卡,或者一台服务器上插多张卡。在该机卡绑定方式模式下,PHY与CPU紧耦合,用户只能通过主机端的CPU对PHY卡进行访问使用,并且每个用户可划分到的PHY板卡受所绑定板卡个数的限制,板卡与板卡间没有直接的数据通信链路,板卡间若有通信的需求,数据须经过CPU进行转发。At present, almost all domestic PHY cloud service vendors adopt the binding mode of single-machine single-card and single-machine multi-card, that is, one card is inserted into one server, or multiple cards are inserted into one server. In this machine-card binding mode, the PHY is tightly coupled with the CPU. The user can only access and use the PHY card through the CPU on the host side, and the PHY boards that each user can be assigned to are determined by the number of boards bound. Limitations: There is no direct data communication link between the board and the board. If there is a communication requirement between the boards, the data must be forwarded by the CPU.
可见,当前PHY板卡与CPU紧耦合,一方面导致PHY数量受到限制,另一方面PHY板卡之间通信需要占用服务器计算资源。It can be seen that the current PHY boards are tightly coupled with the CPU. On the one hand, the number of PHYs is limited. On the other hand, communication between PHY boards requires server computing resources.
发明内容Summary of the invention
本申请的目的是提供一种异构计算系统,用以解决由于当前PHY板卡与CPU紧耦合,导致PHY数量受到限制,而且PHY板卡之间通信需要占用服务器计算资源的问题。The purpose of this application is to provide a heterogeneous computing system to solve the problem that the number of PHYs is limited due to the tight coupling between the current PHY boards and the CPU, and the communication between the PHY boards requires server computing resources.
为解决上述技术问题,本申请提供了一种异构计算系统,包括:本地服务器、第一根级加速卡、次级加速卡,所述第一根级加速卡为通过PCIE模块直接与所述本地服务器相连接的PHY异构加速卡,所述次级加速卡为通过MAC模块直接或间接地与所述第一根级加速卡相连接的PHY异构加速卡;In order to solve the above technical problems, this application provides a heterogeneous computing system, including: a local server, a first root-level accelerator card, and a secondary accelerator card. The first root-level accelerator card directly communicates with the A PHY heterogeneous accelerator card connected to a local server, where the secondary accelerator card is a PHY heterogeneous accelerator card directly or indirectly connected to the first root-level accelerator card through a MAC module;
其中,所述本地服务器用于向所述第一根级加速卡发送源操作数,所述第一根级加速卡用于将所述源操作数分配给所述次级加速卡以进行计算,并将各个所述次级加速卡的计算结果反馈至所述本地服务器。Wherein, the local server is used to send source operands to the first root-level accelerator card, and the first root-level accelerator card is used to allocate the source operands to the secondary accelerator card for calculation, And feedback the calculation results of each of the secondary accelerator cards to the local server.
优选的,所述次级加速卡设置于PHY盘柜中。Preferably, the secondary accelerator card is arranged in the PHY disk cabinet.
优选的,所述第一根级加速卡通过MAC模块经由以太网交换机与所述PHY盘柜中的所述次级加速卡相连接。Preferably, the first root-level accelerator card is connected to the secondary accelerator card in the PHY panel through a MAC module via an Ethernet switch.
优选的,在所述本地服务器向所述第一根级加速卡发送源操作数时,所述本地服务器通过软件驱动识别所述第一根级加速卡,并以配置寄存器的方式控制所述第一根级加速卡从本地读取所述源操作数。Preferably, when the local server sends the source operand to the first root-level accelerator card, the local server recognizes the first root-level accelerator card through a software driver, and controls the first root-level accelerator card by configuring registers. A level accelerator card reads the source operand locally.
优选的,所述第一根级加速卡在从所述本地服务器读取所述源操作数时,确定所述源操作数的数据量,若所述源操作数的数据量超过预设阈值,则根据所述本地服务器侧的配置寄存器对所述源操作数进行封包,并将数据包发送至所述次级加速卡。Preferably, the first root-level accelerator card determines the data amount of the source operand when reading the source operand from the local server, and if the data amount of the source operand exceeds a preset threshold, Then, the source operand is packaged according to the configuration register on the local server side, and the data packet is sent to the secondary accelerator card.
优选的,所述次级加速卡在接收到所述数据包之后,对所述数据包进行解包,若解包得到的源操作数的数据量超过所述预设阈值,则调用基于RTL实现的接口对所述解包得到的源操作数进行再次封包,并将再次封包得到的数据包发送至与自身连接的次级加速卡。Preferably, the secondary accelerator card unpacks the data packet after receiving the data packet, and if the data amount of the source operand obtained by the unpacking exceeds the preset threshold, it calls the implementation based on RTL The interface of encapsulates the source operand obtained by unpacking again, and sends the data packet obtained by encapsulating again to the secondary accelerator card connected to itself.
优选的,还包括第二根级加速卡,所述第二根级加速卡为通过MAC模块直接与所述本地服务器相连接的PHY异构加速卡。Preferably, it further includes a second root-level accelerator card, and the second root-level accelerator card is a PHY heterogeneous accelerator card directly connected to the local server through a MAC module.
优选的,在所述本地服务器向所述第二根级加速卡发送源操作数时,根据所述源操作数和目标网络层协议生成以太网数据帧,并将所述以太网数据帧发送至所述第二根级加速卡;所述第二根级加速卡根据所述目标网络层协议对所述以太网数据帧进行处理或转发。Preferably, when the local server sends the source operand to the second root-level accelerator card, it generates an Ethernet data frame according to the source operand and the target network layer protocol, and sends the Ethernet data frame to The second root-level accelerator card; the second root-level accelerator card processes or forwards the Ethernet data frame according to the target network layer protocol.
优选的,还包括:远端服务器,所述远端服务器通过MAC模块与所述次级加速卡相连接。Preferably, it further includes: a remote server connected to the secondary accelerator card through a MAC module.
此外,本申请还提供了一种异构计算方法,基于如上所述的异构计算系统实现,该方法包括:In addition, this application also provides a heterogeneous computing method, which is implemented based on the above-mentioned heterogeneous computing system, and the method includes:
本地服务器通过PCIE模块向第一根级加速卡发送源操作数;The local server sends the source operands to the first root-level accelerator card through the PCIE module;
所述第一根级加速卡通过MAC模块将所述源操作数分配给次级加速卡;The first root-level accelerator card allocates the source operand to the secondary accelerator card through the MAC module;
所述次级加速卡对所述源操作数进行计算,得到计算结果;The secondary accelerator card calculates the source operand to obtain a calculation result;
所述第一根级加速卡通过MAC模块获取各个所述次级加速卡的计算 结果,并通过PCIE模块反馈至所述本地服务器。The first root-level accelerator card obtains the calculation result of each secondary accelerator card through the MAC module, and feeds it back to the local server through the PCIE module.
本申请所提供的一种异构计算系统,包括本地服务器、第一根级加速卡、次级加速卡,其中第一根级加速卡为通过PCIE模块直接与本地服务器相连接的PHY异构加速卡,次级加速卡为通过MAC模块直接或间接地与第一根级加速卡相连接的PHY异构加速卡。在异构计算过程中,本地服务器用于向第一根级加速卡发送源操作数,第一根级加速卡进而将源操作数分配给次级加速卡以进行计算,并将各个次级加速卡的计算结果反馈至本地服务器。可见,该异构计算系统中的第一根级加速卡不仅通过PCIE模块与本地服务器相连接,还通过MAC模块与次级加速卡相连接,因此,一方面使得PHY异构加速卡不再与CPU紧耦合,数量不再受到限制;另一方面,PHY异构加速卡之间的通信不再需要经过CPU,降低了本地服务器的资源占用率。A heterogeneous computing system provided by this application includes a local server, a first root-level accelerator card, and a secondary accelerator card. The first root-level accelerator card is a PHY heterogeneous acceleration that is directly connected to the local server through a PCIE module. The secondary accelerator card is a PHY heterogeneous accelerator card that is directly or indirectly connected to the first-level accelerator card through the MAC module. In the process of heterogeneous computing, the local server is used to send source operands to the first root-level accelerator card, and the first root-level accelerator card then allocates the source operands to the secondary accelerator cards for calculation, and accelerates each secondary accelerator card. The calculation result of the card is fed back to the local server. It can be seen that the first root-level accelerator card in the heterogeneous computing system is not only connected to the local server through the PCIE module, but also connected to the secondary accelerator card through the MAC module. Therefore, on the one hand, the PHY heterogeneous accelerator card is no longer connected to the local server. The CPU is tightly coupled, and the number is no longer limited; on the other hand, communication between PHY heterogeneous accelerator cards no longer needs to pass through the CPU, which reduces the resource occupancy rate of the local server.
此外,本申请还提供了一种异构计算方法,其技术效果与上述系统的技术效果相对应,这里不再赘述。In addition, this application also provides a heterogeneous calculation method, the technical effect of which corresponds to the technical effect of the above-mentioned system, and will not be repeated here.
附图说明Description of the drawings
为了更清楚的说明本申请实施例或现有技术的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are merely For some of the embodiments of the present application, for those of ordinary skill in the art, other drawings may be obtained based on these drawings without creative work.
图1为本申请所提供的一种异构计算系统实施例的架构示意图图一;Fig. 1 is a schematic structural diagram of an embodiment of a heterogeneous computing system provided by this application;
图2为本申请所提供的一种异构计算系统实施例的架构示意图图二;FIG. 2 is a schematic diagram of the structure of an embodiment of a heterogeneous computing system provided by this application; FIG.
图3为本申请所提供的一种异构计算系统实施例的架构示意图图三;FIG. 3 is a schematic structural diagram of an embodiment of a heterogeneous computing system provided by this application; FIG.
图4为本申请所提供的一种异构计算方法实施例的实现流程图。FIG. 4 is an implementation flowchart of an embodiment of a heterogeneous computing method provided by this application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the application, the application will be further described in detail below with reference to the accompanying drawings and specific implementations. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
随着PHY板卡在云数据中心应用的展开,PHY板卡开始大规模进行部署,当前的部署方式一般采集机卡绑定的方式,即每张PHY板卡都通过PCIE插槽直接插接到本地服务器的标准总线接口上,用户在申请使用PHY实例时,一般会为用户分配一套虚拟机的环境,用户在虚拟机下对板卡进行访问和使用。上述机卡绑定的架构,造成服务器与PHY板卡紧耦合,PHY板卡数量受到限制,若想增加PHY板卡就需配套服务器。并且由于PHY板卡间无直接通信的链路,无法满足业务弹性部署需求,更无法形成有效的分布式加速架构。With the development of the application of PHY boards in cloud data centers, PHY boards have begun to be deployed on a large scale. The current deployment method is generally the method of binding the capture machine card, that is, each PHY board is directly plugged into the PCIE slot. On the standard bus interface of the local server, when the user applies for the use of the PHY instance, the user will generally be assigned a virtual machine environment, and the user will access and use the board under the virtual machine. The above-mentioned machine-card binding architecture causes the server to be tightly coupled with the PHY board, and the number of PHY boards is limited. If you want to increase the PHY board, you need to support the server. And because there is no direct communication link between PHY boards, it cannot meet the needs of flexible deployment of services, and it cannot form an effective distributed acceleration architecture.
针对上述问题,本申请提供一种异构计算系统,使得PHY异构加速卡不再与CPU紧耦合,数量不再受到限制,并使得PHY异构加速卡之间的通信不再需要经过CPU,降低了本地服务器的资源占用率。In response to the above problems, this application provides a heterogeneous computing system, so that PHY heterogeneous accelerator cards are no longer tightly coupled with the CPU, and the number is no longer limited, and communication between PHY heterogeneous accelerator cards no longer needs to pass through the CPU. Reduce the resource occupancy rate of the local server.
下面对本申请提供的一种异构计算系统实施例进行介绍,参见图1,该实施例包括:本地服务器、第一根级加速卡、次级加速卡,其中,第一根级加速卡为通过PCIE模块直接与所述本地服务器相连接的PHY异构加速卡,次级加速卡为通过MAC模块直接或间接地与所述第一根级加速卡相连接的PHY异构加速卡;The following describes an embodiment of a heterogeneous computing system provided by the present application. See FIG. 1. This embodiment includes: a local server, a first root-level accelerator card, and a secondary accelerator card. The PCIE module is directly connected to the PHY heterogeneous accelerator card of the local server, and the secondary accelerator card is the PHY heterogeneous accelerator card directly or indirectly connected to the first root-level accelerator card through the MAC module;
其中,所述本地服务器用于向所述第一根级加速卡发送源操作数,所述第一根级加速卡用于将所述源操作数分配给所述次级加速卡以进行计算,并将各个所述次级加速卡的计算结果反馈至所述本地服务器。Wherein, the local server is used to send source operands to the first root-level accelerator card, and the first root-level accelerator card is used to allocate the source operands to the secondary accelerator card for calculation, And feedback the calculation results of each of the secondary accelerator cards to the local server.
PHY异构加速卡能够利用PHY的高速计算能力,对CPU所发送的源操作数进行加速计算,并将计算后的结果返回给CPU,从而实现高性能的计算能力,完成诸如视频编解码、深度学习、科学计算以及图形处理等对计算能力要求较高的功能。The PHY heterogeneous accelerator card can use the high-speed computing power of the PHY to accelerate the calculation of the source operands sent by the CPU, and return the calculated results to the CPU, thereby achieving high-performance computing capabilities, such as video encoding and decoding, in-depth Learning, scientific computing, and graphics processing require higher computing capabilities.
需要说明的是,本实施例中,各个PHY异构加速卡的硬件结构和软件框架相同,第一根级加速卡和次级加速卡仅仅用于区别两种连接关系,如图1所示,即通过PCIE模块直接与本地服务器连接的,以及通过MAC模块直接或间接与第一根级加速卡连接的。此外,本实施例不限定第一根级加速卡与次级加速卡的数量。It should be noted that in this embodiment, the hardware structure and software framework of each PHY heterogeneous accelerator card are the same, and the first-level accelerator card and the secondary accelerator card are only used to distinguish the two connection relationships, as shown in Figure 1. That is, the PCIE module is directly connected to the local server, and the MAC module is directly or indirectly connected to the first root-level accelerator card. In addition, this embodiment does not limit the number of first-level accelerator cards and secondary accelerator cards.
作为一种具体的实施方式,本实施例一方面保留机卡绑定的形式,另一方面引入BOX OF PHY(PHY盘柜)模式,如图2所示。在PHY盘柜存放次级加速卡,各个次级加速卡通过MAC模块相互连接。具体的,PHY盘柜可包括各类型的异构加速卡,如Intel芯片、PHY厂商芯片等。所述第一根级加速卡通过MAC模块经由以太网交换机与所述PHY盘柜中的所述次级加速卡相连接,以此解耦PHY与CPU的紧耦合。As a specific implementation manner, this embodiment retains the form of machine-card binding on the one hand, and introduces the BOX OF PHY (PHY disk cabinet) mode on the other hand, as shown in FIG. 2. The secondary accelerator cards are stored in the PHY cabinet, and each secondary accelerator card is connected to each other through the MAC module. Specifically, the PHY enclosure may include various types of heterogeneous accelerator cards, such as Intel chips, PHY manufacturer chips, and so on. The first root-level accelerator card is connected to the secondary accelerator card in the PHY enclosure through a MAC module via an Ethernet switch, so as to decouple the tight coupling between the PHY and the CPU.
在本地服务器向第一根级加速卡发送源操作数时,本地服务器通过软件驱动识别第一根级加速卡,并以配置寄存器的方式控制第一根级加速卡从本地读取所述源操作数。第一根级加速卡在从本地服务器读取源操作数时,首先确定源操作数的数据量,若源操作数的数据量超过预设阈值,则根据本地服务器侧的配置寄存器对所述源操作数进行封包,并将数据包发送至所述次级加速卡;若源操作数的数据量未超过预设阈值,则第一根级加速卡自行完成对源操作数的计算。次级加速卡在接收到数据包之后,对数据包进行解包,若解包得到的源操作数的数据量超过预设阈值,则调用基于RTL实现的接口对所述解包得到的源操作数进行再次封包,并将再次封包得到的数据包发送至与自身连接的次级加速卡;若解包得到的源操作数的数据量超过预设阈值,则对源操作数进行计算。When the local server sends the source operand to the first root-level accelerator card, the local server recognizes the first root-level accelerator card through a software driver, and controls the first root-level accelerator card to read the source operation locally by configuring registers. number. When the first-level accelerator card reads the source operand from the local server, it first determines the data amount of the source operand. If the data amount of the source operand exceeds the preset threshold, the source operand is checked according to the configuration register on the local server side. The operand is packaged and the data packet is sent to the secondary accelerator card; if the data amount of the source operand does not exceed the preset threshold, the first root accelerator card completes the calculation of the source operand by itself. After receiving the data packet, the secondary accelerator card unpacks the data packet. If the data volume of the source operand obtained by unpacking exceeds the preset threshold, the interface based on the RTL implementation is called to the source operation obtained from the unpacking The data packet is repacked, and the data packet obtained by the repackaging is sent to the secondary accelerator card connected to itself; if the data amount of the source operand obtained by unpacking exceeds the preset threshold, the source operand is calculated.
作为一种具体的实施方式,本实施例还包括远端服务器,如图3所示,所述远端服务器通过MAC模块与所述次级加速卡相连接。As a specific implementation manner, this embodiment further includes a remote server. As shown in FIG. 3, the remote server is connected to the secondary accelerator card through a MAC module.
下面对几种典型的数据传输路径进行说明:Several typical data transmission paths are described below:
当数据传输路径为远端服务器→次级加速卡时,远端服务器通过网络使用次级加速卡,其接口形式为MAC到MAC,该场景下需要软件驱动按格式进行封包,PHY按格式进行解包,并匹配包类型。When the data transmission path is the remote server → the secondary accelerator card, the remote server uses the secondary accelerator card through the network, and its interface form is MAC to MAC. In this scenario, a software driver is required to packetize according to the format, and the PHY decodes according to the format. Package and match the package type.
当数据传输路径为远端服务器→次级加速卡→次级加速卡/第一根级加速卡时,远端服务器通过网络使用次级加速卡进行加速数据分发,其接口形式为MAC到MAC再到MAC,该场景下需要软件驱动按格式进行封包,次级加速卡重新封MAC包并转发。When the data transmission path is remote server→secondary accelerator card→secondary accelerator card/first root accelerator card, the remote server uses the secondary accelerator card to accelerate data distribution through the network, and the interface form is MAC to MAC then To MAC, in this scenario, a software driver is required to packetize according to the format, and the secondary accelerator card re-encapsulates the MAC packet and forwards it.
当数据传输路径为次级加速卡→远端服务器时,次级加速卡向远端服务器发送运算结果,其接口形式为MAC到MAC,该场景下需要次级加速 卡按格式进行封包,软件驱动按格式进行解包。When the data transmission path is from the secondary accelerator card to the remote server, the secondary accelerator card sends the calculation result to the remote server, and its interface is in the form of MAC to MAC. In this scenario, the secondary accelerator card needs to be packaged according to the format, and the software is driven. Unpack according to the format.
当数据传输路径为远端服务器→第一根级加速卡时,本地或通过虚拟机passthrough的方式使用第一根级加速卡,其接口形式为PCIE,该场景下需要软件驱动识别本地板卡,以配置寄存器的方式使用板卡,数据直接传输。When the data transmission path is the remote server → the first root accelerator card, the first root accelerator card is used locally or through virtual machine passthrough, and its interface form is PCIE. In this scenario, a software driver is required to identify the floor card. The board is used in the manner of configuration registers, and the data is directly transmitted.
当数据传输路径为远端服务器→第一根级加速卡→次级加速卡时,本地或通过虚拟机passthrough的方式使用第一根级加速卡进行数据转发,其接口形式为PCIe到MAC再到MAC,该场景下需要软件驱动识别本地板卡,以配置寄存器的方式控制第一根级加速卡进行数据转发,且第一根级加速卡需要按上述格式进行封包转发。When the data transmission path is the remote server → the first root accelerator card → the secondary accelerator card, the first root accelerator card is used for data forwarding locally or through virtual machine passthrough, and the interface form is PCIe to MAC and then to MAC, in this scenario, a software driver is required to identify the floor card, and control the first-level accelerator card to forward data by configuring registers, and the first-level accelerator card needs to perform packet forwarding in the above-mentioned format.
当数据传输路径为第一根级加速卡→远端服务器时,第一根级加速卡向本地服务器返回运算结果,其接口形式为PCIE,该场景下需要第一根级加速卡将结果直接返回本地服务器,软件驱动直接接收数据。When the data transmission path is the first root-level accelerator card → remote server, the first root-level accelerator card returns the calculation result to the local server, and its interface form is PCIE. In this scenario, the first root-level accelerator card is required to return the result directly On the local server, the software driver directly receives the data.
当数据传输路径为次级加速卡→第一根级加速卡时,次级加速卡将运算结果发送给第一根级加速卡,其接口形式为MAC到MAC,该场景下需要次级加速卡按上述格式封包。When the data transmission path is the secondary accelerator card → the first root accelerator card, the secondary accelerator card sends the calculation result to the first root accelerator card, and its interface form is MAC to MAC. In this scenario, a secondary accelerator card is required Packet in the above format.
作为一种具体的实施方式,如图3所示,本实施例还包括第二根级加速卡,所述第二根级加速卡为通过MAC模块直接与所述本地服务器相连接的PHY异构加速卡。As a specific implementation manner, as shown in FIG. 3, this embodiment further includes a second root-level accelerator card, and the second root-level accelerator card is a heterogeneous PHY that is directly connected to the local server through a MAC module. Accelerator card.
不同于第一根级加速卡,本实施例中,在所述本地服务器向所述第二根级加速卡发送源操作数时,根据所述源操作数和目标网络层协议生成以太网数据帧,并将所述以太网数据帧发送至所述第二根级加速卡;所述第二根级加速卡根据所述目标网络层协议对所述以太网数据帧进行处理或转发。Different from the first root-level accelerator card, in this embodiment, when the local server sends a source operand to the second root-level accelerator card, an Ethernet data frame is generated according to the source operand and the target network layer protocol , And send the Ethernet data frame to the second root-level accelerator card; the second root-level accelerator card processes or forwards the Ethernet data frame according to the target network layer protocol.
本实施例所提供一种异构计算系统,包括本地服务器、第一根级加速卡、次级加速卡,该异构计算系统中的第一根级加速卡不仅通过PCIE模块与本地服务器相连接,还通过MAC模块与次级加速卡相连接。在上述架构下,需要进行加速的应用可通过PCIE或者MAC两种方式,将数据传输给加速卡。并且用户可划分的PHY资源不在受主机的限制,使PHY资 源能够更加灵活分配、部署,并无缝对接现有的服务器云生态环境。The heterogeneous computing system provided in this embodiment includes a local server, a first root-level accelerator card, and a secondary accelerator card. The first root-level accelerator card in the heterogeneous computing system is not only connected to the local server through a PCIE module , It is also connected to the secondary accelerator card through the MAC module. Under the above architecture, applications that need to be accelerated can transmit data to the acceleration card in two ways: PCIE or MAC. In addition, the PHY resources that can be divided by users are not restricted by the host, so that PHY resources can be more flexibly allocated and deployed, and seamlessly connect to the existing server cloud ecosystem.
下面开始详细介绍本申请提供的一种异构计算方法实施例,参见图4,该实施例基于如上文所述的异构计算系统实现,包括:The following describes in detail an embodiment of a heterogeneous computing method provided by the present application, referring to FIG. 4. This embodiment is implemented based on the heterogeneous computing system as described above, and includes:
S401、本地服务器通过PCIE模块向第一根级加速卡发送源操作数;S401. The local server sends the source operand to the first root-level accelerator card through the PCIE module;
S402、所述第一根级加速卡通过MAC模块将所述源操作数分配给次级加速卡;S402: The first root-level accelerator card allocates the source operand to the secondary accelerator card through the MAC module;
S403、所述次级加速卡对所述源操作数进行计算,得到计算结果;S403: The secondary accelerator card calculates the source operand to obtain a calculation result;
S404、所述第一根级加速卡通过MAC模块获取各个所述次级加速卡的计算结果,并通过PCIE模块反馈至所述本地服务器。S404. The first root-level accelerator card obtains the calculation result of each secondary accelerator card through the MAC module, and feeds it back to the local server through the PCIE module.
作为一种具体的实施方式,还包括:As a specific implementation, it also includes:
远端服务器通过MAC模块向次级加速卡发送源操作数;The remote server sends the source operands to the secondary accelerator card through the MAC module;
次级加速卡对所述源操作数进行计算,得到计算结果;The secondary accelerator card calculates the source operand to obtain the calculation result;
次级加速卡通过MAC模块将计算结果发送至所述远端服务器。The secondary accelerator card sends the calculation result to the remote server through the MAC module.
作为一种具体的实施方式,所述本地服务器向所述第一根级加速卡发送源操作数,包括:As a specific implementation manner, the sending of the source operand by the local server to the first root-level accelerator card includes:
所述本地服务器通过软件驱动识别所述第一根级加速卡,并以配置寄存器的方式控制所述第一根级加速卡从本地读取所述源操作数。The local server recognizes the first root-level accelerator card through a software drive, and controls the first root-level accelerator card to read the source operand locally by means of a configuration register.
作为一种具体的实施方式,在所述次级加速卡在接收到所述数据包之后,包括:As a specific implementation manner, after the secondary accelerator card receives the data packet, the method includes:
次级加速卡对所述数据包进行解包,若解包得到的源操作数的数据量超过所述预设阈值,则调用基于RTL实现的接口对所述解包得到的源操作数进行再次封包,并将再次封包得到的数据包发送至与自身连接的次级加速卡。The secondary accelerator card unpacks the data packet, and if the data volume of the source operand obtained by unpacking exceeds the preset threshold, the interface based on the RTL implementation is called to perform the unpacking on the source operand again Encapsulate and send the data packet obtained by enveloping again to the secondary accelerator card connected to itself.
作为一种具体的实施方式,还包括:As a specific implementation, it also includes:
本地服务器根据所述源操作数和目标网络层协议生成以太网数据帧,并将所述以太网数据帧发送至所述第二根级加速卡;The local server generates an Ethernet data frame according to the source operand and the target network layer protocol, and sends the Ethernet data frame to the second root-level accelerator card;
所述第二根级加速卡根据所述目标网络层协议对所述以太网数据帧进行处理或转发。The second root-level accelerator card processes or forwards the Ethernet data frame according to the target network layer protocol.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。The various embodiments in this specification are described in a progressive manner. Each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method part.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the method or algorithm described in the embodiments disclosed in this document can be directly implemented by hardware, a software module executed by a processor, or a combination of the two. The software module can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all areas in the technical field. Any other known storage media.
以上对本申请所提供的方案进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The above provides a detailed introduction to the solution provided by the application, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above examples are only used to help understand the methods and core ideas of the application; at the same time; For those of ordinary skill in the art, according to the ideas of the application, there will be changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as limiting the application.

Claims (10)

  1. 一种异构计算系统,其特征在于,包括:本地服务器、第一根级加速卡、次级加速卡,所述第一根级加速卡为通过PCIE模块直接与所述本地服务器相连接的PHY异构加速卡,所述次级加速卡为通过MAC模块直接或间接地与所述第一根级加速卡相连接的PHY异构加速卡;A heterogeneous computing system, comprising: a local server, a first root-level accelerator card, and a secondary accelerator card, the first root-level accelerator card is a PHY directly connected to the local server through a PCIE module Heterogeneous accelerator card, the secondary accelerator card is a PHY heterogeneous accelerator card directly or indirectly connected to the first root-level accelerator card through a MAC module;
    其中,所述本地服务器用于向所述第一根级加速卡发送源操作数,所述第一根级加速卡用于将所述源操作数分配给所述次级加速卡以进行计算,并将各个所述次级加速卡的计算结果反馈至所述本地服务器。Wherein, the local server is used to send source operands to the first root-level accelerator card, and the first root-level accelerator card is used to allocate the source operands to the secondary accelerator card for calculation, And feedback the calculation results of each of the secondary accelerator cards to the local server.
  2. 如权利要求1所述的系统,其特征在于,所述次级加速卡设置于PHY盘柜中。The system according to claim 1, wherein the secondary accelerator card is arranged in a PHY enclosure.
  3. 如权利要求2所述的系统,其特征在于,所述第一根级加速卡通过MAC模块经由以太网交换机与所述PHY盘柜中的所述次级加速卡相连接。The system according to claim 2, wherein the first root-level accelerator card is connected to the secondary accelerator card in the PHY enclosure through an Ethernet switch through a MAC module.
  4. 如权利要求1所述的系统,其特征在于,在所述本地服务器向所述第一根级加速卡发送源操作数时,所述本地服务器通过软件驱动识别所述第一根级加速卡,并以配置寄存器的方式控制所述第一根级加速卡从本地读取所述源操作数。The system of claim 1, wherein when the local server sends a source operand to the first root-level accelerator card, the local server recognizes the first root-level accelerator card through a software driver, And control the first root-level accelerator card to read the source operand locally by means of a configuration register.
  5. 如权利要求4所述的系统,其特征在于,所述第一根级加速卡在从所述本地服务器读取所述源操作数时,确定所述源操作数的数据量,若所述源操作数的数据量超过预设阈值,则根据所述本地服务器侧的配置寄存器对所述源操作数进行封包,并将数据包发送至所述次级加速卡。The system according to claim 4, wherein when the first root-level accelerator card reads the source operand from the local server, it determines the amount of data of the source operand, if the source If the data amount of the operand exceeds a preset threshold, the source operand is packaged according to the configuration register on the local server side, and the data packet is sent to the secondary accelerator card.
  6. 如权利要求5所述的系统,其特征在于,所述次级加速卡在接收到所述数据包之后,对所述数据包进行解包,若解包得到的源操作数的数据量超过所述预设阈值,则调用基于RTL实现的接口对所述解包得到的源操作数进行再次封包,并将再次封包得到的数据包发送至与自身连接的次级加速卡。The system according to claim 5, wherein after receiving the data packet, the secondary accelerator card unpacks the data packet, and if the data amount of the source operand obtained by the unpacking exceeds all the data packets. According to the preset threshold, an interface based on RTL is called to re-encapsulate the source operand obtained by unpacking, and the data packet obtained by re-packaging is sent to the secondary accelerator card connected to itself.
  7. 如权利要求1-6任意一项所述的系统,其特征在于,还包括第二根级加速卡,所述第二根级加速卡为通过MAC模块直接与所述本地服务器相连接的PHY异构加速卡。The system according to any one of claims 1-6, further comprising a second root-level accelerator card, and the second root-level accelerator card is a PHY differentiator directly connected to the local server through a MAC module. Structure accelerator card.
  8. 如权利要求7所述的系统,其特征在于,在所述本地服务器向所述第二根级加速卡发送源操作数时,根据所述源操作数和目标网络层协议生成以太网数据帧,并将所述以太网数据帧发送至所述第二根级加速卡;所述第二根级加速卡根据所述目标网络层协议对所述以太网数据帧进行处理或转发。The system according to claim 7, wherein when the local server sends a source operand to the second root-level accelerator card, an Ethernet data frame is generated according to the source operand and the target network layer protocol, And send the Ethernet data frame to the second root-level accelerator card; the second root-level accelerator card processes or forwards the Ethernet data frame according to the target network layer protocol.
  9. 如权利要求1所述的系统,其特征在于,还包括:远端服务器,所述远端服务器通过MAC模块与所述次级加速卡相连接。The system according to claim 1, further comprising: a remote server, the remote server is connected to the secondary accelerator card through a MAC module.
  10. 一种异构计算方法,其特征在于,基于如权利要求1-9任意一项所述的异构计算系统实现,该方法包括:A heterogeneous computing method, characterized in that it is implemented based on the heterogeneous computing system according to any one of claims 1-9, and the method comprises:
    本地服务器通过PCIE模块向第一根级加速卡发送源操作数;The local server sends the source operands to the first root-level accelerator card through the PCIE module;
    所述第一根级加速卡通过MAC模块将所述源操作数分配给次级加速卡;The first root-level accelerator card allocates the source operand to the secondary accelerator card through the MAC module;
    所述次级加速卡对所述源操作数进行计算,得到计算结果;The secondary accelerator card calculates the source operand to obtain a calculation result;
    所述第一根级加速卡通过MAC模块获取各个所述次级加速卡的计算结果,并通过PCIE模块反馈至所述本地服务器。The first root-level accelerator card obtains the calculation result of each secondary accelerator card through the MAC module, and feeds it back to the local server through the PCIE module.
PCT/CN2020/110980 2019-12-29 2020-08-25 Heterogeneous computing system and computing method therefor WO2021135283A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911386453.1A CN111143276A (en) 2019-12-29 2019-12-29 Heterogeneous computing system and computing method thereof
CN201911386453.1 2019-12-29

Publications (1)

Publication Number Publication Date
WO2021135283A1 true WO2021135283A1 (en) 2021-07-08

Family

ID=70521404

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/110980 WO2021135283A1 (en) 2019-12-29 2020-08-25 Heterogeneous computing system and computing method therefor

Country Status (2)

Country Link
CN (1) CN111143276A (en)
WO (1) WO2021135283A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143276A (en) * 2019-12-29 2020-05-12 苏州浪潮智能科技有限公司 Heterogeneous computing system and computing method thereof
CN112416840B (en) 2020-11-06 2023-05-26 浪潮(北京)电子信息产业有限公司 Remote mapping method, device, equipment and storage medium for computing resources

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6275501B1 (en) * 1998-04-21 2001-08-14 Hewlett-Packard Company Media access controller capable of connecting to a serial physical layer device and a media independent interface (MII) physical layer device
CN108563595A (en) * 2018-04-17 2018-09-21 上海固高欧辰智能科技有限公司 A kind of system and method for remote transmission usb data
CN109117398A (en) * 2018-07-18 2019-01-01 维沃移动通信有限公司 A kind of sensor control and terminal
CN111143276A (en) * 2019-12-29 2020-05-12 苏州浪潮智能科技有限公司 Heterogeneous computing system and computing method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6275501B1 (en) * 1998-04-21 2001-08-14 Hewlett-Packard Company Media access controller capable of connecting to a serial physical layer device and a media independent interface (MII) physical layer device
CN108563595A (en) * 2018-04-17 2018-09-21 上海固高欧辰智能科技有限公司 A kind of system and method for remote transmission usb data
CN109117398A (en) * 2018-07-18 2019-01-01 维沃移动通信有限公司 A kind of sensor control and terminal
CN111143276A (en) * 2019-12-29 2020-05-12 苏州浪潮智能科技有限公司 Heterogeneous computing system and computing method thereof

Also Published As

Publication number Publication date
CN111143276A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
EP3343364B1 (en) Accelerator virtualization method and apparatus, and centralized resource manager
US11467885B2 (en) Technologies for managing a latency-efficient pipeline through a network interface controller
US10411971B2 (en) Method for unified communication of server, baseboard management controller, and server
US10462056B2 (en) Data packet forwarding method, network adapter, host device, and computer system
WO2021135283A1 (en) Heterogeneous computing system and computing method therefor
US8825910B2 (en) Pass-through converged network adaptor (CNA) using existing ethernet switching device
WO2001018977A2 (en) Integrated circuit switch with embedded processor
KR101649819B1 (en) Technologies for accelerating network virtualization
WO2020143237A1 (en) Dma controller and heterogeneous acceleration system
US11449456B2 (en) System and method for scheduling sharable PCIe endpoint devices
CN101304389A (en) Method, apparatus and system for processing packet
WO2018113622A1 (en) Virtual machine-based method and device for sending and receiving data packet
CN106603409B (en) Data processing system, method and equipment
US9654421B2 (en) Providing real-time interrupts over ethernet
CN110618956B (en) BMC cloud platform resource pooling method and system
CN115858146A (en) Memory expansion system and computing node
CN105190530A (en) Transmitting hardware-rendered graphical data
CN106790162B (en) Virtual network optimization method and system
CN113329022B (en) Information processing method of virtual firewall and electronic equipment
CN105939242B (en) Realize the method and device of virtual system
US20200201667A1 (en) Virtual machine live migration method, apparatus, and system
CN111752705A (en) MCU cloud platform resource pooling system
US20240048612A1 (en) Computing Node Management System and Method for Managing a Plurality of Computing Nodes
CN113722110B (en) Computer system, memory access method and device
CN115934610A (en) GPU cloud platform resource pooling system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20910245

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20910245

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20910245

Country of ref document: EP

Kind code of ref document: A1