WO2023093043A1 - 一种数据处理方法、装置及介质 - Google Patents

一种数据处理方法、装置及介质 Download PDF

Info

Publication number
WO2023093043A1
WO2023093043A1 PCT/CN2022/102531 CN2022102531W WO2023093043A1 WO 2023093043 A1 WO2023093043 A1 WO 2023093043A1 CN 2022102531 W CN2022102531 W CN 2022102531W WO 2023093043 A1 WO2023093043 A1 WO 2023093043A1
Authority
WO
WIPO (PCT)
Prior art keywords
fpga accelerator
accelerator card
calculation
target
result data
Prior art date
Application number
PCT/CN2022/102531
Other languages
English (en)
French (fr)
Inventor
刘钧锴
阚宏伟
王彦伟
张翔宇
韩海跃
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Publication of WO2023093043A1 publication Critical patent/WO2023093043A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture

Definitions

  • the present application relates to the technical field of FPGA cloud platform, in particular to a data processing method, device and medium.
  • FPGA Field Programmable Gate Array
  • FPGA Field Programmable AND Gate Array
  • the inventor realizes that under the management of the cloud platform, due to the limited logical resources of a single FPGA accelerator card, when a complex computing task cannot be realized through an FPGA accelerator card, it is necessary to divide the complex computing task into multiple computing steps , each step is allocated to an FPGA accelerator card for calculation, and after multiple FPGA accelerator cards are calculated in sequence, the final result is returned to the host. Among them, the data transmission between multiple FPGA accelerator cards and the switching between calculation steps are all completed by the software running on the host computer. In this way, compared with single-card calculation, the distributed calculation of multiple cards will have a large delay and low calculation efficiency.
  • the present application discloses a data processing method, including:
  • the data to be processed is calculated to obtain intermediate result data
  • the first target FPGA accelerator card sends the intermediate result data and the calculation type information for the next calculation to the next FPGA accelerator card according to its own configuration information, so that the next FPGA accelerator card can calculate the intermediate result data and obtain a new intermediate result data, and send the new intermediate result data and the calculation type information for the next step to the next FPGA accelerator card according to its own configuration information, until the last second target FPGA accelerator card participating in the calculation is completed, and the final result data is obtained;
  • the final result data is returned to the first target FPGA accelerator card by the second target FPGA accelerator card.
  • the final result data is sent to the target host through the first target FPGA accelerator card, so as to complete the distributed calculation on the data to be processed.
  • the first target FPGA accelerator card before the first target FPGA accelerator card obtains the calculation start command sent by the target host connected to itself, it also includes:
  • the target host communicates with other hosts, and sends the configuration information corresponding to the other hosts to the other hosts, so that the other hosts can configure the corresponding configuration information to the FPGA accelerator card connected to itself;
  • the configuration information of all non-second target FPGA accelerator cards in all FPGA accelerator cards includes the preset address mapping relationship, the network address information of the next FPGA accelerator card participating in the calculation, and the calculation type information for the next calculation, and the predetermined
  • the address mapping relationship be the mapping relationship between the intermediate result data in its own memory storage physical address range and the memory storage physical address range of the next FPGA accelerator card participating in the calculation
  • the configuration information of the second target FPGA accelerator card includes the first target
  • the network address information of the FPGA accelerator card, the final result data stores the physical address range in its own memory and the physical address in the memory of the target host.
  • configuring the configuration information corresponding to the first target FPGA accelerator card to the first target FPGA accelerator card includes:
  • other hosts configure corresponding configuration information to FPGA accelerator cards connected to themselves, including:
  • calculation is performed on the data to be processed to obtain intermediate result data, including:
  • the kernel of the first target FPGA accelerator card is called to calculate the data to be processed to obtain intermediate result data, so that the kernel writes the intermediate result data into the internal memory of the first target FPGA accelerator card.
  • it also includes:
  • the kernel When the kernel writes data to the memory and detects that the current write address is within the physical address range of its own memory storage according to the preset mapping relationship, it triggers the first target FPGA accelerator card to convert the intermediate data according to its own configuration information.
  • the result data and the calculation type information of the next calculation are sent to the next step of the FPGA accelerator card.
  • the intermediate result data and the calculation type information of the next calculation are sent to the next FPGA accelerator card by the first target FPGA accelerator card according to its own configuration information, so that the next FPGA accelerator card performs the intermediate result data Calculate and get new intermediate result data, including:
  • next FPGA accelerator card Send the data packet to the next FPGA accelerator card, so that when the next FPGA accelerator card receives the last data packet, it generates a kernel call command according to the calculation type information in the last data packet, and uses the kernel call command to call its own kernel pair Corresponding calculations are performed on the intermediate result data to obtain new intermediate result data.
  • the final result data is returned to the first target FPGA accelerator card by the second target FPGA accelerator card, including:
  • the final result data is sent to the first target FPGA accelerator card.
  • the application discloses a data processing device, which is applied to the FPGA cloud platform, including multiple FPGA accelerator cards participating in distributed computing, and hosts connected to multiple FPGA accelerator cards respectively, and the multiple FPGA accelerator cards include the first A target FPGA accelerator card, a second target FPGA accelerator card, wherein,
  • the first target FPGA accelerator card is used to calculate the data to be processed and obtain the intermediate result data when obtaining the calculation start command sent by the target host connected to itself; according to its own configuration information, the intermediate result data and the next step calculated
  • the calculation type information is sent to the next FPGA accelerator card, so that the next FPGA accelerator card can calculate the intermediate result data, obtain new intermediate result data, and transfer the new intermediate result data and the calculation type of the next step calculation according to its own configuration information
  • the information is sent to the next FPGA accelerator card until the last second target FPGA accelerator card involved in the calculation is completed, and the final result data is obtained;
  • the second target FPGA accelerator card is used to return the final result data to the first target FPGA accelerator card.
  • the first target FPGA accelerator card is used to send the final result data to the target host to complete the distributed calculation on the data to be processed.
  • the target host is also used to obtain the configuration information of all FPGA accelerator cards participating in the calculation, and configure the configuration information corresponding to the first target FPGA accelerator card to the first target FPGA accelerator card; communicate with other hosts , to send the corresponding configuration information of other hosts to other hosts, so that other hosts can configure the corresponding configuration information to the FPGA accelerator card connected to itself;
  • the configuration information of all non-second target FPGA accelerator cards in all FPGA accelerator cards includes the preset address mapping relationship, the network address information of the next FPGA accelerator card participating in the calculation, and the calculation type information for the next calculation, and the predetermined
  • the address mapping relationship be the mapping relationship between the intermediate result data in its own memory storage physical address range and the memory storage physical address range of the next FPGA accelerator card participating in the calculation
  • the configuration information of the second target FPGA accelerator card includes the first target
  • the network address information of the FPGA accelerator card, the final result data stores the physical address range in its own memory and the physical address in the memory of the target host.
  • the embodiment of the present application discloses one or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the above-mentioned computer-readable instructions are executed by one or more processors, the above-mentioned one or A plurality of processors execute the steps of any one of the data processing methods described above.
  • the embodiment of the present application also discloses a computer device, including a memory and one or more processors, where computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the one or more processors, the The above-mentioned one or more processors execute the steps of any one of the above-mentioned data processing methods.
  • FIG. 1 is a flowchart of a data processing method provided by the present application according to one or more embodiments
  • FIG. 2 is a schematic structural diagram of a specific FGPA cloud platform distributed computing host and accelerator card provided by the present application according to one or more embodiments;
  • FIG. 3 is a schematic structural diagram of a static area of an FPGA accelerator card provided by the present application according to one or more embodiments;
  • FIG. 4 is a schematic structural diagram of a specific FPGA accelerator card provided by the present application according to one or more embodiments
  • FIG. 5 is a schematic structural diagram of a specific FPGA accelerator card provided by the present application according to one or more embodiments
  • FIG. 6 is an implementation architecture diagram of a specific data processing solution provided by the present application according to one or more embodiments.
  • Fig. 7 is a schematic structural diagram of a data processing device provided by the present application according to one or more embodiments.
  • Fig. 8 is a schematic diagram of the internal structure of a computer device provided by the present application according to one or more embodiments.
  • Fig. 9 is a schematic diagram of an internal structure of a computer device according to one or more embodiments of the present application.
  • the embodiment of the present application discloses a data processing method, which is described by taking the method applied to computer equipment as an example, including:
  • Step S11 When the first target FPGA accelerator card obtains the calculation start command sent by the target host connected to itself, it calculates the data to be processed and obtains intermediate result data.
  • the first target FPGA accelerator card before the first target FPGA accelerator card obtains the calculation start command sent by the target host connected to itself, it also includes: acquiring the configuration information of all FPGA accelerator cards participating in the calculation through the target host, and converting the first Configure the configuration information corresponding to a target FPGA accelerator card to the first target FPGA accelerator card; communicate with other hosts through the target host, and send the respective configuration information corresponding to other hosts to other hosts, so that other hosts configure the corresponding configuration information to the corresponding configuration information.
  • Self-connected FPGA accelerator card before the first target FPGA accelerator card obtains the calculation start command sent by the target host connected to itself, it also includes: acquiring the configuration information of all FPGA accelerator cards participating in the calculation through the target host, and converting the first Configure the configuration information corresponding to a target FPGA accelerator card to the first target FPGA accelerator card; communicate with other hosts through the target host, and send the respective configuration information corresponding to other hosts to other hosts, so that other hosts configure the corresponding configuration information to the corresponding configuration information.
  • the configuration information of all non-second target FPGA accelerator cards in all FPGA accelerator cards includes the preset address mapping relationship, the network address information of the next FPGA accelerator card participating in the calculation, and the calculation type information for the next calculation, and the predetermined
  • the address mapping relationship be the mapping relationship between the intermediate result data in its own memory storage physical address range and the memory storage physical address range of the next FPGA accelerator card participating in the calculation
  • the configuration information of the second target FPGA accelerator card includes the first target
  • the network address information of the FPGA accelerator card, the final result data stores the physical address range in its own memory and the physical address in the memory of the target host.
  • the embodiment of the present application can configure the configuration information corresponding to the first target FPGA accelerator card to the internal register of the first target FPGA accelerator card; other hosts configure the corresponding configuration information to connect with themselves The internal registers of the FPGA accelerator card.
  • the internal register is the internal register in the BSP (Board Support Package, board-level support package).
  • each FPGA accelerator card participating in the distributed computing can be configured through the first target host participating in the connection of the distributed computing FPGA accelerator card.
  • the target host Configure the configuration information corresponding to the first target FPGA accelerator card to the internal register of the first target FPGA accelerator card through the PCI-E (peripheral component interconnect express, a high-speed serial computer expansion bus standard) bus, and communicate with other hosts through the network Communication, respectively sending the corresponding configuration information of other hosts to other hosts, so that other hosts can configure the corresponding configuration information to the FPGA accelerator card connected to itself through the PCI-E bus.
  • the target host sends a calculation start command to the first target FPGA accelerator card.
  • the kernel of the first target FPGA accelerator card is called to calculate the data to be processed to obtain intermediate result data, so that the kernel writes the intermediate result data into the internal memory of the first target FPGA accelerator card.
  • the next FPGA accelerator card also calls its own kernel to calculate the data to be processed, and obtain the corresponding intermediate result data.
  • Step S12 Send the intermediate result data and the calculation type information for the next calculation to the next FPGA accelerator card through the first target FPGA accelerator card according to its own configuration information, so that the next FPGA accelerator card can calculate the intermediate result data and obtain a new The intermediate result data, and according to its own configuration information, send the new intermediate result data and the calculation type information of the next step to the next FPGA accelerator card until the second target FPGA accelerator card that participates in the calculation is completed, and the final result data.
  • the embodiment of the present application detects whether the current write address is within the physical address range of the intermediate result data in its own memory storage physical address according to the preset mapping relationship when the kernel writes data to the memory; if so, triggers
  • the first target FPGA accelerator card sends the intermediate result data and the calculation type information for the next calculation to the next FPGA accelerator card according to its own configuration information.
  • the intermediate result data is converted into a data packet through the first target FPGA accelerator card, and the calculation type information for the next calculation is added to the last data packet of the intermediate result data according to its own configuration information; the data packet is sent to the next FPGA acceleration card, so that when the next FPGA accelerator card receives the last data packet, it generates a kernel call command according to the calculation type information in the last data packet, and uses the kernel call command to call its own kernel to perform corresponding calculations on the intermediate result data, and obtains new intermediate result data.
  • the kernel writes the intermediate result data into the internal memory of the first target FPGA accelerator card
  • the BSP in the first target FPGA accelerator card initiates to the MAC (i.e. Media Access Control, media intervention control layer) module of this card RDMA (Remote Direct Memory Access, remote direct data access) command
  • the MAC module converts the intermediate result data in the local memory of the accelerator card into RDMA data packets and transmits them to the memory of the next accelerator card according to the configuration information.
  • the header of the data packet contains the calculation type information for the next calculation.
  • the next accelerator card receives the last data packet of the intermediate result data, it calls the kernel to perform corresponding calculations according to the calculation type information.
  • the next accelerator card When the kernel calculates and generates intermediate result data, it automatically initiates an RDMA write command to be transmitted to the next accelerator card, and so on until the last accelerator card of the calculation. After the calculation of the last acceleration card kernel is completed, the calculation result is fed back to the memory of the target host according to the configuration information in the BSP.
  • Step S13 Return the final result data to the first target FPGA accelerator card through the second target FPGA accelerator card.
  • the embodiment of the present application uses the second target FPGA accelerator card to detect the interrupt signal sent to PCIE after the kernel calculation is completed; when the interrupt signal is detected, the final result data is sent to the first target FPGA accelerator card.
  • Step S14 Send the final result data to the target host through the first target FPGA accelerator card, so as to complete the distributed calculation for the data to be processed.
  • the second target FPGA accelerator card converts the final result to Data is sent to the target host.
  • FIG. 2 is a schematic structural diagram of a specific FGPA cloud platform distributed computing host and accelerator card provided by the embodiment of the present application. Under the management of the cloud platform, assign complex computing tasks to one or several FPGAs in the FPGA resource pool for acceleration.
  • the accelerator cards in the resource pool are connected to the server through PCI-E, and the data transmission between the accelerator cards is carried out through Ethernet.
  • 3 accelerator cards and 3 hosts are taken as an example, including host 1, FPGA accelerator card 1, host 2, FPGA accelerator card 2, host 3, FPGA accelerator card 3.
  • the FPGA accelerator card adopts a general architecture that supports OpenCL programming, and is divided into two parts: the static area (BSP) and the computing unit (kernel). Referring to FIG. 3 , FIG.
  • the static area includes modules such as the PCI-E module connected to the host CPU unit, the network data processing module (MAC) connected to the network, and the memory controller (DDR_controller).
  • the host starts computing by calling the kernel through PCI-E, and obtains the completion information of the computing.
  • the host can send and receive information with other hosts on the network through the PCI-E and MAC modules, and can also initiate RDMA write commands to the MAC through PCI-E.
  • the MAC module converts the memory data of the local accelerator card into RDMA packets and transmits them to other hosts on the Ethernet. Accelerator card memory.
  • Kernel is a computing unit developed by users, which can be written in OpenCL (Open Computing Language) or traditional RTL (register transfer language) language. The Kernel can read and write the FPGA accelerator card memory through the memory controller in the BSP.
  • the first host After the first host confirms that the data transmission is completed, it notifies the second host to proceed. In the next step of calculation, the second host sends an instruction through PCI-E to enable the second accelerator card to start calculation. After the kernel calculation is completed, an interrupt signal is sent to the second host through PCI-E, and the second host sends a message to notify the first host that the calculation is complete. From the aforementioned distributed computing process, it can be seen that the data transmission between multiple cards and the switching between calculation steps are all completed by the software running on the host, and the delay will be very large. The solution proposed in this application can greatly reduce the delay of distributed computing on the FPGA cloud platform without changing the computing unit (kernel).
  • the embodiment of the present application provides a specific structure diagram of an FPGA accelerator card.
  • the embodiment of the present application is realized through the memory detection module and the command combination module in the BSP.
  • the memory detection module is located between the kernel and the memory controller, and can transparently transmit the kernel to read and write memory operations. It contains a memory mapping table inside, which records the mapping relationship between the physical address of the intermediate result data stored in the memory of this card and the physical address of the next accelerator card, as well as the information of the next calculation type and the network address information of the next accelerator card.
  • the memory detection module compares the write address with the register setting of the memory storage physical address of the intermediate result data of the card.
  • the data write address belongs to the intermediate result data within the memory storage physical address range of the card.
  • the memory detection module sends an RDMA write command to the MAC module, and the MAC module reads the intermediate result data from the memory of the card, forms an RDMA network data packet and sends it to the next accelerator card.
  • the memory detection module detects the last data of the intermediate result data written by the kernel, it sends an RDMA write command with the next calculation type to the MAC, and the last intermediate result data packet sent by the MAC has the next calculation type in the packet header information.
  • the command merging module is located between the PCI-E bus and the kernel, and PCI-E bus operations can be transparently transmitted to the kernel through the command merging module.
  • the command merging module can analyze the RDMA data packet received by the MAC module, and obtain the arrival information of the last packet of the intermediate result data and the next calculation type. When the last packet of intermediate result data arrives, the calculation type information contained in it is converted into a PCI-E bus write register command that calls the kernel to start calculation, and sent to the kernel to make the kernel start calculation.
  • the command merging module will detect the interrupt signal sent to PCI-E after the kernel calculation is completed.
  • the command merging module belongs to the last accelerator card in the calculation process, it is set to the physical address of the target host memory to store the calculation result and the first target FPGA accelerator card.
  • the interrupt signal of the kernel calculation is converted into an RDMA write command and sent to the MAC module, and the MAC module sends the calculation result to the memory of the first host through the network.
  • the multi-step distributed computing does not depend on the scheduling of the host software, and realizes the functions of automatic transmission of intermediate result data, automatic calculation of the next step, and automatic return of results.
  • the FPGA cloud platform can distribute complex large-scale calculations without greatly increasing the delay of calculations.
  • FIG. 5 is a schematic structural diagram of a specific FPGA accelerator card provided by the embodiment of this application, and the FPGA accelerator card used is the Inspur f10a accelerator card.
  • the FPGA of this accelerator card is an intel arria10 device.
  • FIG. 6 is an implementation architecture diagram of a specific data processing solution provided by the embodiment of the present application.
  • the two calculation steps are respectively completed by two FPGA accelerator cards connected by the network.
  • Two FPGA accelerator cards are respectively connected to the host through PCI-E.
  • the first host computer sets the BSP register of the first FPGA accelerator card through PCI-E, and determines that the intermediate result data generated by the first step calculation is stored in the physical address range of the accelerator card memory, and the network address of the second host computer and the intermediate result data are stored in the second FPGA accelerator card.
  • the physical address range in the host memory, and the second step calculation type information are examples of the intermediate result data generated by the first step calculation is stored in the physical address range of the accelerator card memory, and the network address of the second host computer and the intermediate result data are stored in the second FPGA accelerator card.
  • the first host transmits the configuration information to the second host through the network, and the second host configures the BSP register of the second FPGA accelerator card through PCI-E, and determines the network address of the first FPGA accelerator card and the final result data in the card and the first host Storage physical address in memory.
  • the first host calls the kernel of the first FPGA accelerator card through PCI-E to start calculation, the kernel writes the calculation result into the memory of the card, and the memory detection module in the BSP detects the operation of the kernel writing the memory of the card, and judges that the write address is in the middle of the setting Within the storage physical address range of the result data, the physical address of the intermediate result data in the memory of the second FPGA accelerator card is obtained by looking up the table, and an RDMA write command is sent to the MAC module.
  • the MAC module forms an RDMA network data packet from the intermediate result data in the memory of the card and sends it to the MAC module of the second FPGA accelerator card, and the MAC module of the second FPGA accelerator card writes the intermediate result data in the RDMA data packet into the corresponding memory physical address in the second FPGA accelerator card.
  • the memory detection module in the BSP of the first FPGA accelerator card detects the last data of the intermediate result data written by the kernel, it sends an RDMA write command with the type information of the next step to the MAC module, and the MAC module sends a command with the next step of calculation The last intermediate result packet of type information.
  • the command merging module After the last packet of the intermediate result data arrives at the second FPGA accelerator card MAC, the command merging module detects that the last packet of the intermediate result arrives and obtains the next step calculation type information, and converts this information into a PCI-E bus write register command and sends it to kernel.
  • the second accelerator card kernel starts to calculate, and after the calculation is completed, the kernel sends an interrupt signal.
  • the command merging module converts the kernel calculation completion interrupt signal into an RDMA write command and sends it to the MAC module.
  • the MAC module converts the final result data into an RDMA packet and sends it to the MAC module of the first FPGA accelerator card, and the MAC module of the first FPGA accelerator card sends the final result data to the memory of the first host through PCI-E, and the first host software Polling the calculation result buffer area of the memory of the first host to obtain the final result data, and the distributed calculation is completed.
  • the embodiment of the present application configures each FPGA accelerator card participating in distributed computing to realize the automatic transmission of intermediate result data, the automatic calculation of the accelerator card corresponding to the intermediate calculation step, and the automatic return of the final result data, avoiding the host
  • the software participates in the distributed computing process, which can reduce the computing delay when multiple FPGA accelerator cards perform distributed computing, thereby improving computing efficiency.
  • the embodiment of the present application provides a data processing device, which is applied to the FPGA cloud platform, including multiple FPGA accelerator cards participating in distributed computing, and hosts connected to multiple FPGA accelerator cards respectively.
  • a data processing device which is applied to the FPGA cloud platform, including multiple FPGA accelerator cards participating in distributed computing, and hosts connected to multiple FPGA accelerator cards respectively.
  • the first target FPGA accelerator card 11 is used to calculate the data to be processed and obtain the intermediate result data when obtaining the calculation start command sent by the target host connected to itself; according to the configuration information of itself, the intermediate result data and the next step are calculated
  • the calculation type information is sent to the next FPGA accelerator card, so that the next FPGA accelerator card can calculate the intermediate result data, obtain new intermediate result data, and transfer the new intermediate result data and the calculation of the next step according to its own configuration information
  • the type information is sent to the next FPGA accelerator card until the last second target FPGA accelerator card 12 that participates in the calculation is completed, and the final result data is obtained;
  • the second target FPGA accelerator card 12 is used to return the final result data to the first target FPGA accelerator card 11;
  • the first target FPGA accelerator card 11 is used to send the final result data to the target host to complete the distributed calculation on the data to be processed.
  • the embodiment of the present application configures each FPGA accelerator card participating in distributed computing to realize the automatic transmission of intermediate result data, the automatic calculation of the accelerator card corresponding to the intermediate calculation step, and the automatic return of the final result data, avoiding the host
  • the software participates in the distributed computing process, which can reduce the computing delay when multiple FPGA accelerator cards perform distributed computing, thereby improving computing efficiency.
  • the target host is also used to obtain the configuration information of all FPGA accelerator cards participating in the calculation, and configure the configuration information corresponding to the first target FPGA accelerator card to the first target FPGA accelerator card; communicate with other hosts , to send the corresponding configuration information of other hosts to other hosts, so that other hosts can configure the corresponding configuration information to the FPGA accelerator card connected to itself;
  • the configuration information of all non-second target FPGA accelerator cards in all FPGA accelerator cards includes the preset address mapping relationship, the network address information of the next FPGA accelerator card participating in the calculation, and the calculation type information for the next calculation, and the predetermined
  • the address mapping relationship be the mapping relationship between the intermediate result data in its own memory storage physical address range and the memory storage physical address range of the next FPGA accelerator card participating in the calculation
  • the configuration information of the second target FPGA accelerator card includes the first target
  • the network address information of the FPGA accelerator card, the final result data stores the physical address range in its own memory and the physical address in the memory of the target host.
  • the target host configures the configuration information corresponding to the first target FPGA accelerator card to the internal register of the first target FPGA accelerator card; other hosts configure the corresponding configuration information to the FPGA accelerator card connected to itself internal registers.
  • the first target FPGA accelerator card invokes its own kernel to calculate the data to be processed to obtain intermediate result data, so that the kernel writes the intermediate result data into the memory of the first target FPGA accelerator card.
  • the first target FPGA accelerator card detects whether the current write address is within the physical address range of the intermediate result data in its own memory storage according to the preset mapping relationship; A step in which a target FPGA accelerator card sends the intermediate result data and the calculation type information for the next calculation to the next FPGA accelerator card according to its own configuration information.
  • the first target FPGA accelerator card converts the intermediate result data into a data packet, and adds the calculation type information of the next step in the last data packet of the intermediate result data according to its own configuration information; sends the data packet to the next FPGA Acceleration card, so that when the next FPGA accelerator card receives the last data packet, it generates a kernel call command according to the calculation type information in the last data packet, and uses the kernel call command to call its own kernel to perform corresponding calculations on the intermediate result data. Get new intermediate result data.
  • the second target FPGA accelerator card detects the interrupt signal sent to the PCIE after the kernel calculation is completed; when the interrupt signal is detected, the final result data is sent to the first target FPGA accelerator card.
  • the embodiment of the present application also discloses a non-volatile computer-readable storage medium, where computer-readable instructions are stored in the non-volatile computer-readable storage medium, and the computer-readable instructions are controlled by one or more When executed by the processor, the steps of the data processing method in any one of the foregoing embodiments may be realized.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 8 .
  • the computer device includes a processor, memory and a network interface connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer readable instructions.
  • the internal memory provides an environment for the execution of the operating system and computer readable instructions in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer readable instructions are executed by the processor, a data processing method is realized.
  • a computer device is provided.
  • the computer device may be a terminal, and its internal structure may be as shown in FIG. 9 .
  • the computer device includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer readable instructions.
  • the internal memory provides an environment for the execution of the operating system and computer readable instructions in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer readable instructions are executed by the processor, a data processing method is realized.
  • the display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer device may be a touch layer covered on the display screen, or a button, a trackball or a touch pad provided on the casing of the computer device , and can also be an external keyboard, touchpad, or mouse.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for relevant details, please refer to the description of the method part.
  • Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

一种数据处理方法、装置及介质,包括:第一目标FPGA加速卡获取与自身连接的目标主机发送的计算开始命令,对待处理数据进行计算,得到中间结果数据;根据自身的配置信息将中间结果数据、下一步计算类型信息发送至下一个FPGA加速卡,下一个FPGA加速卡对中间结果数据进行计算,得到新中间结果数据,将新中间结果数据以及下一步计算类型信息发送至下一个FPGA加速卡,直到最后一个参与计算的第二目标FPGA加速卡计算完成,得到最终结果数据;通过第二目标FPGA加速卡将最终结果数据返回至第一目标FPGA加速卡;通过第一目标FPGA将最终结果数据发送至目标主机,以完成针对待处理数据的分布式计算。

Description

一种数据处理方法、装置及介质
相关申请的交叉引用
本申请要求于2021年11月26日提交中国专利局,申请号为202111425760.3,申请名称为“一种数据处理方法、装置及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及FPGA云平台技术领域,特别涉及一种数据处理方法、装置及介质。
背景技术
随着FPGA(即Field Programmable Gate Array,现场可编程与门阵列)处理能力的不断增强,越来越多的数据中心开始使用FPGA进行加速,以提高计算能力和灵活性。为了管理这些数量和种类越来越多的FPGA加速卡,FPGA云平台应用而生,以期解决当前FPGA加速卡部署、维护和管理难的问题。
目前,发明人意识到,在云平台的管理下,由于单块FPGA加速卡逻辑资源有限,在复杂的计算任务通过一块FPGA加速卡无法实现时,需要将复杂的计算任务分为多个计算步骤,每个步骤分配给一块FPGA加速卡计算,多个FPGA加速卡按顺序计算完成后,返回主机最终结果。其中,多块FPGA加速卡间的数据传输和计算步骤之间的切换都由主机运行的软件完成,这样,多卡的分布式计算相对于单卡计算延迟会很大,计算效率低。
发明内容
第一方面,本申请公开了一种数据处理方法,包括:
在第一目标FPGA加速卡获取到与自身连接的目标主机发送的计算开始命令时,对待处理数据进行计算,得到中间结果数据;
通过第一目标FPGA加速卡根据自身的配置信息将中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,以便下一个FPGA加速卡对中间结果数据进行计算,得到新的中间结果数据,并根据自身的配置信息将新的中间结果数据以及下一 步计算的计算类型信息发送至下一个FPGA加速卡,直到最后一个参与计算的第二目标FPGA加速卡计算完成,得到最终结果数据;
通过第二目标FPGA加速卡将最终结果数据返回至第一目标FPGA加速卡;和
通过第一目标FPGA加速卡将最终结果数据发送至目标主机,以完成针对待处理数据的分布式计算。
在其中一个实施例中,在第一目标FPGA加速卡获取到与自身连接的目标主机发送的计算开始命令之前,还包括:
通过目标主机获取参与计算的全部FPGA加速卡的配置信息,并将第一目标FPGA加速卡对应的配置信息配置至第一目标FPGA加速卡;和
通过目标主机与其他主机通信,分别向其他主机发送其他主机各自对应的配置信息,以便其他主机将相应的配置信息配置至与自身连接的FPGA加速卡;
其中,全部FPGA加速卡中的非第二目标FPGA加速卡的配置信息均包括预设地址映射关系、下一个参与计算的FPGA加速卡的网络地址信息、下一步计算的计算类型信息,并且,预设地址映射关系为中间结果数据在自身的内存存储物理地址范围以及下一个参与计算的FPGA加速卡的内存存储物理地址范围之间的映射关系;第二目标FPGA加速卡的配置信息包括第一目标FPGA加速卡的网络地址信息,最终结果数据在自身的内存存储物理地址范围以及在目标主机的内存存储物理地址。
在其中一个实施例中,将第一目标FPGA加速卡对应的配置信息配置至第一目标FPGA加速卡,包括:
将第一目标FPGA加速卡对应的配置信息配置至第一目标FPGA加速卡的内部寄存器。
在其中一个实施例中,其他主机将相应的配置信息配置至与自身连接的FPGA加速卡,包括:
其他主机将相应的配置信息配置至与自身连接的FPGA加速卡的内部寄存器。
在其中一个实施例中,对待处理数据进行计算,得到中间结果数据,包括:
调用第一目标FPGA加速卡自身的kernel对待处理数据进行计算,得到中间结果数据,以便该kernel将中间结果数据写入第一目标FPGA加速卡的内存。
在其中一个实施例中,还包括:
在kernel向内存进行数据写入,且根据预设映射关系检测当前写入地址在中间结果数据在自身的内存存储物理地址范围内时,触发通过第一目标FPGA加速卡根据自身的 配置信息将中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡的步骤。
在其中一个实施例中,通过第一目标FPGA加速卡根据自身的配置信息将中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,以便下一个FPGA加速卡对中间结果数据进行计算,得到新的中间结果数据,包括:
通过第一目标FPGA加速卡将中间结果数据转为数据包,并根据自身的配置信息在中间结果数据的最后一个数据包中添加下一步计算的计算类型信息;和
将数据包发送至下一个FPGA加速卡,以便下一个FPGA加速卡接收到最后一个数据包时,根据最后一个数据包中的计算类型信息生成kernel调用命令,并利用kernel调用命令调用自身的kernel对中间结果数据进行相应的计算,得到新的中间结果数据。
在其中一个实施例中,通过第二目标FPGA加速卡将最终结果数据返回至第一目标FPGA加速卡,包括:
通过第二目标FPGA加速卡检测kernel计算完成后发给PCIE的中断信号;和
在检测到中断信号时,将最终结果数据发送至第一目标FPGA加速卡。
第二方面,本申请公开了数据处理装置,应用于FPGA云平台,包括参与分布式计算的多个FPGA加速卡,以及分别与多个FPGA加速卡连接的主机,多个FPGA加速卡中包括第一目标FPGA加速卡、第二目标FPGA加速卡,其中,
第一目标FPGA加速卡,用于当获取到与自身连接的目标主机发送的计算开始命令,则对待处理数据进行计算,得到中间结果数据;根据自身的配置信息将中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,以便下一个FPGA加速卡对中间结果数据进行计算,得到新的中间结果数据,并根据自身的配置信息将新的中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,直到最后一个参与计算的第二目标FPGA加速卡计算完成,得到最终结果数据;
第二目标FPGA加速卡,用于将最终结果数据返回至第一目标FPGA加速卡;和
第一目标FPGA加速卡,用于将最终结果数据发送至目标主机,以完成针对待处理数据的分布式计算。
在其中一个实施例中,目标主机,还用于获取参与计算的全部FPGA加速卡的配置信息,并将第一目标FPGA加速卡对应的配置信息配置至第一目标FPGA加速卡;与其他主机通信,分别向其他主机发送其他主机各自对应的配置信息,以便其他主机将相应的配置信息配置至与自身连接的FPGA加速卡;
其中,全部FPGA加速卡中的非第二目标FPGA加速卡的配置信息均包括预设地址映射关系、下一个参与计算的FPGA加速卡的网络地址信息、下一步计算的计算类型信息,并且,预设地址映射关系为中间结果数据在自身的内存存储物理地址范围以及下一个参与计算的FPGA加速卡的内存存储物理地址范围之间的映射关系;第二目标FPGA加速卡的配置信息包括第一目标FPGA加速卡的网络地址信息,最终结果数据在自身的内存存储物理地址范围以及在目标主机的内存存储物理地址。
第三方面,本申请实施例公开了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,上述计算机可读指令被一个或多个处理器执行时,使得上述一个或多个处理器执行上述任意一项数据处理方法的步骤。
最后,本申请实施例还公开了一种计算机设备,包括存储器及一个或多个处理器,存储器中储存有计算机可读指令,上述计算机可读指令被上述一个或多个处理器执行时,使得上述一个或多个处理器执行上述任意一项数据处理方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请根据一个或多个实施例提供的一种数据处理方法流程图;
图2为本申请根据一个或多个实施例提供的一种具体的FGPA云平台分布式计算主机和加速卡的结构示意图;
图3为本申请根据一个或多个实施例提供的一种FPGA加速卡静态区结构示意图;
图4为本申请根据一个或多个实施例提供的一种具体的FPGA加速卡结构示意图;
图5为本申请根据一个或多个实施例提供的一种具体的FPGA加速卡结构示意图;
图6为本申请根据一个或多个实施例提供的一种具体的数据处理方案实施架构图;
图7为本申请根据一个或多个实施例提供的一种数据处理装置结构示意图;
图8为本申请根据一个或多个实施例提供的一种计算机设备的内部结构示意图;
图9为本申请根据一个或多个实施例提供的一种计算机设备的内部结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
目前,在云平台的管理下,由于单块FPGA加速卡逻辑资源有限,在复杂的计算任务通过一块FPGA加速卡无法实现时,需要将复杂的计算任务分为多个计算步骤,每个步骤分配给一块FPGA加速卡计算,多个FPGA加速卡按顺序计算完成后,返回主机最终结果。其中,多块FPGA加速卡间的数据传输和计算步骤之间的切换都由主机运行的软件完成,这样,多卡的分布式计算相对于单卡计算延迟会很大,计算效率低。为此本申请实施例提供了一种数据处理方案,能够降低多块FPGA加速卡进行分布式计算时的计算延迟,从而提升计算效率。
参见图1所示,本申请实施例公开了一种数据处理方法,以该方法应用于计算机设备为例进行说明,包括:
步骤S11:当第一目标FPGA加速卡获取到与自身连接的目标主机发送的计算开始命令,则对待处理数据进行计算,得到中间结果数据。
在具体的实施方式中,在第一目标FPGA加速卡获取到与自身连接的目标主机发送的计算开始命令之前,还包括:通过目标主机获取参与计算的全部FPGA加速卡的配置信息,并将第一目标FPGA加速卡对应的配置信息配置至第一目标FPGA加速卡;通过目标主机与其他主机通信,分别向其他主机发送其他主机各自对应的配置信息,以便其他主机将相应的配置信息配置至与自身连接的FPGA加速卡;
其中,全部FPGA加速卡中的非第二目标FPGA加速卡的配置信息均包括预设地址映射关系、下一个参与计算的FPGA加速卡的网络地址信息、下一步计算的计算类型信息,并且,预设地址映射关系为中间结果数据在自身的内存存储物理地址范围以及下一个参与计算的FPGA加速卡的内存存储物理地址范围之间的映射关系;第二目标FPGA加速卡的配置信息包括第一目标FPGA加速卡的网络地址信息,最终结果数据在自身的内存存储物理地址范围以及在目标主机的内存存储物理地址。
进一步的,在具体的实施方式中,本申请实施例可以将第一目标FPGA加速卡对应的配置信息配置至第一目标FPGA加速卡的内部寄存器;其他主机将相应的配置信息配置至与自身连接的FPGA加速卡的内部寄存器。其中,内部寄存器为BSP(即Board Support Package,板级支持包)中的内部寄存器。
也即,本申请实施例在计算开始之前,可以通过第一个参与分布式计算FPGA加速卡连接的目标主机对各个参与分布式计算的FPGA加速卡进行配置,在具体的实施方式中,目标主机通过PCI-E(即peripheral component interconnect express,一种高速串行计算机扩展总线标准)总线将第一目标FPGA加速卡对应的配置信息配置至第一目标FPGA加速卡的内部寄存器,通过网络与其他主机通信,分别向其他主机发送其他主机各自对应的配置信息,以便其他主机通过PCI-E总线将相应的配置信息配置至与自身连接的FPGA加速卡。配置完成后,目标主机向第一目标FPGA加速卡发送开始计计算命令。
并且,本申请实施例中,调用第一目标FPGA加速卡自身的kernel对待处理数据进行计算,得到中间结果数据,以便该kernel将中间结果数据写入第一目标FPGA加速卡的内存。同理,下一FPGA加速卡同样调用自身的kernel对待处理数据进行计算,得到相应的中间结果数据。
步骤S12:通过第一目标FPGA加速卡根据自身的配置信息将中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,以便下一个FPGA加速卡对中间结果数据进行计算,得到新的中间结果数据,并根据自身的配置信息将新的中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,直到最后一个参与计算的第二目标FPGA加速卡计算完成,得到最终结果数据。
在具体的实施方式中,本申请实施例在kernel向内存进行数据写入时,根据预设映射关系检测当前写入地址是否在中间结果数据在自身的内存存储物理地址范围内;若是,则触发通过第一目标FPGA加速卡根据自身的配置信息将中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡的步骤。
通过第一目标FPGA加速卡将中间结果数据转为数据包,并根据自身的配置信息在中间结果数据的最后一个数据包中添加下一步计算的计算类型信息;将数据包发送至下一个FPGA加速卡,以便下一个FPGA加速卡接收到最后一个数据包时,根据最后一个数据包中的计算类型信息生成kernel调用命令,并利用kernel调用命令调用自身的kernel对中间结果数据进行相应的计算,得到新的中间结果数据。
在具体的实施方式中,kernel将中间结果数据写入第一目标FPGA加速卡的内存,第一目标FPGA加速卡中的BSP向本卡的MAC(即Media Access Control,媒体介入控制层)模块发起RDMA(即Remote Direct Memory Access,远程直接数据存取)命令,MAC模块根据配置信息将加速卡本地内存中的中间结果数据转换成RDMA数据包传输到下一加速卡内存,在发送中间结果数据的最后一个数据包中,数据包头部带有下一步计算的计算类型信息,下一加速卡接收到中间结果数据的最后一个数据包后,根据计算 类型信息调用kernel进行相应的计算,下一加速卡kernel计算产生中间结果数据的同时,自动发起RDMA写命令传输给下一个加速卡,以此类推,直到计算的最后一个加速卡。最后一块加速卡kernel计算完成后将计算结果根据BSP中的配置信息,反馈给目标主机内存。
步骤S13:通过第二目标FPGA加速卡将最终结果数据返回至第一目标FPGA加速卡。
在具体的实施方式中,本申请实施例通过第二目标FPGA加速卡检测kernel计算完成后发给PCIE的中断信号;当检测到中断信号,则将最终结果数据发送至第一目标FPGA加速卡。
步骤S14:通过第一目标FPGA加速卡将最终结果数据发送至目标主机,以完成针对待处理数据的分布式计算。
也即,第二目标FPGA加速卡根据配置信息,即第一目标FPGA加速卡的网络地址信息,以及最终结果数据在自身的内存存储物理地址范围以及在目标主机的内存存储物理地址,将最终结果数据发送至目标主机。
参见图2所示,图2为本申请实施例提供的一种具体的FGPA云平台分布式计算主机和加速卡的结构示意图。在云平台的管理下,将复杂的计算任务分配给FPGA资源池中的某一个或者某几个FPGA中进行加速。资源池内的加速卡通过PCI-E与服务器连接,加速卡之间通过以太网进行数据传输。图2中以3个加速卡和3个主机为例,包括主机1、FPGA加速卡1,主机2、FPGA加速卡2,主机3、FPGA加速卡3。FPGA加速卡内部采用支持OpenCL编程的通用架构,分为静态区(BSP)和计算单元(kernel)两个部分。参见图3所示,图3为本申请实施例提供的一种FPGA加速卡静态区结构示意图。静态区包括与主机CPU单元连接的PCI-E模块、与网络连接的网络数据处理模块(MAC)、内存控制器(DDR_controller)等模块。主机通过PCI-E调用kernel开始计算,并获得计算完成信息。主机可以通过PCI-E和MAC模块与网络上的其他主机收发信息,也可以通过PCI-E向MAC发起RDMA写命令,MAC模块将本地加速卡内存数据转化为RDMA数据包传输给以太网上的其他加速卡内存。Kernel是由用户开发的计算单元,可以用OpenCL(即Open Computing Language,开放运算语言)编写,也可以用传统RTL(即register transfer language,寄存器传递语言)语言开发。Kernel可以通过BSP中的内存控制器读写FPGA加速卡内存。
需要指出的是,现有技术中,将复杂计算任务分为2个或多个计算步骤,每个步骤分配给一块FPGA加速卡计算,多个FPGA加速卡按顺序计算完成后,返回主机最终结果。 以2个计算步骤为例,第一主机通过PCI-E发送指令使第一FPGA加速卡开始计算,kernel计算完成通过PCI-E发送中断信号给第一主机,第一主机得到第一块FPGA加速卡计算完成信息后,通过PCI-E向MAC发送RDMA写命令,将第一加速卡内存中的中间结果数据传输给第二加速卡内存,第一主机确认数据传输完成后,通知第二主机进行下一步计算,第二主机通过PCI-E发送指令使第二加速卡开始计算,kernel计算完成通过PCI-E发送中断信号给第二主机,第二主机发送消息通知第一主机计算结束。从前述分布式计算过程可以看出,多块卡间数据传输和计算步骤之间切换都由主机运行的软件完成,延迟会很大。本申请提出的方案,在不改变计算单元(kernel)的前提下,可以大幅降低FPGA云平台分布式计算的延迟。
参见图4所示,本申请实施例提供了一种具体的FPGA加速卡结构示意图。本申请实施例通过BSP中的内存检测模块以及命令合并模块实现。
内存检测模块处于kernel和内存控制器之间,可以透传kernel读写内存操作。内部包含内存映射表,记录中间结果数据在本卡内存存储物理地址和下一加速卡存储物理地址的映射关系,以及下一步计算类型信息和下一加速卡网络地址信息。当kernel将数据写入加速卡内存时,内存检测模块将写地址和本卡中间结果数据的内存存储物理地址的寄存器设置对比,数据写地址属于中间结果数据在本卡的内存存储物理地址范围以内,则判定kernel写入的数据为中间结果数据;通过查内存映射表得到存入下一加速卡内存的物理地址和加速卡网络地址信息。内存检测模块向MAC模块发出RDMA写命令,MAC模块从本卡内存读取中间结果数据,组成RDMA网络数据包发送到下一加速卡。内存检测模块检测到kernel写入中间结果数据的最后一个数据时,向MAC发出带有下一步计算类型的RDMA写命令,MAC发出的最后一个中间结果数据包,数据包头部带有下一步计算类型信息。
命令合并模块处于PCI-E总线和kernel之间,PCI-E总线操作可以通过命令合并模块透传到kernel。命令合并模块可以解析MAC模块接收的RDMA数据包,得到中间结果数据的最后一包数据是否到来信息和下一步计算类型。当中间结果数据的最后一包数据到来时,将其中包含的计算类型信息转化为调用kernel开始计算的PCI-E总线写寄存器命令,发送给kernel,使kernel开始计算。命令合并模块会检测kernel计算完成后发给PCI-E的中断信号,当命令合并模块属于计算过程的最后一块加速卡,并被设置目标主机内存存储计算结果的物理地址和第一目标FPGA加速卡的网络地址信息时,将kernel计算完成的中断信号转换为RDMA写命令发给MAC模块,MAC模块将计算结果通过网络发送至第一主机的内存。
这样,在不改变FPGA加速卡计算单元设计的前提下,使多步骤分布式计算不依赖主机软件的调度,实现了自动传输中间结果数据和自动进行下一步计算以及自动返回结果的功能。在不增加开发工作量的情况下,使FPGA云平台可以分布式进行复杂的大型计算,而不大幅增加计算的延迟。
下面以两步分布式计算为例,阐述本申请提供的数据处理方案:
参见图5所示,图5为本申请实施例提供的一种具体的FPGA加速卡结构示意图,使用的FPGA加速卡为浪潮f10a加速卡。本加速卡的FPGA为intel的arria10器件,与FPGA连接的有两个10G以太网光口,以及两个4GB的SDRAM作为存储器,可以通过PCI-E连接服务器的CPU。
参见图6所示,图6为本申请实施例提供的一种具体的数据处理方案实施架构图。计算的两个步骤分别由网络连接的两个FPGA加速卡完成。两块FPGA加速卡分别通过PCI-E与主机连接。首先第一主机通过PCI-E设置第一FPGA加速卡的BSP寄存器,确定第一步计算产生的中间结果数据在本加速卡内存存储物理地址范围、第二主机网络地址和中间结果数据在第二主机内存中的物理地址范围,以及第二步计算类型信息。第一主机通过网络将配置信息传递给第二主机,第二主机通过PCI-E配置第二FPGA加速卡的BSP寄存器,确定第一FPGA加速卡网络地址和最终结果数据在本卡以及第一主机内存中的存储物理地址。第一主机通过PCI-E调用第一FPGA加速卡的kernel开始计算,kernel将计算结果写入本卡内存,BSP中的内存检测模块检测kernel写本卡内存操作,并判断出写地址在设置的中间结果数据的存储物理地址范围之内,通过查表得到中间结果数据在第二FPGA加速卡的内存物理地址,向MAC模块发送RDMA写命令。MAC模块根据RDMA写命令,将本卡内存中的中间结果数据组成RDMA网络数据包发送到第二FPGA加速卡的MAC模块,第二FPGA加速卡的MAC模块将RDMA数据包中的中间结果数据写入第二FPGA加速卡中相应的内存物理地址中。当第一FPGA加速卡的BSP中的内存检测模块检测kernel写入中间结果数据的最后一个数据时,向MAC模块发送带有下一步计算类型信息的RDMA写命令,MAC模块发出带有下一步计算类型信息的最后一个中间结果数据包。当中间结果数据的最后一包到达第二FPGA加速卡MAC后,命令合并模块检测到中间结果最后一包到达并且得到下一步计算类型信息,将此信息转化为PCI-E总线写寄存器命令发送给kernel。第二块加速卡kernel开始计算,计算完成后,kernel发出中断信号。命令合并模块将kernel计算完成中断信号,转换为RDMA写命令发送给MAC模块。MAC模块将最终结果数据转化为RDMA数据包发送给第一FPGA加速卡的MAC模块,第一FPGA加速卡的MAC模块通过PCI-E将最终结果数据发送至第一主机内存中,第一主机软件轮询第一主机内存的计算结 果缓存区,得到最终结果数据,分布式计算完成。
可见,本申请实施例通过对参与分布式计算的各个FPGA加速卡进行配置,实现中间结果数据的自动传输、以及中间计算步骤对应的加速卡的自动计算以及最终结果数据的自动返回,避免了主机软件参与分布式计算过程,能够降低多块FPGA加速卡进行分布式计算时的计算延迟,从而提升计算效率。
参见图7所示,本申请实施例提供了一种数据处理装置,应用于FPGA云平台,包括参与分布式计算的多个FPGA加速卡,以及分别与多个FPGA加速卡连接的主机,多个FPGA加速卡中包括第一目标FPGA加速卡11、第二目标FPGA加速卡12,其中,
第一目标FPGA加速卡11,用于当获取到与自身连接的目标主机发送的计算开始命令,则对待处理数据进行计算,得到中间结果数据;根据自身的配置信息将中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,以便下一个FPGA加速卡对中间结果数据进行计算,得到新的中间结果数据,并根据自身的配置信息将新的中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,直到最后一个参与计算的第二目标FPGA加速卡12计算完成,得到最终结果数据;
第二目标FPGA加速卡12,用于将最终结果数据返回至第一目标FPGA加速卡11;
第一目标FPGA加速卡11,用于将最终结果数据发送至目标主机,以完成针对待处理数据的分布式计算。
可见,本申请实施例通过对参与分布式计算的各个FPGA加速卡进行配置,实现中间结果数据的自动传输、以及中间计算步骤对应的加速卡的自动计算以及最终结果数据的自动返回,避免了主机软件参与分布式计算过程,能够降低多块FPGA加速卡进行分布式计算时的计算延迟,从而提升计算效率。
在具体的实施方式中,目标主机,还用于获取参与计算的全部FPGA加速卡的配置信息,并将第一目标FPGA加速卡对应的配置信息配置至第一目标FPGA加速卡;与其他主机通信,分别向其他主机发送其他主机各自对应的配置信息,以便其他主机将相应的配置信息配置至与自身连接的FPGA加速卡;
其中,全部FPGA加速卡中的非第二目标FPGA加速卡的配置信息均包括预设地址映射关系、下一个参与计算的FPGA加速卡的网络地址信息、下一步计算的计算类型信息,并且,预设地址映射关系为中间结果数据在自身的内存存储物理地址范围以及下一个参与计算的FPGA加速卡的内存存储物理地址范围之间的映射关系;第二目标FPGA加速卡的配置信息包括第一目标FPGA加速卡的网络地址信息,最终结果数据在自身的内存存储物理地址范围以及在目标主机的内存存储物理地址。
并且,在具体的实施方式中,目标主机将第一目标FPGA加速卡对应的配置信息配置至第一目标FPGA加速卡的内部寄存器;其他主机将相应的配置信息配置至与自身连接的FPGA加速卡的内部寄存器。
第一目标FPGA加速卡调用自身的kernel对待处理数据进行计算,得到中间结果数据,以便该kernel将中间结果数据写入第一目标FPGA加速卡的内存。
进一步的,在kernel向内存进行数据写入时,第一目标FPGA加速卡根据预设映射关系检测当前写入地址是否在中间结果数据在自身的内存存储物理地址范围内;若是,则触发通过第一目标FPGA加速卡根据自身的配置信息将中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡的步骤。
并且,第一目标FPGA加速卡将中间结果数据转为数据包,并根据自身的配置信息在中间结果数据的最后一个数据包中添加下一步计算的计算类型信息;将数据包发送至下一个FPGA加速卡,以便下一个FPGA加速卡接收到最后一个数据包时,根据最后一个数据包中的计算类型信息生成kernel调用命令,并利用kernel调用命令调用自身的kernel对中间结果数据进行相应的计算,得到新的中间结果数据。
第二目标FPGA加速卡检测kernel计算完成后发给PCIE的中断信号;当检测到中断信号,则将最终结果数据发送至第一目标FPGA加速卡。
进一步的,本申请实施例还公开了一种非易失性计算机可读存储介质,该非易失性计算机可读存储介质中存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时可实现上述任意一个实施例的数据处理方法的步骤。
关于上述数据处理方法的具体过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种数据处理方法。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机 设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种数据处理方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,上述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上上述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (11)

  1. 一种数据处理方法,其特征在于,应用于FPGA云平台,包括:
    在第一目标FPGA加速卡获取到与自身连接的目标主机发送的计算开始命令时,对待处理数据进行计算,得到中间结果数据;
    通过所述第一目标FPGA加速卡根据自身的配置信息将所述中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,以便下一个FPGA加速卡对所述中间结果数据进行计算,得到新的中间结果数据,并根据自身的配置信息将所述新的中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,直到最后一个参与计算的第二目标FPGA加速卡计算完成,得到最终结果数据;
    通过所述第二目标FPGA加速卡将所述最终结果数据返回至所述第一目标FPGA加速卡;和
    通过第一目标FPGA加速卡将所述最终结果数据发送至所述目标主机,以完成针对所述待处理数据的分布式计算。
  2. 根据权利要求1所述的数据处理方法,其特征在于,所述在第一目标FPGA加速卡获取到与自身连接的目标主机发送的计算开始命令时之前,还包括:
    通过所述目标主机获取参与计算的全部FPGA加速卡的配置信息,并将所述第一目标FPGA加速卡对应的配置信息配置至所述第一目标FPGA加速卡;和
    通过所述目标主机与其他主机通信,分别向所述其他主机发送所述其他主机各自对应的配置信息,以便所述其他主机将相应的配置信息配置至与自身连接的FPGA加速卡;
    其中,所述全部FPGA加速卡中的非第二目标FPGA加速卡的配置信息均包括预设地址映射关系、下一个参与计算的FPGA加速卡的网络地址信息、下一步计算的计算类型信息,并且,所述预设地址映射关系为中间结果数据在自身的内存存储物理地址范围以及下一个参与计算的FPGA加速卡的内存存储物理地址范围之间的映射关系;所述第二目标FPGA加速卡的配置信息包括所述第一目标FPGA加速卡的网络地址信息,最终结果数据在自身的内存存储物理地址范围以及在所述目标主机的内存存储物理地址。
  3. 根据权利要求2所述的数据处理方法,其特征在于,所述将所述第一目标FPGA加速卡对应的配置信息配置至所述第一目标FPGA加速卡,包括:
    将所述第一目标FPGA加速卡对应的配置信息配置至所述第一目标FPGA加速卡的内部寄存器;
    所述其他主机将相应的配置信息配置至与自身连接的FPGA加速卡,包括:
    所述其他主机将相应的配置信息配置至与自身连接的FPGA加速卡的内部寄存器。
  4. 根据权利要求2所述的数据处理方法,其特征在于,所述对待处理数据进行计算,得到中间结果数据,包括:
    调用所述第一目标FPGA加速卡自身的kernel对待处理数据进行计算,得到中间结果数据,以便该kernel将所述中间结果数据写入所述第一目标FPGA加速卡的内存。
  5. 根据权利要求4所述的数据处理方法,其特征在于,还包括:
    在kernel向所述内存进行数据写入,且根据所述预设映射关系检测当前写入地址在所述中间结果数据在自身的内存存储物理地址范围内时,触发所述通过所述第一目标FPGA加速卡根据自身的配置信息将所述中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡的步骤。
  6. 根据权利要求1至5任一项所述的数据处理方法,其特征在于,所述通过所述第一目标FPGA加速卡根据自身的配置信息将所述中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,以便下一个FPGA加速卡对所述中间结果数据进行计算,得到新的中间结果数据,包括:
    通过所述第一目标FPGA加速卡将所述中间结果数据转为数据包,并根据自身的配置信息在所述中间结果数据的最后一个数据包中添加下一步计算的计算类型信息;和
    将所述数据包发送至下一个FPGA加速卡,以便下一个FPGA加速卡接收到最后一个数据包时,根据最后一个数据包中的计算类型信息生成kernel调用命令,并利用所述kernel调用命令调用自身的kernel对所述中间结果数据进行相应的计算,得到新的中间结果数据。
  7. 根据权利要求1至6任一项所述的数据处理方法,其特征在于,所述通过所述第二目标FPGA加速卡将所述最终结果数据返回至所述第一目标FPGA加速卡,包括:
    通过所述第二目标FPGA加速卡检测kernel计算完成后发给PCIE的中断信号;和
    在检测到所述中断信号时,将所述最终结果数据发送至所述第一目标FPGA加速卡。
  8. 一种数据处理装置,其特征在于,应用于FPGA云平台,包括参与分布式计算的多个FPGA加速卡,以及分别与所述多个FPGA加速卡连接的主机,多个FPGA加速卡中包括第一目标FPGA加速卡、第二目标FPGA加速卡,其中,
    所述第一目标FPGA加速卡,用于当获取到与自身连接的目标主机发送的计算开始命令,则对待处理数据进行计算,得到中间结果数据;根据自身的配置信息将所述中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,以便下一个FPGA 加速卡对所述中间结果数据进行计算,得到新的中间结果数据,并根据自身的配置信息将所述新的中间结果数据以及下一步计算的计算类型信息发送至下一个FPGA加速卡,直到最后一个参与计算的第二目标FPGA加速卡计算完成,得到最终结果数据;
    所述第二目标FPGA加速卡,用于将所述最终结果数据返回至所述第一目标FPGA加速卡;和
    所述第一目标FPGA加速卡,用于将所述最终结果数据发送至所述目标主机,以完成针对所述待处理数据的分布式计算。
  9. 根据权利要求8所述的数据处理装置,其特征在于,
    所述目标主机,还用于获取参与计算的全部FPGA加速卡的配置信息,并将所述第一目标FPGA加速卡对应的配置信息配置至所述第一目标FPGA加速卡;与其他主机通信,分别向所述其他主机发送所述其他主机各自对应的配置信息,以便所述其他主机将相应的配置信息配置至与自身连接的FPGA加速卡;
    其中,所述全部FPGA加速卡中的非第二目标FPGA加速卡的配置信息均包括预设地址映射关系、下一个参与计算的FPGA加速卡的网络地址信息、下一步计算的计算类型信息,并且,所述预设地址映射关系为中间结果数据在自身的内存存储物理地址范围以及下一个参与计算的FPGA加速卡的内存存储物理地址范围之间的映射关系;所述第二目标FPGA加速卡的配置信息包括所述第一目标FPGA加速卡的网络地址信息,最终结果数据在自身的内存存储物理地址范围以及在所述目标主机的内存存储物理地址。
  10. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任意一项所述的方法的步骤。
  11. 一种计算机设备,其特征在于,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任意一项所述的方法的步骤。
PCT/CN2022/102531 2021-11-26 2022-06-29 一种数据处理方法、装置及介质 WO2023093043A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111425760.3 2021-11-26
CN202111425760.3A CN114138481A (zh) 2021-11-26 2021-11-26 一种数据处理方法、装置及介质

Publications (1)

Publication Number Publication Date
WO2023093043A1 true WO2023093043A1 (zh) 2023-06-01

Family

ID=80388853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/102531 WO2023093043A1 (zh) 2021-11-26 2022-06-29 一种数据处理方法、装置及介质

Country Status (2)

Country Link
CN (1) CN114138481A (zh)
WO (1) WO2023093043A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114138481A (zh) * 2021-11-26 2022-03-04 浪潮电子信息产业股份有限公司 一种数据处理方法、装置及介质
CN114513545B (zh) * 2022-04-19 2022-07-12 苏州浪潮智能科技有限公司 请求处理方法、装置、设备及介质

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108776649A (zh) * 2018-06-11 2018-11-09 山东超越数控电子股份有限公司 一种基于cpu+fpga异构计算系统及其加速方法
CN109976912A (zh) * 2019-03-27 2019-07-05 湖南理工学院 一种基于fpga的分布式计算的实现方法及系统
CN110069441A (zh) * 2019-03-21 2019-07-30 中国科学院计算技术研究所 一种用于流计算的fpga网络及流计算系统与方法
US20190245911A1 (en) * 2017-01-08 2019-08-08 International Business Machines Corporation Address space management with respect to a coherent accelerator processor interface architecture
CN111090611A (zh) * 2018-10-24 2020-05-01 上海雪湖信息科技有限公司 一种基于fpga的小型异构分布式计算系统
CN111324558A (zh) * 2020-02-05 2020-06-23 苏州浪潮智能科技有限公司 数据处理方法、装置、分布式数据流编程框架及相关组件
CN111625368A (zh) * 2020-05-22 2020-09-04 中国科学院空天信息创新研究院 一种分布式计算系统、方法及电子设备
CN111736966A (zh) * 2020-05-11 2020-10-02 深圳先进技术研究院 基于多板fpga异构系统的任务部署方法及设备
CN112087471A (zh) * 2020-09-27 2020-12-15 山东云海国创云计算装备产业创新中心有限公司 一种数据传输方法及fpga云平台
CN112187966A (zh) * 2020-09-17 2021-01-05 浪潮(北京)电子信息产业有限公司 一种加速卡及其mac地址生成方法、装置和存储介质
CN114138481A (zh) * 2021-11-26 2022-03-04 浪潮电子信息产业股份有限公司 一种数据处理方法、装置及介质

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190245911A1 (en) * 2017-01-08 2019-08-08 International Business Machines Corporation Address space management with respect to a coherent accelerator processor interface architecture
CN108776649A (zh) * 2018-06-11 2018-11-09 山东超越数控电子股份有限公司 一种基于cpu+fpga异构计算系统及其加速方法
CN111090611A (zh) * 2018-10-24 2020-05-01 上海雪湖信息科技有限公司 一种基于fpga的小型异构分布式计算系统
CN110069441A (zh) * 2019-03-21 2019-07-30 中国科学院计算技术研究所 一种用于流计算的fpga网络及流计算系统与方法
CN109976912A (zh) * 2019-03-27 2019-07-05 湖南理工学院 一种基于fpga的分布式计算的实现方法及系统
CN111324558A (zh) * 2020-02-05 2020-06-23 苏州浪潮智能科技有限公司 数据处理方法、装置、分布式数据流编程框架及相关组件
CN111736966A (zh) * 2020-05-11 2020-10-02 深圳先进技术研究院 基于多板fpga异构系统的任务部署方法及设备
CN111625368A (zh) * 2020-05-22 2020-09-04 中国科学院空天信息创新研究院 一种分布式计算系统、方法及电子设备
CN112187966A (zh) * 2020-09-17 2021-01-05 浪潮(北京)电子信息产业有限公司 一种加速卡及其mac地址生成方法、装置和存储介质
CN112087471A (zh) * 2020-09-27 2020-12-15 山东云海国创云计算装备产业创新中心有限公司 一种数据传输方法及fpga云平台
CN114138481A (zh) * 2021-11-26 2022-03-04 浪潮电子信息产业股份有限公司 一种数据处理方法、装置及介质

Also Published As

Publication number Publication date
CN114138481A (zh) 2022-03-04

Similar Documents

Publication Publication Date Title
WO2023093043A1 (zh) 一种数据处理方法、装置及介质
EP3706394B1 (en) Writes to multiple memory destinations
US11025544B2 (en) Network interface for data transport in heterogeneous computing environments
US11481346B2 (en) Method and apparatus for implementing data transmission, electronic device, and computer-readable storage medium
US9485310B1 (en) Multi-core storage processor assigning other cores to process requests of core-affined streams
WO2022156370A1 (zh) 一种基于fpga的dma设备及dma数据搬移方法
EP3358463A1 (en) Method, device and system for implementing hardware acceleration processing
US10346342B1 (en) Uniform memory access architecture
WO2019233322A1 (zh) 资源池的管理方法、装置、资源池控制单元和通信设备
WO2023103296A1 (zh) 一种写数据高速缓存的方法、系统、设备和存储介质
US11403250B2 (en) Operation accelerator, switch, task scheduling method, and processing system
WO2018076882A1 (zh) 存储设备的操作方法及物理服务器
US20150268985A1 (en) Low Latency Data Delivery
Shim et al. Design and implementation of initial OpenSHMEM on PCIe NTB based cloud computing
US20230342086A1 (en) Data processing apparatus and method, and related device
US10397140B2 (en) Multi-processor computing systems
US8176304B2 (en) Mechanism for performing function level reset in an I/O device
KR20150090621A (ko) 스토리지 장치 및 데이터 처리 방법
WO2022199357A1 (zh) 数据处理方法及装置、电子设备、计算机可读存储介质
US20230153174A1 (en) Device selection for workload execution
EP4227789A1 (en) Method for order-preserving execution of write requests and network device
US10938875B2 (en) Multi-processor/endpoint data duplicating system
WO2024037239A1 (zh) 一种加速器调度方法及相关装置
WO2022193108A1 (zh) 一种集成芯片及数据搬运方法
CN116383127B (zh) 节点间通信方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897139

Country of ref document: EP

Kind code of ref document: A1