WO2023231330A1 - 一种池化平台的数据处理方法、装置、设备和介质 - Google Patents
一种池化平台的数据处理方法、装置、设备和介质 Download PDFInfo
- Publication number
- WO2023231330A1 WO2023231330A1 PCT/CN2022/134802 CN2022134802W WO2023231330A1 WO 2023231330 A1 WO2023231330 A1 WO 2023231330A1 CN 2022134802 W CN2022134802 W CN 2022134802W WO 2023231330 A1 WO2023231330 A1 WO 2023231330A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- application
- application data
- data
- pooling
- information
- Prior art date
Links
- 238000011176 pooling Methods 0.000 title claims abstract description 54
- 238000003672 processing method Methods 0.000 title claims abstract description 36
- 238000004364 calculation method Methods 0.000 claims abstract description 107
- 230000001133 acceleration Effects 0.000 claims abstract description 85
- 238000012545 processing Methods 0.000 claims abstract description 75
- 238000000034 method Methods 0.000 claims abstract description 61
- 230000005540 biological transmission Effects 0.000 claims abstract description 50
- 230000003993 interaction Effects 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 46
- 238000004590 computer program Methods 0.000 claims description 14
- 230000006837 decompression Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 238000013500 data storage Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- FFBHFFJDDLITSX-UHFFFAOYSA-N benzyl N-[2-hydroxy-4-(3-oxomorpholin-4-yl)phenyl]carbamate Chemical compound OC1=C(NC(=O)OCC2=CC=CC=C2)C=CC(=C1)N1CCOCC1=O FFBHFFJDDLITSX-UHFFFAOYSA-N 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
Definitions
- This application relates to the field of distributed application technology, and in particular to a data processing method, device, equipment and non-volatile readable storage medium for a pooled platform.
- FPGA Field Programmable Gate Array
- the deployment form of the FPGA accelerator card can be the coordination of the host server. processor. It can also be in the form of FPGA BOX (Field Programmable Gate Array Box), that is, machine-card decoupling. There is no server, and only the accelerator card exists as an independent accelerator unit. In the traditional way, one form of FPGA accelerator card deployment is as a co-processor of the host server, and the other form is in the form of FPGA BOX. FPGA accelerator cards exchange data through transmission protocols.
- the acceleration of the FPGA pooling platform involves two aspects, including acceleration within the FPGA accelerator card and acceleration of data transmission between FPGA accelerator cards.
- the FPGA accelerator card logic consists of three parts: the Kernel (operating system kernel) accelerated computing unit that can be dynamically reconfigured according to the application, the Memory unit used to store data, and the PCIe interface (Physical Interface for PCI Express) used to connect to peripherals. , physical layer interface) or MAC (Media Access Layer, media layer) interface.
- the data acceleration process of the FPGA pooling platform includes transferring the application to be accelerated from the host server to the Memory unit of the FPGA accelerator card through the PCIe interface; the host configures the Kernel accelerated computing unit to retrieve data from the Memory unit for accelerated computing; the host or Kernel accelerates the computing.
- the unit configures DMA IP (Direct Memory Access Intellectual Property, direct memory access soft core), and transmits the calculation results back to the host through the PCIe interface or to other FPGA accelerator cards on the pooling platform through the MAC interface.
- DMA IP Direct Memory Access Intellectual Property, direct memory access soft core
- RDMA Remote Direct Memory Access
- the acceleration method of the FPGA pooling platform separates calculation and transmission.
- the host or remote Kernel accelerated computing unit configures the local Kernel accelerated computing unit to accelerate the calculation, and then the local Kernel or host configures the RDMA IP to initiate RDMA data movement.
- multiple configuration processes are required, which increases the overall processing delay and weakens the acceleration advantages of the FPGA pool platform.
- the purpose of the embodiments of this application is to provide a data processing method, device, equipment and non-volatile readable storage medium for a pooling platform, which can reduce the processing delay of the pooling platform.
- embodiments of the present application provide a data processing method for a pooling platform, including:
- the configuration information includes operation identification, address information and calculation information that match the application acceleration requirements;
- the application data is processed according to the operation identification and calculation information, and the processed application data is transmitted to the board pointed by the address information.
- the operation ends until the application data is processed on different boards in the pooling platform.
- the calculation information when application acceleration requirements correspond to multiple Kernel modules on the FPGA board, and at least one Kernel module corresponds to multi-instruction calculations, the calculation information includes operation sequence instructions and instruction addresses; where the instruction address points to the application Instructions required to accelerate requirements;
- Processing of application data based on operation identification and calculation information includes:
- Kernel modules follow the operation sequence instructions and sequentially call the instructions pointed to by the instruction addresses to process the application data.
- the calculation information includes the instructions required for the application acceleration requirement
- Processing of application data based on operation identification and calculation information includes:
- Kernel modules process application data according to their corresponding instructions.
- the calculation information when the application acceleration requirement corresponds to a Kernel module for internal calculation on the FPGA board, the calculation information includes an instruction address; wherein the instruction address points to the internal calculation instruction required by the application acceleration requirement;
- Processing of application data based on operation identification and calculation information includes:
- the Kernel module calls internal computing instructions to process application data based on the instruction address.
- the address information includes the target board ID, a read-write identifier determined based on the calculation information and the remote direct data access operation identifier, and a remote direct data storage identifier. Get the transfer length of the operation.
- the address information includes the target board ID.
- the configuration information also includes a packet sequence number
- packet loss prompt information carrying the missing sequence number is fed back to the host server.
- Embodiments of the present application also provide a data processing device for a pooled platform, including an adding unit, a receiving unit, a processing unit and a transmission unit;
- Adding a unit for adding configuration information to the custom fields of the transmission protocol based on application acceleration requirements wherein the configuration information includes operation identification, address information and calculation information that match the application acceleration requirements;
- the receiving unit is used to receive application data transmitted by the host server
- Processing unit used to process application data based on operation identification and calculation information
- the transmission unit is used to transmit the processed application data to the board pointed by the address information. The operation ends until the processing of the application data on different boards in the pooling platform is completed.
- the calculation information when application acceleration requirements correspond to multiple Kernel modules on the FPGA board, and at least one Kernel module corresponds to multi-instruction calculations, the calculation information includes operation sequence instructions and instruction addresses; where the instruction address points to the application Instructions required to accelerate requirements;
- the processing unit is used to sequentially call the instructions pointed to by the instruction addresses to process the application data according to the operation sequence instructions of multiple Kernel modules.
- the calculation information includes the instructions required for the application acceleration requirement
- a processing unit is used to process application data by multiple Kernel modules according to their corresponding instructions.
- the calculation information when the application acceleration requirement corresponds to a Kernel module for internal calculation on the FPGA board, the calculation information includes an instruction address; wherein the instruction address points to the internal calculation instruction required by the application acceleration requirement;
- the processing unit is used to call the Kernel module to internal computing instructions according to the instruction address to process the application data.
- the address information includes the target board ID, a read-write identifier determined based on the calculation information and the remote direct data access operation identifier, and a remote direct data storage identifier. Get the transfer length of the operation.
- the address information includes the target board ID.
- the configuration information also includes a packet sequence number; the device also includes a judgment unit and a feedback unit;
- the feedback unit is used to feed back packet loss prompt information carrying the missing sequence number to the host server when the processed application data does not match the packet sequence number.
- An embodiment of the present application also provides an electronic device, including:
- Memory used to store computer programs
- the processor is configured to execute a computer program to implement the steps of the data processing method of the above-mentioned pooling platform.
- Embodiments of the present application also provide a non-volatile readable storage medium.
- a computer program is stored on the non-volatile readable storage medium.
- the computer program is executed by the processor, the data processing method of the above-mentioned pooling platform is implemented. step.
- configuration information is added to the custom fields of the transmission protocol; the configuration information may include operation identification, address information and calculation information that match the application acceleration requirements.
- the operation identifier is used to indicate the type of operation that needs to be performed
- the address information is used to indicate the board that processes the application data
- the calculation information is used to indicate the specific operation that needs to be performed on the application data.
- the application data can be processed directly based on the configuration information in the transmission protocol, reducing the number of boards.
- the number of interactions between cards is configured, thereby reducing latency and improving the heterogeneous acceleration performance of the pooled platform.
- the original protocol fields are simplified, thereby simplifying the internal processing logic and further improving processing performance.
- Figure 1 is a flow chart of a data processing method of a pooling platform provided by an embodiment of the present application
- Figure 2 is a schematic structural diagram of a pooling platform provided by an embodiment of the present application.
- Figure 3 is a schematic structural diagram of a pooling platform for application data processing based on two FPGA accelerator cards provided by the embodiment of the present application;
- Figure 4 is a schematic structural diagram of a data processing device of a pooling platform provided by an embodiment of the present application.
- Figure 5 is a structural diagram of an electronic device provided by an embodiment of the present application.
- Figure 6 is a schematic structural diagram of a non-volatile readable storage medium disclosed in this application.
- Figure 1 is a flow chart of a data processing method of a pooling platform provided by an embodiment of the present application. The method includes:
- the configuration information may include operation identification, address information and calculation information that match application acceleration requirements.
- the operation identifier is used to indicate the type of operation that needs to be performed, the address information is used to indicate the board that processes the application data, and the calculation information is used to indicate the specific operation that needs to be performed on the application data.
- the transmission protocol may adopt the RDMA_Enhance transmission protocol (Remote Direct Data Access Transmission Protocol).
- the format of the RDMA_Enhance transmission protocol is shown in Table 1.
- Eth L2Header, IP Header and UDP Header are standard Ethernet header fields
- RDMA Enhance is a custom field
- Payload represents the message load
- ICRC and FCS correspond to redundancy detection and frame verification respectively.
- custom fields can be set based on actual application acceleration requirements. For a commonly used custom field format, see Table 2.
- opcode is an operation identifier, which may include an RDMA operation identifier (remote direct data access operation identifier) and a Stream operation identifier (stream operation identifier).
- dqp represents the target board ID.
- cal_code represents customized calculation information.
- psn represents the packet sequence number, which is used to verify the integrity of the data.
- addr represents the read-write identifier defined based on the operation identifier and calculation information.
- len represents the transmission length during RDMA operation.
- the custom fields can be divided into bytes, and the configuration information can be set for the divided bytes to ensure that each board in the pooling platform can complete the application data based on the configuration information. processing.
- S102 Receive application data transmitted by the host server.
- the pooling platform can include multiple FPGA boards.
- the PCIe DMA module in the FPGA board can interact with the host server.
- the host server can transmit application data to the PCIe DMA module of the FPGA board.
- S103 Process the application data according to the operation identification and calculation information, and transmit the processed application data to the board pointed by the address information. The operation ends until the processing of the application data on different boards in the pooling platform is completed.
- the FPGA board contains a Kernel module that processes application data. Based on different application acceleration requirements, the number of FPGA boards required to be called and the Kernel modules involved in each FPGA board are different.
- the calculation information may include instructions required for application acceleration requirements.
- the calculation information can include the instruction address, and the instruction address can Used to point to instructions required for application acceleration needs.
- the calculation information may also include operation sequence instructions used to indicate the operation sequence of multiple Kernel modules.
- the calculation information may include operation sequence instructions and instruction addresses; where the instruction address points to the application acceleration requirements. instructions.
- the process of the FPGA board processing the application data based on the operation identification and calculation information may include multiple Kernel modules following the operation sequence instructions and sequentially calling the instructions pointed to by the instruction addresses to process the application data.
- the calculation information may include instructions required for application acceleration requirements.
- the process of the FPGA board processing the application data based on the operation identification and calculation information may include multiple Kernel modules processing the application data according to their corresponding instructions.
- the calculation information may include an instruction address; where the instruction address points to the internal calculation instruction required by the application acceleration requirement.
- the process of the FPGA board processing application data based on the operation identifier and calculation information may include the Kernel module calling internal calculation instructions based on the instruction address to process the application data.
- the types of operations performed by the FPGA board can include RDMA operations and Stream operations. Therefore, the operation identifier may include an RDMA operation identifier and a Stream operation identifier.
- the address information may include the target board ID (Identity Document, identity identifier), a read-write identifier determined based on the calculation information and the RDMA operation identifier, and the transmission length of the RDMA operation.
- the address information can only include the target board ID.
- the FPGA board is used to accelerate the processing of application data, so the FPGA board can be called an FPGA accelerator card.
- FIG. 2 is a schematic structural diagram of a pooling platform provided by an embodiment of the present application.
- Figure 2 takes three FPGA accelerator cards as an example.
- the leftmost FPGA accelerator card can be used as a co-processor of the host server.
- the two FPGA accelerator cards on the right exist as independent acceleration units in the form of FPGA BOX.
- Each FPGA accelerator card can include PCIe DMA module, Memory module, DMA module, Stream module, MAC module and at least one Kernel module.
- the arrows in Figure 2 are used to indicate the flow of application data. Data interaction between different FPGA accelerator cards can be achieved through the exchange unit.
- RDMA_Enhance is marked between each FPGA accelerator card and the switching unit to indicate data interaction between different FPGA accelerator cards according to the RDMA_Enhance transmission protocol.
- the two FPGA accelerator cards can be called FPGA accelerator card 1 and FPGA accelerator card 2 respectively.
- the Kernel acceleration calculation of the two FPGA accelerator cards is programmed. unit, FPGA
- Acceleration card 1 uses three Kernel modules, which are used to implement decompression, internal calculation and encryption functions.
- FPGA accelerator card 2 uses two Kernel modules to implement decryption and internal calculation functions respectively.
- the process of adding configuration information to the custom fields of the transmission protocol may include the host server configuring the register and configuring the local FPGA accelerator card 1 based on the RDMA_Enhance protocol.
- the configuration information is as follows:
- opcode_1 PCIe DMA module input data, Stream module output data;
- cal_code_1 3 kernel module sequential calculation mode
- addr_1 Memory read address
- len_1 The Kernel module reads the data length from the Memory module.
- opcode_2 The Stream module inputs data and the DMA module outputs data; among them, the destination Memory is the remote host memory;
- cal_code_2 2 Kernel modules obtain instruction sets from the storage unit for processing
- addr_2 Memory write address
- len_2 The length of data written by the Kernel module to the Memory module.
- FIG. 3 is a schematic structural diagram of a pooling platform for application data processing based on two FPGA accelerator cards provided by the embodiment of the present application.
- the two FPGA accelerator cards can be called FPGA accelerator card 1 and FPGA accelerator respectively.
- Card 2 The Kernel unit in FPGA accelerator card 1 contains three Kernel modules, namely Kernel1, kernel2 and Kernel3.
- the Kernel unit in FPGA accelerator card 2 contains two Kernel modules, namely Kernel1 and Kernel2. It should be noted that Kernel1 of FPGA accelerator card 1 and Kernel1 of FPGA accelerator card 2 perform different operations, and Kernel2 of FPGA accelerator card 1 and Kernel2 of FPGA accelerator card 2 perform different operations.
- the labels between different modules in Figure 3 are used to indicate the processing sequence of application data.
- the application data processing flow includes the following steps: 1 The compressed application data is stored from the host server to Memory through the PCIe DMA module on FPGA accelerator card 1; 2Kernel1 detects that the internal DMA controller sets the completion signal to 1 and starts reading data from Memory; 3Kernel1 starts decompression calculation and transmits the calculation results to Kernel2; 4Kernel2 starts the first stage calculation of the custom algorithm model and transmits the results after completion to Kernel3; 5 Kernel3 starts encrypted calculation and sends the calculation results to the target board, FPGA accelerator card 2, through Stream mode and based on the RDMA_Enhance protocol; 6 The target board receives the RDMA_Enhance protocol message, parses and extracts the relevant fields and sends it to logic ) module and the Kernel module; at the same time, the data part in the message is sent to Kernel1; 7Kernel1 performs the decryption calculation and sends the
- the existing technology requires five configuration operations to be performed when performing accelerated computing tasks on application data, including: (1) After completing the data storage of process 1, a configuration that triggers Kernel calculation needs to be initiated; (2) Before process 5 starts, , a configuration is required to trigger data migration; (3) After completing the operation of process 6, a configuration is required to trigger the Kernel calculation, which can be done by configuring the Ethernet package; (4) Before process 8, after Kernel2 completes the calculation, the trigger data needs to be configured once Storage; (5) A configuration is required before process 9 to trigger data migration.
- the existing technology requires five configuration operations to complete the application data processing in the example of this application.
- the embodiment of this application uses the custom RDMA_Enhance protocol to simplify the content of the RDMA protocol. It only needs to be based on the application acceleration requirements before processing the application data. Just add configuration information to the custom field, that is, the processing of application data can be completed through one configuration, which simplifies the internal processing logic and effectively improves the efficiency of application data processing.
- configuration information is added to the custom fields of the transmission protocol; the configuration information may include operation identification, address information and calculation information that match the application acceleration requirements.
- the operation identifier is used to indicate the type of operation that needs to be performed
- the address information is used to indicate the board that processes the application data
- the calculation information is used to indicate the specific operation that needs to be performed on the application data.
- the application data can be processed directly based on the configuration information in the transmission protocol, reducing the number of boards.
- the number of interactions between cards is configured, thereby reducing latency and improving the heterogeneous acceleration performance of the pooled platform.
- the original protocol fields are simplified, thereby simplifying the internal processing logic and further improving processing performance.
- the packet sequence number in order to implement packet loss detection on application data, can be set in the configuration information. After the FPGA board transmits the processed application data to the board pointed by the address information, it can determine whether the processed application data matches the packet serial number; in the case where the processed application data does not match the packet serial number, Packet loss prompt information carrying missing sequence numbers can be fed back to the host server.
- Figure 4 is a schematic structural diagram of a data processing device of a pooling platform provided by an embodiment of the present application, including an adding unit 41, a receiving unit 42, a processing unit 43 and a transmission unit 44;
- Adding unit 41 is used to add configuration information to the custom fields of the transmission protocol based on application acceleration requirements; wherein the configuration information includes operation identification, address information and calculation information that match the application acceleration requirements;
- the receiving unit 42 is used to receive application data transmitted by the host server;
- the processing unit 43 is used to process application data according to the operation identification and calculation information
- the transmission unit 44 is used to transmit the processed application data to the board pointed by the address information. The operation ends until the processing of the application data on different boards in the pooling platform is completed.
- the operation identifier is used to indicate the type of operation that needs to be performed, the address information is used to indicate the board that processes the application data, and the calculation information is used to indicate the specific operation that needs to be performed on the application data.
- the custom fields can be divided into bytes, and the configuration information can be set for the divided bytes to ensure that each board in the pooling platform can complete the application data based on the configuration information. processing.
- the calculation information when application acceleration requirements correspond to multiple Kernel modules on the FPGA board, and at least one Kernel module corresponds to multi-instruction calculations, the calculation information includes operation sequence instructions and instruction addresses; where the instruction address points to the application Instructions required to accelerate requirements;
- the processing unit is used to sequentially call the instructions pointed to by the instruction addresses to process the application data according to the operation sequence instructions of multiple Kernel modules.
- the calculation information includes the instructions required for the application acceleration requirement
- a processing unit is used to process application data by multiple Kernel modules according to their corresponding instructions.
- the calculation information when the application acceleration requirement corresponds to a Kernel module for internal calculation on the FPGA board, the calculation information includes an instruction address; wherein the instruction address points to the internal calculation instruction required by the application acceleration requirement;
- the processing unit is used to call the Kernel module to internal computing instructions according to the instruction address to process the application data.
- the address information includes the target board ID, a read-write identifier determined based on the calculation information and the remote direct data access operation identifier, and a remote direct data storage identifier. Get the transmission length of the operation identifier.
- the address information includes the target board ID.
- the configuration information also includes a packet sequence number; the device also includes a judgment unit and a feedback unit;
- the feedback unit is used to feed back packet loss prompt information carrying the missing sequence number to the host server when the processed application data does not match the packet sequence number.
- configuration information is added to the custom fields of the transmission protocol; the configuration information may include operation identification, address information and calculation information that match the application acceleration requirements.
- the operation identifier is used to indicate the type of operation that needs to be performed
- the address information is used to indicate the board that processes the application data
- the calculation information is used to indicate the specific operation that needs to be performed on the application data.
- the application data can be processed directly based on the configuration information in the transmission protocol, reducing the number of boards.
- the number of interactions between cards is configured, thereby reducing latency and improving the heterogeneous acceleration performance of the pooled platform.
- the original protocol fields are simplified, thereby simplifying the internal processing logic and further improving processing performance.
- Figure 5 is a structural diagram of an electronic device provided by an embodiment of the present application. As shown in Figure 5, the electronic device includes: a memory 20 for storing computer programs;
- the processor 21 is configured to implement the steps of the data processing method of the pooling platform in the above embodiment when executing the computer program.
- Electronic devices provided in this embodiment may include, but are not limited to, smartphones, tablets, laptops, or desktop computers.
- the processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc.
- the processor 21 can adopt at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array).
- the processor 21 may also include a main processor and a co-processor.
- the main processor is a processor used to process data in the wake-up state, also called a CPU (Central Processing Unit, central processing unit); the co-processor is A low-power processor used to process data in standby mode.
- the processor 21 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing the content that needs to be displayed on the display screen.
- the processor 21 may also include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.
- AI Artificial Intelligence, artificial intelligence
- Memory 20 may include one or more non-volatile readable storage media, which may be non-transitory.
- the memory 20 may also include high-speed random access memory, and non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices.
- the memory 20 is at least used to store the following computer program 201. After the computer program is loaded and executed by the processor 21, the relevant steps of the data processing method of the pooling platform disclosed in any of the foregoing embodiments can be implemented.
- the resources stored in the memory 20 may also include the operating system 202, data 203, etc., and the storage method may be short-term storage or permanent storage.
- the operating system 202 may include Windows, Unix, Linux, etc.
- Data 203 may include but is not limited to configuration information and the like.
- the electronic device may also include a display screen 22 , an input-output interface 23 , a communication interface 24 , a power supply 25 and a communication bus 26 .
- FIG. 5 does not constitute a limitation on the electronic device, and may include more or fewer components than shown in the figure.
- the data processing method of the pooling platform in the above embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , execute all or part of the steps of the methods of various embodiments of this application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), electrically erasable programmable ROM, register, hard disk, removable memory.
- Various media that can store program code such as removable disks, CD-ROMs, magnetic disks or optical disks.
- the embodiment of the present application also provides a non-volatile readable storage medium.
- the non-volatile readable storage medium 30 stores a computer program 31, and the computer program 31 is executed by the processor.
- the steps to implement the data processing method of the above-mentioned pooling platform are as follows.
- each functional module of the non-volatile readable storage medium in the embodiment of the present application can be specifically implemented according to the method in the above method embodiment.
- the specific implementation process reference can be made to the relevant description of the above method embodiment, which will not be described again here.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
本申请实施例公开了一种池化平台的数据处理方法、装置、设备和介质,基于应用加速需求,对传输协议的自定义字段添加配置信息;配置信息包括与应用加速需求相匹配的操作标识、地址信息和计算信息。依据操作标识和计算信息对主机服务器传输的应用数据进行处理,将处理后的应用数据传输至地址信息指向的板卡,直至完成应用数据在池化平台中不同板卡上的处理,则结束操作。通过在传输协议中添加用于对应用数据进行处理的配置信息,直接依据配置信息即可实现对应用数据的处理,减少了板卡间配置交互次数,降低了延时,提高了池化平台异构加速性能。并且通过在传输协议的自定义字段中设置配置信息,简化了原本的协议字段,进一步提高了处理性能。
Description
相关申请的交叉引用
本申请要求于2022年05月31日提交中国专利局、申请号202210609570.5、申请名称为“一种池化平台的数据处理方法、装置、设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及分布式应用技术领域,特别是涉及一种池化平台的数据处理方法、装置、设备和非易失性可读存储介质。
在FPGA(Field Programmable Gate Array,现场可编程门阵列)池化平台中,大量的FPGA加速卡组成加速资源池,用于分布式应用的加速处理,FPGA加速卡的部署形式可以是主机服务器的协处理器。也可以是FPGA BOX(现场可编程门阵列盒子)形态即机卡解耦,没有服务器,只有加速卡作为独立的加速单元存在。传统方式中FPGA加速卡部署的一种形式是作为主机服务器的协处理器,另一种形式是FPGA BOX形态。FPGA加速卡之间通过传输协议进行数据交互。
FPGA池化平台的加速涉及两方面,包括FPGA加速卡内的加速和FPGA加速卡之间的数据传输加速。FPGA加速卡逻辑由三部分组成,根据应用可动态重配置的Kernel(操作系统内核)加速计算单元,用于存储数据的Memory单元,用于与外设进行连接的PCIe接口(Physical Interface for PCI Express,物理层接口)或MAC(Media Access Layer,媒介层)接口。
FPGA池化平台的数据加速流程包括待加速应用通过PCIe接口从主机服务器传输到FPGA加速卡的Memory(内存)单元;主机配置Kernel加速计算单元从Memory单元取数进行加速计算;主机或Kernel加速计算单元配置DMA IP(Direct Memory Access Intellectual Property,直接存储器访问软核),将计算结果通过PCIe接口传回主机或通过MAC接口传给池化平台其他FPGA加速卡。
目前FPGA加速卡间通常通过RDMA(Remote Direct Memory Access,远程直接数据存取)技术实现数据的传输。但是FPGA池化平台的加速方法中计算和传输分离,主机或远端Kernel加速计算单元配置本地Kernel加速计算单元完成计算的加速,再由本地Kernel或 主机配置RDMA IP发起RDMA数据搬移。一个应用在FPGA池化平台完成加速需要多次的配置过程,增加了总体处理延时,减弱了FPGA池化平台加速的优势。
可见,如何降低池化平台的处理延时,是本领域技术人员需要解决的问题。
发明内容
本申请实施例的目的是提供一种池化平台的数据处理方法、装置、设备和非易失性可读存储介质,可以降低池化平台的处理延时。
为解决上述技术问题,本申请实施例提供一种池化平台的数据处理方法,包括:
基于应用加速需求,对传输协议的自定义字段添加配置信息;其中,配置信息包括与应用加速需求相匹配的操作标识、地址信息和计算信息;
接收主机服务器传输的应用数据;
依据操作标识和计算信息对应用数据进行处理,将处理后的应用数据传输至地址信息指向的板卡,直至完成应用数据在池化平台中不同板卡上的处理,则结束操作。
在一些实施例中,在应用加速需求对应FPGA板卡上的多个Kernel模块,并且至少一个Kernel模块对应多指令计算的情况下,计算信息包括操作顺序指令和指令地址;其中,指令地址指向应用加速需求所需的指令;
依据操作标识和计算信息对应用数据进行处理包括:
多个Kernel模块按照操作顺序指令,依次调用指令地址指向的指令对应用数据进行处理。
在一些实施例中,在应用加速需求对应FPGA板卡上的多个Kernel模块,并且每个Kernel模块对应单指令计算的情况下,计算信息包括应用加速需求所需的指令;
依据操作标识和计算信息对应用数据进行处理包括:
多个Kernel模块按照各自对应的指令对应用数据进行处理。
在一些实施例中,在应用加速需求对应FPGA板卡上的一个用于内部计算的Kernel模块的情况下,计算信息包括指令地址;其中,指令地址指向应用加速需求所需的内部计算指令;
依据操作标识和计算信息对应用数据进行处理包括:
Kernel模块依据指令地址调用内部计算指令对应用数据进行处理。
在一些实施例中,在操作标识为远程直接数据存取操作标识的情况下,地址信息包括目标板卡ID、依据计算信息和远程直接数据存取操作标识确定的读写标识以及远程直接数据 存取操作的传输长度。
在一些实施例中,在操作标识为流操作标识的情况下,地址信息包括目标板卡ID。
在一些实施例中,配置信息还包括包序列号;
在将处理后的应用数据传输至地址信息指向的板卡之后还包括:
判断处理后的应用数据是否与包序列号匹配;
在处理后的应用数据与包序列号不匹配的情况下,向主机服务器反馈携带有缺失序列号的丢包提示信息。
本申请实施例还提供了一种池化平台的数据处理装置,包括添加单元、接收单元、处理单元和传输单元;
添加单元,用于基于应用加速需求,对传输协议的自定义字段添加配置信息;其中,配置信息包括与应用加速需求相匹配的操作标识、地址信息和计算信息;
接收单元,用于接收主机服务器传输的应用数据;
处理单元,用于依据操作标识和计算信息对应用数据进行处理;
传输单元,用于将处理后的应用数据传输至地址信息指向的板卡,直至完成应用数据在池化平台中不同板卡上的处理,则结束操作。
在一些实施例中,在应用加速需求对应FPGA板卡上的多个Kernel模块,并且至少一个Kernel模块对应多指令计算的情况下,计算信息包括操作顺序指令和指令地址;其中,指令地址指向应用加速需求所需的指令;
处理单元,用于将多个Kernel模块按照操作顺序指令,依次调用指令地址指向的指令对应用数据进行处理。
在一些实施例中,在应用加速需求对应FPGA板卡上的多个Kernel模块,并且每个Kernel模块对应单指令计算的情况下,计算信息包括应用加速需求所需的指令;
处理单元,用于将多个Kernel模块按照各自对应的指令对应用数据进行处理。
在一些实施例中,在应用加速需求对应FPGA板卡上的一个用于内部计算的Kernel模块的情况下,计算信息包括指令地址;其中,指令地址指向应用加速需求所需的内部计算指令;
处理单元,用于将Kernel模块依据指令地址调用内部计算指令对应用数据进行处理。
在一些实施例中,在操作标识为远程直接数据存取操作标识的情况下,地址信息包括目标板卡ID、依据计算信息和远程直接数据存取操作标识确定的读写标识以及远程直接数据 存取操作的传输长度。
在一些实施例中,在操作标识为流操作标识的情况下,地址信息包括目标板卡ID。
在一些实施例中,配置信息还包括包序列号;装置还包括判断单元和反馈单元;
判断单元,用于判断处理后的应用数据是否与包序列号匹配;
反馈单元,用于在处理后的应用数据与包序列号不匹配的情况下,向主机服务器反馈携带有缺失序列号的丢包提示信息。
本申请实施例还提供了一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序以实现如上述池化平台的数据处理方法的步骤。
本申请实施例还提供了一种非易失性可读存储介质,非易失性可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述池化平台的数据处理方法的步骤。
由上述技术方案可以看出,基于应用加速需求,对传输协议的自定义字段添加配置信息;其中,配置信息可以包括与应用加速需求相匹配的操作标识、地址信息和计算信息。操作标识用于指示所需执行的操作类型,地址信息用于指示对应用数据进行处理的板卡,计算信息用于指示对应用数据所需执行的具体操作。接收主机服务器传输的应用数据;依据操作标识和计算信息对应用数据进行处理,将处理后的应用数据传输至地址信息指向的板卡,直至完成应用数据在池化平台中不同板卡上的处理,则结束操作。在该技术方案中,通过在传输协议中添加用于对应用数据进行处理的配置信息,在接收到应用数据后,直接依据传输协议中的配置信息即可实现对应用数据的处理,减少了板卡间配置交互次数,从而降低了延时,提高了池化平台异构加速性能。并且通过在传输协议的自定义字段中根据实际的应用加速需求设置配置信息,简化了原本的协议字段,从而简化了内部处理逻辑,进一步提高了处理性能。
为了更清楚地说明本申请实施例,下面将对实施例中所需要使用的附图做简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种池化平台的数据处理方法的流程图;
图2为本申请实施例提供的一种池化平台的结构示意图;
图3为本申请实施例提供的一种基于两块FPGA加速卡进行应用数据处理的池化平台的 结构示意图;
图4为本申请实施例提供的一种池化平台的数据处理装置的结构示意图;
图5为本申请实施例提供的一种电子设备的结构图;
图6为本申请公开的一种非易失性可读存储介质的结构示意图。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下,所获得的所有其他实施例,都属于本申请保护范围。
本申请的说明书和权利要求书及上述附图中的术语“包括”和“具有”以及他们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可包括没有列出的步骤或单元。
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。
接下来,详细介绍本申请实施例所提供的一种池化平台的数据处理方法。图1为本申请实施例提供的一种池化平台的数据处理方法的流程图,该方法包括:
S101:基于应用加速需求,对传输协议的自定义字段添加配置信息。
其中,配置信息可以包括与应用加速需求相匹配的操作标识、地址信息和计算信息。
操作标识用于指示所需执行的操作类型,地址信息用于指示对应用数据进行处理的板卡,计算信息用于指示对应用数据所需执行的具体操作。
在本申请实施例中,传输协议可以采用RDMA_Enhance传输协议(远程直接数据存取传输协议)。RDMA_Enhance传输协议的格式如表1所示,
表1
其中,Eth L2Header、IP Header和UDP Header为标准的以太头字段,RDMA Enhance为自定义字段,Payload表示消息负载,ICRC和FCS分别对应冗余检测和帧校验。
自定义字段的格式可以基于实际的应用加速需求设置,一种常用的自定义字段的格式可 以参见表2,
表2
其中,opcode为操作标识,可以包括RDMA操作标识(远程直接数据存取操作标识)和Stream操作标识(流操作标识)。dqp表示目标板卡ID。cal_code表示自定义的计算信息。psn表示包序列号,用于校验数据的完整性。addr表示根据操作标识和计算信息所定义的读写标识。len表示RDMA操作时的传输长度。
不同的应用加速需求所调用的板卡以及每个板卡所需执行的操作会有所不同。因此针对于当前的应用加速需求,可以对自定义字段进行字节的划分,对划分后的字节设定好配置信息,从而保证池化平台中各板卡依赖于配置信息可以完成对应用数据的处理。
S102:接收主机服务器传输的应用数据。
池化平台可以包括多个FPGA板卡,FPGA板卡中的PCIe DMA模块可以实现与主机服务器的交互,在实际应用中,主机服务器可以将应用数据传输至FPGA板卡的PCIe DMA模块。
S103:依据操作标识和计算信息对应用数据进行处理,将处理后的应用数据传输至地址信息指向的板卡,直至完成应用数据在池化平台中不同板卡上的处理,则结束操作。
FPGA板卡中包含有对应用数据进行处理的Kernel模块。基于应用加速需求的不同,所需调用的FPGA板卡个数以及每个FPGA板卡所涉及的Kernel模块均有所不同。
在实际应用中,应用加速需求不同,对应的计算信息有所不同。对于通过简单的算法可以完成应用加速需求的场景,计算信息可以包括应用加速需求所需的指令。
对于需要通过复杂的算法才能够完成应用加速需求的场景,往往需要调用可以实现内部计算的Kernel模块或者调用多个Kernel模块才能完成对应用数据的处理,因此计算信息可以包括指令地址,指令地址可以用于指向应用加速需求所需的指令。对于调用多个Kernel模块的场景,计算信息中还可以包括用于指示多个Kernel模块操作顺序的操作顺序指令。
以应用加速需求对应FPGA板卡上的多个Kernel模块,并且至少一个Kernel模块对应多指 令计算的情况为例,计算信息可以包括操作顺序指令和指令地址;其中,指令地址指向应用加速需求所需的指令。
FPGA板卡依据操作标识和计算信息对应用数据进行处理的过程可以包括多个Kernel模块按照操作顺序指令,依次调用指令地址指向的指令对应用数据进行处理。
以应用加速需求对应FPGA板卡上的多个Kernel模块,并且每个Kernel模块对应单指令计算的情况为例,计算信息可以包括应用加速需求所需的指令。
FPGA板卡依据操作标识和计算信息对应用数据进行处理的过程可以包括多个Kernel模块按照各自对应的指令对应用数据进行处理。
以应用加速需求对应FPGA板卡上的一个用于内部计算的Kernel模块的情况为例,计算信息可以包括指令地址;其中,指令地址指向应用加速需求所需的内部计算指令。
FPGA板卡依据操作标识和计算信息对应用数据进行处理的过程可以包括Kernel模块依据指令地址调用内部计算指令对应用数据进行处理。
在实际应用中,FPGA板卡执行的操作类型可以包括RDMA操作和Stream操作。因此操作标识可以包括RDMA操作标识和Stream操作标识。
在操作标识为RDMA操作标识的情况下,地址信息可以包括目标板卡ID(Identity Document,身份标识)、依据计算信息和RDMA操作标识确定的读写标识以及RDMA操作的传输长度。
Stream操作用于实现应用数据在不同FPGA板卡之间的传输,因此在操作标识为Stream操作标识的情况下,地址信息可以只包括目标板卡ID。
在池化平台中FPGA板卡用于实现应用数据的加速处理,因此可以将FPGA板卡称作FPGA加速卡。
如图2所示为本申请实施例提供的一种池化平台的结构示意图,图2中是以三个FPGA加速卡为例,最左侧的FPGA加速卡可以作为主机服务器的协处理器,右侧的两个FPGA加速卡以FPGA BOX形态作为独立的加速单元存在。每个FPGA加速卡中可以包括PCIe DMA模块、Memory模块、DMA模块、Stream模块、MAC模块以及至少一个Kernel模块。图2中箭头用于表示应用数据的流向。不同FPGA加速卡之间可以通过交换单元实现数据的交互。图2中在各FPGA加速卡与交换单元之间标记RDMA_Enhance,用于表示不同FPGA加速卡间按照RDMA_Enhance传输协议实现数据的交互。
以利用两块FPGA加速卡进行应用数据处理的应用加速需求为例,可以将这两块FPGA加 速卡分别称作FPGA加速卡1和FPGA加速卡2,烧写两块FPGA加速卡的Kernel加速计算单元,FPGA
加速卡1使用了3个Kernel模块,分别用于实现解压缩、内部计算和加密功能。FPGA加速卡2使用了2个Kernel模块,分别实现解密和内部计算功能。
对传输协议的自定义字段添加配置信息的过程可以包括主机服务器通过寄存器配置,基于RDMA_Enhance协议配置本地FPGA加速卡1的配置信息如下:
opcode_1:PCIe DMA模块输入数据,Stream模块输出数据;
cal_code_1:3个kernel模块顺序计算模式;
dqp_1:FPGA板卡2;
addr_1:Memory读地址;
len_1:Kernel模块从Memory模块读取数据长度。
同时在本地FPGA加速卡1中配置远端FPGA加速卡2的RDMA_Enhance协议的配置信息如下:
opcode_2:Stream模块输入数据,DMA模块输出数据;其中,目的Memory为远端主机内存;
cal_code_2:2个Kernel模块从存储单元获取指令集进行处理;
dqp_2:FPGA板卡1;
addr_2:Memory写地址;
len_2:Kernel模块写入Memory模块的数据长度。
如图3所示为本申请实施例提供的一种基于两块FPGA加速卡进行应用数据处理的池化平台的结构示意图,可以将这两块FPGA加速卡分别称作FPGA加速卡1和FPGA加速卡2。FPGA加速卡1中Kernel单元包含有三个Kernel模块,分别为Kernel1、kernel2和Kernel3。FPGA加速卡2中Kernel单元包含有两个Kernel模块,分别为Kernel1和Kernel2。需要说明的是,FPGA加速卡1的Kernel1和FPGA加速卡2的Kernel1所执行的操作不同,FPGA加速卡1的Kernel2和FPGA加速卡2的Kernel2所执行的操作不同。
图3中不同模块之间的标号用于表示应用数据的处理顺序,应用数据的处理流程包括如下步骤:①经压缩的应用数据从主机服务器经FPGA加速卡1上的PCIe DMA模块存储到Memory;②Kernel1检测到内部DMA控制器把完成信号置1,开始从Memory读取数据;③Kernel1开始解压缩计算,并把计算结果传输给Kernel2;④Kernel2开始自定义算法模型第一阶段计算,完成后把结果传输给Kernel3;⑤Kernel3开始加密计算,并将计算结果通过 Stream方式并基于RDMA_Enhance协议发送给目标板卡即FPGA加速卡2;⑥目标板卡接收RDMA_Enhance协议报文,解析并提取相关字段发送给logic(逻辑)模块和Kernel模块;同时把报文中的数据部分发送给Kernel1;⑦Kernel1执行解密计算并把结果发送给Kernel2;⑧Kernel2从存储单元读取计算指令执行自定义算法模型第二阶段计算,并把计算结果存入Memory;⑨Kernel2把数据存入Memory后写入DMA内部寄存器Memory_wr_done置1,DMA开始从Memory取数并组织成基于RDMA_Enhance协议的数据传输给目标板卡即FPGA加速卡1的主机服务器,完成此次加速计算任务。
现有技术在执行应用数据的加速计算任务时需要执行5次配置操作,分别包括:(1)完成流程1的数据存储后,需要发起一次触发Kernel计算的配置;(2)在流程5开始前,需要一次配置触发数据搬移;(3)完成流程6的操作后,需要一次配置触发Kernel计算,可以通过配置以太包的方式;(4)流程8之前,Kernel2完成计算后,需要一次配置触发数据存储;(5)流程9之前需要一次配置触发数据搬移。
现有技术完成一次本申请实例中的应用数据处理需要5次配置操作,而本申请实施例使用自定义RDMA_Enhance协议简化了RDMA协议内容,只需要在处理应用数据之前基于应用加速需求,对传输协议的自定义字段添加配置信息即可,即通过一次配置便可以完成应用数据的处理,简化了内部处理逻辑,有效的提高了应用数据的处理效率。
由上述技术方案可以看出,基于应用加速需求,对传输协议的自定义字段添加配置信息;其中,配置信息可以包括与应用加速需求相匹配的操作标识、地址信息和计算信息。操作标识用于指示所需执行的操作类型,地址信息用于指示对应用数据进行处理的板卡,计算信息用于指示对应用数据所需执行的具体操作。接收主机服务器传输的应用数据;依据操作标识和计算信息对应用数据进行处理,将处理后的应用数据传输至地址信息指向的板卡,直至完成应用数据在池化平台中不同板卡上的处理,则结束操作。在该技术方案中,通过在传输协议中添加用于对应用数据进行处理的配置信息,在接收到应用数据后,直接依据传输协议中的配置信息即可实现对应用数据的处理,减少了板卡间配置交互次数,从而降低了延时,提高了池化平台异构加速性能。并且通过在传输协议的自定义字段中根据实际的应用加速需求设置配置信息,简化了原本的协议字段,从而简化了内部处理逻辑,进一步提高了处理性能。
在本申请实施例中,为了实现对应用数据的丢包检测,可以在配置信息中设置包序列号。FPGA板卡在将处理后的应用数据传输至地址信息指向的板卡之后,可以判断处理后的应 用数据是否与包序列号匹配;在处理后的应用数据与包序列号不匹配的情况下,可以向主机服务器反馈携带有缺失序列号的丢包提示信息。
图4为本申请实施例提供的一种池化平台的数据处理装置的结构示意图,包括添加单元41、接收单元42、处理单元43和传输单元44;
添加单元41,用于基于应用加速需求,对传输协议的自定义字段添加配置信息;其中,配置信息包括与应用加速需求相匹配的操作标识、地址信息和计算信息;
接收单元42,用于接收主机服务器传输的应用数据;
处理单元43,用于依据操作标识和计算信息对应用数据进行处理;
传输单元44,用于将处理后的应用数据传输至地址信息指向的板卡,直至完成应用数据在池化平台中不同板卡上的处理,则结束操作。
操作标识用于指示所需执行的操作类型,地址信息用于指示对应用数据进行处理的板卡,计算信息用于指示对应用数据所需执行的具体操作。
不同的应用加速需求所调用的板卡以及每个板卡所需执行的操作会有所不同。因此针对于当前的应用加速需求,可以对自定义字段进行字节的划分,对划分后的字节设定好配置信息,从而保证池化平台中各板卡依赖于配置信息可以完成对应用数据的处理。
在一些实施例中,在应用加速需求对应FPGA板卡上的多个Kernel模块,并且至少一个Kernel模块对应多指令计算的情况下,计算信息包括操作顺序指令和指令地址;其中,指令地址指向应用加速需求所需的指令;
处理单元,用于将多个Kernel模块按照操作顺序指令,依次调用指令地址指向的指令对应用数据进行处理。
在一些实施例中,在应用加速需求对应FPGA板卡上的多个Kernel模块,并且每个Kernel模块对应单指令计算的情况下,计算信息包括应用加速需求所需的指令;
处理单元,用于将多个Kernel模块按照各自对应的指令对应用数据进行处理。
在一些实施例中,在应用加速需求对应FPGA板卡上的一个用于内部计算的Kernel模块的情况下,计算信息包括指令地址;其中,指令地址指向应用加速需求所需的内部计算指令;
处理单元,用于将Kernel模块依据指令地址调用内部计算指令对应用数据进行处理。
在一些实施例中,在操作标识为远程直接数据存取操作标识的情况下,地址信息包括目标板卡ID、依据计算信息和远程直接数据存取操作标识确定的读写标识以及远程直接数据存 取操作标识的传输长度。
在一些实施例中,在操作标识为流操作标识的情况下,地址信息包括目标板卡ID。
在一些实施例中,配置信息还包括包序列号;装置还包括判断单元和反馈单元;
判断单元,用于判断处理后的应用数据是否与包序列号匹配;
反馈单元,用于在处理后的应用数据与包序列号不匹配的情况下,向主机服务器反馈携带有缺失序列号的丢包提示信息。
图4所对应实施例中特征的说明可以参见图1所对应实施例的相关说明,这里不再一一赘述。
由上述技术方案可以看出,基于应用加速需求,对传输协议的自定义字段添加配置信息;其中,配置信息可以包括与应用加速需求相匹配的操作标识、地址信息和计算信息。操作标识用于指示所需执行的操作类型,地址信息用于指示对应用数据进行处理的板卡,计算信息用于指示对应用数据所需执行的具体操作。接收主机服务器传输的应用数据;依据操作标识和计算信息对应用数据进行处理,将处理后的应用数据传输至地址信息指向的板卡,直至完成应用数据在池化平台中不同板卡上的处理,则结束操作。在该技术方案中,通过在传输协议中添加用于对应用数据进行处理的配置信息,在接收到应用数据后,直接依据传输协议中的配置信息即可实现对应用数据的处理,减少了板卡间配置交互次数,从而降低了延时,提高了池化平台异构加速性能。并且通过在传输协议的自定义字段中根据实际的应用加速需求设置配置信息,简化了原本的协议字段,从而简化了内部处理逻辑,进一步提高了处理性能。
图5为本申请实施例提供的一种电子设备的结构图,如图5所示,电子设备包括:存储器20,用于存储计算机程序;
处理器21,用于执行计算机程序时实现如上述实施例池化平台的数据处理方法的步骤。
本实施例提供的电子设备可以包括但不限于智能手机、平板电脑、笔记本电脑或台式电脑等。
其中,处理器21可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器21可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器21也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。
在一些实施例中,处理器21可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器21还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器20可以包括一个或多个非易失性可读存储介质,该非易失性可读存储介质可以是非暂态的。存储器20还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。本实施例中,存储器20至少用于存储以下计算机程序201,其中,该计算机程序被处理器21加载并执行之后,能够实现前述任一实施例公开的池化平台的数据处理方法的相关步骤。另外,存储器20所存储的资源还可以包括操作系统202和数据203等,存储方式可以是短暂存储或者永久存储。其中,操作系统202可以包括Windows、Unix、Linux等。数据203可以包括但不限于配置信息等。
在一些实施例中,电子设备还可包括有显示屏22、输入输出接口23、通信接口24、电源25以及通信总线26。
本领域技术人员可以理解,图5中示出的结构并不构成对电子设备的限定,可以包括比图示更多或更少的组件。
可以理解的是,如果上述实施例中的池化平台的数据处理方法以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、磁碟或者光盘等各种可以存储程序代码的介质。
进一步的,参见图6所示,本申请实施例还提供了一种非易失性可读存储介质,非易失性可读存储介质30上存储有计算机程序31,计算机程序31被处理器执行时实现如上述池化平台的数据处理方法的步骤。
本申请实施例非易失性可读存储介质的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
以上对本申请实施例所提供的一种池化平台的数据处理方法、装置、设备和非易失性可读存储介质进行了详细介绍。说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
以上对本申请所提供的一种池化平台的数据处理方法、装置、设备和非易失性可读存储介质进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
Claims (20)
- 一种池化平台的数据处理方法,其特征在于,包括:基于应用加速需求,对传输协议的自定义字段添加配置信息;其中,所述配置信息包括与所述应用加速需求相匹配的操作标识、地址信息和计算信息;接收主机服务器传输的应用数据;依据所述操作标识和所述计算信息对所述应用数据进行处理,将处理后的应用数据传输至所述地址信息指向的板卡,直至完成所述应用数据在池化平台中不同板卡上的处理,则结束操作。
- 根据权利要求1所述的池化平台的数据处理方法,其特征在于,在所述应用加速需求对应FPGA板卡上的多个Kernel模块,并且至少一个Kernel模块对应多指令计算的情况下,所述计算信息包括操作顺序指令和指令地址;其中,所述指令地址指向所述应用加速需求所需的指令;所述依据所述操作标识和所述计算信息对所述应用数据进行处理包括:所述多个Kernel模块按照所述操作顺序指令,依次调用所述指令地址指向的指令对所述应用数据进行处理。
- 根据权利要求1所述的池化平台的数据处理方法,其特征在于,在所述应用加速需求对应FPGA板卡上的多个Kernel模块,并且每个Kernel模块对应单指令计算的情况下,所述计算信息包括所述应用加速需求所需的指令;所述依据所述操作标识和所述计算信息对所述应用数据进行处理包括:所述多个Kernel模块按照各自对应的指令对所述应用数据进行处理。
- 根据权利要求1所述的池化平台的数据处理方法,其特征在于,在所述应用加速需求对应FPGA板卡上的一个用于内部计算的Kernel模块的情况下,所述计算信息包括指令地址;其中,所述指令地址指向所述应用加速需求所需的内部计算指令;所述依据所述操作标识和所述计算信息对所述应用数据进行处理包括:所述Kernel模块依据所述指令地址调用内部计算指令对所述应用数据进行处理。
- 根据权利要求3或4所述的池化平台的数据处理方法,其特征在于,所述依据所述操作标识和所述计算信息对所述应用数据进行处理包括:所述Kernel模块依据所述操作标识和所述计算信息对所述应用数据进行解压缩处理、内部计算处理以及加密处理。
- 根据权利要求1所述的池化平台的数据处理方法,其特征在于,在所述操作标识为远程直接数据存取操作标识的情况下,所述地址信息包括目标板卡ID、依据所述计算信息和所述远程直接数据存取操作标识确定的读写标识以及所述远程直接数据存取操作的传输长度。
- 根据权利要求1所述的池化平台的数据处理方法,其特征在于,在所述操作标识为流操作标识的情况下,所述地址信息包括目标板卡ID。
- 根据权利要求7所述的池化平台的数据处理方法,其特征在于,所述依据所述操作标识和所述计算信息对所述应用数据进行处理,将处理后的应用数据传输至所述地址信息指向的板卡包括:依据所述流操作标识和所述计算信息对所述应用数据进行处理,所述流操作标识为用于将应用数据在不同FPGA板卡之间进行传输的标识;将所述处理后的应用数据传输至所述地址信息指向的目标板卡ID。
- 根据权利要求1-6任意一项所述的池化平台的数据处理方法,其特征在于,所述配置信息还包括包序列号;在所述将处理后的应用数据传输至所述地址信息指向的板卡之后还包括:判断所述处理后的应用数据是否与所述包序列号匹配;在所述处理后的应用数据与所述包序列号不匹配的情况下,向所述主机服务器反馈携带有缺失序列号的丢包提示信息。
- 根据权利要求1所述的池化平台的数据处理方法,其特征在于,所述基于应用加速需求,对传输协议的自定义字段添加配置信息,包括:基于所述应用加速需求,对所述传输协议的自定义字段进行字节划分,并在字节划分后的自定义字段中添加所述配置信息。
- 根据权利要求10所述的池化平台的数据处理方法,其特征在于,所述池化平台包括多个FPGA板卡,所述主机服务器包括交换单元,所述基于应用加速需求,对传输协议的自定义字段添加配置信息,包括:从所述多个FPGA板卡中获取FPGA加速卡,所述FPGA加速卡为用于对所述应用数据进行加速处理的板卡;在所述FPGA加速卡和所述交换单元之间标记远程直接数据存取标识,所述远程直接数据存取标识对应于RDMA_Enhance传输协议;基于所述应用加速需求,按照所述RDMA_Enhance传输协议的自定义字段添加所述配置信息。
- 根据权利要求11所述的池化平台的数据处理方法,其特征在于,所述主机服务器包括寄存器,所述FPGA加速卡包括本地FPGA加速卡,所述基于所述应用加速需求,按照所述RDMA_Enhance传输协议的自定义字段添加所述配置信息,包括:基于所述应用加速需求,控制所述寄存器按照所述RDMA_Enhance传输协议的自定义字段在所述本地FPGA加速卡中添加所述配置信息。
- 根据权利要求12所述的池化平台的数据处理方法,其特征在于,所述FPGA加速卡还包括远端FPGA加速卡,所述基于所述应用加速需求,按照所述RDMA_Enhance传输协议的自定义字段添加所述配置信息,包括:基于所述应用加速需求,控制所述寄存器在所述本地FPGA加速卡中配置所述远端FPGA加速卡的RDMA_Enhance协议的配置信息。
- 根据权利要求11所述的池化平台的数据处理方法,其特征在于,所述FPGA加速卡包括PCIe DMA模块,所述接收主机服务器传输的应用数据,包括:通过所述PCIe DMA模块接收所述主机服务器传输的应用数据,所述PCIe DMA模块用于与所述主机服务器进行数据交互。
- 根据权利要求14所述的池化平台的数据处理方法,其特征在于,所述FPGA加速卡还包括内存模块,所述接收主机服务器传输的应用数据,包括:从所述主机服务器中将压缩后的应用数据通过所述PCIe DMA模块存储至所述内存模块。
- 根据权利要求15所述的池化平台的数据处理方法,其特征在于,所述FPGA加速卡还包括至少一个Kernel模块,所述依据所述操作标识和所述计算信息对所述应用数据进行处理,包括:所述Kernel模块从所述内存模块中读取压缩处理后的应用数据,并依据所述操作标识和所述计算信息对所述压缩处理的应用数据进行处理。
- 根据权利要求11所述的池化平台的数据处理方法,其特征在于,所述主机服务器包括协处理器,所述依据所述操作标识和所述计算信息对所述应用数据进行处理,包括:将所述FPGA加速卡作为所述协处理器;在待机状态下,控制所述协处理器依据所述操作标识和所述计算信息对所述应用数据进行处理。
- 一种池化平台的数据处理装置,其特征在于,包括添加单元、接收单元、处理单元和传输单元;所述添加单元,用于基于应用加速需求,对传输协议的自定义字段添加配置信息;其中,所述配置信息包括与所述应用加速需求相匹配的操作标识、地址信息和计算信息;所述接收单元,用于接收主机服务器传输的应用数据;所述处理单元,用于依据所述操作标识和所述计算信息对所述应用数据进行处理;所述传输单元,用于将处理后的应用数据传输至所述地址信息指向的板卡,直至完成所述应用数据在池化平台中不同板卡上的处理,则结束操作。
- 一种电子设备,其特征在于,包括:存储器,用于存储计算机程序;处理器,用于执行所述计算机程序以实现如权利要求1至17任意一项所述池化平台的数据处理方法的步骤。
- 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至17任意一项所述池化平台的数据处理方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210609570.5 | 2022-05-31 | ||
CN202210609570 | 2022-05-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023231330A1 true WO2023231330A1 (zh) | 2023-12-07 |
Family
ID=89026847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/134802 WO2023231330A1 (zh) | 2022-05-31 | 2022-11-28 | 一种池化平台的数据处理方法、装置、设备和介质 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023231330A1 (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170109290A1 (en) * | 2015-10-16 | 2017-04-20 | International Business Machines Corporation | Method to share a coherent accelerator context inside the kernel |
CN108776648A (zh) * | 2018-05-28 | 2018-11-09 | 郑州云海信息技术有限公司 | 数据传输方法、系统及fpga异构加速卡和存储介质 |
CN112241323A (zh) * | 2020-10-23 | 2021-01-19 | 浪潮(北京)电子信息产业有限公司 | 基于fpga服务器的计算资源发现及管理方法、系统 |
CN113900982A (zh) * | 2021-12-09 | 2022-01-07 | 苏州浪潮智能科技有限公司 | 一种分布式异构加速平台通信方法、系统、设备及介质 |
CN114003392A (zh) * | 2021-12-28 | 2022-02-01 | 苏州浪潮智能科技有限公司 | 一种数据加速计算方法及相关装置 |
CN115237500A (zh) * | 2022-07-29 | 2022-10-25 | 浪潮(北京)电子信息产业有限公司 | 一种池化平台的数据处理方法、装置、设备和介质 |
-
2022
- 2022-11-28 WO PCT/CN2022/134802 patent/WO2023231330A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170109290A1 (en) * | 2015-10-16 | 2017-04-20 | International Business Machines Corporation | Method to share a coherent accelerator context inside the kernel |
CN108776648A (zh) * | 2018-05-28 | 2018-11-09 | 郑州云海信息技术有限公司 | 数据传输方法、系统及fpga异构加速卡和存储介质 |
CN112241323A (zh) * | 2020-10-23 | 2021-01-19 | 浪潮(北京)电子信息产业有限公司 | 基于fpga服务器的计算资源发现及管理方法、系统 |
CN113900982A (zh) * | 2021-12-09 | 2022-01-07 | 苏州浪潮智能科技有限公司 | 一种分布式异构加速平台通信方法、系统、设备及介质 |
CN114003392A (zh) * | 2021-12-28 | 2022-02-01 | 苏州浪潮智能科技有限公司 | 一种数据加速计算方法及相关装置 |
CN115237500A (zh) * | 2022-07-29 | 2022-10-25 | 浪潮(北京)电子信息产业有限公司 | 一种池化平台的数据处理方法、装置、设备和介质 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11599490B1 (en) | Packet queueing for network device | |
US10331595B2 (en) | Collaborative hardware interaction by multiple entities using a shared queue | |
US9311110B2 (en) | Techniques to initialize from a remotely accessible storage device | |
US11956156B2 (en) | Dynamic offline end-to-end packet processing based on traffic class | |
US9280297B1 (en) | Transactional memory that supports a put with low priority ring command | |
US9678866B1 (en) | Transactional memory that supports put and get ring commands | |
WO2016115831A1 (zh) | 一种虚拟机容错的方法、装置及系统 | |
WO2021244194A1 (zh) | 寄存器的读写方法、芯片、子系统、寄存器组及终端 | |
US10409744B1 (en) | Low-latency wake-up in a peripheral device | |
US20150032835A1 (en) | Iwarp send with immediate data operations | |
US11868297B2 (en) | Far-end data migration device and method based on FPGA cloud platform | |
WO2020143237A1 (zh) | 一种dma控制器和异构加速系统 | |
US20240333766A1 (en) | Method and system for processing full-stack network card task based on fpga | |
WO2022032984A1 (zh) | 一种mqtt协议仿真方法及仿真设备 | |
CN115237500A (zh) | 一种池化平台的数据处理方法、装置、设备和介质 | |
US20230205715A1 (en) | Acceleration framework to chain ipu asic blocks | |
WO2022032990A1 (zh) | 一种命令信息传输方法、系统、装置及可读存储介质 | |
WO2024051122A1 (zh) | 一种PCIe中断处理方法、装置、设备及非易失性可读存储介质 | |
US10489322B2 (en) | Apparatus and method to improve performance in DMA transfer of data | |
CN113472523A (zh) | 用户态协议栈报文处理优化方法、系统、装置及存储介质 | |
WO2023231330A1 (zh) | 一种池化平台的数据处理方法、装置、设备和介质 | |
US9342313B2 (en) | Transactional memory that supports a get from one of a set of rings command | |
US20230153153A1 (en) | Task processing method and apparatus | |
JP6954535B2 (ja) | 通信装置 | |
US12056072B1 (en) | Low latency memory notification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22944631 Country of ref document: EP Kind code of ref document: A1 |