WO2023093065A1 - 数据传输方法、计算设备及计算系统 - Google Patents

数据传输方法、计算设备及计算系统 Download PDF

Info

Publication number
WO2023093065A1
WO2023093065A1 PCT/CN2022/104116 CN2022104116W WO2023093065A1 WO 2023093065 A1 WO2023093065 A1 WO 2023093065A1 CN 2022104116 W CN2022104116 W CN 2022104116W WO 2023093065 A1 WO2023093065 A1 WO 2023093065A1
Authority
WO
WIPO (PCT)
Prior art keywords
command
message
data
processor
computing node
Prior art date
Application number
PCT/CN2022/104116
Other languages
English (en)
French (fr)
Inventor
李思聪
勾文进
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023093065A1 publication Critical patent/WO2023093065A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Definitions

  • the present application relates to the field of computer technology, and in particular to a data transmission method, computing equipment and a computing system.
  • the inter-process communication includes inter-core communication within a single computing node in the HPC cluster and network communication between computing nodes.
  • both inter-core communication and network communication need to occupy computing resources, thereby reducing the efficiency of computing nodes in performing tasks.
  • the embodiment of the present application aims to provide a data transmission solution, which reduces the computing resources occupied during inter-process communication.
  • the first aspect of the present application provides a first computing device, including a first processor, at least one second processor, and a first network device, and the first network device is used to connect the first computing device to a second
  • the at least one second processor is used to run a first process and a second process
  • the first processor is used to: receive a first command sent by the first process, and the first command is used to transmit data to a target process; when it is determined according to the first command that the target process is the second process, execute the first command to send the data to the second process; determine the target process according to the first command If the target process is a third process located on the second computing device, the data is transmitted to the third process through the first network device.
  • the computing resource of the second processor is saved and the efficiency is improved.
  • the first process is used to call the message passing interface MPI to generate a command group, and send the command group to the first processor, and the first command is a command in the command group .
  • the length of the first data is less than a preset value
  • the first data is carried in the first command
  • the first processor is specifically configured to: generate A packet including the first data, and sending the packet to the second process.
  • the calculation resource of the second processor is saved and the efficiency is improved.
  • the first processor is specifically configured to: store the first command in a preset queue to wait for matching with the second command sent by the second process, and the second command It is used for receiving the first data from the first process, and sending the first data to the second process after the first command and the second command are successfully matched.
  • the computing resource of the second processor is saved and the efficiency is improved.
  • the length of the first data is greater than or equal to a preset value
  • the computing device further includes a memory access device
  • the first processor is specifically configured to: generate a message according to the first command , and send the message to the memory access device; the memory access device is configured to transmit the first data to the second process according to the message.
  • the calculation resource of the second processor is saved and the efficiency is improved.
  • the first processor is specifically configured to: generate a message according to the first command, and send the message to the first network device; the first network device is used to transmitting the first data to the third process through the second computing device according to the message.
  • the calculation resource of the second processor is saved and the efficiency is improved.
  • the command group includes a third command
  • the third command is used to receive the second data from the fourth process
  • the first processor is further configured to: when the fourth process is When the second process is the second process, receive the second data from the second process; when the fourth process is the third process, receive the data from the third process through the first network device second data; when the third command indicates not to process the second data, execute the third command to send the second data to the first process.
  • the calculation resources of the second processor are saved and the efficiency is improved.
  • the length of the second data is less than a preset value
  • the first processor is further configured to: when the third command indicates to process the second data, according to the first The three commands process the second data to obtain third data, and send the third data to the first process.
  • the calculation resource of the second processor is saved and the efficiency is improved.
  • the length of the second data is greater than or equal to a preset value
  • the first processor is further configured to: when the third command indicates to process the second data, instruct the The memory access device performs the following operations: process the second data according to the third command to obtain third data, and transmit the third data to the first process.
  • the calculation resource of the second processor is saved and the efficiency is improved.
  • the second aspect of the present application provides a data transmission method, the method is executed by a first computing node, the first computing node includes a first processor, at least one second processor, and a first network device, the first computing node A network device is used to connect the first computing node to the second computing node, the at least one second processor runs the first process and the second process, and the method includes: the first processor receives the first process A first command sent by a process, the first command is used to transmit the first data to the target process; when the first processor determines that the target process is the second process according to the first command, execute the first command to send the first data to the second process; when the first processor determines that the target process is a third process located on the second computing node according to the first command , then transmit the first data to the third process through the first network device.
  • the first process calls a message passing interface MPI to generate a command group, and sends the command group to the first processor, and the first command is a command in the command group.
  • the command includes a command type and a descriptor
  • the command type includes any of the following types: inline type, local direct memory access LDMA type, remote direct memory access RDMA type, the The format of the descriptor corresponds to the command type.
  • the length of the first data is less than a preset value
  • the first data is carried in the first command
  • the sending the first data to the second process includes: the The first processor generates a message including the first data according to the first command, and sends the message to the second process.
  • the sending the first data to the second process includes: the first processor stores the first command in a preset queue, waiting to be sent with the second process Match the second command, the second command is used to receive the first data from the first process, after the first processor successfully matches the first command and the second command, Send the first data to the second process.
  • the length of the first data is greater than or equal to a preset value
  • the computing node further includes a memory access device
  • the sending the first data to the second process includes: the first A processor generates a message according to the first command, and sends the message to the memory access device; the memory access device transmits the first data to the second process according to the message .
  • the first processor transmitting the first data to the third process through the first network device includes: the first processor generating a message according to the first command , and send the message to the first network device; the first network device instructs the second computing node to transmit the first data to the third process according to the message.
  • the command group includes a third command
  • the third command is used to receive the second data from the fourth process
  • the method further includes: when the fourth process is the second data process, the first processor receives the second data from the second process; when the fourth process is the third process, the first processor receives the second data from the first network device through the first network device
  • the third process receives the second data; in the case where the third command indicates not to process the second data, the first processor sends the second data according to the third command to the first process.
  • the length of the second data is less than a preset value
  • the method further includes: in the case that the third command indicates to process the second data, the first processor Process the second data according to the third command to obtain third data, and send the third data to the first process.
  • the length of the second data is greater than or equal to a preset value
  • the method further includes: when the third command indicates to process the second data, the first The processor instructs the memory access device to perform the following operations: process the second data according to the third command to obtain third data, and transmit the third data to the first process.
  • the third aspect of the present application provides a chip, including a processing unit and an interface, the interface is used to receive a command sent by the first process in the first computing node, the command is used to transmit data to the target process, and the first
  • the computing node includes the chip; the interface is further configured to: when the processing unit determines that the target process is a second process in the first computing node according to the command, send the data to the second process ; when it is determined according to the command that the target process is the third process in the second computing node, transmitting the data to the third process through the first network device in the first computing node.
  • the fourth aspect of the present application provides a computing system, including a first computing node and a second computing node, the first computing node includes a first processor, a second processor, and a first network device, and the first computing node
  • the first network device is connected to the second computing node
  • the second processor is used to run the first process and the second process
  • the second computing node is used to run the third process
  • the first processing The device is used to: receive a command sent by the first process, and the command is used to transmit data to the target process; when the target process is determined to be the second process according to the command, send the data to the second process Two processes; when it is determined according to the command that the target process is the third process, the data is transmitted to the third process through the first network device.
  • FIG. 1 is a schematic diagram of a computing cluster provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a structure of a computing node provided by an embodiment of the present application
  • FIG. 3 is a flowchart of a data transmission method provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of interprocess communication corresponding to the MPI-Bcast interface in an embodiment
  • Fig. 5 is the flow chart of the large packet message inter-process communication method corresponding to the MPI-Bcast interface provided by the embodiment of the present application;
  • FIG. 6 is a flowchart of a data transmission method provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of inter-process communication in an embodiment
  • Fig. 8 is a schematic diagram of inter-process communication corresponding to the MPI-Reduce interface in an embodiment
  • FIG. 9 is a flow chart of a small packet message inter-process communication method corresponding to the MPI-Reduce interface provided by the embodiment of the present application.
  • Fig. 10 is a schematic diagram of inter-process communication corresponding to the MPI-Reduce interface in an embodiment
  • FIG. 11 is a flow chart of a large packet message inter-process communication method corresponding to the MPI-Reduce interface provided by the embodiment of the present application;
  • FIG. 12 is a structural diagram of a chip provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a computing cluster provided by an embodiment of the present application.
  • the computing cluster includes multiple computing nodes, such as computing node C0, computing node C1, computing node Cm, and so on.
  • Computing nodes are, for example, application servers, distributed file system servers, and the like.
  • HPC High Performance Computing
  • AI artificial intelligence
  • each node runs multiple processes in parallel, for example, computing node C0 runs P0, P1...Pi in parallel at the same time
  • Multiple processes such as Pi+1, Pi+2...Pj are running in parallel in the computing node C1, and multiple processes Pj+1, Pj+2...Pn are running in parallel in the computing node Cm.
  • each process can perform inter-process communication with other processes in its own node, or can perform inter-process communication with processes of other nodes.
  • the process P0 in the computing node C0 performs network communication with processes such as the process Pi of the node and the process Pi+1 in the computing node C1.
  • the computing node C0 runs an application program (Application, APP) to generate computing tasks, and the computing tasks can be executed by multiple processes.
  • the application program can be, for example, a weather forecast program, a molecular dynamics simulation program, etc. that require a large amount of calculation.
  • Each process can implement inter-process communication through the message-passing interface (Message-Passing Interface, MPI) provided by the operating system to the application.
  • MPI includes a point-to-point message passing interface, a collective communications interface, and a unilateral communication interface.
  • the bilateral communication interface includes, for example, a sending (Send) interface and a receiving (Receive, Recv) interface.
  • the incoming parameters of the Send interface include, for example, the current storage address information (such as the memory address) of the message, the identification of the process receiving the message, the message identification, etc.
  • the incoming parameters of the Send interface may also include the communication space (Communicator) logo.
  • the communication space is preset by an application in the computing device, and includes a group of processes that can communicate with each other, and the communication space identifier is the group identifier of the process group.
  • the incoming parameters of the Recv interface include, for example, the address information (such as a memory address) used to store the received message, the identifier of the process sending the message, and the message identifier.
  • the incoming parameters of the Recv interface may also include a communication space logo, etc.
  • the process P0 can send a message to the process Pk by calling the Send interface, and the process Pk can be a process in the computing node C0, or can be a process in other computing nodes.
  • the process P0 provides incoming parameters to the Send interface, and the incoming parameters may include the address Addr1 for storing the message, the identifier of the process Pk receiving the message, and the message identifier.
  • the CPU in the computing node C0 executes the Send interface, it constructs a corresponding message according to whether the process Pk is an intra-node process, and sends the message to the process Pk.
  • the message includes the identifier of the process P0 sending the message, Information such as the identifier of the process Pk receiving the message, the message identifier, and the like.
  • the process Pk can call the Recv interface to receive the message sent by the process P0.
  • the process Pk provides incoming parameters of the Recv interface, which may include the address Addr2 for storing the message, the identifier of the process P0 sending the message, the message identifier, and the like.
  • the process Pk is a process in the computing node C1
  • the CPU in the computing node C1 executes the Recv interface, it determines the message that matches the call of the Recv interface among the multiple received messages, that is, determines the message in the message
  • the identifier of the process P0 sending the message, the identifier of the process Pk receiving the message, and the message identifier all match the Recv interface call, obtain the message from the message, and store the message in the address Addr2.
  • the collective communication interface specifically includes multiple types of interfaces, such as one-to-many, many-to-one, many-to-many, and so on.
  • the one-to-many type interface is used to send a message generated or obtained by one process to all processes in the communication space
  • the one-to-many type interface includes, for example, MPI-Bcast interface, MPI-Scatter interface and other interfaces.
  • the many-to-one interface is used to enable a process to receive messages generated or acquired by all processes in the communication space.
  • the many-to-one interface includes, for example, MPI-Gather interface, MPI-Reduce interface and other interfaces.
  • the many-to-many interface is used to enable all processes in the communication space to receive the messages generated or obtained by the processes respectively, and the many-to-many interface includes, for example, MPI-Allgather interface, MPI-Allreduce interface and other interfaces.
  • the MPI-Bcast interface is a one-to-many collective communication interface.
  • the MPI-Bcast interface is used to send messages from one process to all processes in the communication space where the process resides.
  • the incoming parameters of the MPI-Bcast interface include, for example, current storage address information of the message to be sent, message identifier, root process (root process) identifier, and communication space identifier and other parameters.
  • the address information includes, for example, a first address and a message length.
  • the root process refers to a process corresponding to "one" in a one-to-many type interface or a process corresponding to "one" in a many-to-one type interface. Specifically, the process P0 in FIG.
  • the CPU in the computing node C0 can execute the MPI-Bcast interface, construct a corresponding message according to whether the process Pk is an intra-node process, and send the message to each process Pk, where the process Pk can refer to processes P0 to Pj any process.
  • the MPI-Reduce interface is a many-to-one collective communication interface.
  • the MPI-Reduce interface is used to make a process receive multiple messages from multiple processes, sum the multiple messages and its own messages, and then send the message sum to all processes in the communication space.
  • the incoming parameters of the MPI-Reduce interface include, for example, address information for storing messages, message identifiers, root process identifiers, and communication space identifiers.
  • the process P0 in FIG. 1 can call the MPI-Reduce interface to obtain multiple messages from multiple processes (such as P0 to Pj) in the communication space, sum the multiple messages, and sum the messages and respectively Send to P1 to Pj.
  • the CPU in computing node C0 can execute the MPI-Reduce interface, and match the messages received from processes P1 to Pj with the calls to the Recv interface in computing node C0.
  • the multiple messages of the process P0 and the messages of the process P0 are summed, and multiple messages sent to the processes P1 to Pj are generated, and the multiple messages are sent to the processes P1 to Pj respectively, so as to send the message sum to each process respectively.
  • the embodiment of the present application provides an inter-process communication scheme.
  • a message processor Message Process Unit, MPU
  • MPU Message Process Unit
  • FIG. 2 is a schematic diagram of a structure of a computing node provided by an embodiment of the present application.
  • the computing node C0 may include a CPU 21 , an MPU 22 , a memory access device 23 and a network device 24 .
  • the CPU 21 includes multiple processor cores (cores), such as cores cr0, cores cr1 .
  • cores processor cores
  • a computing node may include multiple CPUs, and each CPU may include multiple processor cores, and multiple processor cores in multiple CPUs may run multiple CPUs in parallel. process.
  • the MPU22 may include an ASIC chip (Application Specific Integrated Circuit, ASIC) or a field programmable gate array chip (Field Programmable Gate Array, FPGA), and execute inter-process communication through the operation of the ASIC chip or FPGA chip .
  • the MPU22 may include a microcontroller unit (Microcontroller Unit, MCU) and a storage unit, and the code stored in the storage unit is executed by the MCU to perform inter-process communication, and the parts for storing data in the MPU22 are hereinafter referred to as Collectively referred to as storage units.
  • the MPU 22 may include a scheduling unit 221 and a bilateral communication unit 222 , which will be described in detail below.
  • the memory access device 23 shown in FIG. 2 can store or read data into the memory, and in some implementations, the memory access device 23 can also have simple computing capabilities.
  • the memory access device 23 may be a local direct memory access (Local Direct Memory Access, LDMA) device.
  • the LDMA device is a hardware control circuit for accessing data to memory. It is a special processor for direct data transmission. It can send a bus request signal to the CPU to take over the control of the bus and enter the DMA operation mode. Send address information to address the memory, so as to store or read data to the memory.
  • LDMA Local Direct Memory Access
  • the network device shown in FIG. 2 is, for example, a remote direct memory access (Remote Direct Memory Access, RDMA) device, which may specifically include an RDMA network interface controller (RDMA-aware Network Interface Controller, RNIC) and the like.
  • RDMA is a technology for direct remote memory access, which can directly and quickly migrate data from one computing node to the memory of another remote computing node, reducing the consumption of CPU involved in the data transmission process.
  • the MPU22 is connected to the CPU21, the memory access device 23 and the network device 24 through a bus.
  • the CPU21 After a certain process (or application) in the computing node C0 invokes any of the above MPIs, the CPU21 generates a command group (Command Group) according to the MPI, and sends the command group to the MPU22 for inter-process communication through the MPU22 .
  • the CPU 21 After the process P0 in the computing node C0 invokes the Bcast interface, the CPU 21 generates a plurality of commands (message sending commands) for sending messages according to the Bcast interface, and sends the plurality of message sending commands to the MPU22 for respectively sending data Sent to multiple other processes in the process space.
  • the scheduling unit 221 in the MPU22 can be used to sequentially execute the commands in the command group, and perform data transmission with the CPU21, the memory access device 23 or the network device 24.
  • the bilateral communication unit 222 can be used to match the command for receiving a message (message receiving command) received by the MPU 22 with the information of the message to be received, so as to complete the reception of the message.
  • FIG. 3 is a flow chart of a data transmission method provided by an embodiment of the present application.
  • the method may be executed by computing node C0, for example, and the method includes:
  • Step S301 the process P0 calls MPI to generate a command group, and the commands in the command group are used to transmit data to the target process;
  • Step S302 the process P0 sends the generated command group to the MPU22;
  • Step S303 when the MPU22 determines that the target process is the process P1 according to the first command in the command group, then execute the first command to send data to the process P1;
  • step S304 when the MPU 22 determines that the target process is the process P2 located in the computing node C1 according to the second command in the command group, the data is transmitted to the process P2 through the network device 24 (shown schematically as RDMA in the figure).
  • step S301 the process P0 invokes MPI to generate a command group.
  • an MPU adaptation program module (hereinafter referred to as the MPU adaptation module) can be installed in the computing node C0, so as to generate a supply Commands executed by the MPU.
  • the process PO in the computing node C0 corresponds to the core cr0, and the process PO can call a certain MPI (such as the MPI-Bcast interface) according to the MPI provided to the application in the computing node C0.
  • the core cr0 executes the interface call, it can run the MPU adaptation module, thereby generating a command group corresponding to the MPI, which will be executed by the MPU.
  • the arrangement sequence of the multiple commands in the command group indicates the execution sequence of each command.
  • the commands in the command group can have a data structure as shown in Table 1:
  • the command type may be any type in RDMA, LDMA or inline.
  • the RDMA type indicates that the command is used for information exchange with the RDMA device, and its descriptor corresponds to the format of the RDMA message.
  • the MPU adaptation module can determine whether the target process is a process in other nodes according to the incoming parameters of the process P0 to MPI. When it is determined that the target process is a process in other nodes, the command corresponding to the target process will be Type is set to RDMA type.
  • the descriptor may include a sending message descriptor (Send element, SE) and a receiving message descriptor (Receive element, RE).
  • the sending message descriptor in the RDMA type command may include the current storage address of the message to be sent, the process ID of sending the message, the ID of the computing node receiving the message, the process ID of receiving the message, the message ID, etc., and the receiving message descriptor may Including the storage address used to store the received message, the identifier of the computing node sending the message, the process identifier of the sending message, the process identifier of the receiving message, the message identifier, etc.
  • the LDMA type indicates that the command is used for information exchange with the LDMA device, and its descriptor corresponds to the format of the LDMA message.
  • the MPU adaptation module can determine whether the target process is a process in other nodes according to the incoming parameters of the process P0 to MPI, when it is determined that the target process is a process in the computing node C0, and the size of the target message is greater than or equal to the preset value ( That is, in the case of a large packet message), the command type of the command corresponding to the target process is set to the LDMA type.
  • the sending message descriptor in the LDMA type command may include the current storage address of the message to be sent, the process ID of sending the message, the process ID of receiving the message, the message ID, etc.
  • the receiving message descriptor may include the Storage address, process ID of sending message, process ID of receiving message, message ID, etc.
  • the inline type indicates that the command is used for information exchange between CPU cores corresponding to another process of this node.
  • the MPU adaptation module can determine whether the target process is a process in other nodes according to the input parameters of the process P0 to MPI. message), set the command type of the command corresponding to the target process to the inline type.
  • the descriptor in the command of the inline type directly includes the message to be sent to another process.
  • the operation type includes a common type (Common) and a calculation type (Calculate, Calc).
  • the normal type indicates that no computation is performed on the message indicated by the descriptor
  • the calculation type indicates that calculation is performed on the message indicated by the descriptor.
  • the operation type field in the command may further include a subfield, which is used to indicate the specific calculation to be performed, such as addition calculation, transposition processing, and the like.
  • the command type is LDMA type or RDMA type
  • the operation type is calculation type
  • the message in the command is a large packet message
  • this command indicates that LDMA performs calculation processing.
  • the command type is RDMA type
  • the operation type is calculation type
  • the message in the command is a small packet message
  • this command indicates that the MPU performs calculation processing.
  • RSVD is a reserved field.
  • the MPU adaptation module generates the command group provided to the MPU as described above, uses a unified protocol for intra-node communication and inter-node communication, and indicates whether the command is used for intra-node communication or inter-node communication through the command type, Therefore, the MPU does not need to switch protocols according to the node where the communication target process is located.
  • Fig. 4 is a schematic diagram of inter-process communication corresponding to the MPI-Bcast interface in an embodiment.
  • process P0 in computing node C0 can send message A1 to process P1 and process P2 by calling the MPI-Bcast interface, wherein process P1 is also a process in computing node C0, and process P2 is a process in computing node C1. process.
  • the incoming parameters of the MPI-Bcast interface include, for example, the current storage address addr1 of the message A1 to be sent, the root process ID, the message ID, and the communication space ID.
  • the process P0 provides the above-mentioned incoming parameters to the MPI-Bcast interface when calling the MPI-Bcast interface.
  • the root process ID is the ID of the process PO
  • the communication space ID is the group ID of the process group including the process PO, the process P1 and the process P2.
  • the core cr0 corresponding to the process P0 executes the MPU adaptation module after executing the call of the MPI-Bcast interface by the process PO, and first determines whether the message is a small packet message or a large packet message according to the message length in the address information, assuming that the message in this example is For a small packet message, the core cr0 reads the message A1 from the address addr1.
  • the core cr0 determines that the process P1 is the process in the computing node C0 where the process P0 is located and corresponds to the core cr1, and the process P2 is the process in the computing node C1, that is, the process between the process P1 and the process P0
  • the communication between processes P0 and P2 is inter-node communication.
  • the core cr0 can generate the command group shown in Table 2 according to the above information:
  • the command group includes two commands of the inline type command and the RDMA type command arranged in sequence, and the sequence of the two commands indicates their execution sequence.
  • the operation type is normal type
  • the descriptor is SE1.
  • SE1 may include the identifier of the process P1 receiving the message, the identifier of the process P0 sending the message, the message identifier, the message A1 to be sent, and the like.
  • SE2 in the RDMA type command and SE2 may include the identifier of computing node C1 receiving the message, the identifier of process P2 receiving the message, the identifier of process P0 sending the message, the identifier of message A2, message A1, and so on.
  • the identifier of the computing node C1 is, for example, a network card identifier of the computing node C1.
  • step S302 the process P0 sends the generated command group to the MPU22.
  • the process P0 instructs the core cr0 to send the command group to the MPU22 after generating the above command group by calling the MPI-Bcast interface.
  • the core cr0 sends the command group to the MPU 22 , it can acquire other tasks (such as other processes) and execute the new tasks. That is to say, through the inter-process communication operation performed by the MPU, CPU resources are saved and system efficiency is improved.
  • step S303 when the MPU 22 determines that the target process is the process P1 according to the first command in the command group, it executes the first command to send data to the process P1.
  • this message sending command is stored in the preset queue of MPU22, to wait for the match with message receiving command, described matching comprises the process identification of sending message to SE1 in the message sending command and RE1 in the message receiving command, Matching of the process ID receiving the message and the message ID.
  • the process P1 can call MPI (such as the Recv interface) to receive the message A1 in SE1.
  • the incoming parameters of the called Recv interface include, for example, the address information addr2 for storing the received message A1, the identifier of the sending process P0, the message identifier, and the like.
  • the core cr1 corresponding to the process P1 generates a corresponding message receiving command after executing the call to the Recv interface.
  • the message receiving command may include RE1, and the RE1 includes the identification of the process P0 sending the message and the identification of the process P1 receiving the message. , message ID, etc. Afterwards, the core cr1 sends the generated message receiving command to the MPU22.
  • the scheduling unit 221 in the MPU22 sends the message receiving command to the bilateral communication unit 222 after receiving the message receiving command, and the bilateral communication unit 222 determines whether the preset queue in the storage unit of the MPU22 is stored with the message receiving command A matching message sending command, if not, the bilateral communication unit 222 determines again whether there is a matching message sending command stored in the preset queue after waiting for a preset time.
  • the bilateral communication unit 222 can notify the dispatching unit 221 when the above-mentioned message sending command matches the above-mentioned message receiving command, and the dispatching unit 221 obtains the message A1 from SE1 and generates an inline for directly sending to the core cr0 corresponding to the process P1.
  • the inline message includes, for example, the identifier of the process PO that sends the message, the identifier of the process P1 that receives the message, and the message A1 to be sent.
  • the MPU22 sends the inline message to the core cr1.
  • the core cr1 can store the message in the received message into the message storage address Addr2 in RE1 in the message receiving command, so as to complete the intra-node communication between the process P0 and the process P1, that is, the data from the process P0 to the process P1 send.
  • the MPU22 Carry out the communication between the process P0 and the process P1 through the process described above, the MPU22 replaces the CPU to carry out the matching of the message sending command and the message receiving command, and after the matching is successful, an inline message is generated and sent to the core cr1, reducing the CPU resource usage.
  • the MPU22 is not limited to sending the small packet message sent by the process P0 to the process P1 through the above process, and can also send the large packet message sent by the process P0 to the target process by executing the first command. This process will be referred to below in FIG. 5 A detailed description.
  • step S304 when the MPU 22 determines that the target process is the process P2 located in the computing node C1 according to the second command in the command group, it sends the data to the network device 24 (such as an RDMA device) to transfer the data to the process P2.
  • the network device 24 such as an RDMA device
  • MPU22 executes the second command in the command group. After determining that the target process (process P2) is the process in computing node C1, the scheduling unit 221 generates the RDMA for sending to the RDMA device according to SE2 in the command. message.
  • the RDMA message includes, for example, the message A1, the identifier of the process P0 that sent the message A1, the identifier of the computing node C1 that received the message A1, the identifier of the process P2 that received the message A1, and the like.
  • the MPU22 sends the RDMA packet to the RDMA device of the computing node C0.
  • the RDMA device of the computing node C0 After receiving the RDMA message, the RDMA device of the computing node C0 generates a message to be sent to the RDMA device of the computing node C1, and the message includes the message A1, the identification of the process P0 that sent the message, and the calculation of the received message.
  • the RDMA device of computing node C1 can send the message to the MPU in computing node C1, and the bilateral communication unit in the MPU of computing node C1 compares the message with the message sent by process P2 After the matching of the received command is successful, the message can be sent to the process P2, thereby completing the communication between the process P0 and the process P2.
  • the MPU can notify the MPU adaptation module, and the adaptation module will handle the error.
  • FIG. 5 is a flow chart of a large packet message inter-process communication method corresponding to the MPI-Bcast interface provided by the embodiment of the present application. This method can be executed by computing node C0
  • step S501 the process P0 invokes MPI to generate a command group.
  • process P0 can send message A1 to process P1 and process P2 by calling the MPI-Bcast interface.
  • the incoming parameters of the MPI-Bcast interface include, for example, the current memory address addr1 of the message A1 to be sent, the ID of the root process P0, the message ID, and the communication space ID.
  • the core cr0 corresponding to the process P0 executes the MPU adaptation module after executing the call of the MPI-Bcast interface by the process PO, and first determines whether the message A1 to be sent is a small packet message or a large packet message according to the message length in the address information, assuming that in this example The medium message is a large package message.
  • the core cr0 determines that the process P1 is the process in the computing node C0 where the process P0 is located and corresponds to the core cr1, and the process P2 is the process in the computing node C1, that is, the process between the process P1 and the process P0
  • the communication between processes P0 and P2 is inter-node communication.
  • the core cr0 can generate the command group shown in Table 3 according to the above information:
  • the command group includes two commands of an LDMA type command and an RDMA type command arranged in sequence, and the sequence of the two commands indicates their execution sequence.
  • the operation type is common type
  • the descriptor is SE3, which may include the identification of the process P1 receiving the message, the identification of the process P0 sending the message, the identification of the message A1, and the storage of the message A1 to be sent address addr1 etc.
  • the RDMA type command includes SE4, and SE4 may include the identifier of the process P2 receiving the message, the identifier of the process P0 sending the message, the identifier of the message A1, the storage address addr1 of the message A1 to be sent, and the like.
  • step S502 the process P0 sends the command groups shown in Table 3 to the MPU22.
  • step S302 the description of step S302 above, and details are not repeated here.
  • step S503 the MPU 22 generates an LDMA message according to the first command in the command group.
  • the MPU22 After the MPU22 receives the command group of Table 3, it first executes the first command in the command group, and after the MPU22 executes the first command, the message sending command is stored in the preset memory unit of the MPU22. In the queue, waiting for matching with the message receiving command, the matching includes matching the process ID of sending a message, the process ID of receiving a message, and the message ID of SE3 in the message sending command and RE in the message receiving command.
  • the process P1 can call MPI (such as the Recv interface) to receive the message A1 in SE1.
  • the incoming parameters of the Recv interface include, for example, the memory address addr2 for storing the message A1, the identifier of the process P0 sending the message, the identifier of the message A1, and the like.
  • the core cr1 corresponding to the process P1 executes the call to the Recv interface, it generates a corresponding message receiving command.
  • the message receiving command may include RE, and the RE includes the identifier of the process P0 that sends the message and the identifier of the process P1 that receives the message. , the identifier of the message A1, the address addr2 for storing the message A1, and the like.
  • the core cr1 sends the generated message receiving command to the MPU22.
  • the scheduling unit 221 in the MPU22 sends the message receiving command to the bilateral communication unit 222 after receiving the message receiving command, and the bilateral communication unit 222 determines whether there is a message sending matching the message receiving command stored in the preset queue of the MPU22. Order.
  • the bilateral communication unit 222 notifies the scheduling unit 221 when the above-mentioned message sending command and the above-mentioned message receiving command match successfully, and the scheduling unit 221 generates an LDMA message for sending to the LDMA device according to the above-mentioned message sending command and the above-mentioned message receiving command.
  • the LDMA message includes, for example, the identifier of the process PO that sends the message, the identifier of the process P1 that receives the message, the current storage address addr1 of the message A1, and the address addr2 that will be used to store the message A1.
  • step S504 the MPU 22 sends the LDMA message to the LDMA device.
  • step S505 the LDMA device sends data to the process P1 according to the LDMA message.
  • the LDMA device After receiving the LDMA message, the LDMA device obtains the message A1 from the address addr1 and stores the message A1 in the address addr2, thereby completing the data transmission between the process P0 and the process P1. Carry out the communication between the process P0 and the process P1 through the process described above, replace the SE in the message sending command and the RE in the message receiving command by the MPU22 to match the message, generate the LDMA message, and instruct the LDMA device to carry out the message in The transfer between different addresses reduces the occupation of CPU resources.
  • step S506 the MPU 22 executes the second command in the command group (that is, the RDMA type message sending command), and generates an RDMA packet according to the command.
  • step S508 the MPU22 sends the RDMA message to the RDMA device, so as to send the data to the process P2 in the computing node C1.
  • steps S506 and S507 reference may be made to the above description of step S304, which will not be repeated here.
  • FIG. 6 is a flow chart of a data transmission method provided by an embodiment of the present application.
  • the method may be executed by computing node C0, for example, and the method includes:
  • step S601 process P0 sends data to MPU22;
  • step S602 the RDMA device sends the second data to the MPU22;
  • Step S603 the process P1 calls the MPI to generate a command, and the command is used to receive data from the target process;
  • Step S604 the process P1 sends the generated command to the MPU22;
  • step S605 the MPU22 executes a command to send data to the process P1.
  • Fig. 7 is a schematic diagram of inter-process communication in an embodiment.
  • the process P1 in the computing node C0 can receive the message A1 from the process P0 by calling the Recv interface, and the process P1 can also receive the message A2 from the process P2 by calling the Recv interface.
  • the process P0 and the process P1 are processes in the computing node C0
  • the process P2 is a process in the computing node C1.
  • step S601 the process P0 sends data to the MPU22.
  • Process P0 can call the MPI interface to send message A1 to process P1 through the method shown in Figure 3, so that as described above, a message sending command can be generated, and process PO can send the message A1 by sending the message sending command to MPU22 to MPU22.
  • sending the message A1 to the MPU22 includes sending the message A1 directly to the MPU22 when the message A1 is a small packet message, or sending the storage address of the message A1 to the MPU22 when the message A1 is a large packet message.
  • step S602 the RDMA device sends data to the MPU22.
  • the process P2 in the computing node C1 can send the message A2 to the RDMA device in the computing node C0 through the method shown in FIG. 3 , so that the RDMA device can send the message A2 to the MPU22.
  • step S603 the process P1 invokes MPI to generate a command for receiving data from the target process.
  • step S604 process P1 sends the generated command to MPU22
  • Process P1 can receive message A1 or message A2 by calling the Recv interface.
  • the incoming parameters provided to the Recv interface include, for example, the address information used to store the received message (such as the memory address addr1), the identification of the process sending the message (process P0 or Process P2), message ID (A1 or A2).
  • the core cr1 corresponding to the process P1 executes the MPU adaptation module after executing the call of the Recv interface by the process P1.
  • the core cr1 For the call of the Recv interface used to receive A1, first determine whether the message is a small packet message or a large packet message according to the message length in the address information , assuming that the message is a small packet message in this example, the core cr1 can generate the following commands:
  • RE1 includes an address addr1 for storing the received message, an identifier of the process P0 sending the message, an identifier of the process P1 receiving the message, and an identifier of the message A1.
  • RE2 includes the address addr2 for storing the received message, the identifier of the process P2 sending the message, the identifier of the process P1 receiving the message, and the identifier of the message A2.
  • step S605 MPU 22 executes a command to send data to process P1.
  • the MPU22 After receiving the above command, the MPU22 matches the information of the received data (message A1 or message A2) according to the identifier of the process that sends the message in RE1 or RE2, the identifier of the process that receives the message, and the message identifier. After that, send the data to process P1. For example, for the above command for receiving message A1, MPU22 may generate a message including message A1, send the message to core cr1, and core cr1 stores the message in address addr1 in the command, thereby sending message A1 to process P1. For the above command for receiving message A2, MPU22 may instruct the LDMA device to transfer message A2 from the current storage address to address addr2 in the command.
  • Fig. 8 is a schematic diagram of inter-process communication corresponding to the MPI-Reduce interface in an embodiment.
  • the inter-process communication corresponding to the MPI-Reduce interface includes two stages: phase 1 and phase 2.
  • phase 1 other processes in the communication space (such as process P1, process P2, and process P3) send their respective messages (A1, A2, and A3) to the root process (such as process P0), and root process P0 receives multiple
  • the multiple messages (A1, A2 and A3) and the message A0 of the process P0 are added and calculated to obtain the sum of all the messages;
  • the process P0 sends the message and B2 to the process P1, process P2 and process P0 respectively.
  • P3 The process P0 in the computing node C0 can perform the above process by calling the MPI-Reduce interface.
  • FIG. 9 is a flow chart of a small packet message inter-process communication method corresponding to the MPI-Reduce interface provided by the embodiment of the present application. This process is executed, for example, by computing node C0.
  • step S901 the process P0 invokes the MPI-Reduce interface to generate a command group.
  • the process PO in the computing node C0 can call the MPI-Reduce interface according to the MPI provided by the system to the application. After the core cr0 executes the interface call, it can run the MPU adaptation module to generate the corresponding MPI-Reduce interface. command group.
  • the incoming parameters of the MPI-Reduce interface include, for example, the current storage address addr1 of the message A0 of the process P0, the address addr2 to be used to store the message sum, the message identifier, the identifier of the root process P0, and the communication Parameters such as space identification, wherein the communication space identification is used to indicate a process group composed of process PO, process P1, process P2 and process P3.
  • the core cr0 executes the call of the process PO to the MPI-Reduce interface, it first determines whether the message to be processed is a small packet message or a large packet message according to the message length in the address information, assuming that the message is a small packet message in this example.
  • the core cr0 determines that the process P1 is a process in the computing node C0 where the process P0 is located and corresponds to the core cr1, and the process P2 and process P3 are processes in the computing node C1, that is, the nodes between the process P1 and the process P0 Intra-node communication between process P0, process P2 and process P3 is inter-node communication.
  • the core cr0 can generate the command group shown in Table 4 according to the above information:
  • the command group includes 5 commands arranged in order, and the arrangement order of the 5 commands indicates their execution order.
  • the first command includes RE2 and RE3
  • RE2 may include the identification of the process P2 that sent the message A2, the identification of the process P0 that received the message A2, the identification of the message A2, and the storage address addr2 for storing the received message A2, etc.
  • RE3 may include the identity of the computing node C1 that sent the message A3, the identity of the process P3 that sent the message A3, the identity of the process P0 that received the message A3, the identity of the message A3, the storage address addr2 for storing the received message A3, etc. .
  • the command type "RDMA” in the first command is used to indicate the receipt of message A2 and message A3 from the RDMA device, and the operation type “Calc” is used to indicate the calculation of the two received messages, where the subfield of the operation type "SUM " can be used to indicate that the calculation is specifically an additive calculation.
  • the MPU22 executes the first command, it can determine whether the MPU22 performs the calculation or the LDMA device performs the calculation according to the size of the message to be received.
  • the second command includes RE1, which may include the identifier of the process P1 that sent the message A1, the identifier of the process P0 that received the message A1, the identifier of the message A1, the identifier of the sum of the messages A0, A2 and A3, etc.
  • the command type "inline” in the second command is used to indicate that the core cr1 corresponding to the process P1 receives the message A1, and the subfield "SUM" of the operation type "Calc” is used to indicate that the MPU22 compares the message A1 received from the core cr1 with the message A0, and the message sum B1 of the message A2 and the message A3 are added to obtain the message sum B2 of four messages (A0, A1, A2, A3).
  • step S902 the process P0 sends the command group to the MPU22.
  • step S903 the RDMA device sends an RDMA packet to the MPU22, and the RDMA packet includes the message A2 sent by the process P2 in the computing node C1 to the process P0.
  • the process P2 can call the Send interface or the MPI-Bcast interface to send the message A2 to the process P0.
  • the core cr2 corresponding to the process P2 runs the interface, similar to the above, it can generate an RDMA-type command and send it to the MPU, so that the MPU generates an RDMA message according to the command, and sends the RDMA message to the computing node C1.
  • the RDMA packet includes the identifier of the process P2 sending the message A2, the identifier of the process P1 receiving the message A2, the message A2, the identifier of the message A2, and the like.
  • the RDMA device of the computing node C1 After receiving the RDMA message, the RDMA device of the computing node C1 generates a message to be sent to the RDMA device of the computing node C0.
  • the message includes the identification of the process P2 that sent the message A2, and the ID of the process P0 that received the message A2. ID, message A2, ID of message A2, etc.
  • the RDMA device of computing node C0 After receiving the message sent by the RDMA device of computing node C1, the RDMA device of computing node C0 generates an RDMA message sent to MPU22 when it determines that the size of message A2 is smaller than the preset value. It includes the identifier of the process P2 that sent the message A2, the identifier of the process P0 that received the message A2, the message A2, the identifier of the message A2, and the like.
  • the MPU22 After receiving the RDMA message, the MPU22 stores the RDMA message in a preset queue of the storage unit of the MPU22, waiting for a match with the message receiving command.
  • step S904 the RDMA device sends an RDMA message to the MPU22, and the RDMA message includes information about the message A3 sent to the process P0 by the process P3 in the computing node C1.
  • the RDMA message includes information about the message A3 sent to the process P0 by the process P3 in the computing node C1.
  • step S905 the MPU 22 combines the messages in the two RDMA packets according to the first command (ie, the first command) in the command group to obtain the first result B1.
  • the MPU22 After receiving the command group shown in Table 4, the MPU22 first executes the first command.
  • the scheduling unit 221 in the MPU 22 sends the first command to the bilateral communication unit 222 after executing the first command.
  • the bilateral communication unit 222 determines in the preset queue whether an RDMA message matching RE2 and RE3 in the command has been received, wherein the matching includes RE2 or RE3 and the process identifier of the sending message in the RDMA message, Matching of the process ID receiving the message and the message ID.
  • step S906 the process P1 sends an inline message sending command to the MPU22.
  • the message sending command may include the identifier of the process P1 sending the message A1, the identifier of the process P0 receiving the message A1, the identifier of the message A1, and the message A1.
  • Process P1 can call MPI (eg Send interface) to send message A1 to process P0.
  • the incoming parameters of the Send interface include, for example, the current storage address information of the message A1, the identifier of the process P0 receiving the message A1, the identifier of the message A1, and the like.
  • the core cr1 corresponding to the process P1 reads the message A1 and generates an inline message sending command, which may include the identification of the process P1 sending the message A1 and the process receiving the message A1 The identifier of P0, the identifier of message A1, the message A1, and so on.
  • the core cr1 sends the generated message sending command to the MPU22.
  • the MPU22 stores the message sending command in the preset queue of the storage unit in the MPU22, waiting for matching with the message receiving command.
  • step S907 the MPU22 combines the first result B1 with the message A1 in the message sending command and the message A0 of the process P0 according to the second command in the command group (i.e. the inline message receiving command) and the received inline message sending command,
  • the second result B2 is obtained.
  • the MPU 22 executes the second command in Table 4, and the scheduling unit 221 in the MPU 22 sends the second command to the bilateral communication unit 222 after executing the second command.
  • the bilateral communication unit 222 determines whether an inline message sending command matching the message receiving command has been received in the preset queue, wherein the matching includes sending messages in RE1 in the message receiving command and SE in the message sending command The process ID of the message, the process ID of the receiving message, and the matching of the message ID.
  • step S908 MPU 22 sends the second result to process P0. Specifically, the MPU22 generates an inline message including the second result above, and sends the inline message to the core cr0. After receiving the inline message, the core cr0 can obtain the second result, and store the second result in the address addr2 according to the address addr2 in RE1, so that the process P0 can obtain the second result.
  • step S909 the process P1 sends an inline message receiving command to the MPU22.
  • step S910 the MPU22 executes the third command in Table 4, and sends the second result B2 to the process P1, specifically, after the inline type message receiving command matches the third command successfully, an inline message is generated,
  • the inline message includes the second result B2, and the inline message is sent to the core Cr1 corresponding to the process P1, and the core Cr1 stores the second result B2 in the address of the inline message receiving command.
  • step S911 the MPU 22 generates two RDMA packets for sending to the RDMA device according to the RDMA type message sending command (ie, the fourth command and the fifth command) in the command group.
  • step S912 the MPU 22 sends the generated two RDMA packets to the RDMA device, so as to send them to the processes P2 and P3 respectively.
  • step S911-step S915 reference may be made to the description of FIG. 3 above, and details are not repeated here.
  • Fig. 10 is a schematic diagram of inter-process communication corresponding to the MPI-Reduce interface in an embodiment.
  • other processes in the communication space such as process P1 and process P2 respectively send their respective messages (A1, A2) to the root process (such as process P0), and root process P0 receives multiple After each message, add and calculate message A1, message A2 and message A0 of process P0 to obtain the sum B1 of all messages; in phase 2, process P0 sends the message and B1 to process P1 and process P2 respectively.
  • the process P0 in the computing node C0 can perform the above process by calling the MPI-Reduce interface.
  • the inter-process communication diagram shown in FIG. 10 corresponds to the inter-process communication flow shown in FIG. 11 .
  • FIG. 11 is a flow chart of a large packet message inter-process communication method corresponding to the MPI-Reduce interface provided by the embodiment of the present application. This process is executed, for example, by computing node C0.
  • step S1101 the process P0 invokes the MPI-Reduce interface to generate a command group.
  • the process PO in the computing node C0 can call the MPI-Reduce interface according to the MPI provided by the system to the application. After the core cr0 executes the interface call, it can run the MPU adaptation module to generate the corresponding MPI-Reduce interface. command group.
  • the incoming parameters of the MPI-Reduce interface include, for example, the current storage address addr1 of the message A0 of the process P0, the address addr2 to be used to store the message sum, the message identifier, the identifier of the root process P0, and the communication Parameters such as space identification.
  • the communication space identifier is used to indicate a process group composed of process PO, process P1 and process P2.
  • the core cr0 executes the MPU adaptation module after the process PO calls the MPI-Reduce interface. First, it determines whether the message to be processed is a small packet message or a large packet message according to the message length in the address information. Assume that the message is a large packet message in this example. information.
  • the core cr0 determines that the process P1 is a process in the computing node C0 where the process P0 is located and corresponds to the core cr1, and the process P2 is a process in the computing node C1, that is, the process P1 and the process P0 are Intra-node communication, between process P0 and process P2 is inter-node communication. Afterwards, the core cr0 can generate the command group shown in Table 5 according to the above information:
  • the command group includes 4 commands arranged in order, and the arrangement order of these 4 commands indicates their execution order.
  • the first command includes RE2, and RE2 may include the identifier of the process P2 that sent the message A2, the identifier of the process P0 that received the message A2, the identifier of the message A2, and the storage address addr2 for storing the received message A2, etc.
  • the command type "RDMA" in the first command indicates receiving the message A2 from the RDMA device, and the operation type "normal" indicates only receiving the message without processing the message.
  • the second command includes RE1, which may include the identification of the process P1 that sent the message A1, the identification of the process P0 that received the message A1, the identification of the message A1, the address addr2 used to store the message sum, and the current storage address of the message A0 addr1 etc.
  • the command type "LDMA" in the second command indicates that the message A1 is received from the LDMA device, and the operation type "Calc” is used to indicate that the LDMA device performs addition calculation on the received message A1, message A0 and message A2 stored in the address addr2, Get the message sum B1 of the three messages and store the message and B1 into address addr2 in RE1.
  • step S1102 the process P0 sends the command group to the MPU22.
  • step S1103 the RDMA device sends an RDMA packet to the MPU22, and the RDMA packet includes information related to the message A2 sent to the process P0 by the process P2 in the computing node C1.
  • This step is different from step S903 in FIG. 9 in that, since the message A2 is a large packet message, the RDMA message does not include the message A2 itself.
  • step S1104 the MPU 22 receives the message according to the RDMA type message reception command in the command group.
  • the MPU22 After receiving the RDMA message, the MPU22 can match the message receiving command (that is, the first command) in the command group with the RDMA message. After the matching is successful, the MPU22 instructs the RDMA device in the RDMA message to store the message A2 in the memory address Addr2 for storing the received message in the message receiving command.
  • the message receiving command that is, the first command
  • the MPU22 instructs the RDMA device in the RDMA message to store the message A2 in the memory address Addr2 for storing the received message in the message receiving command.
  • step S1105 the process P1 sends an LDMA type message sending command to the MPU22.
  • the process P1 can call MPI (such as the Send interface) to send the message A1 to the process P0.
  • the incoming parameters of the Send interface include, for example, the current storage address addr4 of the message A1, the identifier of the process P0 receiving the message A1, the identifier of the message A1, and the like.
  • the core cr1 corresponding to the process P1 generates a message sending command after executing the call to the Send interface, and the message sending command may include the identification of the process P1 sending the message A1, the identification of the process P0 receiving the message A1, and the identification of the message A1 , the current storage address addr4 of the message A1, and so on.
  • the core cr1 sends the generated message sending command to the MPU22.
  • the MPU22 stores the message sending command in a storage unit in the MPU22 to wait for a match.
  • step S1106 the MPU 22 generates an LDMA message according to the LDMA type message receiving command (that is, the second command) and the LDMA type message sending command in the command group.
  • the MPU 22 then executes the second command in Table 5, and the scheduling unit 221 in the MPU 22 sends the second command to the bilateral communication unit 222 after executing the second command.
  • the bilateral communication unit 222 determines whether a message sending command matching the message receiving command has been received in the preset queue of the storage unit, wherein the matching includes RE in the message receiving command and SE in the message sending command The process ID of the sending message, the process ID of the receiving message, and the matching of the message ID.
  • an LDMA message is generated according to the operation type "Calc" to instruct the LDMA device to add the message A0, message A1 and message A2.
  • the LDMA message may include the storage address addr1 of the message A0, the storage address addr2 of the message A2, the storage address addr4 of the message A1, and an instruction to add the message A0, the message A1 and the message A2.
  • step S1107 the MPU22 sends an LDMA message to the LDMA device.
  • step S1108 the LDMA device combines multiple messages according to the LDMA message to obtain a first result B1.
  • the LDMA device After receiving the LDMA message, the LDMA device reads the message A0 from the address addr1, reads the message A2 from the address addr2, and reads the message A1 from the address addr4 according to the instruction of the LDMA message, and calculates A0 and A1 Sum B1 with A2, and store the message and B1 into address addr2 in the 2nd command.
  • the process P0 can obtain the first result B1.
  • step S1109 the process P1 sends an LDMA type message reception command to the MPU22.
  • step S1110 MPU22 generates an LDMA message after the LDMA type message receiving command and the LDMA type message sending command (i.e. the 3rd command) in the command group are matched successfully, wherein the current storage address of the message in the LDMA message can be It is the above address addr2.
  • step S1111 the MPU 22 sends the generated LDMA message to the LDMA device.
  • the LDMA device sends the first result B1 to the process P1 according to the LDMA message, specifically, transfers the first result B1 from the address addr2 to the address in the LDMA type message receiving command.
  • step S1113 MPU22 generates an RDMA packet according to the RDMA type message sending command (that is, the fourth command) in the command group, and in step S1114, MPU22 sends the RDMA packet to the RDMA device.
  • the RDMA type message sending command that is, the fourth command
  • step S1114 MPU22 sends the RDMA packet to the RDMA device.
  • FIG. 12 is a structural diagram of a chip provided by an embodiment of the present application, the chip includes a processing unit 121 and an interface 122,
  • the interface 122 is configured to receive a command sent by a first process in a first computing node, the command is used to transmit data to a target process, and the first computing node includes the chip;
  • the interface 122 is further configured to: when the processing unit 121 determines that the target process is a second process in the first computing node according to the command, send the data to the second process; When the command determines that the target process is the third process in the second computing node, the data is transmitted to the third process through the first network device in the first computing node.
  • the aforementioned program can be stored in a computer-readable storage medium.
  • the program executes all or part of the steps comprising the above-mentioned method embodiments; and the aforementioned storage medium includes: read-only memory (read-only memory, ROM), random-access memory (random-access memory, RAM)
  • ROM read-only memory
  • RAM random-access memory
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (e.g.
  • the computer readable storage medium may be a computer Any available media that can be accessed or a data storage device such as a server, data center, etc. integrated with one or more available media.
  • the available media can be magnetic media, (for example, floppy disks, hard disks, tapes), optical media, or Semiconductor media (such as solid state disk (Solid State Disk, SSD), etc.
  • the disclosed devices and methods can be implemented in other ways without exceeding the scope of the present application.
  • the above-described embodiments are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined Or it can be integrated into another system, or some features can be ignored, or not implemented.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may also be distributed to multiple network units .
  • Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Abstract

本申请实施例提供一种数据传输方法、计算设备和计算系统,所述计算设备包括第一处理器、第二处理器、和第一网络设备,第一网络设备用于连接计算设备至目标计算设备,第二处理器用于运行第一进程及第二进程;第一处理器用于:接收第一进程发送的命令,命令用于传输数据至目标进程;根据所述命令确定目标进程为第二进程时,则执行第一命令,以发送数据至第二进程;根据命令确定目标进程为位于第二计算设备的第三进程时,则通过第一网络设备将数据传输至第三进程。

Description

数据传输方法、计算设备及计算系统
本申请要求于2021年11月25日提交中国专利局、申请号为202111413487.2、申请名称为“一种通信加速的方法及装置”的中国专利申请、以及于2022年3月9日提交中国专利局、申请号为202210234247.4、申请名称为“数据传输方法、计算设备及计算系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及一种数据传输方法、计算设备及计算系统。
背景技术
在高性能计算(High-Performance Computing,HPC)场景中,HPC集群在执行应用的任务时,需要进行频繁的进程间通信。其中,该进程间通信包含HPC集群中单个计算节点内的核间通信和计算节点之间的网络通信。在相关技术中,核间通信和网络通信都需要占用计算资源,从而降低了计算节点执行任务的效率。
发明内容
本申请实施例旨在提供一种数据传输方案,减少了在进行进程间通信时占用的计算资源。
本申请第一方面提供一种第一计算设备,包括第一处理器、至少一个第二处理器、和第一网络设备,所述第一网络设备用于连接所述第一计算设备至第二计算设备,所述至少一个第二处理器用于运行第一进程及第二进程;所述第一处理器用于:接收所述第一进程发送的第一命令,所述第命令用于传输数据至目标进程;根据所述第一命令确定所述目标进程为所述第二进程时,则执行所述第一命令,以发送所述数据至所述第二进程;根据所述第一命令确定所述目标进程为位于所述第二计算设备的第三进程时,则通过所述第一网络设备将所述数据传输至所述第三进程。
通过由第一处理器接收命令,并根据命令将数据发送给目标进程,节省了第二处理器的计算资源,提高了效率。
在一种实施方式中,所述第一进程用于调用消息传递接口MPI生成命令组,将所述命令组发送给所述第一处理器,所述第一命令为所述命令组中的命令。
在一种实施方式中,所述第一数据的长度小于预设值,所述第一数据携带在所述第一命令中,所述第一处理器具体用于:根据所述第一命令生成包括所述第一数据的报文,并将所述报文发送至所述第二进程。
通过由第一处理器根据命令生成报文并发送给目标进程,节省了第二处理器的计算资源,提高了效率。
在一种实施方式中,所述第一处理器具体用于:在预设队列中存储所述第一命令,以等待与所述第二进程发送的第二命令进行匹配,所述第二命令用于从所述第一进程接收所述第一数据,在对所述第一命令和所述第二命令匹配成功之后,将所述第一数据发送给所 述第二进程。
通过在预设队列中存储消息发送命令,以等待与消息接收命令进行匹配,节省了第二处理器的计算资源,提高了效率。
在一种实施方式中,所述第一数据的长度大于或者等于预设值,所述计算设备还包括内存访问设备,所述第一处理器具体用于:根据所述第一命令生成报文,并将所述报文发送给所述内存访问设备;所述内存访问设备用于根据所述报文将所述第一数据传输至所述第二进程。
通过由第一处理器生成报文并发送给内存访问设备以将数据发送给目标进程,节省了第二处理器的计算资源,提高了效率。
在一种实施方式中,所述第一处理器具体用于:根据所述第一命令生成报文,并将所述报文发送给所述第一网络设备;所述第一网络设备用于根据所述报文将所述第一数据通过所述第二计算设备传输至所述第三进程。
通过由第一处理器生成报文并发送给网络设备以将数据发送给目标进程,节省了第二处理器的计算资源,提高了效率。
在一种实施方式中,所述命令组中包括第三命令,所述第三命令用于从第四进程接收第二数据,所述第一处理器还用于:在所述第四进程为所述第二进程时,从所述第二进程接收所述第二数据;在所述第四进程为所述第三进程时,通过所述第一网络设备从所述第三进程接收所述第二数据;在所述第三命令指示不对所述第二数据进行处理时,执行所述第三命令,以将所述第二数据发送给所述第一进程。
通过由第一处理器接收目标进程发送的数据并将数据发送给第一进程,节省了第二处理器的计算资源,提高了效率。
在一种实施方式中,所述第二数据的长度小于预设值,所述第一处理器还用于:在所述第三命令指示对所述第二数据进行处理时,根据所述第三命令对所述第二数据进行处理,得到第三数据,将所述第三数据发送给所述第一进程。
通过由第一处理器对接收的数据进行处理之后再发送给第一进程,节省了第二处理器的计算资源,提高了效率。
在一种实施方式中,所述第二数据的长度大于或者等于预设值,所述第一处理器还用于:在所述第三命令指示对所述第二数据进行处理时,指示所述内存访问设备进行以下操作:根据所述第三命令对所述第二数据进行处理,得到第三数据,将所述第三数据传输至所述第一进程。
通过由第一处理器指示内存访问设备对接收的数据进行处理之后再发送给第一进程,节省了第二处理器的计算资源,提高了效率。
本申请第二方面提供一种数据传输方法,所述方法由第一计算节点执行,所述第一计算节点包括第一处理器、至少一个第二处理器、和第一网络设备,所述第一网络设备用于连接所述第一计算节点至第二计算节点,所述至少一个第二处理器运行第一进程及第二进程,所述方法包括:所述第一处理器接收所述第一进程发送的第一命令,所述第一命令用于传输第一数据至目标进程;所述第一处理器根据所述第一命令确定所述目标进程为所述第二进程时,则执行所述第一命令,以发送所述第一数据至所述第二进程;所述第一处理器根据所述第一命令确定所述目标进程为位于所述第二计算节点的第三进程时,则通过所 述第一网络设备将所述第一数据传输至所述第三进程。
在一种实施方式中,所述第一进程调用消息传递接口MPI生成命令组,将所述命令组发送给所述第一处理器,所述第一命令为所述命令组中的命令。
在一种实施方式中,所述命令中包括命令类型和描述符,所述命令类型包括以下任一种类型:内联类型、本地直接内存访问LDMA类型、远端直接内存访问RDMA类型,所述描述符的格式与所述命令类型对应。
在一种实施方式中,所述第一数据的长度小于预设值,所述第一数据携带在所述第一命令中,所述发送所述第一数据至所述第二进程包括:所述第一处理器根据所述第一命令生成包括所述第一数据的报文,并将所述报文发送至所述第二进程。
在一种实施方式中,所述发送所述第一数据至所述第二进程包括:所述第一处理器在预设队列中存储所述第一命令,以等待与所述第二进程发送的第二命令进行匹配,所述第二命令用于从所述第一进程接收所述第一数据,所述第一处理器在对所述第一命令和所述第二命令匹配成功之后,将所述第一数据发送给所述第二进程。
在一种实施方式中,所述第一数据的长度大于或者等于预设值,所述计算节点还包括内存访问设备,所述发送所述第一数据至所述第二进程包括:所述第一处理器根据所述第一命令生成报文,并将所述报文发送给所述内存访问设备;所述内存访问设备根据所述报文将所述第一数据传输至所述第二进程。
在一种实施方式中,所述第一处理器通过所述第一网络设备将所述第一数据传输至所述第三进程包括:所述第一处理器根据所述第一命令生成报文,并将所述报文发送给所述第一网络设备;所述第一网络设备根据所述报文指示所述第二计算节点将所述第一数据传输至所述第三进程。
在一种实施方式中,所述命令组中包括第三命令,所述第三命令用于从第四进程接收第二数据,所述方法还包括:在所述第四进程为所述第二进程时,所述第一处理器从所述第二进程接收所述第二数据;在所述第四进程为所述第三进程时,所述第一处理器通过所述第一网络设备从所述第三进程接收所述第二数据;在所述第三命令指示不对所述第二数据进行处理的情况中,所述第一处理器根据所述第三命令将所述第二数据发送给所述第一进程。
在一种实施方式中,所述第二数据的长度小于预设值,所述方法还包括:在所述第三命令指示对所述第二数据进行处理的情况中,所述第一处理器根据所述第三命令对所述第二数据进行处理,得到第三数据,将所述第三数据发送给所述第一进程。
在一种实施方式中,所述第二数据的长度大于或者等于预设值,所述方法还包括:在所述第三命令指示对所述第二数据进行处理的情况中,所述第一处理器指示所述内存访问设备进行以下操作:根据所述第三命令对所述第二数据进行处理,得到第三数据,将所述第三数据传输至所述第一进程。
本申请第三方面提供一种芯片,包括处理单元和接口,所述接口用于接收第一计算节点中的第一进程发送的命令,所述命令用于传输数据至目标进程,所述第一计算节点包括所述芯片;所述接口还用于:在所述处理单元根据所述命令确定所述目标进程为第一计算节点中的第二进程时,发送所述数据至所述第二进程;在根据所述命令确定所述目标进程为第二计算节点中的第三进程时,则通过所述第一计算节点中的第一网络设备将所述数据 传输至所述第三进程。
本申请第四方面提供一种计算系统,包括第一计算节点和第二计算节点,所述第一计算节点包括第一处理器、第二处理器和第一网络设备,所述第一计算节点通过所述第一网络设备与所述第二计算节点连接,所述第二处理器用于运行第一进程和第二进程,所述第二计算节点用于运行第三进程,所述第一处理器用于:接收所述第一进程发送的命令,所述命令用于传输数据至目标进程;根据所述命令确定所述目标进程为所述第二进程时,则发送所述数据至所述第二进程;根据所述命令确定所述目标进程为所述第三进程时,则通过所述第一网络设备将所述数据传输至所述第三进程。
附图说明
通过结合附图描述本申请实施例,可以使得本申请实施例更加清楚:
图1为本申请实施例提供的计算集群的示意图;
图2为本申请实施例提供的计算节点的结构的示意图;
图3为本申请实施例提供的一种数据传输方法的流程图;
图4为一实施例中MPI-Bcast接口对应的进程间通信的示意图;
图5为本申请实施例提供的MPI-Bcast接口对应的大包消息进程间通信方法流程图;
图6为本申请实施例提供的一种数据传输方法的流程图;
图7为一实施例中进程间通信的示意图;
图8为一实施例中MPI-Reduce接口对应的进程间通信的示意图;
图9为本申请实施例提供的MPI-Reduce接口对应的小包消息进程间通信方法的流程图;
图10为一实施例中MPI-Reduce接口对应的进程间通信的示意图;
图11为本申请实施例提供的MPI-Reduce接口对应的大包消息进程间通信方法流程图;
图12为本申请实施例提供的一种芯片的架构图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
图1为本申请实施例提供的计算集群的示意图。如图1所示,计算集群中包括多个计算节点,如计算节点C0、计算节点C1、计算节点Cm等。计算节点例如为应用服务器、分布式文件系统服务器等。在例如高性能计算(High Performance Computing,HPC)场景中(例如人工智能(Artificial Intelligenc,AI)场景),每个节点并行运行多个进程,例如,计算节点C0中同时并行运行P0、P1…Pi等多个进程,计算节点C1中同时并行运行Pi+1、Pi+2…Pj等多个进程,计算节点Cm中同时并行运行Pj+1、Pj+2…Pn等多个进程。另外,如图1中所示,每个进程可以与本节点内的其他进程进行进程间通信、或者可以与其他节点的进程进行进程间通信。例如,计算节点C0中的进程P0与本节点的进程Pi、计算节点C1中的进程Pi+1等进程进行网络通信。
在相关技术中,计算节点C0运行应用程序(Application,APP)而产生计算任务,计算任务可以由多个进程执行。所述应用程序例如可以为气象预报程序、分子动力学模拟程序等需要大量计算的应用程序。各个进程可通过操作系统提供给应用的消息传递接口 (Message-Passing Interface,MPI)来实施进程间通信。MPI包括双边通信接口(point-to-point message passing interface)、集合通信接口(collective communications interface)、单边通信接口等。
具体是,双边通信接口例如包括发送(Send)接口和接收(Receive,Recv)接口。其中,Send接口的传入参数中例如包括消息的当前存储地址信息(例如内存地址)、接收消息的进程的标识、消息标识等,可选的,Send接口的传入参数中还可以包括通信空间(Communicator)标识。所述通信空间由计算设备中的应用预先设置,包括可相互通信的一组进程,通信空间标识即进程组的组标识。Recv接口的传入参数中例如包括用于存储接收的消息的地址信息(例如内存地址)、发送消息的进程的标识、消息标识,可选的,Recv接口的传入参数中还可以包括通信空间标识等。
例如,进程P0可通过调用Send接口向进程Pk发送一个消息,进程Pk可以是计算节点C0中的一个进程,或者可以是其他计算节点中的进程。进程P0提供对该Send接口的传入参数,该传入参数可包括存储该消息的地址Addr1、接收消息的进程Pk的标识、消息标识。计算节点C0中的CPU在执行该Send接口之后,根据进程Pk是否是节点内进程构建相应的报文,并将该报文发送给进程Pk,该报文中包括发送消息的进程P0的标识、接收消息的进程Pk的标识、消息标识等信息。
进程Pk可调用Recv接口来接收进程P0发送的消息。进程Pk提供该Recv接口的传入参数,该传入参数中可包括将用于存储该消息的地址Addr2、发送消息的进程P0的标识、消息标识等。假设进程Pk为计算节点C1中的进程,计算节点C1中的CPU在执行该Recv接口之后,在接收的多个报文中确定与该Recv接口调用相匹配的报文,即确定该报文中的发送消息的进程P0的标识、接收消息的进程Pk的标识、消息标识都与该Recv接口调用都匹配,从该报文中获取消息,并将消息存储到地址Addr2中。
集合通信接口具体包括多种类型的接口,例如一对多类型、多对一类型、多对多类型等。其中,一对多类型接口用于使得将一个进程生成或者获取的消息发送给通信空间中的全部进程,一对多类型接口例如包括MPI-Bcast接口、MPI-Scatter接口等接口。多对一类型接口用于使得一个进程接收通信空间中的全部进程各自生成或者获取的消息,多对一类型接口例如包括MPI-Gather接口、MPI-Reduce接口等接口。多对多类型接口用于使得通信空间中的全部进程分别接收到该全部进程分别生成或者获取的消息,多对多类型接口例如包括MPI-Allgather接口、MPI-Allreduce接口等接口。
例如,MPI-Bcast接口为一种一对多类型的集合通信接口。MPI-Bcast接口用于从一个进程将消息发送给该进程所在通信空间中的全部进程。MPI-Bcast接口的传入参数例如包括待发送消息的当前存储地址信息、消息标识、根进程(root process)标识、以及通信空间标识等参数。其中,地址信息例如包括首地址和消息长度。根进程是指在一对多类型接口中的对应于“一”的进程或者在多对一类型接口中对应于“一”的进程。具体是,图1中的进程P0可调用MPI-Bcast接口以用于将进程P0生成的消息发送给通信空间中的多个进程(例如进程P0至Pj)。计算节点C0中的CPU可执行该MPI-Bcast接口,根据进程Pk是否是节点内进程构建相应的报文,并将该报文发送给各个进程Pk,其中,进程Pk可以指进程P0至Pj中的任一进程。
例如,MPI-Reduce接口为一种多对一类型的集合通信接口。MPI-Reduce接口用于使 得一个进程从多个进程接收到多个消息,并对该多个消息及自身的消息求和之后将消息和发送给通信空间中的全部进程。MPI-Reduce接口的传入参数例如包括将用于存储消息和的地址信息、消息标识、根进程标识、以及通信空间标识等参数。具体是,图1中的进程P0可调用MPI-Reduce接口以用于从通信空间中的多个进程(例如P0至Pj)获取多个消息,对该多个消息求和,并将消息和分别发送给P1至Pj。计算节点C0中的CPU可执行该MPI-Reduce接口,将从进程P1至Pj接收的报文与计算节点C0中对Recv接口的调用进行匹配,在匹配通过之后,对接收的多个报文中的多个消息及进程P0自身的消息进行求和,并生成发送给进程P1至Pj的多个报文,将多个报文分别发送给进程P1至Pj,以将消息和分别发送各个进程。
从上面所述的过程可以看出,在进行进程间通信的过程中,计算节点的CPU需要执行多种操作,如生成报文、接收报文、对报文进行匹配、对接收的消息进行计算等。如此,频繁的进程间通信会大量占用CPU资源,降低计算节点的运行效率。
本申请实施例提供一种进程间通信方案,通过在计算节点中设置消息处理器(Message Process Unit,MPU),由MPU处理节点内的进程间通信和节点间的进程间通信,从而节省计算节点的CPU资源,提高计算节点的计算效率。
图2为本申请实施例提供的计算节点的结构的示意图。如图2所示,以计算节点C0为例,计算节点C0中可包括CPU21、MPU22、内存访问设备23和网络设备24。其中,CPU21中包括多个处理器核(core),例如核cr0、核cr1…核crn等,CPU21通过多个核分别运行多个进程,从而可以并行运行多个进程。可以理解,图2中虽然示出单个CPU,实际中,计算节点可以包括多个CPU,每个CPU都可以包括多个处理器核,多个CPU中的多个处理器核可以并行运行多个进程。本申请实施例提供的方法同样适用于一个计算节点中不同CPU运行的多个进程间的通信。在一种实施方式中,MPU22可以包括专用集成电路芯片(Application Specific Integrated Circuit,ASIC)或现场可编程门阵列芯片(Field Programmable Gate Array,FPGA),通过ASIC芯片或FPGA芯片的运行执行进程间通信。在另一种实施方式中,MPU22可以包括微控制单元(Microcontroller Unit,MCU)和存储单元,通过由MCU执行存储单元中存储的代码进行进程间通信,下文中将MPU22中用于存储数据的部件统称为存储单元。MPU22中根据功能进行划分可包括调度单元221和双边通信单元222,调度单元221和双边通信单元222将在下文中详细描述。
图2中所示的内存访问设备23可进行数据到内存的存储或者读取,在一些实施方式中,内存访问设备23还可以具有简单运算能力。例如,内存访问设备23可以为本地直接内存访问(Local Direct Memory Access,LDMA)设备。LDMA设备是进行数据到内存的存取的硬件控制电路,是一种实现直接数据传输的专用处理器,其可以向CPU发出总线请求信号,从而接管对总线的控制权,进入DMA操作方式,可以发出地址信息,对内存进行寻址,从而进行对数据到内存的存储或者读取。
图2中所示的网络设备例如为远程直接内存访问(RemoteDirect Memory Access,RDMA)设备,具体可以包括RDMA网络接口控制器(RDMA-aware Network Interface Controller,RNIC)等。RDMA是一种直接进行远程内存存取的技术,可以直接将数据从一个计算节点快速迁移到另一个远程计算节点的存储器中,减少了CPU参与数据传输过程的消耗。
MPU22通过总线与CPU21、内存访问设备23和网络设备24相互连接。在计算节点C0 中的某个进程(或者应用)调用上述任一MPI之后,CPU21根据该MPI生成命令组(Command Group),并将该命令组发送给MPU22,以用于通过MPU22实现进程间通信。例如,计算节点C0中的进程P0在调用Bcast接口之后,CPU21根据Bcast接口生成多个用于发送消息的命令(消息发送命令),将多个消息发送命令发送给MPU22,以用于分别将数据发送给进程空间中的其他多个进程。MPU22中的调度单元221可以用于顺序执行命令组中的命令,进行与CPU21、内存访问设备23或网络设备24之间的数据传输。双边通信单元222可用于对MPU22接收的用于接收消息的命令(消息接收命令)和待接收的消息的信息进行匹配,以完成对消息的接收。
图3为本申请实施例提供的一种数据传输方法的流程图,所述方法例如可由计算节点C0执行,所述方法包括:
步骤S301,进程P0调用MPI生成命令组,所述命令组中的命令用于传输数据至目标进程;
步骤S302,进程P0将生成的命令组发送给MPU22;
步骤S303,MPU22根据命令组中的第一命令确定目标进程为进程P1时,则执行第一命令,以发送数据至进程P1;
步骤S304,MPU22根据命令组中的第二命令确定目标进程为位于计算节点C1的进程P2时,通过网络设备24(图中以RDMA示意示出)将数据传输至进程P2。
下文将详细描述图3所示的各个步骤。
首先,在步骤S301,进程P0调用MPI生成命令组。
在本申请实施例中,为了适应于对MPU硬件的添加,可在计算节点C0中安装MPU适配程序模块(下文简称为MPU适配模块),以用于响应于进程对MPI的调用生成供MPU执行的命令。例如,计算节点C0中的进程PO对应于核cr0,进程PO可根据计算节点C0中提供给应用的MPI进行对某个MPI(例如MPI-Bcast接口)的调用。核cr0在执行到该接口调用之后,可运行MPU适配模块,从而生成与MPI对应的命令组,该命令组将由MPU执行。其中,在命令组包括多个命令的情况中,该命令组的多个命令的排列顺序指示各个命令的执行顺序。
所述命令组中的命令可具有如表1所示的数据结构:
命令类型 操作类型 RSVD 描述符
表1
其中,命令类型可以为RDMA、LDMA或者内联(inline)中的任一类型。其中,RDMA类型指示该命令用于进行与RDMA设备之间的信息交互,其描述符对应于RDMA报文的格式。MPU适配模块可根据进程P0对MPI的传入参数,确定目标进程是否为其他节点中的进程,当确定目标进程是其他节点中的进程的情况中,将与该目标进程对应的命令的命令类型设置为RDMA类型。其中,描述符可包括发送消息描述符(Send element,SE)和接收消息描述符(Receive element,RE)。RDMA类型命令中的发送消息描述符中可包括待发送消息的当前存储地址、发送消息的进程标识、接收消息的计算节点的标识、接收消息的进程标识、消息标识等,接收消息描述符中可包括将用于存储接收消息的存储地址、发送消息的计算节点的标识、发送消息的进程标识、接收消息的进程标识、消息标识等。
LDMA类型指示该命令用于进行与LDMA设备之间的信息交互,其描述符对应于LDMA报文的格式。MPU适配模块可根据进程P0对MPI的传入参数,确定目标进程是否为其他节点中的进程,当确定目标进程是计算节点C0中的进程、且目标消息的大小大于或等于预设值(即大包消息)的情况中,将与该目标进程对应的命令的命令类型设置为LDMA类型。LDMA类型命令中的发送消息描述符中可包括待发送消息的当前存储地址、发送消息的进程标识、接收消息的进程标识、消息标识等,接收消息描述符中可包括将用于存储接收消息的存储地址、发送消息的进程标识、接收消息的进程标识、消息标识等。
内联类型指示该命令用于进行与本节点另一进程对应的CPU核之间的信息交互。MPU适配模块可根据进程P0对MPI的传入参数,确定目标进程是否为其他节点中的进程,当确定目标进程是计算节点C0中的进程、且目标消息的大小小于预设值(即小包消息)的情况中,将与该目标进程对应的命令的命令类型设置为内联类型。内联类型的命令中的描述符中直接包括将要发送给另一进程的消息。
所述操作类型包括普通类型(Common)和计算类型(Calculate,Calc)。其中,普通类型指示不对描述符指示的消息进行计算处理,计算类型指示对描述符指示的消息进行计算处理。在操作类型为Calc类型的情况中,在命令中的操作类型字段中还可以包括子字段,该子字段用于指示将要进行的计算处理具体是什么计算,例如加法计算、转置处理等。当命令类型为内联类型、操作类型为计算类型时,该条命令指示由MPU进行计算处理。当命令类型为LDMA类型或者RDMA类型、操作类型为计算类型、且命令中的消息为大包消息时,该条命令指示由LDMA进行计算处理。当命令类型为RDMA类型、操作类型为计算类型、且命令中的消息为小包消息时,该条命令指示由MPU进行计算处理。
另外,RSVD为保留字段。MPU适配模块通过如上所述生成提供给MPU的命令组,对于节点内通信和节点间通信使用统一的协议,通过命令类型指示该命令是用于进行节点内通信还是用于进行节点间通信,从而使得MPU无需根据通信目标进程所在节点进行协议切换。
图4为一实施例中MPI-Bcast接口对应的进程间通信的示意图。如图4所示,计算节点C0中的进程P0可通过调用MPI-Bcast接口将消息A1发送给进程P1和进程P2,其中,进程P1也是计算节点C0中的进程,进程P2为计算节点C1中的进程。
如上文所述,MPI-Bcast接口的传入参数例如包括待发送消息A1的当前存储地址addr1、根进程标识、消息标识、以及通信空间标识等参数。进程P0在调用该MPI-Bcast接口时向MPI-Bcast接口提供上述传入参数。其中,根进程标识为进程PO的标识,通信空间标识为包括进程PO、进程P1和进程P2的进程组的组标识。进程P0对应的核cr0在执行进程PO对MPI-Bcast接口的调用之后执行MPU适配模块,首先根据地址信息中的消息长度确定该消息为小包消息还是大包消息,假设在该实例中消息为小包消息,核cr0从地址addr1读取该消息A1。然后,核cr0根据应用提供的通信操作的底层信息确定进程P1为进程P0所在的计算节点C0中的进程且对应于核cr1,进程P2为计算节点C1中的进程,即进程P1与进程P0之间为节点内通信,进程P0与进程P2之间为节点间通信。之后,核cr0根据上述信息可生成如表2所示的命令组:
Inline 普通 RSVD SE1
RDMA 普通 RSVD SE2
表2
其中,该命令组包括顺序排列的inline类型命令和RDMA类型命令两条命令,这两条命令的排列顺序指示了其执行顺序。在inline类型命令中,操作类型为普通类型,且描述符为SE1。SE1中可包括接收消息的进程P1的标识、发送消息的进程P0的标识、消息标识、待发送的消息A1等。在RDMA类型命令中包括SE2,SE2可包括接收消息的计算节点C1的标识接收消息的进程P2的标识、发送消息的进程P0的标识、消息A2的标识、消息A1等。其中,计算节点C1的标识例如为计算节点C1的网卡标识。
在步骤S302,进程P0将生成的命令组发送给MPU22。
具体是,进程P0通过调用MPI-Bcast接口,指示核cr0在生成上述命令组之后将该命令组发送给MPU22。核cr0在将命令组发送给MPU22之后可获取其他任务(例如其他进程)并执行该新的任务。也就是说,通过由MPU进行进程间通信操作,节省了CPU资源,提高了系统效率。
在步骤S303,MPU22根据命令组中的第一命令确定目标进程为进程P1时,则执行第一命令,以发送数据至进程P1。
具体是,MPU22在接收到表2的命令组之后,首先执行命令组中的第1条命令,MPU22在执行该第1条命令之后,在确定目标进程(进程P1)为计算节点C0中的进程之后,将该消息发送命令存储到MPU22的预设队列中,以等待与消息接收命令的匹配,所述匹配包括对消息发送命令中的SE1和消息接收命令中的RE1的发送消息的进程标识、接收消息的进程标识、消息标识的匹配。通过将消息发送命令存储到预设队列中,避免了由于MPU未接收到消息接收命令而将消息发送命令返回给CPU的情况,保证了由MPU进行对消息发送命令和消息接收命令的匹配。
同时,进程P1可调用MPI(例如Recv接口),以对SE1中的消息A1进行接收。如上文所述,该调用的Recv接口的传入参数中例如包括用于存储接收的消息A1的地址信息addr2、发送进程P0的标识、消息标识等。进程P1对应的核cr1在执行该对Recv接口的调用之后生成对应的消息接收命令,该消息接收命令中可包括RE1,该RE1中包括发送消息的进程P0的标识、接收消息的进程P1的标识、消息标识等。之后,核cr1将该生成的消息接收命令发送给MPU22。
MPU22中的调度单元221在接收到该消息接收命令之后,将该消息接收命令发送给双边通信单元222,双边通信单元222确定MPU22的存储单元中的预设队列中是否存储有与该消息接收命令匹配的消息发送命令,如果为否,则双边通信单元222在等待预设时间之后再次确定预设队列中是否存储有匹配的消息发送命令。
双边通信单元222在对上述消息发送命令与上述消息接收命令匹配成功的情况下可通知调度单元221,调度单元221从SE1中获取消息A1,生成用于直接发送给进程P1对应的核cr0的inline报文,该inline报文中例如包括发送消息的进程PO的标识、接收消息的进程P1的标识和待发送的消息A1。之后,MPU22将inline报文发送给核cr1。核cr1可将该接收的报文中的消息存入消息接收命令中的RE1中的消息存储地址Addr2中,从而完成进程P0与进程P1之间的节点内通信,即进程P0到进程P1的数据发送。通过上文所述过程进行进程P0与进程P1之间的通信,由MPU22代替CPU进行消息发送命令与消息接收命令的匹配,并在匹配成功之后生成inline报文发送给核cr1,减少了对CPU资源的占 用。
可以理解,MPU22不限于通过上述过程将进程P0发送的小包消息发送给进程P1,也可以通过执行第一命令,将进程P0发送的大包消息发送给目标进程,该过程将在下文参考图5详细描述。
在步骤S304,MPU22根据命令组中的第二命令确定目标进程为位于计算节点C1的进程P2时,将数据发送给网络设备24(例如RDMA设备),以将数据传输至进程P2。
具体是,MPU22执行命令组中的第2条命令,在确定目标进程(进程P2)为计算节点C1中的进程之后,由调度单元221根据该命令中的SE2生成用于发送给RDMA设备的RDMA报文。该RDMA报文例如包括消息A1、发送消息A1的进程P0的标识、接收消息A1的计算节点C1的标识、接收消息A1的进程P2的标识等。之后,MPU22将RDMA报文发送给计算节点C0的RDMA设备。
计算节点C0的RDMA设备在接收到该RDMA报文之后,生成发送给计算节点C1的RDMA设备的报文,该报文中包括所述消息A1、发送消息的进程P0的标识、接收消息的计算节点C1的标识、接收消息的进程P2的标识等。计算节点C1的RDMA设备在接收到该报文之后,可将该报文发送给计算节点C1中的MPU,计算节点C1的MPU中的双边通信单元在将该报文与由进程P2发送的消息接收命令匹配成功之后,可以以将该消息发送给进程P2,从而完成进程P0与进程P2的通信。
当MPU在执行命令组的任一命令时出错,MPU可通知MPU适配模块,并由该适配模块进行错误处理。
图5为本申请实施例提供的MPI-Bcast接口对应的大包消息进程间通信方法流程图。该方法可由计算节点C0执行
如图5所示,首先,在步骤S501,进程P0调用MPI生成命令组。
与参考图3所述流程类似地,进程P0可通过调用MPI-Bcast接口来向进程P1和进程P2发送消息A1。MPI-Bcast接口的传入参数例如包括待发送消息A1的当前内存地址addr1、根进程P0的标识、消息标识、以及通信空间标识等参数。进程P0对应的核cr0在执行进程PO对MPI-Bcast接口的调用之后执行MPU适配模块,首先根据地址信息中的消息长度确定待发送的消息A1为小包消息还是大包消息,假设在该实例中消息为大包消息。然后,核cr0根据应用提供的通信操作的底层信息确定进程P1为进程P0所在的计算节点C0中的进程且对应于核cr1,进程P2为计算节点C1中的进程,即进程P1与进程P0之间为节点内通信,进程P0与进程P2之间为节点间通信。之后,核cr0根据上述信息可生成如表3所示的命令组:
LDMA 普通 RSVD SE3
RDMA 普通 RSVD SE4
表3
其中,该命令组包括顺序排列的LDMA类型命令和RDMA类型命令两条命令,这两条命令的排列顺序指示了其执行顺序。在LDMA类型命令中,操作类型为普通类型,且描述符为SE3,SE3中可包括接收消息的进程P1的标识、发送消息的进程P0的标识、消息A1的标识、待发送的消息A1的存储地址addr1等。在RDMA类型命令中包括SE4,SE4可包括 接收消息的进程P2的标识、发送消息的进程P0的标识、消息A1的标识、待发送的消息A1的存储地址addr1等。
在步骤S502,进程P0将表3所示的命令组发送给MPU22。该步骤可参考上文对步骤S302的描述,在此不再赘述。
在步骤S503MPU22根据命令组中的第一命令生成LDMA报文。
具体是,MPU22在接收到表3的命令组之后,首先执行命令组中的第1条命令,MPU22在执行该第1条命令之后,将该消息发送命令存储到MPU22的存储单元中的预设队列中,以等待与消息接收命令的匹配,所述匹配包括对消息发送命令中的SE3和消息接收命令中的RE的发送消息的进程标识、接收消息的进程标识、消息标识的匹配。
进程P1可调用MPI(例如Recv接口),以对SE1中的消息A1进行接收。如上文所述,Recv接口的传入参数中例如包括用于存储消息A1的内存地址addr2、发送消息的进程P0的标识、消息A1的标识等。进程P1对应的核cr1在执行该对Recv接口的调用之后,生成对应的消息接收命令,该消息接收命令中可包括RE,该RE包括发送消息的进程P0的标识、接收消息的进程P1的标识、消息A1的标识、用于存储消息A1的地址addr2等。之后,核cr1将该生成的消息接收命令发送给MPU22。MPU22中的调度单元221在接收到该消息接收命令之后,将该消息接收命令发送给双边通信单元222,双边通信单元222确定MPU22的预设队列中是否存储有与该消息接收命令匹配的消息发送命令。
双边通信单元222在对上述消息发送命令与上述消息接收命令匹配成功的情况下通知调度单元221,调度单元221根据上述消息发送命令和上述消息接收命令生成用于发送给LDMA设备的LDMA报文。该LDMA报文中例如包括发送消息的进程PO的标识、接收消息的进程P1的标识、消息A1的当前存储地址addr1、将用于存储该消息A1的地址addr2。
在步骤S504,MPU22将LDMA报文发送给LDMA设备。
在步骤S505,LDMA设备根据LDMA报文将数据发送给进程P1。
具体是,LDMA设备在接收到该LDMA报文之后,从地址addr1获取消息A1,将该消息A1存储到地址addr2,从而完成进程P0与进程P1之间的数据传输。通过上文所述过程进行进程P0与进程P1之间的通信,由MPU22代替CPU进行消息发送命令中的SE与消息接收命令中的RE的匹配,生成LDMA报文,并指示LDMA设备进行消息在不同地址之间的转移,减少了对CPU资源的占用。
在步骤S506,MPU22执行命令组中的第二命令(即RDMA类型消息发送命令),根据该命令生成RDMA报文。在步骤S508,MPU22将RDMA报文发送给RDMA设备,以将数据发送给计算节点C1中的进程P2。其中,步骤S506和S507可参考上文对步骤S304的描述,在此不再赘述。
图6为本申请实施例提供的一种数据传输方法的流程图,所述方法例如可由计算节点C0执行,所述方法包括:
在步骤S601,进程P0向MPU22发送数据;
在步骤S602,RDMA设备向MPU22发送第二数据;
步骤S603,进程P1调用MPI生成命令,所述命令用于从目标进程接收数据;
步骤S604,进程P1将生成的命令发送给MPU22;
步骤S605,MPU22执行命令,以将数据发送给进程P1。
图7为一实施例中进程间通信的示意图。如图7所示,计算节点C0中的进程P1可通过调用Recv接口从进程P0接收消息A1,进程P1还可以通过调用Recv接口从进程P2接收消息A2。其中,进程P0和进程P1是计算节点C0中的进程,进程P2为计算节点C1中的进程。
下文将结合图7描述图6所示的各个步骤。
首先,在步骤S601,进程P0向MPU22发送数据。
进程P0可通过如图3所示的方法调用MPI接口向进程P1发送消息A1,从而如上文所述,可生成消息发送命令,进程PO通过将该消息发送命令发送给MPU22,从而将消息A1发送给MPU22。其中,将消息A1发送给MPU22包括,在消息A1为小包消息的情况中将消息A1直接发送给MPU22,或者在消息A1为大包消息的情况中将消息A1的存储地址发送给MPU22。
在步骤S602,RDMA设备向MPU22发送数据。
参考图7,计算节点C1中的进程P2可通过图3所示的方法将消息A2发送至计算节点C0中的RDMA设备,从而该RDMA设备可将消息A2发送至MPU22。
在步骤S603,进程P1调用MPI生成命令,所述命令用于从目标进程接收数据。在步骤S604,进程P1将生成的命令发送给MPU22
进程P1可通过调用Recv接口来接收消息A1或消息A2。如上文所述,进程P1在调用Recv接口时,对Recv接口提供的传入参数中例如包括用于存储接收的消息的地址信息(例如内存地址addr1)、发送消息的进程的标识(进程P0或进程P2)、消息标识(A1或A2)。
进程P1对应的核cr1在执行进程P1对Recv接口的调用之后执行MPU适配模块,对于用于接收A1的Recv接口调用,首先根据地址信息中的消息长度确定该消息为小包消息还是大包消息,假设在该实例中消息为小包消息,核cr1可生成如下所示的命令:
Inline 普通 RSVD RE1
其中,RE1中包括用于存储接收的消息的地址addr1、发送消息的进程P0的标识、接收消息的进程P1的标识、消息A1的标识。
对于用于接收A2的Recv接口调用,和cr1可生成如下所示的命令:
RDMA 普通 RSVD RE2
其中,RE2中包括用于存储接收的消息的地址addr2、发送消息的进程P2的标识、接收消息的进程P1的标识、消息A2的标识。
在步骤S605,MPU22执行命令以将数据发送给进程P1。
MPU22在接收到上述命令之后,根据RE1或RE2中的发送消息的进程的标识、接收消息的进程的标识、消息标识与已接收的数据(消息A1或消息A2)的信息进行匹配,在匹配成功之后,将数据发送给进程P1。例如对于上述用于接收消息A1的命令,MPU22可以生成包括消息A1的报文,将该报文发送给核cr1,核cr1将该报文存储到命令中的地址addr1中,从而将消息A1发送给进程P1。对于上述用于接收消息A2的命令,MPU22可以指示LDMA设备将消息A2从当前的存储地址转移到命令中的地址addr2中。
图8为一实施例中MPI-Reduce接口对应的进程间通信的示意图。如图8所示,MPI-Reduce接口对应的进程间通信包括阶段1和阶段2两个阶段。在阶段1,通信空间中的其他进程(例如进程P1、进程P2和进程P3)分别将各自的消息(A1、A2和A3)发送 给根进程(例如进程P0),根进程P0在接收多个消息之后对该多个消息(A1、A2和A3)及进程P0的消息A0进行加计算,得到全部消息的和;在阶段2,进程P0将消息和B2分别发送给进程P1、进程P2和进程P3。计算节点C0中的进程P0可通过调用MPI-Reduce接口进行上述过程。
图9为本申请实施例提供的MPI-Reduce接口对应的小包消息进程间通信方法的流程图。该流程例如由计算节点C0执行。
如图9所示,首先在步骤S901,进程P0调用MPI-Reduce接口生成命令组。
计算节点C0中的进程PO可根据系统提供给应用的MPI进行对MPI-Reduce接口的调用,核cr0在执行到该接口调用之后,可运行MPU适配模块,从而生成与MPI-Reduce接口对应的命令组。
具体是,如上文所述,MPI-Reduce接口的传入参数例如包括进程P0的消息A0的当前存储地址addr1、将用于存储消息和的地址addr2、消息标识、根进程P0的标识、以及通信空间标识等参数,其中,通信空间标识用于指示由进程PO、进程P1、进程P2和进程P3构成的进程组。核cr0在执行进程PO对MPI-Reduce接口的调用之后,首先根据地址信息中的消息长度确定将要处理的消息为小包消息还是大包消息,假设在该实例中消息为小包消息。然后,核cr0根据底层信息确定进程P1为进程P0所在的计算节点C0中的进程且对应于核cr1,进程P2和进程P3为计算节点C1中的进程,即进程P1与进程P0之间为节点内通信,进程P0与进程P2和进程P3之间为节点间通信。之后,核cr0根据上述信息可生成如表4所示的命令组:
RDMA Calc(SUM) RSVD RE2 RE3
Inline Calc(SUM) RSVD RE1
Inline 普通 RSVD SE1
RDMA 普通 RSVD SE2
RDMA 普通 RSVD SE3
表4
其中,该命令组包括顺序排列的5条命令,这5条命令的排列顺序指示了其执行顺序。其中,第1条命令包括RE2和RE3,RE2中可包括发送消息A2的进程P2的标识、接收消息A2的进程P0的标识、消息A2的标识、用于存储接收的消息A2的存储地址addr2等,RE3中可包括发送消息A3的计算节点C1的标识、发送消息A3的进程P3的标识、接收消息A3的进程P0的标识、消息A3的标识、用于存储接收的消息A3的存储地址addr2等。第1条命令中的命令类型“RDMA”用于指示从RDMA设备接收消息A2和消息A3,操作类型“Calc”用于指示对接收的两个消息进行计算,其中,操作类型的子字段“SUM”中可用于指示该计算具体为加法计算。
MPU22在执行该第1条命令时,可根据待接收的消息大小确定是由MPU22进行所述计算还是由LDMA设备进行所述计算。第2条命令中包括RE1,RE1中可包括发送消息A1的 进程P1的标识、接收消息A1的进程P0的标识、消息A1的标识、消息A0、消息A2与消息A3的消息和的标识等。第2条命令中的命令类型“inline”用于指示从进程P1对应的核cr1接收消息A1,操作类型“Calc”的子字段“SUM”用于指示MPU22对从核cr1接收的消息A1与消息A0、以及消息A2与消息A3的消息和B1进行加法计算,得到四个消息(A0、A1、A2、A3)的消息和B2。
表4中第3-5条命令可参考上文中对表2中的命令的描述,其中,SE1-SE3中的当前存储消息的地址可以为RE1-RE3中的地址addr2。
在步骤S902,进程P0将命令组发送给MPU22。
在步骤S903,RDMA设备向MPU22发送RDMA报文,该RDMA报文中包括由计算节点C1中的进程P2发送给进程P0的消息A2。
在计算节点C1中,进程P2可调用Send接口或者MPI-Bcast接口以向进程P0发送消息A2。进程P2对应的核cr2在运行该接口之后,与上文类似地,可生成RDMA类型的命令发送给MPU,从而MPU根据该命令生成RDMA报文,并将该RDMA报文发送给计算节点C1的RDMA设备,该RDMA报文中包括发送该消息A2的进程P2的标识、接收该消息A2的进程P1的标识、消息A2、消息A2的标识等。计算节点C1的RDMA设备在接收到RDMA报文之后,生成发送给计算节点C0的RDMA设备的报文,该报文中包括发送该消息A2的进程P2的标识、接收该消息A2的进程P0的标识、消息A2、消息A2的标识等。计算节点C0的RDMA设备在接收到由计算节点C1的RDMA设备发送的报文之后,在确定消息A2的大小小于预设值的情况中,生成发送给MPU22的RDMA报文,该RDMA报文中包括发送该消息A2的进程P2的标识、接收该消息A2的进程P0的标识、消息A2、消息A2的标识等。
MPU22在接收到RDMA报文之后,在MPU22的存储单元的预设队列中存储该RDMA报文,以等待与消息接收命令的匹配。
在步骤S904,RDMA设备向MPU22发送RDMA报文,该RDMA报文中包括由计算节点C1中的进程P3发送给进程P0的消息A3的相关信息。该步骤可参考上文对步骤S703的描述,在此不再赘述。
在步骤S905,MPU22根据命令组中的第一命令(即第1条命令),对两个RDMA报文中的消息进行结合处理,得到第一结果B1。
MPU22在接收到表4所示的命令组之后,首先执行第1条命令。MPU22中的调度单元221在执行该第1条命令中之后,将该第1条命令发送给双边通信单元222。双边通信单元222在预设队列中确定是否已接收到与该条命令中的RE2和RE3匹配的RDMA报文,其中,所述匹配包括RE2或RE3与RDMA报文中的发送消息的进程标识、接收消息的进程标识、消息标识的匹配。
双边通信单元222在对RE2与第1次接收的RDMA报文匹配成功之后通知调度单元221,双边通信单元222在对RE3与第2次接收的RDMA报文匹配成功之后通知调度单元221,调度单元221从两次接收的RDMA报文中获取消息A2和消息A3,计算消息A2与消息A3之和,得到第一结果A2+A3=B1,并将第一结果B1存入MPU22中的存储单元中。
在步骤S906,进程P1向MPU22发送inline类型消息发送命令,该消息发送命令中可包括发送消息A1的进程P1的标识、接收消息A1的进程P0的标识、消息A1的标识、消息A1。
进程P1可调用MPI(例如Send接口),以向进程P0发送消息A1。如上文所述,Send接口的传入参数中例如包括消息A1的当前存储地址信息、接收消息A1的进程P0的标识、消息A1的标识等。进程P1对应的核cr1在执行该对Send接口的调用之后,读取该消息A1,生成inline类型消息发送命令,该消息发送命令中可包括发送消息A1的进程P1的标识、接收消息A1的进程P0的标识、消息A1的标识、消息A1等。之后,核cr1将该生成的消息发送命令发送给MPU22。MPU22在接收到该消息发送命令之后,将该消息发送命令存储到MPU22中的存储单元的预设队列中,以等待与消息接收命令进行匹配。
在步骤S907,MPU22根据命令组中的第二命令(即inline消息接收命令)和接收的inline消息发送命令对第一结果B1与消息发送命令中的消息A1、进程P0的消息A0进行结合处理,得到第二结果B2。
具体是,MPU22执行表4中的第2条命令,MPU22中的调度单元221在执行该第2条命令之后,将该第2条命令发送给双边通信单元222。双边通信单元222在预设队列中确定是否已接收到与该条消息接收命令匹配的inline消息发送命令,其中,所述匹配包括消息接收命令中的RE1与消息发送命令中的SE中的发送消息的进程标识、接收消息的进程标识、消息标识的匹配。
双边通信单元222在对RE1与接收的消息发送命令匹配成功之后通知调度单元221,调度单元221从RE1中获取消息A0、根据RE1从MPU22中读取第一结果B1,从SE1中获取消息A1,计算消息A1与第一结果B1和消息A0之和,得到第二结果A1+B1+A0=B2。
在步骤S908,MPU22将第二结果发送给进程P0。具体是,MPU22生成包括上述第二结果的inline报文,并将该inline报文发送给核cr0。核cr0在接收到该内联报文之后,可获取第二结果,并根据RE1中的地址addr2,将第二结果存入到地址addr2中,以使得进程P0可获取该第二结果。
步骤S909,进程P1向MPU22发送inline类型消息接收命令。在步骤S910,MPU22执行表4中的第3条命令,将第二结果B2发送给进程P1,具体是,在对inline类型消息接收命令和第3条命令匹配成功之后,生成内联报文,该内联报文中包括第二结果B2,将内联报文发送给进程P1对应的核Cr1,核Cr1将第二结果B2存储到内联型消息接收命令的地址中。在步骤S911,MPU22根据命令组中的RDMA类型消息发送命令(即第4条命令和第5条命令),生成用于发送给RDMA设备的两个RDMA报文。在步骤S912,MPU22将生成的两个RDMA报文发送给RDMA设备,以分别发送给进程P2和P3。上述步骤S911-步骤S915可参考上文中对图3的描述,在此不再赘述。
图10为一实施例中MPI-Reduce接口对应的进程间通信的示意图。如图10所示,在阶段1,通信空间中的其他进程(例如进程P1和进程P2)分别将各自的消息(A1、A2)发送给根进程(例如进程P0),根进程P0在接收多个消息之后对消息A1、消息A2及进程P0的消息A0进行加计算,得到全部消息的和B1;在阶段2,进程P0将消息和B1分别发送给进程P1和进程P2。计算节点C0中的进程P0可通过调用MPI-Reduce接口进行上述过程。图10所示进程间通信图与图11所示进程间通信流程对应。
图11为本申请实施例提供的MPI-Reduce接口对应的大包消息进程间通信方法流程图。该流程例如由计算节点C0执行。
如图11所示,首先在步骤S1101,进程P0调用MPI-Reduce接口生成命令组。
计算节点C0中的进程PO可根据系统提供给应用的MPI进行对MPI-Reduce接口的调用,核cr0在执行到该接口调用之后,可运行MPU适配模块,从而生成与MPI-Reduce接口对应的命令组。
具体是,如上文所述,MPI-Reduce接口的传入参数例如包括进程P0的消息A0的当前存储地址addr1、将用于存储消息和的地址addr2、消息标识、根进程P0的标识、以及通信空间标识等参数。其中,通信空间标识用于指示由进程PO、进程P1和进程P2构成的进程组。核cr0在执行进程PO对MPI-Reduce接口的调用之后执行MPU适配模块,首先根据地址信息中的消息长度确定将要处理的消息为小包消息还是大包消息,假设在该实例中消息为大包消息。然后,核cr0根据通信操作对应的底层信息确定进程P1为进程P0所在的计算节点C0中的进程且对应于核cr1,进程P2为计算节点C1中的进程,即进程P1与进程P0之间为节点内通信,进程P0与进程P2之间为节点间通信。之后,核cr0根据上述信息可生成如表5所示的命令组:
RDMA 普通 RSVD RE2
LDMA Calc(SUM) RSVD RE1
LDMA 普通 RSVD SE1
RDMA 普通 RSVD SE2
表5
其中,该命令组包括顺序排列的4条命令,这4条命令的排列顺序指示了其执行顺序。其中,第1条命令包括RE2,RE2中可包括发送消息A2的进程P2的标识、接收消息A2的进程P0的标识、消息A2的标识、用于存储接收的消息A2的存储地址addr2等。第1条命令中的命令类型“RDMA”指示从RDMA设备接收消息A2,操作类型“普通”指示仅接收消息而不对消息进行处理。第2条命令中包括RE1,RE1中可包括发送消息A1的进程P1的标识、接收消息A1的进程P0的标识、消息A1的标识、用于存储消息和的地址addr2、消息A0的当前存储地址addr1等。第2条命令中的命令类型“LDMA”指示从LDMA设备接收消息A1,操作类型“Calc”用于指示由LDMA设备对接收的消息A1、消息A0和地址addr2中存储的消息A2进行加法计算,得到三个消息的消息和B1,并将该消息和B1存储到RE1中的地址addr2中。
表5中第3-4条命令可参考上文中对表3中的命令的描述,其中,SE1-SE2中的当前存储消息的地址可以为RE1中用于存储消息和B1的地址addr2。
在步骤S1102,进程P0将命令组发送给MPU22。
在步骤S1103,RDMA设备向MPU22发送RDMA报文,该RDMA报文中包括由计算节点C1中的进程P2发送给进程P0的消息A2的相关信息。该步骤与图9中的步骤S903不同在于,由于消息A2是大包消息,在该RDMA报文中不包括消息A2自身。
在步骤S1104,MPU22根据命令组中的RDMA类型消息接收命令接收消息。
MPU22在接收到RDMA报文之后,可将命令组中的消息接收命令(即第1条命令)与RDMA报文进行匹配。在匹配成功之后,MPU22根据RDMA报文中指示RDMA设备将该消息A2 存储到消息接收命令中的用于存储接收消息的内存地址Addr2中。
在步骤S1105,进程P1向MPU22发送LDMA类型消息发送命令。
与上文类似地,进程P1可调用MPI(例如Send接口),以向进程P0发送消息A1。如上文所述,Send接口的传入参数中例如包括消息A1的当前存储地址addr4、接收消息A1的进程P0的标识、消息A1的标识等。进程P1对应的核cr1在执行该对Send接口的调用之后,生成消息发送命令,该消息发送命令中可包括发送消息A1的进程P1的标识、接收消息A1的进程P0的标识、消息A1的标识、消息A1的当前存储地址addr4等。之后,核cr1将该生成的消息发送命令发送给MPU22。MPU22在接收到该消息发送命令之后,将该消息发送命令存储到MPU22中的存储单元中,以等待匹配。
在步骤S1106,MPU22根据命令组中的LDMA类型消息接收命令(即第2条命令)和LDMA类型消息发送命令生成LDMA报文。
具体是,MPU22之后执行表5中的第2条命令,MPU22中的调度单元221在执行该第2条命令之后,将该第2条命令发送给双边通信单元222。双边通信单元222在存储单元的预设队列中确定是否已接收到与该条消息接收命令匹配的消息发送命令,其中,所述匹配包括消息接收命令中的RE与消息发送命令中的SE中的发送消息的进程标识、接收消息的进程标识、消息标识的匹配。
在MPU22对第2条命令与进程P1发送的消息发送命令匹配成功之后,根据操作类型“Calc”生成LDMA报文,以指示LDMA设备对消息A0、消息A1和消息A2进行相加。该LDMA报文中可包括消息A0的存储地址addr1、消息A2的存储地址addr2,消息A1的存储地址addr4,和对消息A0、消息A1和消息A2进行相加的指示。
在步骤S1107,MPU22向LDMA设备发送LDMA报文。
在步骤S1108,LDMA设备根据LDMA报文对多个消息进行结合处理,得到第一结果B1。
具体是,LDMA设备在接收到LDMA报文之后,根据该LDMA报文的指示,从地址addr1读取消息A0,从地址addr2读取消息A2,从地址addr4读取消息A1,计算得到A0、A1与A2之和B1,并将消息和B1存储到第2条命令中的地址addr2中。从而使得进程P0可获取第一结果B1。
在步骤S1109,进程P1向MPU22发送LDMA类型消息接收命令。步骤S1110,MPU22在对LDMA类型消息接收命令和命令组中的LDMA类型消息发送命令(即第3条命令)匹配成功之后,生成LDMA报文,其中该LDMA报文中的消息的当前存储地址可以为上述地址addr2。在步骤S1111,MPU22将生成的LDMA报文发送给LDMA设备。在步骤S1112,LDMA设备根据LDMA报文将第一结果B1发送给进程P1,具体是,将第一结果B1从地址addr2转移到LDMA类型消息接收命令中的地址中。在步骤S1113,MPU22根据命令组中的RDMA类型消息发送命令(即第4条命令)生成RDMA报文,在步骤S1114,MPU22将RDMA报文发送给RDMA设备。上述步骤S1109-步骤S1114可参考上文中对图5中的描述,在此不再赘述。
图12为本申请实施例提供的一种芯片的架构图,所述芯片包括处理单元121和接口122,
所述接口122用于接收第一计算节点中的第一进程发送的命令,所述命令用于传输数据至目标进程,所述第一计算节点包括所述芯片;
所述接口122还用于:在所述处理单元121根据所述命令确定所述目标进程为第一计算节点中的第二进程时,发送所述数据至所述第二进程;在根据所述命令确定所述目标进程为第二计算节点中的第三进程时,则通过所述第一计算节点中的第一网络设备将所述数据传输至所述第三进程。
需要理解,本文中的“第一”,“第二”等描述,仅仅为了描述的简单而对相似概念进行区分,并不具有其他限定作用。
本领域的技术人员可以清楚地了解到,本申请提供的各实施例的描述可以相互参照,为描述的方便和简洁,例如关于本申请实施例提供的各装置、设备的功能以及执行的步骤可以参照本申请方法实施例的相关描述,各方法实施例之间、各装置实施例之间也可以互相参照。
本领域技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的全部或部分步骤;而前述的存储介质包括:只读存储器(read-only memory,ROM)、随机存取存储器(random-access memory,RAM)、磁盘或者光盘等各种可以存储程序代码的介质。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质、或者半导体介质(例如固态硬盘(Solid State Disk,SSD)等。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,在没有超过本申请的范围内,可以通过其他的方式实现。例如,以上所描述的实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
另外,所描述装置和方法以及不同实施例的示意图,在不超出本申请的范围内,可以与其它系统,模块,技术或方法结合或集成。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电子、机械或其它的形式。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。

Claims (19)

  1. 一种第一计算设备,其特征在于,包括第一处理器、至少一个第二处理器、和第一网络设备,
    所述第一网络设备用于连接所述第一计算设备至第二计算设备,所述至少一个第二处理器用于运行第一进程及第二进程;
    所述第一处理器用于:
    接收所述第一进程发送的第一命令,所述第一命令用于传输第一数据至目标进程;
    根据所述第一命令确定所述目标进程为所述第二进程时,则执行所述第一命令,以发送所述第一数据至所述第二进程;
    根据所述第一命令确定所述目标进程为位于所述第二计算设备的第三进程时,则通过所述第一网络设备将所述第一数据传输至所述第三进程。
  2. 根据权利要求1所述的计算设备,其特征在于,
    所述第一进程用于调用消息传递接口MPI生成命令组,将所述命令组发送给所述第一处理器,所述第一命令为所述命令组中的命令。
  3. 根据权利要求1或2所述的计算设备,其特征在于,所述第一数据的长度小于预设值,所述第一数据携带在所述第一命令中,所述第一处理器具体用于:
    根据所述第一命令生成包括所述第一数据的报文,并将所述报文发送至所述第二进程。
  4. 根据权利要求3所述的计算设备,其特征在于,所述第一处理器具体用于:
    在预设队列中存储所述第一命令,以等待与所述第二进程发送的第二命令进行匹配,所述第二命令用于从所述第一进程接收所述第一数据,在对所述第一命令和所述第二命令匹配成功之后,将所述第一数据发送给所述第二进程。
  5. 根据权利要求1或2所述的计算设备,其特征在于,所述第一数据的长度大于或者等于预设值,所述计算设备还包括内存访问设备,所述第一处理器具体用于:
    根据所述第一命令生成报文,并将所述报文发送给所述内存访问设备;
    所述内存访问设备用于根据所述报文将所述第一数据传输至所述第二进程。
  6. 根据权利要求1或2所述的计算设备,其特征在于,所述第一处理器具体用于:
    根据所述第一命令生成报文,并将所述报文发送给所述第一网络设备;
    所述第一网络设备用于根据所述报文将所述第一数据通过所述第二计算设备传输至所述第三进程。
  7. 根据权利要求5所述的计算设备,其特征在于,所述命令组中包括第三命令,所述第三命令用于从第四进程接收第二数据,所述第一处理器还用于:
    在所述第四进程为所述第二进程时,从所述第二进程接收所述第二数据;
    在所述第四进程为所述第三进程时,通过所述第一网络设备从所述第三进程接收所述第二数据;
    在所述第三命令指示不对所述第二数据进行处理时,执行所述第三命令,以将所述第二数据发送给所述第一进程。
  8. 根据权利要求7所述的计算设备,其特征在于,所述第二数据的长度小于预设值,所述第一处理器还用于:在所述第三命令指示对所述第二数据进行处理时,根据所述第三命令对所述第二数据进行处理,得到第三数据,将所述第三数据发送给所述第一进程。
  9. 根据权利要求7所述的计算设备,其特征在于,所述第二数据的长度大于或者等于预设值,所述第一处理器还用于:在所述第三命令指示对所述第二数据进行处理时,指示所述内存访问设备进行以下操作:根据所述第三命令对所述第二数据进行处理,得到第三数据,将所述第三数据传输至所述第一进程。
  10. 一种数据传输方法,所述方法由第一计算节点执行,所述第一计算节点包括第一处理器、至少一个第二处理器、和第一网络设备,所述第一网络设备用于连接所述第一计算节点至第二计算节点,所述至少一个第二处理器运行第一进程及第二进程,所述方法包括:
    所述第一处理器接收所述第一进程发送的第一命令,所述第一命令用于传输第一数据至目标进程;
    所述第一处理器根据所述第一命令确定所述目标进程为所述第二进程时,则执行所述第一命令,以发送所述第一数据至所述第二进程;
    所述第一处理器根据所述第一命令确定所述目标进程为位于所述第二计算节点的第三进程时,则通过所述第一网络设备将所述第一数据传输至所述第三进程。
  11. 根据权利要求10所述的方法,其特征在于,所述第一进程调用消息传递接口MPI生成命令组,将所述命令组发送给所述第一处理器,所述第一命令为所述命令组中的命令。
  12. 根据权利要求10或11所述的方法,其特征在于,所述第一数据的长度小于预设值,所述第一数据携带在所述第一命令中,所述发送所述第一数据至所述第二进程包括:
    所述第一处理器根据所述第一命令生成包括所述第一数据的报文,并将所述报文发送至所述第二进程。
  13. 根据权利要求12所述的方法,其特征在于,所述发送所述第一数据至所述第二进程包括:
    所述第一处理器在预设队列中存储所述第一命令,以等待与所述第二进程发送的第二命令进行匹配,所述第二命令用于从所述第一进程接收所述第一数据;
    所述第一处理器在对所述第一命令和所述第二命令匹配成功之后,将所述第一数据发送给所述第二进程。
  14. 根据权利要求10或11所述的方法,其特征在于,所述第一数据的长度大于或者等于预设值,所述计算节点还包括内存访问设备,所述发送所述第一数据至所述第二进程 包括:
    所述第一处理器根据所述第一命令生成报文,并将所述报文发送给所述内存访问设备;
    所述内存访问设备根据所述报文将所述第一数据传输至所述第二进程。
  15. 根据权利要求10或11所述的方法,其特征在于,所述第一处理器通过所述第一网络设备将所述第一数据传输至所述第三进程包括:
    所述第一处理器根据所述第一命令生成报文,并将所述报文发送给所述第一网络设备;
    所述第一网络设备根据所述报文将所述第一数据通过所述第二计算节点传输至所述第三进程。
  16. 根据权利要求14所述的方法,其特征在于,所述命令组中包括第三命令,所述第三命令用于从第四进程接收第二数据,所述方法还包括:
    在所述第四进程为所述第二进程时,所述第一处理器从所述第二进程接收所述第二数据;
    在所述第四进程为所述第三进程时,所述第一处理器通过所述第一网络设备从所述第三进程接收所述第二数据;
    在所述第三命令指示不对所述第二数据进行处理时,所述第一处理器执行所述第三命令,以将所述第二数据发送给所述第一进程。
  17. 根据权利要求16所述的方法,其特征在于,所述第二数据的长度小于预设值,所述方法还包括:在所述第三命令指示对所述第二数据进行处理时,所述第一处理器根据所述第三命令对所述第二数据进行处理,得到第三数据,将所述第三数据发送给所述第一进程。
  18. 根据权利要求16所述的方法,其特征在于,所述第二数据的长度大于或者等于预设值,所述方法还包括:在所述第三命令指示对所述第二数据进行处理时,所述第一处理器指示所述内存访问设备进行以下操作:根据所述第三命令对所述第二数据进行处理,得到第三数据,将所述第三数据传输至所述第一进程。
  19. 一种计算系统,其特征在于,包括第一计算节点和第二计算节点,所述第一计算节点包括第一处理器、第二处理器和第一网络设备,所述第一计算节点通过所述第一网络设备与所述第二计算节点连接,所述第二处理器用于运行第一进程和第二进程,所述第二计算节点用于运行第三进程;
    所述第一处理器用于:接收所述第一进程发送的命令,所述命令用于传输数据至目标进程;根据所述命令确定所述目标进程为所述第二进程时,则发送所述数据至所述第二进程;根据所述命令确定所述目标进程为所述第三进程时,则通过所述第一网络设备将所述数据传输至所述第三进程。
PCT/CN2022/104116 2021-11-25 2022-07-06 数据传输方法、计算设备及计算系统 WO2023093065A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111413487.2 2021-11-25
CN202111413487 2021-11-25
CN202210234247.4 2022-03-09
CN202210234247.4A CN116170435A (zh) 2021-11-25 2022-03-09 数据传输方法、计算设备及计算系统

Publications (1)

Publication Number Publication Date
WO2023093065A1 true WO2023093065A1 (zh) 2023-06-01

Family

ID=86413765

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/104116 WO2023093065A1 (zh) 2021-11-25 2022-07-06 数据传输方法、计算设备及计算系统

Country Status (2)

Country Link
CN (1) CN116170435A (zh)
WO (1) WO2023093065A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101300551A (zh) * 2005-11-17 2008-11-05 国际商业机器公司 对称的多处理集群环境中的进程间的通信
US20100153966A1 (en) * 2008-12-16 2010-06-17 International Business Machines Corporation Techniques for dynamically assigning jobs to processors in a cluster using local job tables
CN102185672A (zh) * 2011-03-02 2011-09-14 浪潮(北京)电子信息产业有限公司 进程间通信方法和高速网络设备
CN112202599A (zh) * 2020-09-11 2021-01-08 北京科技大学 针对异构多核平台通信优化的拓扑感知映射方法及系统
CN113287286A (zh) * 2019-01-30 2021-08-20 华为技术有限公司 通过rdma进行分布式存储节点中的输入/输出处理

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101300551A (zh) * 2005-11-17 2008-11-05 国际商业机器公司 对称的多处理集群环境中的进程间的通信
US20100153966A1 (en) * 2008-12-16 2010-06-17 International Business Machines Corporation Techniques for dynamically assigning jobs to processors in a cluster using local job tables
CN102185672A (zh) * 2011-03-02 2011-09-14 浪潮(北京)电子信息产业有限公司 进程间通信方法和高速网络设备
CN113287286A (zh) * 2019-01-30 2021-08-20 华为技术有限公司 通过rdma进行分布式存储节点中的输入/输出处理
CN112202599A (zh) * 2020-09-11 2021-01-08 北京科技大学 针对异构多核平台通信优化的拓扑感知映射方法及系统

Also Published As

Publication number Publication date
CN116170435A (zh) 2023-05-26

Similar Documents

Publication Publication Date Title
US8635388B2 (en) Method and system for an OS virtualization-aware network interface card
US9231849B2 (en) Apparatus and method for controlling virtual switches
US8654798B2 (en) Barrier synchronization apparatus, barrier synchronization system, and barrier synchronization method
CN113485823A (zh) 数据传输方法、装置、网络设备、存储介质
US11922304B2 (en) Remote artificial intelligence (AI) acceleration system
CN102521201A (zh) 多核数字信号处理器片上系统及数据传输方法
EP3291089B1 (en) Data processing method and apparatus
CN112291293B (zh) 任务处理方法、相关设备及计算机存储介质
US20160062779A1 (en) Announcing virtual machine migration
US20230080588A1 (en) Mqtt protocol simulation method and simulation device
US20160048402A1 (en) Hash-based load balancing for bonded network interfaces
US20180262560A1 (en) Method and system for transmitting communication data
US10621124B2 (en) Method, device and computer program product for enabling SR-IOV functions in endpoint device
WO2023104194A1 (zh) 一种业务处理方法及装置
US9116881B2 (en) Routing switch apparatus, network switch system, and routing switching method
US9479438B2 (en) Link aggregation based on virtual interfaces of VLANs
CN113691466B (zh) 一种数据的传输方法、智能网卡、计算设备及存储介质
US20130013892A1 (en) Hierarchical multi-core processor, multi-core processor system, and computer product
WO2023093065A1 (zh) 数据传输方法、计算设备及计算系统
US20230153153A1 (en) Task processing method and apparatus
CN115827524A (zh) 一种数据传输方法以及装置
JP6337469B2 (ja) 通信システム及び通信方法
CN109460379A (zh) 一种串口选择的方法及切换装置
WO2022160714A1 (zh) 一种通信方法、装置以及系统
WO2024077999A1 (zh) 集合通信方法及计算集群

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897161

Country of ref document: EP

Kind code of ref document: A1