WO2022228485A1 - 一种数据传输方法、数据处理方法及相关产品 - Google Patents

一种数据传输方法、数据处理方法及相关产品 Download PDF

Info

Publication number
WO2022228485A1
WO2022228485A1 PCT/CN2022/089705 CN2022089705W WO2022228485A1 WO 2022228485 A1 WO2022228485 A1 WO 2022228485A1 CN 2022089705 W CN2022089705 W CN 2022089705W WO 2022228485 A1 WO2022228485 A1 WO 2022228485A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
address
memory
memory space
Prior art date
Application number
PCT/CN2022/089705
Other languages
English (en)
French (fr)
Inventor
李晓峰
吴沛
杨瑞
李品生
毛修斌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22794955.9A priority Critical patent/EP4322003A1/en
Publication of WO2022228485A1 publication Critical patent/WO2022228485A1/zh
Priority to US18/496,234 priority patent/US20240061802A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Definitions

  • the present application relates to the field of computer technology, and in particular, to a data transmission method, a data processing method and related products.
  • a computer cluster is a computer system composed of a group of independent computers using a high-speed communication network.
  • a computer cluster processes sudden services, the following situations may occur: some computing nodes in the cluster are overwhelmed, while some computing nodes have excess resources, which will affect the processing progress of the service. Therefore, how to realize resource sharing across computing nodes in a computer cluster is an urgent problem to be solved.
  • the present application provides a data transmission method, a data processing method and related products, which can realize resource sharing across computing nodes in a computer cluster.
  • the present application provides a data transmission method, which is applied to a computer system, where the computer system includes a first computing node and a second computing node, the first computing node includes a first device and a first memory, and the second computing node It includes a second device and a second memory, the first memory includes a first memory space, and the second memory includes a second memory space, and the above method includes the following steps:
  • the first device obtains a cross-node read instruction, the cross-node read instruction is used to instruct the first device to read the first data from the second memory space, the cross-node read instruction includes the first source address and the size of the first data, the first source The address is the virtual address of the second memory space, the first device stores a first correspondence, and the first correspondence includes the correspondence between the virtual address of the second memory space and the ID of the second computing node;
  • the first device determines the ID of the second computing node according to the virtual address of the second memory space and the first correspondence
  • the first device obtains the first network transmission message according to the ID of the second computing node and the cross-node read instruction, and sends the first network transmission message to the second device, where the first network transmission message includes the second memory space. the virtual address and the size of the first data;
  • the second device receives the first network transmission message, reads the first data from the second memory space, and sends the first data to the first device.
  • the first device can read the first data from the memory of the second computing node (that is, the second memory space), thereby realizing data transmission across computing nodes, thereby realizing memory across computing nodes.
  • Resource Sharing Moreover, a first correspondence is stored in the first device, and the first device can obtain the first network transmission message according to the first correspondence, and send the first network transmission message to the second device. This process can be bypassed.
  • the CPU and the operating system of the first computing node therefore, the above method can also improve the data transmission efficiency across the computing nodes.
  • the computing nodes in the computer system share resources in a memory resource pool, and the memory resource pool includes the first memory and the second memory.
  • the memory resource pool includes the first memory and the second memory.
  • the first computing node further includes a first processor
  • the method further includes: the first processor addresses the address space of the memory resource pool to obtain a global virtual address of the memory resource pool;
  • the first computing node accesses the storage space of the memory resource pool through the global virtual address. In this way, any one computing node in the computer system can obtain the address of the memory space of other computing nodes, so that the memory resources of other computing nodes can be used.
  • acquiring the cross-node read instruction by the first device includes: the first processor acquiring the virtual address of the first memory space and the address of the second memory space corresponding to the first data from a memory resource pool. The virtual address is used to generate the above-mentioned cross-node read instruction, and then the first processor sends the cross-node read instruction to the first device.
  • the obtaining of the cross-node read instruction by the first device includes: the first device obtaining the virtual address of the first memory space and the address of the second memory space corresponding to the first data from the memory resource pool. The virtual address to generate the above-mentioned cross-node read instruction.
  • the above-mentioned cross-node read instruction may be generated by the first processor, or may be generated by the first device.
  • the first device can generate a cross-node read command by itself, without waiting for the first processor to generate a cross-node read command, thereby improving the first device from The efficiency of reading the first data in the second memory space.
  • the above-mentioned cross-node read instruction further includes a first destination address, and the first destination address is a virtual address of the first memory space
  • the above method further includes: the first device receives the first data, and then according to the first A virtual address of the memory space, and the first data is written into the first memory space. In this way, the first device can write the data read from the second memory space into the first memory space.
  • the above-mentioned first correspondence includes the global virtual address of the memory resource pool, the physical address of the storage space of the memory resource pool, and the correspondence between the IDs of each computing node associated with the memory resource pool.
  • the first device writes the first data into the first memory space according to the virtual address of the first memory space, including: the first device determines the physical address of the first memory space according to the first correspondence and the virtual address of the first memory space , and then write the first data into the first memory space by means of direct memory access (DMA).
  • DMA direct memory access
  • the second device stores the first correspondence
  • the second device receives the first network transmission message, and reads the first data from the second memory space, including: the second device Receive the first network transmission message, obtain the virtual address of the second memory space, and then determine the physical address of the second memory space according to the first correspondence and the virtual address of the second memory space, and then use the DMA method from the second memory space. Read the first data in. In this way, the speed at which the second device reads the first data from the second memory space can be improved.
  • the first correspondence is stored in the second device, so that the second device can determine the physical address of the second memory space according to the first correspondence, so as to read the first data from the second memory space. This process is bypassed.
  • the CPU and the operating system of the second computing node are not used, therefore, the above method can also provide data transmission efficiency across computing nodes.
  • the above-mentioned first memory further includes a third memory space
  • the above-mentioned second memory further includes a fourth memory space
  • the above-mentioned method further includes: the first device obtains a cross-node write instruction, and the cross-node write instruction uses Instructing the first device to write the second data into the third memory space, the cross-node write instruction includes the second source address, the second destination address and the size of the second data, and the second source address is the virtual address of the third memory space , the second destination address is the virtual address of the fourth memory space; the first device determines the physical address of the third memory space according to the above-mentioned first correspondence and the virtual address of the third memory space; The second data is read in the memory space; the first device determines the ID of the second computing node according to the first correspondence and the virtual address of the fourth memory space; the first device determines the ID of the second computing node and the cross-node write instruction according to the ID of the second computing node
  • the first device can write the second data into the memory of the second computing node (that is, the fourth memory space), thereby realizing data transmission across computing nodes, thereby realizing memory across computing nodes.
  • Resource Sharing Moreover, a first correspondence is stored in the first device, and the first device can obtain the second data according to the first correspondence, and send the second data to the second device. This process bypasses the CPU and the operating system of the first computing node. Therefore, the above method can improve the efficiency of data transmission across computing nodes.
  • the present application provides a data processing method applied to a computer system, where the computer system includes a first computing node and a second computing node, the first computing node includes a first device, and the second computing node includes a second device , the above method includes:
  • the first device acquires a cross-node acceleration instruction, the cross-node acceleration instruction is used to instruct the first device to use the second device to process the third data, the cross-node read instruction includes the ID of the second device and the target acceleration function ID, and the first device stores the The second correspondence, the second correspondence includes the correspondence between the ID of the second device and the ID of the second computing node;
  • the first device determines the ID of the second computing node according to the ID of the second device and the second correspondence
  • the first device obtains a third network transmission message according to the ID of the second computing node and the cross-node acceleration instruction, and sends the third network transmission message to the second device, where the third network transmission message includes the above-mentioned target acceleration function ID ;
  • the second device performs corresponding processing on the third data according to the target acceleration function ID
  • the second device sends the processing result of the third data to the first computing node.
  • the first device can use the second device in the second computing node to process the third data, thereby realizing data processing across computing nodes, thereby realizing computing resource sharing across computing nodes.
  • the first device stores the second correspondence
  • the first device can obtain the third network transmission message according to the second correspondence, and sends the third network transmission message to the second device, and this process can be bypassed
  • the CPU and operating system of the first computing node therefore, the above method can also provide data processing efficiency across computing nodes.
  • the computing nodes in the aforementioned computer system share resources in a computing resource pool, and the computing resource pool includes the aforementioned second device.
  • the computing resource pool includes the aforementioned second device.
  • the first computing node further includes a first processor
  • the method further includes: the first processor numbering the acceleration devices in the computing resource pool and the acceleration function of each acceleration device, Obtain multiple acceleration device IDs and the acceleration function ID corresponding to each acceleration device ID; the first computing node uses the acceleration device pair in the above-mentioned computing resource pool through multiple acceleration device IDs and the acceleration function ID corresponding to each acceleration device ID.
  • the third data is processed. In this way, any one computing node in the computer system can obtain information on computing resources of other computing nodes, so that computing resources of other computing nodes can be used.
  • obtaining the cross-node acceleration instruction by the first device includes: the first processor obtains the ID of the second device and the target acceleration function ID corresponding to the third data from the computing resource pool, and generates a cross-node acceleration function ID. The node acceleration instruction is then sent to the first device across the node acceleration instruction.
  • obtaining the cross-node acceleration instruction by the first device includes: the first device obtains the ID of the second device and the target acceleration function ID corresponding to the third data from the computing resource pool, and generates a cross-node acceleration function ID. Node acceleration instructions. It can be seen that the above-mentioned cross-node acceleration instruction may be generated by the first processor, or may be generated by the first device.
  • the first device can generate the above-mentioned cross-node acceleration instruction by itself, without waiting for the first processor to generate the cross-node acceleration instruction, thereby improving the data processing efficiency. efficiency.
  • the above-mentioned cross-node acceleration instruction further includes a third source address and a third destination address, the third source address is the address of the device storage space where the third data is stored, and the third destination address is the The address of the device storage space where the processing result of the three data is written.
  • the third source address is the address of the storage space of the first device, and before the second device performs corresponding processing on the third data according to the target acceleration function ID, the method further includes: A device obtains the third source address according to the cross-node acceleration instruction, then reads the third data from the storage space of the first device, and then sends the third data to the second device.
  • the third source address is an address of the storage space of the second device
  • the third network transmission packet further includes a third source address
  • the second device according to the target acceleration function ID
  • the above method further includes: the second device obtains the third source address according to the third network transmission message, and then reads the third data from the storage space of the second device.
  • the above-mentioned third data can be stored in the device storage space of the first device, and can also be stored in the device storage space of the second device. Using the data processing method provided by this application, the third data can be processed by the second device. deal with.
  • the present application provides a computer system, the computer system includes a first computing node and a second computing node, the first computing node includes a first device and a first memory, and the second computing node includes a second device and a second computing node.
  • Two memories the first memory includes a first memory space, and the second memory includes a second memory space,
  • the first device is used to obtain a cross-node read instruction, the cross-node read instruction includes a first source address and a size of the first data, the first source address is a virtual address of the second memory space, and the first device stores a first correspondence,
  • the first correspondence includes the correspondence between the virtual address of the second memory space and the ID of the second computing node;
  • the first device is further configured to determine the ID of the second computing node according to the virtual address of the second memory space and the first correspondence;
  • the first device is further configured to obtain the first network transmission message according to the ID of the second computing node and the cross-node read instruction, and send the first network transmission message to the second device, where the first network transmission message includes the second network transmission message.
  • the second device is configured to receive the first network transmission message, read the first data from the second memory space, and send the first data to the first device.
  • the computing nodes in the computer system share resources in a memory resource pool, and the memory resource pool includes the first memory and the second memory.
  • the first computing node further includes a first processor, and the first processor is configured to address the address space of the memory resource pool to obtain the global virtual address of the memory resource pool; the first computing The storage space used by the node to access the memory resource pool through the global virtual address.
  • the first processor is further configured to obtain the virtual address of the first memory space and the virtual address of the second memory space corresponding to the first data from the memory resource pool, and generate a cross-node read instruction; the first processor is further configured to send the cross-node read instruction to the first device.
  • the above-mentioned first device is specifically configured to: obtain the virtual address of the first memory space and the virtual address of the second memory space corresponding to the above-mentioned first data from the above-mentioned memory resource pool, and generate a cross-node read instruction.
  • the above-mentioned cross-node read instruction further includes a first destination address, where the first destination address is a virtual address of the first memory space, and the first device is also used to receive the first data; the first device also uses for writing the first data into the first memory space according to the virtual address of the first memory space.
  • the above-mentioned first correspondence includes the correspondence between the global virtual address of the above-mentioned memory resource pool, the physical address of the storage space of the memory resource pool, and the ID of each computing node associated with the memory resource pool
  • the first device is specifically configured to: determine the physical address of the first memory space according to the first correspondence and the virtual address of the first memory space, and then write the first data into the first memory space by means of DMA.
  • the second device stores the above-mentioned first correspondence
  • the second device is specifically configured to: receive the first network transmission message, obtain the virtual address of the second memory space, and then according to the first correspondence and the virtual address of the second memory space, determine the physical address of the second memory space, and then read the first data from the second memory space by means of DMA.
  • the first memory further includes a third memory space
  • the second memory further includes a fourth memory space
  • the first device is further configured to obtain a cross-node write instruction
  • the cross-node write instruction is used for Instruct the first device to write the second data into the fourth memory space
  • the cross-node write instruction includes the second source address, the second destination address and the size of the second data
  • the second source address is the virtual address of the third memory space
  • the second destination address is the virtual address of the fourth memory space
  • the first device is also used to determine the physical address of the third memory space according to the first correspondence and the virtual address of the third memory space
  • the first device is also used to pass the DMA way to read the second data from the third memory space
  • the first device is also used to determine the ID of the second computing node according to the first correspondence and the virtual address of the fourth memory space
  • the first device is also used to determine the ID of the second computing node according to the second Calculate the ID of the node and the cross-node write instruction,
  • the present application further provides a computer system, the computer system includes a first computing node and a second computing node, the first computing node includes a first device, and the second computing node includes a second device,
  • the first device is used to obtain a cross-node acceleration instruction
  • the cross-node acceleration instruction is used to instruct the first device to use the second device to process the third data
  • the cross-node acceleration instruction includes the ID of the second device and the target acceleration function ID
  • the first device A second correspondence is stored, and the second correspondence includes the correspondence between the ID of the second device and the ID of the second computing node;
  • the first device is further configured to determine the ID of the second computing node according to the ID of the second device and the second correspondence;
  • the first device is further configured to obtain a third network transmission message according to the ID of the second computing node and the cross-node acceleration instruction, and send the third network transmission message to the second device, where the third network transmission message includes the target acceleration function ID;
  • the second device is further configured to perform corresponding processing on the third data according to the target acceleration function ID;
  • the second device is configured to send the processing result of the third data to the first computing node.
  • the computing nodes in the aforementioned computer system share resources in a computing resource pool, and the computing resource pool includes the aforementioned second device.
  • the first computing node further includes a first processor, and the first processor is configured to number the acceleration devices in the computing resource pool and the acceleration function of each acceleration device, so as to obtain multiple acceleration devices.
  • the device ID and the acceleration function ID corresponding to each acceleration device ID; the first computing node is used to use the acceleration device in the computing resource pool to use the acceleration device in the computing resource pool to use the acceleration device ID and the acceleration function ID corresponding to each acceleration device ID to the third data. to be processed.
  • the above-mentioned first processor is further configured to obtain the ID of the second device corresponding to the above-mentioned third data and the target acceleration function ID from the above-mentioned computing resource pool, and generate a cross-node acceleration instruction; the first processing The controller is further configured to send the cross-node acceleration instruction to the first device.
  • the first device is specifically configured to: obtain the ID of the second device and the target acceleration function ID corresponding to the third data from the computing resource pool, and generate a cross-node acceleration instruction.
  • the above-mentioned cross-node acceleration instruction further includes a third source address and a third destination address, the third source address is the address of the device storage space where the third data is stored, and the third destination address is the The address of the device storage space to which the processing result of the third data is written.
  • the above-mentioned third source address is the address of the storage space of the first device, and the first device is specifically configured to: obtain the third source address according to the cross-node acceleration instruction, and then obtain the third source address from the storage space of the first device.
  • the third data is read in the space, and then the third data is sent to the second device.
  • the third source address is the address of the storage space of the second device
  • the third network transmission packet further includes the third source address
  • the second device is further configured to: transmit according to the third network message, obtain the third source address, and then read the third data from the storage space of the second device.
  • the present application provides a computer-readable storage medium storing first computer instructions and second computer instructions, and the first computer instructions and the second computing instructions are respectively run on the first computing node and the second computing node. , to execute the method in any possible implementation manner of the first aspect, the first aspect, the second aspect, and any possible implementation manner of the second aspect, so as to realize the first computing node and the second computing node between data processing.
  • FIG. 1 is a schematic structural diagram of a computer system provided by the application.
  • FIG. 2 is a schematic diagram of a memory resource pool and a first correspondence provided by the present application
  • FIG. 3 is a schematic flowchart of a data transmission method provided by the present application.
  • FIG. 4 is a schematic diagram of a format of a cross-node read instruction provided by the present application.
  • FIG. 6 is a schematic structural diagram of another computer system provided by the application.
  • FIG. 8 is a schematic flowchart of a data processing method provided by the present application.
  • FIG. 9 is a schematic diagram of a format of a cross-node acceleration instruction provided by the present application.
  • a computer system includes two or more computing nodes (that is, computers), and the resources of the computer system include two aspects: on the one hand, memory resources, that is, memory resources owned by all computing nodes in the system; Aspects are computing resources, that is, computing resources owned by all computing nodes in the system.
  • the resource sharing of the computer system includes the sharing of the memory resources of the system and the sharing of the computing resources of the system.
  • the purpose of sharing the memory resources of the computer system is to build a memory resource pool, so that when the memory resources of a computing node in the computer system are insufficient, the computing node can use the memory of other computing nodes as a disk or cache to use It is used to store some data. When the computing node needs to use the data, it will read data from the memory of other computing nodes, so as to solve the problem that the progress of task execution is affected because the memory configuration of a single computing node does not meet the actual demand.
  • the purpose of sharing the computing resources of the computer system is to build a computing resource pool. In this way, when a computing node in the computer system is overloaded, the computing node can use the computing power of other computing nodes to process a part of the computing power required by the computing node. Completed tasks, so as to achieve global load balancing within the computer system to speed up the completion of tasks.
  • the sharing of computing resources of the computer system specifically refers to the sharing of acceleration resources of the computer system. Acceleration resources refer to accelerated computing capabilities, which can be provided by acceleration devices.
  • An acceleration device is a type of hardware that can reduce the workload of the CPU in the computing node and improve the efficiency of the computing node's processing tasks.
  • Neural-network processing units NPU
  • DSA data stream accelerators
  • the sharing of acceleration resources of a computer system can be understood as: when the acceleration device on a certain computing node in the computer system is overloaded, some computing tasks can be allocated to acceleration devices on other computing nodes in the system. To execute, thereby reducing the workload of the CPU and acceleration device of the computing node, and improving the completion efficiency of computing tasks.
  • the present application provides a data transmission method, which can be executed by a computer system. When the method is executed in the computer system, data transmission across computing nodes can be realized, thereby realizing the sharing of memory resources in the system.
  • the data transmission method provided by the present application will be described below with reference to the computer system shown in FIG. 1 .
  • FIG. 1 shows a schematic structural diagram of a computer system provided by the present application.
  • the computer system 100 includes a first computing node 110 and a second computing node 120 .
  • the first computing node 110 includes a first processor 111 , a first device 112 and a first memory 113
  • the first processor 111 includes a first resource manager 1111
  • the first device 112 includes a first management unit 1121 .
  • the second computing node 120 includes a second processor 121 , a second device 122 and a second memory 123
  • the second processor 121 includes a second resource manager 1211
  • the second device 122 includes a second management unit 1221 .
  • the first computing node 110 is a first computing node 110:
  • the first processor 111 may include a central processing unit (CPU), an application specific integrated circuit (ASIC), or a programmable logic device (PLD).
  • the PLD may be Complex Programmable Logical Device (CPLD), Field Programmable Gate Array (FPGA), Generic Array Logic (GAL) or any combination thereof.
  • CPLD Complex Programmable Logical Device
  • FPGA Field Programmable Gate Array
  • GAL Generic Array Logic
  • the first device 112 is an external device on the first computing node 110 .
  • the first device 112 may be a GPU, an NPU, a DSA, a tensor processing unit (TPU), an artificial intelligence (artificial intelligence) chip, a network card, a data processing unit (DPU), or one or more integrated circuit.
  • the first processor 111 and the first device 112 may be connected through a peripheral component interconnect express (PCIe) standard, or may be connected through a compute express link (CXL).
  • PCIe peripheral component interconnect express
  • CXL compute express link
  • the first processor 111 and the first device 112 may also be connected through other buses, such as: peripheral component interconnect (PCI), universal serial bus (USB), etc., which are not described here. Specific restrictions.
  • the first memory 113 is the memory in the first computing node 110 and is used for storing data in the CPU of the first computing node 110 and exchanging with external memory (eg, the memory of the first device 112 ) on the first computing node 110 data.
  • the first resource manager 1111 is a component in the first computing node 110 that manages memory resources owned by all computing nodes in the computer system 100 . Specifically, the first resource manager 1111 is used to construct a memory resource pool, and the memory resource pool includes the first memory 113 and the second memory 123 . The first resource manager 1111 is further configured to address the address space of the memory resource pool to obtain the global virtual address of the memory resource pool. The first resource manager 1111 is further configured to construct a first correspondence relationship, and configure the first correspondence relationship to the first management unit 1121 .
  • the first correspondence refers to the correspondence between the above-mentioned global virtual address, the physical address of the storage space of the memory resource pool, and the IDs of each computing node associated with the memory resource pool (that is, the IDs of the computing nodes that provide the above-mentioned storage space). relation.
  • addressing the memory address space of the memory resource pool by the first resource manager 1111 refers to: the discrete memory address space provided by the first memory 113 and the discrete memory address space provided by the second memory 123 Edited into a virtual, linearly contiguous memory address space.
  • the computing nodes in the computer system 100 share the resources in the memory resource pool, and access the storage space of the memory resource pool through the above-mentioned global virtual address.
  • the first resource manager 1111 will be described:
  • the memory resource owned by the first computing node 110 is the memory space provided by the first memory 113 (for example, the first memory space and the third memory space shown in FIG. 1 ), and the memory resource owned by the second computing node 120 is the second memory The memory space provided by 123 (for example, the second memory space and the fourth memory space shown in FIG. 1 ).
  • the first resource management unit 114 is configured to acquire the memory information of the first memory 113 and the second memory 123, wherein the memory information of the first memory 113 includes the physical address of the first memory space and the physical address of the third memory space,
  • the memory information of the second memory 123 includes the physical address of the second memory space and the physical address of the fourth memory space.
  • the memory information of the first memory 113 further includes the size of the memory space provided by the first memory 113 (including the size of the available memory space and the size of the used memory space in the first memory 113 ), and the size of the memory space in the first memory 113 has been The physical address of the used memory space, the physical address of the available memory space, etc.
  • the memory information of the second memory 123 also includes the size of the memory space provided by the second memory 123 (including the size of the available memory space and the size of the used memory space in the second memory 123 ), the used memory space in the second memory 123 physical address, physical address of the available memory space, etc.
  • the first resource manager 1111 is further configured to connect the memory space provided by the first memory 113 and the memory space provided by the second memory 123 into one memory space to obtain a memory resource pool, which includes the first memory space and the second memory. space, a third memory space, and a fourth memory space. Then, address the memory address space of the memory resource pool to obtain a global virtual address, where the global virtual address includes the virtual address of the first memory space, the virtual address of the second memory space, the virtual address of the third memory space and the fourth memory space The virtual address of the space.
  • the first resource manager 1111 is further configured to construct a first correspondence relationship, and configure the first correspondence relationship to the first management unit 1121 .
  • the first correspondence includes the correspondence between the virtual address of the first memory space, the physical address of the first memory space, and the ID of the first computing node 110, the virtual address of the second memory space, the physical address of the second memory space.
  • the correspondence between the address and the ID of the second computing node 120, the correspondence between the virtual address of the third memory space, the physical address of the third memory space and the ID of the first computing node 110, the virtual address of the fourth memory space The correspondence between the address, the physical address of the fourth memory space, and the ID of the second computing node 120 .
  • the first resource manager 1111 can obtain the memory resource pool and the first correspondence as shown in FIG. 2 through the above method.
  • the second processor 121 may include a CPU, and may also include an ASIC or a PLD, and the above-mentioned PLD may be a CPLD, an FPGA, a GAL, or any combination thereof.
  • the second device 122 is an external device on the second computing node 120 .
  • the second device 122 may be a GPU, an NPU, a DSA, a TPU, an artificial intelligence (artificial intelligence) chip, a network card, a DPU, or one or more integrated circuits.
  • the second processor 121 and the second device 122 may be connected through PCIe, may also be connected through CXL, or may be connected through PCI, USB, etc., which are not specifically limited here.
  • the second memory 123 is the memory in the second computing node 120 for storing data in the CPU of the second computing node 120 and exchanging with external memory (eg, the memory of the second device 122 ) on the second computing node 120 data.
  • the second resource manager 1211 is a component in the second computing node 120 for managing memory resources owned by all computing nodes in the computer system 100 .
  • the second resource manager 1211 can manage the memory resources owned by all computing nodes in the computer system 100 in a manner similar to that of the first resource manager 1111, which will not be described here.
  • the second resource manager 1211 can also manage the memory resources owned by all the computing nodes in the computer system 100 in the following manner: After the first resource manager 1111 obtains the global virtual address and the first correspondence, it stores the global virtual address and the first correspondence It is sent to the second resource manager 1211 , and then the second resource manager 1211 sends the first correspondence to the second management unit 1221 .
  • the first computing node 110 and the second computing node 120 can communicate through the first device 112 and the second device 122 .
  • the first device 112 and the second device 122 may be connected through a wired interface or a wireless interface.
  • the wired interface can be an Ethernet interface, a controller area network interface, a local interconnect network (LIN) interface, etc.
  • the wireless interface can be a cellular network interface, a wireless local area network interface, etc., which are not specifically limited here.
  • the following takes the first computing node 110 reading data from the memory of the second computing node 120 and the first computing node 110 writing data to the memory of the second computing node 120 as examples to describe how the above computer system 100 implements cross-computing Sharing of memory resources of nodes.
  • the first computing node 110 reads data from the memory of the second computing node 120
  • FIG. 3 shows a schematic flowchart of a data transmission method provided by the present application. The method includes but is not limited to the following steps:
  • the first device 112 obtains a cross-node read instruction.
  • the cross-node read instruction is used to instruct the first device 112 to read the first data from the second memory space.
  • the cross-node read instruction may be an atomic instruction (atomic instruction), for example, ST64BV instruction, ST64BV0 instruction of ARM, ENQCMD instruction, ENQCMDS instruction of x86, etc.
  • atomic instruction is a command used to instruct a device to perform an atomic operation, an operation that is not interrupted by the thread scheduling mechanism. Therefore, an atomic instruction can be understood as an instruction that, once executed, will not be interrupted until the execution is complete.
  • the cross-node read instruction includes a first source address, a first destination address, and a size of the first data.
  • the first source address is the virtual address of the memory space where the first data is stored, and here is the virtual address of the second memory space; the first destination address is the address where the first data is written after the first data is read.
  • the virtual address of the memory space here is the virtual address of the first memory space; the size of the first data may be the number of bytes of the first data.
  • the positions of the above-mentioned first source address, first destination address and the size of the first data in the cross-node read instruction may be allocated according to actual conditions.
  • the above-mentioned cross-node read instruction may also include other information such as first operation description information, and the first operation description information is used to describe the cross-node read instruction, thereby instructing the first device 112 that received the instruction from the first source address.
  • the first data is read in the corresponding memory space, and the read first data is written into the first destination address.
  • the cross-node read command is a 64-bit command
  • the 0-7 bytes in the cross-node read command are used to fill in the first source address
  • the 8-15 bytes are used to fill in the first destination address
  • the 16-21 bytes are used to fill in the size of the first data
  • the 22-64 bytes are used to fill in other information contained in the cross-node read instruction, for example, the above-mentioned first operation description information.
  • FIG. 4 shows an exemplary format of a cross-node read instruction, and the format of the cross-node read instruction may also be other formats, which are not specifically limited in this application.
  • obtaining the cross-node read instruction by the first device 112 includes: the first processor 111 obtains, from the memory resource pool, the virtual address of the first memory space and the address of the second memory space corresponding to the first data from the memory resource pool. A virtual address, a cross-node read command is generated, and sent to the first device 112 .
  • obtaining the cross-node read instruction by the first device 112 includes: the first device 112 obtains, from the memory resource pool, the virtual address of the first memory space and the address of the second memory space corresponding to the first data from the memory resource pool. Virtual address to generate cross-node read instructions.
  • the first device 112 obtains the first network transmission message according to the cross-node read instruction.
  • the first device 112 parses the cross-node read instruction to obtain the first source address and the size of the first data. Then, the first device 111 determines the ID of the second computing node 120 according to the first source address and the first correspondence stored in the first management unit 1121 . Then, the first device 111 obtains a first network transmission packet according to the ID of the second computing node 120 and the cross-node read instruction, wherein the first network transmission packet includes the first source address, the size of the first data, and the first network transmission packet.
  • a source IP address and a first destination IP address the first source IP address is the IP address of the first computing node 110 , and the first destination IP address is the IP address of the second computing node 120 .
  • the ID of the second computing node 120 may be an IP address of the second computing node 120 , or may be a serial number used to indicate the second computing node 120 .
  • the first device 112 obtains the first network transmission message according to the ID of the second computing node 120 and the cross-node read instruction, including: the first device 112 Encapsulates the first source address and the size of the first data according to the IP address of the first computing node 110 and the ID of the second computing node 120 to obtain a first network transmission message.
  • the first device 112 obtains the first network transmission message according to the ID of the second computing node 120 and the cross-node read instruction, including: A device 112 determines the IP address of the second computing node 120 according to the ID of the second computing node 120, and then, according to the IP address of the first computing node 110 and the IP address of the second computing node 120, compares the first source address and the first The size of the data is encapsulated to obtain a first network transmission message.
  • the first device 112 can also obtain the first network transmission packet in any one of the following ways: Mode 1, the first device 111 obtains the first network transmission packet according to the IP address of the first computing node 110 and the IP address of the second computing node 120 address, the first source address, the first destination address and the size of the first data are encapsulated to obtain the first network transmission message. Mode 2: The first device 111 encapsulates the cross-node read instruction according to the IP address of the first computing node 110 and the IP address of the second computing node 120 to obtain the first network transmission message.
  • the first device 112 sends the first network transmission message to the second device 122.
  • the second device 122 receives the first network transmission message, and reads the first data from the second memory space.
  • the second device 122 receives the first network transmission packet, and then parses the first network transmission packet to obtain the first source address and the size of the first data. Then, the second device 122 determines the physical address of the second memory space according to the first source address and the first correspondence stored in the second management unit 1221 . Then, the second device 122 reads the first data from the second memory space according to the physical address of the second memory.
  • the second device 122 may read the first data from the second memory space in a DMA manner.
  • DMA is a high-speed data transmission method.
  • the second device 122 reads data from the second memory space through the DMA method, it does not need to rely on the CPU in the second computing node 120, so the CPU can be reduced by this method.
  • the overhead of copying data thereby improving the efficiency of the second device 122 reading data from the second memory space.
  • the second device 122 sends the first data to the first device 112.
  • the second device 122 encapsulates the first data to obtain the second network transmission message.
  • the second network transmission packet includes first data, a second source IP address, and a second destination IP address, where the second source IP address is the IP address of the second computing node 120, and the second destination IP address is the first computing node 110 IP address. Then, the second device 122 sends the second network transmission message to the first device 112 .
  • the first device 112 receives the first data, and writes the first data into the first memory space.
  • the first device 112 receives the second network transmission packet, and parses the second network transmission packet to obtain the first data.
  • the first device 112 also obtains the first destination address (that is, the virtual address of the first memory space) according to the above-mentioned cross-node read instruction, and according to the virtual address of the first memory space and the first relationship stored in the first management unit 1121, Determine the physical address of the first memory space. Then, the first device 112 writes the above-mentioned first data into the first memory space.
  • the above-mentioned second network transmission message further includes the virtual address of the first memory space
  • the first device 112 may write the first data into the first memory space in the following manner: first After receiving the second network transmission message, the device 112 parses the second network transmission message to obtain the first data and the virtual address of the first memory space, and then according to the virtual address of the first memory space and the first management unit 1121 The first relationship stored in the first memory space is determined, and the physical address of the first memory space is determined, and then the first data is written into the first memory space.
  • the above-mentioned first network transmission message may include the virtual address (ie, the first destination address) of the first memory space.
  • the second device 122 may encapsulate the first data and the virtual address of the first memory space together, so as to obtain a second network transmission packet including the virtual address of the first memory space.
  • the first device 112 may write the first data into the first memory space in a DMA manner. In this way, the speed at which the first device 112 writes the first data to the first memory space can be improved.
  • the above S101-S106 describe the process of the first computing node 110 reading data from the memory of the second computing node 120. It should be understood that the process of the second computing node 120 reading data from the memory of the first computing node 110 is the same as the above-mentioned process. The processes of S101-S106 are similar, and are not described here for simplicity.
  • the first device 112 can read the first data from the second memory space, thereby realizing memory resource sharing across computing nodes in the computer system 100 .
  • the first processor 111 and the first processor 111 and the The second processor 121 that is to say, the CPU and the operating system in the first computing node 110 and the second computing node 120 can be bypassed by using the above data transmission method, so that the data transmission efficiency across the computing nodes can be improved, thereby improving the Sharing efficiency of memory resources of computer system 100 .
  • the CPU of the second computing node 120 can perform other tasks, and if the CPU of the first computing node 110 needs to generate a cross-node read instruction, and sent to the first device 112, then, after the CPU of the first computing node 110 sends the cross-node read instruction to the first device 112, it can also be released to perform other tasks, thereby reducing resource waste and improving resource utilization.
  • the first computing node 110 writes data to the memory of the second computing node 120
  • FIG. 5 shows a schematic flowchart of another data transmission method provided by the present application.
  • the first device 112 obtains a cross-node write instruction.
  • the cross-node write instruction is used to instruct the first device 112 to write the second data into the fourth memory space. Similar to the above-mentioned cross-node read instruction, the cross-node write instruction may also be an atomic instruction.
  • the cross-node write instruction includes the second source address, the second destination address, and the size of the second data.
  • the second source address is the virtual address of the memory space where the second data is stored, here is the virtual address of the third memory space
  • the second destination address is the virtual address of the memory space where the second data is written, here is the virtual address of the fourth memory space
  • the size of the second data may be the number of bytes of the second data.
  • the positions of the second source address, the second destination address, and the size of the second data in the cross-node write instruction may be allocated according to actual conditions.
  • the above-mentioned cross-node write instruction may also include second operation description information, etc., and the second operation description information is used to describe the cross-node write instruction, thereby instructing the first device 112 receiving the instruction from the corresponding second source address.
  • the second data is read in the memory space, and the read second data is written into the memory space corresponding to the second destination address.
  • the specific format of the cross-node write instruction may also adopt the format of the cross-node read instruction as shown in FIG. 4 , which is not specifically limited in this application.
  • obtaining the cross-node write instruction by the first device 112 includes: the first processor 111 obtains a second source address and a second destination address corresponding to the second data from the memory resource pool, and generates a cross-node write instruction. A write command is sent to the first device 112 .
  • acquiring the cross-node read instruction by the first device 112 includes: the first device 112 acquires a second source address and a second destination address corresponding to the second data from the memory resource pool, and generates a cross-node read instruction. write command.
  • S202 The first device 112 obtains the second data according to the cross-node write instruction.
  • the first device 112 parses the cross-node write instruction to obtain the second source address and the size of the second data, and then according to the second source address and the data stored in the first management unit 1121 The first correspondence is to determine the physical address of the third memory space. Then, the first device 112 reads the second data from the third memory space according to the size of the second data.
  • the first device 112 can read the second data from the third memory space by means of DMA. In this way, the speed at which the first device obtains the second data can be improved.
  • S203 The first device 112 obtains the third network transmission message according to the cross-node write instruction.
  • the first device 112 parses the cross-node write instruction to obtain the second destination address, and then determines according to the second destination address and the first correspondence stored in the first management unit 1121 ID of the second compute node 120 . Then, the first device 112 encapsulates the second data and the second destination address according to the ID of the second computing node 120 to obtain a third network transmission message.
  • the third network transmission packet includes second data, a second destination address, and a third source IP address and a third destination IP address, where the third source IP address is the IP address of the first computing node 110, and the third destination IP address The address is the IP address of the second computing node 120 .
  • the ID of the second computing node 120 may be an IP address of the second computing node 120 , or may be a serial number used to indicate the second computing node 120 .
  • the first device 112 encapsulates the second data and the second destination address according to the ID of the second computing node 120 to obtain a third network transmission packet
  • the message includes: the first device 112 encapsulates the second source address and the second destination address according to the IP address of the first computing node 110 and the ID of the second computing node 120 to obtain a third network transmission message.
  • the first device 112 When the ID of the second computing node 120 is a number used to indicate the second computing node 120, the first device 112 encapsulates the second data and the second destination address according to the ID of the second computing node 120 to obtain the third network
  • the packet transmission includes: the first device 112 determines the IP address of the second computing node 120 according to the ID of the second computing node 120, and then, according to the IP address of the first computing node 110 and the IP address of the second computing node 120, corrects the The second data and the second destination address are encapsulated to obtain a third network transmission message.
  • the first device 112 sends the third network transmission message to the second device 122.
  • the second device 122 receives the third network transmission message, and writes the second data into the fourth memory space.
  • the second device 122 parses the third network transmission packet to obtain the second data and the second destination address, and then stores the second destination address and the second management unit 1221 according to the second destination address and the second management unit 1221. The first corresponding relationship of , determines the physical address of the fourth memory space. Then, the second device 122 writes the second data into the quad memory space by means of DMA.
  • the above S201-S205 describe the process of the first computing node 110 writing data into the memory of the second computing node. It should be understood that the process of writing data into the memory of the first computing node 110 by the second computing node 120 is the same as the above S201. -S205 is similar, and is not described here for brevity.
  • the first device 112 can write the second data stored in the third memory space into the fourth memory space, thereby realizing memory resource sharing across computing nodes in the computer system 100 .
  • the first processor 111 and the The second processor 121 that is to say, the CPU and the operating system in the first computing node 110 and the second computing node 120 can be bypassed by using the above data transmission method, so that the data transmission efficiency across the computing nodes can be improved, thereby improving the Sharing efficiency of memory resources of computer system 100 .
  • the CPU of the second computing node 120 may perform other tasks, and if the CPU of the first computing node 110 needs to generate a cross-node write instruction, and After sending to the first device 112, after the CPU of the first computing node 110 sends the cross-node write instruction to the first device 112, it can also be released to perform other tasks, thereby reducing resource waste and improving resource utilization.
  • the present application also provides a data processing method.
  • the method When the method is executed in a computer system, data processing across computing nodes can be realized, thereby realizing the sharing of computing resources (acceleration resources) in the system.
  • the data transmission method provided by the present application will be described below with reference to the computer system shown in FIG. 6 .
  • FIG. 6 shows a schematic structural diagram of another computer system provided by the present application.
  • the computer system 200 includes a first computing node 210 and a second computing node 220 .
  • the first computing node 210 includes a first processor 211 , a first device 212 and a first memory 213 , the first processor 211 includes a first resource manager 2111 , and the first device 212 includes a first management unit 2121 .
  • the second computing node 220 includes a second processor 221 , a second device 222 and a second memory 223 , the second processor 221 includes a second resource manager 2211 , and the second device 222 includes a second management unit 2221 .
  • the computer system 200 shown in FIG. 6 :
  • the first device 212 and the second device 222 are external devices on the first computing node 210 and the second computing node 220, respectively. Both the first device 212 and the second device 222 have computing capabilities. In this embodiment of the present application, the computing capabilities of the first device 212 and the second device 222 may both be accelerated computing capabilities. Then, the first device 212 and the second device 222 are the first computing node 210 and the second computing node 220 respectively. acceleration device in .
  • the first device 212 or the second device 222 may be a GPU, an NPU, a DSA, a TPU, an artificial intelligence (artificial intelligence) chip, a network card, a DPU, or one or more integrated circuits.
  • both the first device 212 and the second device 222 may have one or more acceleration functions, for example, a function for accelerating the data integrity verification process, a function for accelerating the data encryption and decryption process, using The functions used to accelerate the data compression and decompression process, the functions used to accelerate the machine learning process, the functions used to accelerate the data classification process, the functions used to accelerate the deep learning process, the functions used to accelerate floating-point calculations, etc.
  • acceleration functions for example, a function for accelerating the data integrity verification process, a function for accelerating the data encryption and decryption process, using The functions used to accelerate the data compression and decompression process, the functions used to accelerate the machine learning process, the functions used to accelerate the data classification process, the functions used to accelerate the deep learning process, the functions used to accelerate floating-point calculations, etc.
  • the first device 212 may not have computing capability.
  • the first resource manager 2111 is a component in the first computing node 210 for managing computing resources owned by all computing nodes in the computer system 200 .
  • the first resource manager 2111 is used to construct a computing resource pool, and the computing resource pool includes the second device 222 .
  • the computing resource pool further includes the first device 212 .
  • the first resource manager 2111 is further configured to number the acceleration devices included in all computing nodes in the computer system 200 and the acceleration functions possessed by each acceleration device to obtain a plurality of acceleration device IDs and an acceleration function ID corresponding to each acceleration device ID.
  • the first resource manager 2111 is further configured to construct a second correspondence relationship, and configure the second correspondence relationship to the first management unit 2121 .
  • the second correspondence refers to the relationship between the ID of the acceleration device in the computing resource pool, the ID of the acceleration function possessed by each acceleration device, and the ID of each computing node associated with the computing resource pool (that is, the ID of the computing node where the acceleration device is located) corresponding relationship.
  • the computing nodes in the computer system 200 share the resources in the computing resource pool, and can use the acceleration device in the computing resource pool through the above-mentioned multiple acceleration device IDs and the acceleration function ID corresponding to each acceleration device ID. to process the data.
  • the first resource manager 2111 will be described:
  • the computing resources possessed by the first computing node 210 include the accelerated computing capability provided by the first device 212
  • the computing resources possessed by the second computing node 220 include the accelerated computing capability provided by the second device 222 .
  • the first resource manager 2111 is used to obtain the accelerated computing capability information of the first device 212 and the second device 222, wherein the accelerated computing capability information of the first device 212 includes the acceleration function possessed by the first device 212, and the second The accelerated computing capability information of the device 222 includes the acceleration function possessed by the second device 222 .
  • the accelerated computing capability information of the first device 212 also includes the computing power used and available computing power of the first device 212
  • the accelerated computing capability information of the second device 222 also includes the computing power used by the second device 222. and available computing power.
  • the first resource manager 2111 is also used to number the first device 212 and the second device 222 to obtain the ID of the first device 212 and the ID of the second device 222, as well as the acceleration function and the second device 212 of the first device 212.
  • the acceleration functions possessed by the device 222 are numbered to obtain the ID of each acceleration function.
  • the first resource manager 2111 is further configured to construct a second correspondence relationship, and configure the second correspondence relationship to the first management unit 2121 .
  • the second correspondence includes the ID of the first device 212 , the ID of the acceleration function possessed by the first device 212 and the ID of the first computing node 210 , and the ID of the second device 222 , the ID of the second device 222 The corresponding relationship between the ID of the acceleration function and the ID of the second computing node 220 .
  • the acceleration function provided by the first device 212 includes the function for accelerating the machine learning process and the function for accelerating the data encryption and decryption process
  • the acceleration function provided by the second device 222 includes the function for accelerating the machine learning process and Functions used to speed up the data compression and decompression process.
  • the first resource manager 2111 can number the first device 212 as 1, the second device 222 as 2, the function for accelerating the machine learning process as 1, and the function for accelerating the data encryption and decryption process as 2
  • the function of accelerating the data compression and decompression process is numbered 3
  • the first computing node is numbered 1
  • the second computing node is numbered 2, so as to obtain the second correspondence shown in FIG. 7 .
  • the second resource manager 2211 is used in the second computing node 220 to manage computing resources owned by all computing nodes in the computer system 200 .
  • the second resource manager 2211 can manage the memory resources owned by all computing nodes in the computer system 200 in a manner similar to that of the first resource manager 2111, which will not be described here.
  • the second resource manager 2211 can also manage the computing resources owned by all computing nodes in the computer system 200 in the following manner: the first resource manager 2111 stores the relevant information of the computing resource pool (for example, the information of the accelerated computing capability) and the second correspondence are sent to the second resource manager 2211, and the second resource manager 2211 sends the second correspondence to the second management unit 2221.
  • the functions of the first processor 211 , the second processor 221 , the first memory 213 and the second memory 223 in the computer system 200 are different from the functions of the first processor 111 and the second processor 121 in the computer system 100 .
  • the functions of the first memory 113 and the second memory 123 are similar.
  • the connection relationship between the first processor 211 and the first device 212 in the computer system 200 and the connection relationship between the second processor 221 and the second device 222 are respectively connected with the computer
  • the connection relationship between the first processor 111 and the first device 112 and the connection relationship between the second processor 121 and the second device 122 in the system 100 are similar, and are not described in this embodiment of the present application for simplicity.
  • the following describes how the computer system 200 implements the sharing of computing resources across computing nodes by taking the first computing node 210 using the computing resources of the second computing node 120 as an example.
  • FIG. 8 shows a schematic flowchart of a data processing method provided by the present application. The method includes but is not limited to the following steps:
  • the first device 212 obtains a cross-node acceleration instruction.
  • the cross-node acceleration instruction is used to instruct the first device 212 to use the second device 222 to process the third data. Similar to the above-mentioned cross-node read instruction and cross-node write instruction, the cross-node acceleration instruction may also be an atomic instruction.
  • the cross-node acceleration instruction includes a third source address, a third destination address, a size of the third data, a target acceleration device ID, and a target acceleration function ID.
  • the third source address is the address of the storage space of the device storing the third data, here is the address of the first storage space, the device storage space of the first device 212 includes the first storage space; the third destination address is the address of the first storage space.
  • the address of the device storage space to which the processing result of the third data is written here is the address of the second storage space
  • the device storage space of the first device 212 also includes the second storage space; the size of the third data may be the third data
  • the target acceleration device ID is the ID of the acceleration device used by the first device 212 for processing the third data, here is the ID of the second device; the target acceleration function ID is the acceleration device that the second device 222 has.
  • the ID of the function is used to instruct the second device 222 to process the third data. For example, when the target acceleration function ID is the ID corresponding to the data integrity check function, the second device 222 performs the data integrity check on the third data. Operation, for another example, when the target acceleration function ID is the ID corresponding to the data encryption function, the second device 222 performs a data encryption operation on the third data.
  • the cross-node acceleration instruction further includes the address of the third storage space
  • the device storage space of the second device 222 includes the third storage space
  • the address of the third storage space is that after the second device 222 receives the third data, the storage space is stored.
  • the address of the third data It should be understood that the location of the third source address, the third destination address, the size of the third data, the target acceleration device ID, the target acceleration function ID, and the address of the third storage space in the above-mentioned cross-node acceleration instruction can be determined according to the actual situation. distribute.
  • the cross-node acceleration instruction is a 64-bit instruction
  • the 0-7 bytes in the cross-node acceleration instruction are used to fill in the third source address
  • the 8-15 bytes are used to fill in the third destination address.
  • 16-23 bytes are used to fill in the address of the third storage space
  • 24-27 bytes are used to fill in the target acceleration device ID
  • 28-31 bytes are used to fill in the target acceleration function ID
  • Section is used to fill in the size of the third data
  • bytes 38-64 are used to fill in other information contained in the cross-node acceleration instruction, for example, the third operation description information
  • the third operation description information is used to describe the cross-node acceleration instruction
  • the first device 212 receiving the instruction is instructed to use the second device 222 to process the third data.
  • FIG. 9 shows an exemplary format of a cross-node acceleration instruction, and the format of the cross-node acceleration instruction may also be other formats, which are not specifically limited in this application.
  • obtaining the cross-node acceleration instruction by the first device 212 includes: the first processor 211 obtains the target acceleration device ID and the target acceleration function ID corresponding to the third data from the computing resource pool, and generates a cross-node acceleration instruction.
  • the acceleration instruction is sent to the first device 112 .
  • obtaining the cross-node acceleration instruction by the first device 212 includes: the first device 212 obtains the target acceleration device ID and the target acceleration function ID corresponding to the third data from the computing resource pool, and generates a cross-node acceleration instruction. acceleration command.
  • the first device 212 obtains third data and a target acceleration function ID according to the cross-node acceleration instruction.
  • the first device 212 parses the cross-node acceleration instruction, obtains the third source address, the target acceleration device ID and the target acceleration function ID, and then reads the third source address from the first storage space. data.
  • the first device 212 encapsulates the third data and the target acceleration function ID to obtain a fourth network transmission message.
  • the first device 212 determines the ID of the second computing node 220 according to the target acceleration device ID and the second correspondence stored in the first management unit 2121 . Then, the first device 212 encapsulates the third data and the target acceleration function ID according to the ID of the second computing node 220 to obtain a fourth network transmission packet.
  • the fourth network transmission message includes third data, the target acceleration function ID, and a fourth source IP address and a fourth destination IP address, where the fourth source IP address is the IP address of the first computing node 210, and the fourth destination IP address The address is the IP address of the second computing node 220 .
  • the ID of the second computing node 220 may be an IP address of the second computing node 220 , or may be a serial number used to indicate the second computing node 220 .
  • the first device 212 encapsulates the third data and the target acceleration function ID according to the ID of the second computing node 220 to obtain a fourth network transmission packet , including: the first device 212 encapsulates the third data and the target acceleration function ID according to the IP address of the first computing node 210 and the ID of the second computing node 220 to obtain a fourth network transmission message.
  • the first device 212 When the ID of the second computing node 220 is a number used to indicate the second computing node 220, the first device 212 encapsulates the third data and the target acceleration function ID according to the ID of the second computing node 220 to obtain a fourth network transmission
  • the message includes: the first device 212 determines the IP address of the second computing node 220 according to the ID of the second computing node 120, and then according to the IP address of the first computing node 210 and the IP address of the second computing node 220,
  • the third data is encapsulated with the target acceleration function ID to obtain the fourth network transmission message.
  • the first device 212 sends the fourth network transmission packet to the second device 222.
  • the second device 222 receives the fourth network transmission message, and processes the third data according to the fourth network transmission message.
  • the second device 222 parses the fourth network transmission packet to obtain third data and the target acceleration function ID, and then processes the third data according to the target acceleration function ID.
  • the second device 222 sends the processing result of the third data to the first device 212.
  • the second device 222 encapsulates the processing result of the third data to obtain the fifth network transmission message.
  • the fifth network transmission packet includes third data, a fifth source IP address, and a fifth destination IP address, where the fifth source IP address is the IP address of the second computing node 220, and the fifth destination IP address is the first computing node 210's IP address.
  • the first device 212 receives the processing result of the third data, and writes the result into the third storage space.
  • the first device 212 parses the fifth network transmission packet to obtain a processing result of the third data.
  • the first device 212 also obtains the third destination address according to the above-mentioned cross-node acceleration instruction. Then, the first device 212 writes the processing result of the third data into the storage space (ie, the second storage space) corresponding to the third destination address.
  • the fifth network transmission message further includes a third destination address
  • the first device 212 writes the processing result of the third data into the second storage space, including: the first device 212
  • analyze the fifth network transmission message to obtain third data and a third destination address, and then write the third data into the second storage space.
  • the above-mentioned fourth network transmission packet also includes the third destination address.
  • the second device 222 can process the result of the third data It is encapsulated together with the third destination address, so as to obtain the fifth network transmission message.
  • the third data is the data stored in the first device 212, and the processing result of the third data is written into the device storage space of the first device 212, that is, the third source address in the cross-node acceleration instruction is the third source address.
  • the address of a storage space, and the third destination address in the cross-node acceleration instruction is the address of the second storage space.
  • the third data may also be data stored in the second device 222 , or data stored in the memory of the first computing node 210 , or data stored in the memory of the second computing node 220 .
  • the processing result of the third data may also be written into the memory of the first computing node 210 . The following will briefly describe the above situations.
  • the first device 212 can use the second device 222 to process the third data through the following steps: the first device 212 parses the cross-node acceleration instruction, obtains the ID of the second device, and then The ID of the second computing node 220 is determined according to the ID of the second device and the second correspondence stored in the first management unit 2121, and then the third source address and the target acceleration function ID are processed according to the ID of the second computing node 220. encapsulate to obtain the corresponding network transmission message, and send the network transmission message to the second device 222. After receiving the network transmission message, the second device 222 obtains the third source address and the target acceleration function ID.
  • the second device 222 reads the third data from the fourth storage space, and performs corresponding processing on the third data according to the target acceleration function ID. Afterwards, the second device 222 sends the processing result of the third data to the first device 212, and after receiving the processing result of the third data, the first device 212 writes the processing result of the third data into the second storage space.
  • the computer system 200 can combine the above data transmission method to To realize the above data processing across computing nodes, for the sake of simplicity, only the steps different from the above S301-S307 are described here:
  • the third source address included in the cross-node acceleration instruction is the virtual address of the fifth memory space, and the third destination address is the virtual address of the sixth memory space, wherein the first memory 213 includes the fifth memory space and the sixth memory space space. That is to say, the cross-node acceleration instruction is obtained by the first processor 212 or the first device 222 through the following steps: obtaining the virtual address of the fifth memory space and the virtual address of the sixth memory space from the memory resource pool; The pool obtains the ID of the second device and the target acceleration function ID, thereby generating a cross-node acceleration instruction.
  • the first resource manager 2111 is not only used to manage the computing resources owned by all the computing nodes in the computer system 200, but also used to manage the memory resources owned by all the computing nodes in the computer system 200. See the management mode of the first resource manager 1111. Therefore, in addition to the second correspondence, the first management unit 2121 also includes the correspondence between the virtual address of the fifth memory space, the physical address of the fifth memory space, and the ID of the first computing node 210, and the sixth memory The corresponding relationship between the virtual address of the space, the physical address of the sixth memory space, and the ID of the first computing node 210 .
  • the first device 212 obtains the third data through the following steps: after the first device 212 obtains the cross-node acceleration instruction, it parses the cross-node acceleration instruction to obtain a third source address, and then manages the third data according to the third source address and the first management
  • the corresponding relationship between the virtual address of the fifth memory space and the physical address of the fifth memory space stored in the unit 2121 is to read the third data from the fifth memory space.
  • the third data is written into the third destination address through the following steps: after the first device 212 obtains the processing result of the third data, according to the third destination address and the third destination address A corresponding relationship between the virtual address of the sixth memory space and the physical address of the sixth memory space is stored in the management unit 2121, and the processing result of the third data is written into the sixth memory space.
  • the third data is data stored in the second device 222 and the processing result of the third data is written into the memory of the first computing node 210, or the third data is stored in the memory of the second computing node 220
  • the processing result of the third data is written in the device storage space of the first device 212, or the third data is the data stored in the memory of the second computing node 220, and the processing result of the third data is written in
  • the method for the first device 212 to use the second device 222 to process the third data can also be combined with the data transmission method provided in this application. Adaptive modifications are made, which are not described here for brevity.
  • the above embodiment describes the process of processing data by the first computing node 210 using the acceleration device of the second computing node 120. It should be understood that the process of processing data by the second computing node 220 using the acceleration device of the first computing node 220 is the same as that described in the above embodiments. The process is similar, and is not described here for brevity.
  • the first device 212 can use the second device 222 to process the third data, so as to realize the sharing of computing resources across computing nodes in the computer system 200 .
  • the first processor 211 may be required to generate a cross-node write instruction and send it to the first device 212 in the above S301, other steps do not require the first processor 211 and the first processor 211.
  • the CPU of the second computing node 220 can perform other tasks, and if the CPU of the first computing node 210 needs to generate a cross-node acceleration instruction and send it To the first device 212, after the CPU of the first computing node 210 sends the cross-node acceleration instruction to the first device 212, it can also be released to perform other tasks, thereby reducing resource waste and improving resource utilization.
  • the present application also provides a computer system.
  • the computer system 100 may include a first computing node 110 and a second computing node 120 , and the first computing node 110 may include a first device 112 and a first memory 113 , the second computing node 120 may include a second device 122 .
  • the first computing node 110 may further include a first processor 111 .
  • the first device 112 is configured to execute the aforementioned S101-S103, S106, S201-S204, and the second device 122 is configured to execute the aforementioned S104-S105 and S205.
  • the first processor 111 when the first computing node 110 includes the first processor 111, the first processor 111 is configured to address the address space of the memory resource pool of the computer system 110 to obtain the global virtual address of the memory resource pool, and A first correspondence is constructed.
  • the first processor 111 is further configured to perform the steps of generating a cross-node read instruction in the foregoing S101, and sending the cross-node read instruction to the first device 112.
  • the first processor 111 may also be configured to perform the steps of generating a cross-node write instruction in the foregoing S201, and sending the cross-node write instruction to the first device 112.
  • the present application also provides a computer system.
  • the computer system 200 may include a first computing node 210 and a second computing node 220, the first computing node 210 may include a first device 212, and the second computing node 220 may include a second device 222 .
  • the first computing node 110 may further include a first processor 211 .
  • the first device 212 is configured to perform the aforementioned steps S301-S304, S307, and the steps performed by the first device 212 in cases 1 and 2
  • the second device 222 is configured to perform the aforementioned steps S305-S306, and in cases 1 and 2 Steps performed by the second device 222 .
  • the first processor 211 is configured to number the acceleration devices in the computing resource pool of the computer system 200 and the acceleration function of each acceleration device, and obtain multiple numbers of acceleration devices. Each acceleration device ID and the acceleration function ID corresponding to each acceleration device ID, and a second correspondence relationship is constructed.
  • the first processor 211 is further configured to perform the steps of generating the cross-node acceleration instruction in the foregoing S301, and sending the cross-node acceleration instruction to the first device 212.
  • the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores first computer instructions and second computer instructions, and the first computer instructions and the second computer instructions respectively run on the first computing node (for example, The first computing node 110 shown in FIG. 1 , the first computing node 210 shown in FIG. 6 ) and the second computing node (eg, the second computing node 120 shown in FIG. 1 , the second computing node shown in FIG.
  • the aforementioned computing nodes may be general purpose computers, special purpose computers, computer networks, or other programmable devices.
  • the above-mentioned computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the above-mentioned computer instructions may be transmitted from a website site, computer, server, or data center via wired communication. (eg, coaxial cable, fiber optic, twisted pair, or wireless (eg, infrared, wireless, microwave), etc.) to another website site, computer, server, or data center.
  • wired communication eg, coaxial cable, fiber optic, twisted pair, or wireless (eg, infrared, wireless, microwave), etc.
  • the above-mentioned computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, or the like including one or more mediums integrated.
  • the above-mentioned usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, optical disks), or semiconductor media (eg, solid state disks (SSDs)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Multi Processors (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供了一种数据传输方法、数据处理方法及相关产品。其中,数据传输方法应用于计算机系统,该系统包括第一计算节点和第二计算节点,第一计算节点包括第一设备和第一内存,第二计算节点包括第二设备和第二内存,第一内存包括第一内存空间,第二内存包括第二内存空间,上述方法包括:第一设备获取跨节点读指令,跨节点读指令包括第二内存空间的虚拟地址和第一数据的大小;第一设备根据第二内存空间的虚拟地址和第一对应关系,确定第二计算节点的ID,从而得到第一网络传输报文,并将该报文发送至第二设备;第二设备接收该报文,从第二内存空间中读取第一数据,并将第一数据发送至第一设备。利用该方法能够提高跨计算节点的数据传输效率。

Description

一种数据传输方法、数据处理方法及相关产品
本申请要求于2021年4月30日提交中国专利局、申请号为202110486548.1、发明名称为“数据传输的方法、装置和系统”的中国专利申请的优先权,以及于2021年6月28日提交的申请号为202110720639.7、发明名称为“一种数据传输方法、数据处理方法及相关产品”的中国专利申请的优先权,前述两件专利申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种数据传输方法、数据处理方法及相关产品。
背景技术
随着人工智能、云计算等技术的快速发展,计算机集群的应用也越来越广泛。计算机集群是由一组相互独立的计算机利用高速通信网络组成的一个计算机系统。计算机集群在处理突发业务时,可能出现以下情况:该集群中的某些计算节点不堪重负,而某些计算节点却资源过剩,这将影响业务的处理进度。因此,如何在计算机集群中实现跨计算节点的资源共享是一个亟需解决的问题。
发明内容
本申请提供了一种数据传输方法、数据处理方法及相关产品,能够在计算机集群中实现跨计算节点的资源共享。
第一方面,本申请提供了一种数据传输方法,应用于计算机系统,该计算机系统包括第一计算节点和第二计算节点,第一计算节点包括第一设备和第一内存,第二计算节点包括第二设备和第二内存,第一内存包括第一内存空间,第二内存包括第二内存空间,上述方法包括以下步骤:
第一设备获取跨节点读指令,跨节点读指令用于指示第一设备从第二内存空间中读取第一数据,跨节点读指令包括第一源地址和第一数据的大小,第一源地址为第二内存空间的虚拟地址,第一设备存储有第一对应关系,第一对应关系包括第二内存空间的虚拟地址与第二计算节点的ID之间的对应关系;
第一设备根据第二内存空间的虚拟地址和第一对应关系,确定第二计算节点的ID;
第一设备根据第二计算节点的ID和跨节点读指令,得到第一网络传输报文,并将第一网络传输报文发送至第二设备,第一网络传输报文包括第二内存空间的虚拟地址和第一数据的大小;
第二设备接收第一网络传输报文,从第二内存空间中读取第一数据,并将第一数据发送至第一设备。
实施第一方面所描述的方法,第一设备可以从第二计算节点的内存(即第二内存空间)中读取第一数据,实现了跨计算节点的数据传输,从而实现跨计算节点的内存资源共享。而且,第一设备中存储有第一对应关系,第一设备根据第一对应关系可以获得第一网络传输报文,并将第一网络传输报文发送至第二设备,这一过程可以绕过第一计算节点的CPU和操作 系统,因此,利用上述方法还能够提高跨计算节点的数据传输效率。
在一种可能的实现方式中,上述计算机系统中的计算节点共享内存资源池中的资源,内存资源池包括上述第一内存和第二内存。通过建立内存资源池,计算机系统中的任意一个计算节点可以使用其他计算节点的内存资源,从而解决因单个计算节点的内存配置不满足实际需求而影响任务执行进度的问题。
在一种可能的实现方式中,上述第一计算节点还包括第一处理器,上述方法还包括:第一处理器对内存资源池的地址空间进行编址,得到内存资源池的全局虚拟地址;第一计算节点通过全局虚拟地址访问内存资源池的存储空间。如此,计算机系统中的任意一个计算节点可以获得其他计算节点的内存空间的地址,从而可以使用其他计算节点的内存资源。
在一种可能的实现方式中,上述第一设备获取跨节点读指令,包括:第一处理器从内存资源池获得与上述第一数据对应的第一内存空间的虚拟地址和第二内存空间的虚拟地址,生成上述跨节点读指令,然后,第一处理器将跨节点读指令发送至第一设备。在另一种可能的实现方式中,上述第一设备获取跨节点读指令,包括:第一设备从内存资源池获得与上述第一数据对应的第一内存空间的虚拟地址和第二内存空间的虚拟地址,生成上述跨节点读指令。可以看出,上述跨节点读指令可以是第一处理器生成的,也可以是第一设备生成的。当第一处理器的负载过重或第一处理器需要优先处理其他任务时,第一设备可以自行生成跨节点读指令,无需等待第一处理器生成跨节点读指令,从而提高第一设备从第二内存空间读取第一数据的效率。
在一种可能的实现方式中,上述跨节点读指令还包括第一目的地址,第一目的地址为第一内存空间的虚拟地址,上述方法还包括:第一设备接收第一数据,然后根据第一内存空间的虚拟地址,将第一数据写入第一内存空间。如此,第一设备可以将从第二内存空间中读取的数据写入第一内存空间。
在一种可能的实现方式中,上述第一对应关系包括内存资源池的全局虚拟地址、内存资源池的存储空间的物理地址以及内存资源池关联的各个计算节点的ID之间的对应关系,上述第一设备根据第一内存空间的虚拟地址,将第一数据写入第一内存空间,包括:第一设备根据第一对应关系和第一内存空间的虚拟地址,确定第一内存空间的物理地址,然后通过直接内存访问(direct memory access,DMA)方式将第一数据写入第一内存空间。如此,可以提高第一设备将第一数据写入第一内存空间的速度。
在一种可能的实现方式中,上述第二设备存储有上述第一对应关系,上述第二设备接收第一网络传输报文,从第二内存空间中读取第一数据,包括:第二设备接收第一网络传输报文,获得第二内存空间的虚拟地址,然后根据第一对应关系和第二内存空间的虚拟地址,确定第二内存空间的物理地址,然后通过DMA方式从第二内存空间中读取第一数据。如此,可以提高第二设备从第二内存空间中读取第一数据的速度。而且,第二设备中存储有第一对应关系,使得第二设备可以根据第一对应关系确定第二内存空间的物理地址,从而从第二内存空间中读取第一数据,这一过程绕开了第二计算节点的CPU和操作系统,因此,利用上述方法还能够提供跨计算节点的数据传输效率。
在一种可能的实现方式中,上述第一内存还包括第三内存空间,上述第二内存还包括第四内存空间,上述方法还包括:第一设备获取跨节点写指令,跨节点写指令用于指示第一设备向第三内存空间中写入第二数据,跨节点写指令包括第二源地址、第二目的地址以及第二数据的大小,第二源地址为第三内存空间的虚拟地址,第二目的地址为第四内存空间的虚拟地址;第一设备根据上述第一对应关系和第三内存空间的虚拟地址,确定第三内存空间的物 理地址;第一设备通过DMA方式从第三内存空间中读取第二数据;第一设备根据第一对应关系和第四内存空间的虚拟地址,确定第二计算节点的ID;第一设备根据第二计算节点的ID和跨节点写指令,得到第二网络传输报文,并将第二网络传输报文发送至第二设备,其中,第二网络传输报文包括第四内存空间的虚拟地址和上述第二数据;第二设备接收第二网络传输报文,将第二数据写入第四内存空间。
实施上述实现方式所描述的方法,第一设备可以向第二计算节点的内存(即第四内存空间)中写入第二数据,实现了跨计算节点的数据传输,从而实现跨计算节点的内存资源共享。而且,第一设备中存储有第一对应关系,第一设备根据第一对应关系可以获得第二数据,并将第二数据发送至第二设备。这一过程绕开了第一计算节点的CPU和操作系统,因此,利用上述方法能够提高跨计算节点的数据传输效率。
第二方面,本申请提供了一种数据处理方法,应用于计算机系统,该计算机系统包括第一计算节点和第二计算节点,第一计算节点包括第一设备,第二计算节点包括第二设备,上述方法包括:
第一设备获取跨节点加速指令,跨节点加速指令用于指示第一设备使用第二设备来处理第三数据,跨节点读指令包括第二设备的ID和目标加速功能ID,第一设备存储有第二对应关系,第二对应关系包括第二设备的ID与第二计算节点的ID之间的对应关系;
第一设备根据第二设备的ID和第二对应关系,确定第二计算节点的ID;
第一设备根据第二计算节点的ID和跨节点加速指令,得到第三网络传输报文,并将第三网络传输报文发送至第二设备,第三网络传输报文包括上述目标加速功能ID;
第二设备根据目标加速功能ID,对第三数据进行相应的处理;
第二设备将第三数据的处理结果发送至第一计算节点。
实施第二方面所描述的方法,第一设备可以使用第二计算节点中的第二设备来处理第三数据,实现了跨计算节点的数据处理,从而实现跨计算节点的计算资源共享。而且,第一设备中存储有第二对应关系,第一设备根据第二对应关系可以获得第三网络传输报文,并将第三网络传输报文发送至第二设备,这一过程可以绕开第一计算节点的CPU和操作系统,因此,利用上述方法还能够提供跨计算节点的数据处理效率。
在一种可能的实现方式中,上述计算机系统中的计算节点共享计算资源池中的资源,计算资源池包括上述第二设备。通过建立计算资源池,计算机系统中的任意一个计算节点可以使用其他计算节点的计算资源,从而可以在上述计算机系统中实现全局的负载均衡,提高任务的处理效率。
在一种可能的实现方式中,上述第一计算节点还包括第一处理器,上述方法还包括:第一处理器对上述计算资源池中的加速设备及每个加速设备的加速功能进行编号,得到多个加速设备ID以及每个加速设备ID对应的加速功能ID;第一计算节点通过多个加速设备ID以及每个加速设备ID对应的加速功能ID,使用上述计算资源池中的加速设备对第三数据进行处理。如此,计算机系统中的任意一个计算节点可以获得其他计算节点的计算资源的信息,从而可以使用其他计算节点的计算资源。
在一种可能的实现方式中,上述第一设备获取跨节点加速指令,包括:第一处理器从计算资源池获得与上述第三数据对应的第二设备的ID以及目标加速功能ID,生成跨节点加速指令,然后将跨节点加速指令发送至第一设备。在另一种可能的实现方式中,上述第一设备获取跨节点加速指令,包括:第一设备从计算资源池获得与上述第三数据对应的第二设备的ID以及目标加速功能ID,生成跨节点加速指令。可以看出,上述跨节点加速指令可以是第一 处理器生成的,也可以是第一设备生成的。当第一处理器的负载过重或第一处理器需要优先处理其他任务时,第一设备可以自行生成上述跨节点加速指令,无需等待第一处理器生成跨节点加速指令,从而提高数据处理的效率。
在一种可能的实现方式中,上述跨节点加速指令还包括第三源地址和第三目的地址,第三源地址为存储有第三数据的设备存储空间的地址,第三目的地址为将第三数据的处理结果写入的设备存储空间的地址。
在一种可能的实现方式中,上述第三源地址为第一设备的存储空间的地址,在第二设备根据目标加速功能ID,对第三数据进行相应的处理之前,上述方法还包括:第一设备根据跨节点加速指令,获得第三源地址,然后从第一设备的存储空间中读取第三数据,然后将第三数据发送至第二设备。
在另一种可能的实现方式中,上述第三源地址为第二设备的存储空间的地址,上述第三网络传输报文还包括第三源地址,在第二设备根据目标加速功能ID,对第三数据进行相应的处理之前,上述方法还包括:第二设备根据第三网络传输报文,获取第三源地址,然后从第二设备的存储空间中读取第三数据。
可以看出,上述第三数据可以存储在第一设备的设备存储空间,还可以存储在第二设备的设备存储空间,利用本申请提供的数据处理方法均能够使用第二设备对第三数据进行处理。
第三方面,本申请提供了一种计算机系统,该计算机系统包括第一计算节点和第二计算节点,第一计算节点包括第一设备和第一内存,第二计算节点包括第二设备和第二内存,第一内存包括第一内存空间,第二内存包括第二内存空间,
第一设备用于获取跨节点读指令,跨节点读指令包括第一源地址和第一数据的大小,第一源地址为第二内存空间的虚拟地址,第一设备存储有第一对应关系,第一对应关系包括第二内存空间的虚拟地址与第二计算节点的ID之间的对应关系;
第一设备还用于根据第二内存空间的虚拟地址和第一对应关系,确定第二计算节点的ID;
第一设备还用于根据第二计算节点的ID和跨节点读指令,得到第一网络传输报文,并将第一网络传输报文发送至第二设备,第一网络传输报文包括第二内存空间的虚拟地址和第一数据的大小;
第二设备用于接收第一网络传输报文,从第二内存空间中读取第一数据,并将第一数据发送至第一设备。
在一种可能的实现方式中,上述计算机系统中的计算节点共享内存资源池中的资源,内存资源池包括上述第一内存和上述第二内存。
在一种可能的实现方式中,上述第一计算节点还包括第一处理器,第一处理器用于对上述内存资源池的地址空间进行编址,得到内存资源池的全局虚拟地址;第一计算节点用于通过全局虚拟地址访问内存资源池的存储空间。
在一种可能的实现方式中,上述第一处理器还用于从上述内存资源池获得与上述第一数据对应的第一内存空间的虚拟地址和第二内存空间的虚拟地址,生成跨节点读指令;第一处理器还用于将跨节点读指令发送至第一设备。
在一种可能的实现方式中,上述第一设备具体用于:从上述内存资源池获得与上述第一数据对应的第一内存空间的虚拟地址和第二内存空间的虚拟地址,生成跨节点读指令。
在一种可能的实现方式中,上述跨节点读指令还包括第一目的地址,第一目的地址为第一内存空间的虚拟地址,第一设备还用于接收第一数据;第一设备还用于根据第一内存空间 的虚拟地址,将第一数据写入第一内存空间。
在一种可能的实现方式中,上述第一对应关系包括上述内存资源池的全局虚拟地址、内存资源池的存储空间的物理地址以及内存资源池关联的各个计算节点的ID之间的对应关系,第一设备具体用于:根据第一对应关系和第一内存空间的虚拟地址,确定第一内存空间的物理地址,然后通过DMA方式将第一数据写入第一内存空间。
在一种可能的实现方式中,第二设备存储有上述第一对应关系,第二设备具体用于:接收第一网络传输报文,获得第二内存空间的虚拟地址,然后根据第一对应关系和第二内存空间的虚拟地址,确定第二内存空间的物理地址,然后通过DMA方式从第二内存空间中读取第一数据。
在一种可能的实现方式中,上述第一内存还包括第三内存空间,上述第二内存还包括第四内存空间,上述第一设备还用于获取跨节点写指令,跨节点写指令用于指示第一设备向第四内存空间中写入第二数据,跨节点写指令包括第二源地址、第二目的地址以及第二数据的大小,第二源地址为第三内存空间的虚拟地址,第二目的地址为第四内存空间的虚拟地址;第一设备还用于根据第一对应关系和第三内存空间的虚拟地址,确定第三内存空间的物理地址;第一设备还用于通过DMA方式从第三内存空间中读取第二数据;第一设备还用于根据第一对应关系和第四内存空间的虚拟地址,确定第二计算节点的ID;第一设备还用于根据第二计算节点的ID和跨节点写指令,得到第二网络传输报文,并将第二网络传输报文发送至第二设备,其中,第二网络传输报文包括第四内存空间的虚拟地址和第二数据;第二设备还用于接收第二网络传输报文,将第二数据写入第四内存空间。
第四方面,本申请还提供了一种计算机系统,该计算机系统包括第一计算节点和第二计算节点,第一计算节点包括第一设备,第二计算节点包括第二设备,
第一设备用于获取跨节点加速指令,跨节点加速指令用于指示第一设备使用第二设备来处理第三数据,跨节点加速指令包括第二设备的ID和目标加速功能ID,第一设备存储有第二对应关系,第二对应关系包括第二设备的ID与第二计算节点的ID之间的对应关系;
第一设备还用于根据第二设备的ID和第二对应关系,确定第二计算节点的ID;
第一设备还用于根据第二计算节点的ID和跨节点加速指令,得到第三网络传输报文,并将第三网络传输报文发送至第二设备,第三网络传输报文包括目标加速功能ID;
第二设备还用于根据目标加速功能ID,对第三数据进行相应的处理;
第二设备用于将第三数据的处理结果发送至第一计算节点。
在一种可能的实现方式中,上述计算机系统中的计算节点共享计算资源池中的资源,计算资源池包括上述第二设备。
在一种可能的实现方式中,上述第一计算节点还包括第一处理器,第一处理器用于对上述计算资源池中的加速设备及每个加速设备的加速功能进行编号,得到多个加速设备ID以及每个加速设备ID对应的加速功能ID;第一计算节点用于通过多个加速设备ID以及每个加速设备ID对应的加速功能ID,使用计算资源池中的加速设备对第三数据进行处理。
在一种可能的实现方式中,上述第一处理器还用于从上述计算资源池获得与上述第三数据对应的第二设备的ID以及目标加速功能ID,生成跨节点加速指令;第一处理器还用于将跨节点加速指令发送至第一设备。
在一种可能的实现方式中,第一设备具体用于:从上述计算资源池获得与上述第三数据对应的第二设备的ID以及目标加速功能ID,生成跨节点加速指令。
在一种可能的实现方式中,上述跨节点加速指令还包括第三源地址和第三目的地址,第 三源地址为存储有上述第三数据的设备存储空间的地址,第三目的地址为将上述第三数据的处理结果写入的设备存储空间的地址。
在一种可能的实现方式中,上述第三源地址为第一设备的存储空间的地址,第一设备具体用于:根据跨节点加速指令,获得第三源地址,然后从第一设备的存储空间中读取第三数据,然后将第三数据发送至第二设备。
在一种可能的实现方式中,上述第三源地址为第二设备的存储空间的地址,上述第三网络传输报文还包括第三源地址,第二设备还用于:根据第三网络传输报文,获取第三源地址,然后从第二设备的存储空间中读取第三数据。
第五方面,本申请提供了一种计算机可读存储介质,存储有第一计算机指令和第二计算机指令,第一计算机指令和第二计算指令分别运行在第一计算节点和第二计算节点上,以执行前述第一方面、第一方面的任意一种可能的实现方式、第二方面、第二方面的任意一种可能的实现方式中的方法,从而实现第一计算节点与第二计算节点之间的数据处理。
附图说明
图1是本申请提供的一种计算机系统的结构示意图;
图2是本申请提供的一种内存资源池及第一对应关系的示意图;
图3是本申请提供的一种数据传输方法的流程示意图;
图4是本申请提供的一种跨节点读指令的格式的示意图;
图5是本申请提供的另一种数据传输方法的流程示意图;
图6是本申请提供的另一种计算机系统的结构示意图;
图7是本申请提供的一种第二对应关系的示意图;
图8是本申请提供的一种数据处理方法的流程示意图;
图9是本申请提供的一种跨节点加速指令的格式的示意图。
具体实施方式
为了便于理解本申请提供的技术方案,首先介绍本申请适用的应用场景:计算机系统(例如,集群)的资源共享。
本申请中,计算机系统包括两个或两个以上的计算节点(即计算机),计算机系统的资源包括两个方面:一方面是内存资源,即该系统中所有计算节点拥有的内存资源;另一方面是计算资源,即该系统中所有计算节点拥有的计算资源。计算机系统的资源共享包括该系统的内存资源的共享,以及该系统的计算资源的共享。
计算机系统的内存资源的共享旨在构建一个内存资源池,如此,当计算机系统中的某个计算节点的内存资源不够用时,该计算节点可以把其他计算节点的内存当作磁盘或缓存,以用于存储一些数据,当该计算节点需要使用这些数据时,再从其他计算节点的内存中读取数据,从而解决因单个计算节点的内存配置不满足实际需求而影响任务执行进度的问题。
计算机系统的计算资源的共享旨在构建一个计算资源池,如此,当计算机系统中的某个计算节点的负载过重时,该计算节点可以使用其他计算节点的算力来处理一部分需要由本计算节点完成的任务,从而在计算机系统范围内实现全局负载均衡,以加快任务的完成进度。本申请实施例中,计算机系统的计算资源的共享具体是指计算机系统的加速资源的共享。加速资源是指加速计算能力,可以由加速设备提供。加速设备是一类能够减轻计算节点中CPU 的工作量,并提高计算节点处理任务的效率的硬件,例如,专门用于进行图像和图形相关运算工作的图形处理器(graphics processing unit,GPU)、专门用于处理视频和图像类的海量多媒体数据的神经网络处理器(neural-network processing units,NPU),数据流加速器(data stream accelerator,DSA)等。因此,计算机系统的加速资源的共享可以理解为:当计算机系统中的某个计算节点上的加速设备的负载过重时,可以将一些计算任务分配给该系统中的其他计算节点上的加速设备来执行,从而减轻该计算节点的CPU和加速设备的工作量,提高计算任务的完成效率。
本申请提供了一种数据传输方法,该方法可以由计算机系统执行,当在计算机系统中执行该方法时,能够实现跨计算节点的数据传输,从而在该系统中实现内存资源的共享。下面将结合图1示出的计算机系统介绍本申请提供的数据传输方法。
如图1所示,图1示出了本申请提供的一种计算机系统的结构示意图。其中,计算机系统100包括第一计算节点110和第二计算节点120。第一计算节点110包括第一处理器111、第一设备112以及第一内存113,第一处理器111包括第一资源管理器1111,第一设备112包括第一管理单元1121。第二计算节点120包括第二处理器121、第二设备122以及第二内存123,第二处理器121包括第二资源管理器1211,第二设备122包括第二管理单元1221。
第一计算节点110:
第一处理器111可以包括中央处理器(central processing unit,CPU),也可以包括专用集成电路(application specific integrated circuit,ASIC),或可编程逻辑器件(programmable logic device,PLD),上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程逻辑门阵列(field programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
第一设备112是第一计算节点110上的外部设备。第一设备112可以是GPU、NPU、DSA、张量处理器(tensor processing unit,TPU)、人工智能(artificial intelligent)芯片、网卡、数据处理器(data processing unit,DPU)或者一个或多个集成电路。
可选的,第一处理器111和第一设备112之间可以通过快捷外围部件互连标准(peripheral component interconnect express,PCIe)连接,也可以通过计算快速链接(compute express link,CXL)连接。第一处理器111和第一设备112之间还可以通过其他总线连接,例如:外围部件互连标准(peripheral component interconnect,PCI)、通用串行总线(universal serial bus,USB)等,此处不作具体限定。
第一内存113为第一计算节点110中的内存,用于存储第一计算节点110的CPU中的数据,以及与第一计算节点110上的外部存储器(例如,第一设备112的存储器)交换数据。
第一资源管理器1111是第一计算节点110中管理计算机系统100中所有计算节点拥有的内存资源的部件。具体地,第一资源管理器1111用于构建内存资源池,内存资源池包括第一内存113和第二内存123。第一资源管理器1111还用于对内存资源池的地址空间进行编址,得到内存资源池的全局虚拟地址。第一资源管理器1111还用于构建第一对应关系,并将第一对应关系配置到第一管理单元1121。其中,第一对应关系是指上述全局虚拟地址、内存资源池的存储空间的物理地址、以及内存资源池关联的各个计算节点的ID(即提供上述存储空间的计算节点的ID)之间的对应关系。
本申请实施例中,第一资源管理器1111对内存资源池的内存地址空间进行编址是指:将第一内存113提供的离散的内存地址空间和第二内存123提供的离散的内存地址空间编辑成 一个虚拟的、线性连续的内存地址空间。计算机系统100中的计算节点共享内存资源池中的资源,并且通过上述全局虚拟地址访问内存资源池的存储空间。
以图1示出的计算机系统100为例,对第一资源管理器1111进行说明:
第一计算节点110拥有的内存资源为第一内存113提供的内存空间(例如,图1所示的第一内存空间、第三内存空间),第二计算节点120拥有的内存资源为第二内存123提供的内存空间(例如,图1所示的第二内存空间、第四内存空间)。那么,第一资源管理单元114用于获取第一内存113和第二内存123的内存信息,其中,第一内存113的内存信息包括第一内存空间的物理地址、第三内存空间的物理地址,第二内存123的内存信息包括第二内存空间的物理地址、第四内存空间的物理地址。
可选的,第一内存113的内存信息还包括第一内存113提供的内存空间的大小(包括第一内存113中可用的内存空间大小、已用的内存空间大小),第一内存113中已用的内存空间的物理地址、可用的内存空间的物理地址等。第二内存123的内存信息还包括第二内存123提供的内存空间的大小(包括第二内存123中可用的内存空间大小、已用的内存空间大小),第二内存123中已用的内存空间的物理地址、可用的内存空间的物理地址等。
第一资源管理器1111还用于将第一内存113提供的内存空间和第二内存123提供的内存空间连接成一个内存空间,得到内存资源池,内存资源池包括第一内存空间、第二内存空间、第三内存空间以及第四内存空间。然后,对内存资源池的内存地址空间进行编址,得到全局虚拟地址,全局虚拟地址包括第一内存空间的虚拟地址、第二内存空间的虚拟地址、第三内存空间的虚拟地址以及第四内存空间的虚拟地址。
第一资源管理器1111还用于构建第一对应关系,并将第一对应关系配置到第一管理单元1121。其中,第一对应关系包括第一内存空间的虚拟地址、第一内存空间的物理地址以及第一计算节点110的ID之间的对应关系,第二内存空间的虚拟地址、第二内存空间的物理地址以及第二计算节点120的ID之间的对应关系,第三内存空间的虚拟地址、第三内存空间的物理地址以及第一计算节点110的ID之间的对应关系,第四内存空间的虚拟地址、第四内存空间的物理地址以及第二计算节点120的ID之间的对应关系。
举例说明,假设第一内存空间的物理地址为100:200,第三内存空间的物理地址为300:350,第二内存空间的物理地址为110:210,第四内存空间的物理地址为400:500,那么,第一资源管理器1111通过上述方式可以得到如图2所示的内存资源池和第一对应关系。
第二计算节点120:
第二处理器121可以包括CPU,也可以包括ASIC、PLD,上述PLD可以是CPLD、FPGA、GAL或其任意组合。
第二设备122是第二计算节点120上的外部设备。第二设备122可以是GPU、NPU、DSA、TPU、人工智能(artificial intelligent)芯片、网卡、DPU或者一个或多个集成电路。
可选的,第二处理器121和第二设备122之间可以通过PCIe连接,也可以通过CXL连接,还可以通过PCI、USB等连接,此处不作具体限定。
第二内存123为第二计算节点120中的内存,用于存储第二计算节点120的CPU中的数据,以及与第二计算节点120上的外部存储器(例如,第二设备122的存储器)交换数据。
第二资源管理器1211是第二计算节点120中用于管理计算机系统100中所有计算节点拥有的内存资源的部件。可选的,第二资源管理器1211可以采用与第一资源管理器1111类似的方式来管理计算机系统100中所有计算节点拥有的内存资源,此处不再展开叙述。第二资源管理器1211也可以通过以下方式管理计算机系统100中所有计算节点拥有的内存资源:第 一资源管理器1111得到全局虚拟地址以及第一对应关系后,将全局虚拟地址以及第一对应关系发送至第二资源管理器1211,然后,第二资源管理器1211将第一对应关系发送至第二管理单元1221。
本申请实施例中,第一计算节点110和第二计算节点120之间能够通过第一设备112和第二设备122进行通信。可选的,第一设备112和第二设备122之间可以通过有线接口或无线接口进行连接。其中,有线接口可以是以太网接口、控制器局域网接口、局域互联网络(local interconnect network,LIN)接口等,无线接口可以是蜂窝网络接口、无线局域网接口等,此处不作具体限定。
下面以第一计算节点110从第二计算节点120的内存中读取数据、以及第一计算节点110向第二计算节点120的内存中写入数据为例,介绍上述计算机系统100如何实现跨计算节点的内存资源的共享。
(一)第一计算节点110从第二计算节点120的内存中的读取数据
如图3所示,图3示出了本申请提供的一种数据传输方法的流程示意图。该方法包括但不限于如下步骤:
S101:第一设备112获取跨节点读指令。
其中,跨节点读指令用于指示第一设备112从第二内存空间中读取第一数据。本申请实施例中,跨节点读指令可以是一个原子指令(atomic instruction),例如,ARM的ST64BV指令、ST64BV0指令,x86的ENQCMD指令、ENQCMDS指令等。原子指令是用于指示设备执行原子操作(atomic operation)的命令,原子操作是一种不会被线程调度机制打断的操作。因此,原子指令可以理解为一旦被执行就不会被打断,直至运行完毕的指令。
在一具体的实施例中,跨节点读指令包括第一源地址、第一目的地址以及第一数据的大小。其中,第一源地址为存储有第一数据的内存空间的虚拟地址,此处为第二内存空间的虚拟地址;第一目的地址为读取到第一数据后,将第一数据写入的内存空间的虚拟地址,此处为第一内存空间的虚拟地址;第一数据的大小可以是第一数据的字节数。
应理解,上述第一源地址、第一目的地址以及第一数据的大小在跨节点读指令中的位置可以根据实际情况进行分配。还应理解,上述跨节点读指令还可以包括第一操作描述信息等其他信息,第一操作描述信息用于描述跨节点读指令,从而指示接收到该指令的第一设备112从第一源地址对应的内存空间中读取第一数据,并将读取到的第一数据写入第一目的地址。以图4为例,跨节点读指令为一个64位的指令,跨节点读指令中的第0-7字节用于填写第一源地址,第8-15字节用于填写第一目的地址,第16-21字节用于填写第一数据的大小,第22-64字节用于填写跨节点读指令所包含的其他信息,例如,上述第一操作描述信息。图4示出了一种示例性的跨节点读指令的格式,跨节点读指令的格式还可以是其他的格式,本申请不作具体限定。
在一种可能的实现方式中,第一设备112获取跨节点读指令,包括:第一处理器111从内存资源池获得与第一数据对应的第一内存空间的虚拟地址、第二内存空间的虚拟地址,生成跨节点读指令,并发送至第一设备112。
在另一种可能的实现方式中,第一设备112获取跨节点读指令,包括:第一设备112从内存资源池获得与第一数据对应的第一内存空间的虚拟地址、第二内存空间的虚拟地址,生成跨节点读指令。
S102:第一设备112根据跨节点读指令得到第一网络传输报文。
具体地,第一设备112接收到跨节点读指令后,对跨节点读指令进行解析,获得第一源 地址和第一数据的大小。然后,第一设备111根据第一源地址和第一管理单元1121中存储的第一对应关系,确定第二计算节点120的ID。然后,第一设备111根据第二计算节点120的ID和跨节点读指令,得到第一网络传输报文,其中,第一网络传输报文包括第一源地址和第一数据的大小,以及第一源IP地址和第一目的IP地址,第一源IP地址为第一计算节点110的IP地址,第一目的IP地址为第二计算节点120的IP地址。
可选的,第二计算节点120的ID可以是第二计算节点120的IP地址,也可以是用于指示第二计算节点120的编号。当第二计算节点120的ID是第二计算节点120的IP地址时,第一设备112根据第二计算节点120的ID和跨节点读指令,得到第一网络传输报文,包括:第一设备112根据第一计算节点110的IP地址和第二计算节点120的ID,对第一源地址和第一数据的大小进行封装,得到第一网络传输报文。当第二计算节点120的ID是用于指示第二计算节点120的编号时,第一设备112根据第二计算节点120的ID和跨节点读指令,得到第一网络传输报文,包括:第一设备112根据第二计算节点120的ID,确定第二计算节点120的IP地址,然后根据第一计算节点110的IP地址和第二计算节点120的IP地址,对第一源地址和第一数据的大小进行封装,得到第一网络传输报文。
可选的,第一设备112还可以通过以下方式中的任意一种得到第一网络传输报文:方式1、第一设备111根据第一计算节点110的IP地址和第二计算节点120的IP地址,对第一源地址、第一目的地址以及第一数据的大小进行封装,得到第一网络传输报文。方式2、第一设备111根据第一计算节点110的IP地址和第二计算节点120的IP地址,对跨节点读指令进行封装,得到第一网络传输报文。
S103:第一设备112将第一网络传输报文发送至第二设备122。
S104:第二设备122接收第一网络传输报文,从第二内存空间中读取第一数据。
具体地,第二设备122接收第一网络传输报文,然后对第一网络传输报文进行解析,获得第一源地址和第一数据的大小。然后,第二设备122根据第一源地址和第二管理单元1221中存储的第一对应关系,确定第二内存空间的物理地址。然后,第二设备122根据第二内存的物理地址,从第二内存空间中读取第一数据。
可选的,第二设备122可以通过DMA方式从第二内存空间中读取第一数据。其中,DMA是一种高速的数据传输方式,当第二设备122通过DMA方式从第二内存空间中读取数据时,无需依赖第二计算节点120中的CPU,因此通过这种方式可以减少CPU拷贝数据的开销,从而提高第二设备122从第二内存空间中读取数据的效率。
S105:第二设备122将第一数据发送至第一设备112。
具体地,第二设备122对第一数据进行封装,得到第二网络传输报文。其中,第二网络传输报文包括第一数据、第二源IP地址以及第二目的IP地址,第二源IP地址为第二计算节点120的IP地址,第二目的IP地址为第一计算节点110的IP地址。然后,第二设备122将第二网络传输报文发送至第一设备112。
S106:第一设备112接收第一数据,并将第一数据写入第一内存空间。
在一种可能的实现方式中,第一设备112接收到第二网络传输报文,并对第二网络传输报文进行解析,获得第一数据。第一设备112还根据上述跨节点读指令,获得第一目的地址(即第一内存空间的虚拟地址),并根据第一内存空间的虚拟地址和第一管理单元1121中存储的第一关系,确定第一内存空间的物理地址。然后,第一设备112将上述第一数据写入第一内存空间。
在另一种可能的实现方式中,上述第二网络传输报文还包括第一内存空间的虚拟地址, 那么,第一设备112可以通过以下方式将第一数据写入第一内存空间:第一设备112接收到第二网络传输报文后,对第二网络传输报文进行解析,获得第一数据和第一内存空间的虚拟地址,然后根据第一内存空间的虚拟地址和第一管理单元1121中存储的第一关系,确定第一内存空间的物理地址,然后将第一数据写入第一内存空间。应理解,当第二网络传输报文包括第一内存空间的虚拟地址时,上述第一网络传输报文可以包括第一内存空间的虚拟地址(即第一目的地址),这样,在上述S106中,第二设备122可以对第一数据以及第一内存空间的虚拟地址一起进行封装,从而得到包括第一内存空间的虚拟地址的第二网络传输报文。
可选的,第一设备112可以通过DMA方式将第一数据写入第一内存空间。如此,可以提高第一设备112向第一内存空间写入第一数据的速度。
上述S101-S106描述了第一计算节点110从第二计算节点120的内存中读取数据的过程,应理解,第二计算节点120从第一计算节点110的内存中读取数据的过程与上述S101-S106的过程类似,为了简便,此处不再叙述。
通过图3示出的数据传输方法,第一设备112能够将从第二内存空间中读取第一数据,从而在计算机系统100中实现跨计算节点的内存资源共享。而且,在利用上述数据传输方法读取数据时,除了上述S101中可能需要第一处理器111生成跨节点读指令,并发送至第一设备112外,其他步骤均不需要第一处理器111和第二处理器121,也就是说,利用上述数据传输方法可以绕过第一计算节点110和第二计算节点120中的CPU和操作系统,如此,可以提高跨计算节点的数据传输效率,从而提高计算机系统100的内存资源的共享效率。另外,在第一设备112从第二内存空间中读取第一数据的过程中,第二计算节点120的CPU可以执行其他任务,而如果第一计算节点110的CPU需要生成跨节点读指令,并发送至第一设备112,那么,第一计算节点110的CPU将跨节点读指令发送至第一设备112之后,还可以释放出来执行其它任务,从而减少资源浪费,提供资源利用率。
(二)第一计算节点110向第二计算节点120的内存中写入数据
如图5所示,图5示出了本申请提供的另一种数据传输方法的流程示意图。
S201:第一设备112获取跨节点写指令。
其中,跨节点写指令用于指示第一设备112向第四内存空间写入第二数据。与上述跨节点读指令类似的,跨节点写指令也可以是一个原子指令。
在一具体的实施例中,跨节点写指令包括第二源地址、第二目的地址以及第二数据的大小。其中,第二源地址为存储有第二数据的内存空间的虚拟地址,此处为第三内存空间的虚拟地址;第二目的地址为将第二数据写入的内存空间的虚拟地址,此处为第四内存空间的虚拟地址;第二数据的大小可以是第二数据的字节数。
应理解,上述第二源地址、第二目的地址以及第二数据的大小在跨节点写指令中的位置可以根据实际情况进行分配。还应理解,上述跨节点写指令还可以包括第二操作描述信息等,第二操作描述信息用于描述跨节点写指令,从而指示接收到该指令的第一设备112从第二源地址对应的内存空间中读取第二数据,并将读取到的第二数据写入第二目的地址对应的内存空间。跨节点写指令的具体格式也可以采用如图4示出的跨节点读指令的格式,本申请不作具体限定。
在一种可能的实现方式中,第一设备112获取跨节点写指令,包括:第一处理器111从内存资源池获得与第二数据对应的第二源地址和第二目的地址,生成跨节点写指令,并发送至第一设备112。
在另一种可能的实现方式中,第一设备112获取跨节点读指令,包括:第一设备112从 内存资源池获得与第二数据对应的第二源地址和第二目的地址,生成跨节点写指令。
S202:第一设备112根据跨节点写指令获得第二数据。
具体地,第一设备112接收到跨节点写指令后,对跨节点写指令进行解析,获得第二源地址和第二数据的大小,然后根据第二源地址和第一管理单元1121中存储的第一对应关系,确定第三内存空间的物理地址。然后,第一设备112根据第二数据的大小,从第三内存空间中读取第二数据。
可选的,第一设备112可以通过DMA方式从第三内存空间中读取到第二数据,如此,可以提高第一设备获得第二数据的速度。
S203:第一设备112根据跨节点写指令得到第三网络传输报文。
具体地,第一设备112接收到跨节点写指令后,对跨节点写指令进行解析,获得第二目的地址,然后根据第二目的地址和第一管理单元1121中存储的第一对应关系,确定第二计算节点120的ID。然后,第一设备112根据第二计算节点120的ID,对第二数据、第二目的地址进行封装,得到第三网络传输报文。其中,第三网络传输报文包括第二数据、第二目的地址、以及第三源IP地址和第三目的IP地址,第三源IP地址为第一计算节点110的IP地址,第三目的IP地址为第二计算节点120的IP地址。
可选的,第二计算节点120的ID可以是第二计算节点120的IP地址,也可以是用于指示第二计算节点120的编号。当第二计算节点120的ID是第二计算节点120的IP地址时,第一设备112根据第二计算节点120的ID,对第二数据、第二目的地址进行封装,得到第三网络传输报文,包括:第一设备112根据第一计算节点110的IP地址和第二计算节点120的ID,对第二源地址和第二目的地址进行封装,得到第三网络传输报文。当第二计算节点120的ID是用于指示第二计算节点120的编号时,第一设备112根据第二计算节点120的ID,对第二数据、第二目的地址进行封装,得到第三网络传输报文,包括:第一设备112根据第二计算节点120的ID,确定第二计算节点120的IP地址,然后根据第一计算节点110的IP地址和第二计算节点120的IP地址,对第二数据、第二目的地址进行封装,得到第三网络传输报文。
S204:第一设备112将第三网络传输报文发送至第二设备122。
S205:第二设备122接收第三网络传输报文,将第二数据写入第四内存空间。
具体地,第二设备122接收第三网络传输报文后,对第三网络传输报文进行解析,获得第二数据以及第二目的地址,然后根据第二目的地址和第二管理单元1221中存储的第一对应关系,确定第四内存空间的物理地址。然后,第二设备122通过DMA方式将第二数据写入四内存空间。
上述S201-S205描述了第一计算节点110向第二计算节点的内存中写入数据的过程,应理解,第二计算节点120向第一计算节点110的内存中写入数据的过程与上述S201-S205类似,为了简便,此处不再叙述。
通过图5示出的数据传输方法,第一设备112能够将存储在第三内存空间中的第二数据写入第四内存空间,从而在计算机系统100中实现跨计算节点的内存资源共享。而且,在利用上述数据传输方法在传输数据时,除了上述S201中可能需要第一处理器111生成跨节点写指令,并发送至第一设备112外,其他步骤均不需要第一处理器111和第二处理器121,也就是说,利用上述数据传输方法可以绕过第一计算节点110和第二计算节点120中的CPU和操作系统,如此,可以提高跨计算节点的数据传输效率,从而提高计算机系统100的内存资源的共享效率。另外,在第一设备112向第四内存空间写入第二数据的过程中,第二计算节点 120的CPU可以执行其他任务,而如果第一计算节点110的CPU需要生成跨节点写指令,并发送至第一设备112,那么,第一计算节点110的CPU将跨节点写指令发送至第一设备112之后,还可以释放出来执行其它任务,从而减少资源浪费,提供资源利用率。
本申请还提供了一种数据处理方法,当在计算机系统中执行该方法时,能够实现跨计算节点的数据处理,从而在该系统中实现计算资源(加速资源)的共享。下面将结合图6示出的计算机系统介绍本申请提供的数据传输方法。
如图6所示,图6示出了本申请提供的另一种计算机系统的结构示意图。其中,计算机系统200包括第一计算节点210和第二计算节点220。第一计算节点210包括第一处理器211、第一设备212以及第一内存213,第一处理器211包括第一资源管理器2111,第一设备212包括第一管理单元2121。第二计算节点220包括第二处理器221、第二设备222以及第二内存223,第二处理器221包括第二资源管理器2211,第二设备222包括第二管理单元2221。相较于图1示出的计算机系统100,图6示出的计算机系统200中:
第一设备212和第二设备222分别是第一计算节点210和第二计算节点220上的外部设备。第一设备212和第二设备222均具有计算能力。本申请实施例中,第一设备212和第二设备222具有的计算能力均可以是加速计算能力,那么,第一设备212和第二设备222分别是第一计算节点210和第二计算节点220中的加速设备。可选的,第一设备212或第二设备222可以是GPU、NPU、DSA、TPU、人工智能(artificial intelligent)芯片、网卡、DPU或者一个或多个集成电路。
可选的,第一设备212和第二设备222均可以具有一种或多种加速功能,例如,用于加速数据完整性校验过程的功能,用于加速数据加密、解密过程的功能,用于加速数据压缩、解压缩过程的功能,用于加速机器学习过程的功能,用于加速数据分类过程的功能,用于加速深度学习过程的功能,用于加速浮点计算的功能等。
可选的,第一设备212也可以不具有计算能力。
第一资源管理器2111是第一计算节点210中用于管理计算机系统200中所有计算节点拥有的计算资源的部件。具体地,第一资源管理器2111用于构建计算资源池,计算资源池包括第二设备222。可选的,当第一设备212也具有计算能力时,计算资源池还包括第一设备212。第一资源管理器2111还用于将计算机系统200中所有计算节点包括的加速设备和每个加速设备具有的加速功能进行编号,得到多个加速设备ID以及每个加速设备ID对应的加速功能ID。第一资源管理器2111还用于构建第二对应关系,并将第二对应关系配置到第一管理单元2121。第二对应关系是指计算资源池中的加速设备的ID、每个加速设备具有的加速功能的ID以及计算资源池关联的各个计算节点的ID(即加速设备所在的计算节点的ID)之间的对应关系。
本申请实施例中,计算机系统200中的计算节点共享计算资源池中的资源,并且能够通过上述多个加速设备ID以及每个加速设备ID对应的加速功能ID,使用计算资源池中的加速设备来处理数据。
以图6示出的计算机系统200为例,对第一资源管理器2111进行说明:
第一计算节点210拥有的计算资源包括第一设备212提供的加速计算的能力,第二计算节点220拥有的计算资源包括第二设备222提供的加速计算的能力。那么,第一资源管理器2111用于获取第一设备212和第二设备222的加速计算能力的信息,其中,第一设备212的加速计算能力信息包括第一设备212具有的加速功能,第二设备222的加速计算能力信息包 括第二设备222具有的加速功能。可选的,第一设备212的加速计算能力信息还包括第一设备212已用的算力和可用的算力,第二设备222的加速计算能力信息还包括第二设备222已用的算力和可用的算力。
第一资源管理器2111还用于对第一设备212和第二设备222进行编号,得到第一设备212的ID和第二设备222的ID,以及对第一设备212具有的加速功能和第二设备222具有的加速功能进行编号,得到每种加速功能的ID。
第一资源管理器2111还用于构建第二对应关系,并将第二对应关系配置到第一管理单元2121。其中,第二对应关系包括第一设备212的ID、第一设备212具有的加速功能的ID以及第一计算节点210的ID之间的对应关系,以及第二设备222的ID、第二设备222具有的加速功能的ID以及第二计算节点220的ID之间的对应关系。
举例说明,第一设备212提供的加速功能包括用于加速机器学习过程的功能和用于加速数据加密、解密过程的功能,第二设备222提供的加速功能包括用于加速机器学习过程的功能和用于加速数据压缩、解压缩过程的功能。那么,第一资源管理器2111可以将第一设备212编号为1,将第二设备222编号为2,将加速机器学习过程的功能编号为1,将加速数据加密、解密过程的功能编号为2,将加速数据压缩、解压缩过程的功能编号为3,将第一计算节点编号为1,将第二计算节点编号为2,从而得到如图7所示的第二对应关系。
第二资源管理器2211是第二计算节点220中用于管理计算机系统200中所有计算节点拥有的计算资源。可选的,第二资源管理器2211可以采用与第一资源管理器2111类似的方式来管理计算机系统200中所有计算节点拥有的内存资源,此处不再展开叙述。第二资源管理器2211也可以通过以下方式管理计算机系统200中所有计算节点拥有的计算资源:第一资源管理器2111将计算资源池的相关信息(例如,计算资源池中每个加速设备具有的加速计算能力的信息)以及第二对应关系发送至第二资源管理器2211,再由第二资源管理器2211将第二对应关系发送至第二管理单元2221。
需要说明的,计算机系统200中的第一处理器211、第二处理器221、第一内存213、第二内存223的功能,与计算机系统100中的第一处理器111、第二处理器121、第一内存113、第二内存123的功能类似,计算机系统200中的第一处理器211与第一设备212的连接关系、第二处理器221与第二设备222的连接关系,分别与计算机系统100中的第一处理器111与第一设备112的连接关系、第二处理器121与第二设备122的连接关系类似,为了简便,本申请实施例不再叙述。
下面以第一计算节点210使用第二计算节点120的计算资源为例,介绍上述计算机系统200如何实现跨计算节点的计算资源的共享。
如图8所示,图8示出了本申请提供的一种数据处理方法的流程示意图。该方法包括但不限于如下步骤:
S301:第一设备212获取跨节点加速指令。
其中,跨节点加速指令用于指示第一设备212使用第二设备222对第三数据进行处理。与上述跨节点读指令、跨节点写指令类似的,跨节点加速指令也可以是一个原子指令。
在一具体的实施例中,跨节点加速指令包括第三源地址、第三目的地址、第三数据的大小、目标加速设备ID以及目标加速功能ID。其中,第三源地址为存储有第三数据的设备的存储空间的地址,此处为第一存储空间的地址,第一设备212的设备存储空间包括第一存储空间;第三目的地址为将第三数据的处理结果写入的设备存储空间的地址,此处为第二存储空间的地址,第一设备212的设备存储空间还包括第二存储空间;第三数据的大小可以是第 三数据的字节数;目标加速设备ID为第一设备212使用的、用于处理第三数据的加速设备的ID,此处为第二设备的ID;目标加速功能ID为第二设备222具有的加速功能的ID,用于指示第二设备222处理第三数据,例如,当目标加速功能ID为数据完整性校验功能对应的ID时,第二设备222对第三数据执行数据完整性校验的操作,又例如,当目标加速功能ID为数据加密功能对应的ID时,第二设备222对第三数据执行数据加密的操作。
可选的,跨节点加速指令还包括第三存储空间的地址,第二设备222的设备存储空间包括第三存储空间,第三存储空间的地址为第二设备222接收到第三数据后,存储第三数据的地址。应理解,上述第三源地址、第三目的地址、第三数据的大小、目标加速设备ID、目标加速功能ID以及第三存储空间的地址在上述跨节点加速指令中的位置可以根据实际情况进行分配。以图9为例,跨节点加速指令为一个64位的指令,跨节点加速指令中的第0-7字节用于填写第三源地址,第8-15字节用于填写第三目的地址,第16-23字节用于填写第三存储空间的地址,第24-27字节用于填写目标加速设备ID,第28-31字节用于填写目标加速功能ID,第32-37字节用于填写第三数据的大小,第38-64字节用于填写跨节点加速指令所包含的其他信息,例如,第三操作描述信息,第三操作描述信息用于描述跨节点加速指令,从而指示接收到该指令的第一设备212使用第二设备222来对第三数据进行处理。图9示出了一种示例性的跨节点加速指令的格式,跨节点加速指令的格式还可以是其他的格式,本申请不作具体限定。
在一种可能的实现方式中,第一设备212获取跨节点加速指令,包括:第一处理器211从计算资源池获得与第三数据对应的目标加速设备ID以及目标加速功能ID,生成跨节点加速指令,并发送至第一设备112。
在另一种可能的实现方式中,第一设备212获取跨节点加速指令,包括:第一设备212从计算资源池获得与第三数据对应的目标加速设备ID以及目标加速功能ID,生成跨节点加速指令。
S302:第一设备212根据跨节点加速指令,获得第三数据和目标加速功能ID。
具体地,第一设备212接收到跨节点加速指令后,对跨节点加速指令进行解析,获得第三源地址、目标加速设备ID和目标加速功能ID,然后从第一存储空间中读取第三数据。
S303:第一设备212将第三数据和目标加速功能ID进行封装,得到第四网络传输报文。
具体地,第一设备212根据目标加速设备ID和第一管理单元2121中存储的第二对应关系,确定第二计算节点220的ID。然后,第一设备212根据第二计算节点220的ID对第三数据和目标加速功能ID进行封装,得到第四网络传输报文。其中,第四网络传输报文包括第三数据、目标加速功能ID,以及第四源IP地址、第四目的IP地址,第四源IP地址为第一计算节点210的IP地址,第四目的IP地址为第二计算节点220的IP地址。
可选的,第二计算节点220的ID可以是第二计算节点220的IP地址,也可以是用于指示第二计算节点220的编号。当第二计算节点220的ID是第二计算节点220的IP地址时,第一设备212根据第二计算节点220的ID对第三数据和目标加速功能ID进行封装,得到第四网络传输报文,包括:第一设备212根据第一计算节点210的IP地址和第二计算节点220的ID,对第三数据和目标加速功能ID进行封装,得到第四网络传输报文。当第二计算节点220的ID是用于指示第二计算节点220的编号时,第一设备212根据第二计算节点220的ID对第三数据和目标加速功能ID进行封装,得到第四网络传输报文,包括:第一设备212根据第二计算节点120的ID,确定第二计算节点220的IP地址,然后根据第一计算节点210的IP地址和第二计算节点220的IP地址,对第三数据和目标加速功能ID进行封装,得到第四 网络传输报文。
S304:第一设备212将第四网络传输报文发送至第二设备222。
S305:第二设备222接收第四网络传输报文,并根据第四网络传输报文对第三数据进行处理。
具体地,第二设备222接收到第四网络传输报文后,对第四网络传输报文进行解析,获得第三数据和目标加速功能ID,然后根据目标加速功能ID对第三数据进行处理。
S306:第二设备222将第三数据的处理结果发送至第一设备212。
具体地,第二设备222对第三数据的处理结果进行封装,得到第五网络传输报文。其中,第五网络传输报文包括第三数据、第五源IP地址以及第五目的IP地址,第五源IP地址为第二计算节点220的IP地址,第五目的IP地址为第一计算节点210的IP地址。
S307:第一设备212接收第三数据的处理结果,并将该结果写入第三存储空间。
在一种可能的实现方式中,第一设备212接收到第五网络传输报文后,对第五网络传输报文进行解析,获得第三数据的处理结果。第一设备212还根据上述跨节点加速指令,获得第三目的地址。然后,第一设备212将第三数据的处理结果写入第三目的地址对应的存储空间(即第二存储空间)。
在另一种可能的实现方式中,上述第五网络传输报文还包括第三目的地址,那么,第一设备212将第三数据的处理结果写入第二存储空间,包括:第一设备212接收到第五网络传输报文后,对第五网络传输报文进行解析,得到第三数据和第三目的地址,然后将第三数据写入第二存储空间。应理解,当第五网络传输报文包括第三目的地址时,上述第四网络传输报文也包括第三目的地址,这样,在上述S306中,第二设备222可以对第三数据的处理结果和第三目的地址一起进行封装,从而得到第五网络传输报文。
上述实施例描述了第三数据为第一设备212中存储的数据,第三数据的处理结果写入的是第一设备212的设备存储空间,即跨节点加速指令中的第三源地址为第一存储空间的地址,跨节点加速指令中的第三目的地址为第二存储空间的地址。在实际应用中,第三数据还可以是第二设备222中存储的数据,或第一计算节点210的内存中存储的数据,或第二计算节点220的内存中存储的数据。第三数据的处理结果还可以写入第一计算节点210的内存。下面将对以上几种情况进行简单的说明。
情况1、当第三数据为第二设备222中存储的数据、第三数据的处理结果写入的是第一设备212的设备存储空间时,跨节点加速指令中的第三源地址为第四存储空间的地址,第三目的地址仍为第二存储空间,其中,第二设备222的设备存储空间包括第四存储空间。
那么,第一设备212在获取跨节点加速指令后,可以通过以下步骤使用第二设备222对第三数据进行处理:第一设备212对跨节点加速指令进行解析,获得第二设备的ID,然后根据第二设备的ID和第一管理单元2121中存储的第二对应关系,确定第二计算节点220的ID,然后根据第二计算节点220的ID,对第三源地址、目标加速功能ID进行封装,得到对应的网络传输报文,并将该网络传输报文发送至第二设备222,第二设备222接收到该网络传输报文后,获得第三源地址和目标加速功能ID。然后,第二设备222从第四存储空间中读取第三数据,并根据目标加速功能ID对第三数据进行相应的处理。之后,第二设备222将第三数据的处理结果发送至第一设备212,第一设备212接收到第三数据的处理结果后,将第三数据的处理结果写入第二存储空间。
情况2、当第三数据为第一计算节点210的内存中存储的数据、第三数据的处理结果写入的是第一计算节点220的内存中时,计算机系统200可以结合上述数据传输方法来实现上 述跨计算节点的数据处理,为了简便,此处仅描述与上述S301-S307的不同步骤:
(1)跨节点加速指令包括的第三源地址为第五内存空间的虚拟地址,第三目的地址为第六内存空间的虚拟地址,其中,第一内存213包括第五内存空间和第六内存空间。也就是说,该跨节点加速指令是第一处理器212或第一设备222通过以下步骤得到的:从内存资源池获得第五内存空间的虚拟地址和第六内存空间的虚拟地址,从计算资源池获得第二设备的ID和目标加速功能ID,从而生成跨节点加速指令。
需要说明的是,在情况2中,第一资源管理器2111除了用于管理计算机系统200中所有计算节点拥有的计算资源,还用于管理计算机系统200中所有计算节点拥有的内存资源,具体可参见第一资源管理器1111的管理方式。因此,第一管理单元2121除了包括第二对应关系外,还包括第五内存空间的虚拟地址、第五内存空间的物理地址以及第一计算节点210的ID之间的对应关系,以及第六内存空间的虚拟地址、第六内存空间的物理地址以及第一计算节点210的ID之间的对应关系。
(2)第一设备212通过以下步骤获取第三数据:第一设备212获取跨节点加速指令后,对跨节点加速指令进行解析,获得第三源地址,然后根据第三源地址和第一管理单元2121中存储的第五内存空间的虚拟地址与第五内存空间的物理地址之间的对应关系,从第五内存空间中读取第三数据。
(3)第一设备212得到第三数据的处理结果后,通过以下步骤将第三数据写入第三目的地址:第一设备212获取第三数据的处理结果后,根据第三目的地址以及第一管理单元2121中存储的第六内存空间的虚拟地址与第六内存空间的物理地址之间的对应关系,将第三数据的处理结果写入第六内存空间。
应理解,当第三数据为第二设备222中存储的数据、第三数据的处理结果写入的是第一计算节点210的内存中,或者第三数据为第二计算节点220的内存中存储的数据,第三数据的处理结果写入的是第一设备212的设备存储空间中,或者第三数据为第二计算节点220的内存中存储的数据,第三数据的处理结果写入的是第一计算节点210的内存中时,第一设备212使用第二设备222处理第三数据的方法也可以结合本申请提供的数据传输方法,具体过程可参见情况2所描述的方法,并对其进行适应性的修改,为了简便,此处不再叙述。
上述实施例描述了第一计算节点210使用第二计算节点120的加速设备处理数据的过程,应理解,第二计算节点220使用第一计算节点220的加速设备处理数据的过程与上述实施例描述的过程类似,为了简便,此处不再叙述。
通过上述数据处理方法,第一设备212能够使用第二设备222处理第三数据,从而在计算机系统200中实现跨计算节点的计算资源共享。而且,在利用上述数据处理方法在处理数据时,除了上述S301中可能需要第一处理器211生成跨节点写指令,并发送至第一设备212,其他步骤均不需要第一处理器211和第二处理器221,也就是说,利用上述数据处理方法可以绕过第一计算节点210和第二计算节点220中的CPU和操作系统,如此,可以提高跨计算节点的数据处理效率,从而提高计算机系统200的计算资源的共享效率。另外,在第一设备212使用第二设备222处理第三数据的过程中,第二计算节点220的CPU可以执行其他任务,而如果第一计算节点210的CPU需要生成跨节点加速指令,并发送至第一设备212,那么,在第一计算节点210的CPU将跨节点加速指令发送至第一设备212之后,还可以释放出来执行其它任务,从而减少资源浪费,提供资源利用率。
上文中结合图1至图9,详细描述了本申请提供的数据传输方法和数据处理方法,下面 将描述执行上述数据传输方法和数据处理方法的系统。
本申请还提供了一种计算机系统,如前述图1所示,计算机系统100可以包括第一计算节点110和第二计算节点120,第一计算节点110可以包括第一设备112、第一内存113,第二计算节点120可以包括第二设备122。可选的,第一计算节点110还可以包括第一处理器111。
其中,第一设备112用于执行前述S101-S103、S106、S201-S204,第二设备122用于执行前述S104-S105、S205。
可选的,当第一计算节点110包括第一处理器111时,第一处理器111用于对计算机系统110的内存资源池的地址空间进行编址,得到内存资源池的全局虚拟地址,以及构建第一对应关系。第一处理器111还用于执行前述S101中生成跨节点读指令,并将跨节点读指令发送至第一设备112的步骤。第一处理器111还可以用于执行前述S201中生成跨节点写指令,并将跨节点写指令发送至第一设备112的步骤。
本申请还提供了一种计算机系统,如前述图6所示,计算机系统200可以包括第一计算节点210和第二计算节点220,第一计算节点210可以包括第一设备212,第二计算节点220可以包括第二设备222。可选的,第一计算节点110还可以包括第一处理器211。
其中,第一设备212用于执行前述S301-S304、S307、以及情况1和情况2中第一设备212执行的步骤,第二设备222用于执行前述S305-S306、以及情况1和情况2中第二设备222执行的步骤。
可选的,当第一计算节点210包括第一处理器211时,第一处理器211用于对计算机系统200的计算资源池中的加速设备及每个加速设备的加速功能进行编号,得到多个加速设备ID以及每个加速设备ID对应的加速功能ID,以及构建第二对应关系。第一处理器211还用于执行前述S301中生成跨节点加速指令,并将跨节点加速指令发送至第一设备212的步骤。
本申请还提供了一种计算机可读存储介质,该计算机可读存储介质存储有第一计算机指令和第二计算机指令,第一计算机指令和第二计算机指令分别运行在第一计算节点(例如,图1所示的第一计算节点110、图6所示的第一计算节点210)和第二计算节点(例如,图1所示的第二计算节点120、图6所示的第二计算节点220)上,以实现前述方法实施例所描述的第一计算节点110与第二计算节点120之间的数据传输,以及第一计算节点210与第二计算节点220之间的数据处理,从而在计算机系统中实现跨计算节点的资源(内存资源和计算资源)的共享。
上述计算节点可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。上述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,上述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如,同轴电缆、光纤、双绞线或无线(例如,红外、无线、微波)等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。上述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个介质集成的服务器、数据中心等数据存储设备。上述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,光盘)、或者半导体介质(例如,固态硬盘(solid state disk,SSD))。
以上所述,仅为本申请的具体实施方式。熟悉本技术领域的技术人员根据本申请提供的具体实施方式,可想到变化或替换,都应涵盖在本申请的保护范围之内。

Claims (33)

  1. 一种数据传输方法,其特征在于,应用于计算机系统,所述计算机系统包括第一计算节点和第二计算节点,所述第一计算节点包括第一设备和第一内存,所述第二计算节点包括第二设备和第二内存,所述第一内存包括第一内存空间,所述第二内存包括第二内存空间,所述方法包括:
    所述第一设备获取跨节点读指令,所述跨节点读指令包括第一源地址和第一数据的大小,所述第一源地址为所述第二内存空间的虚拟地址,所述第一设备存储有第一对应关系,所述第一对应关系包括所述第二内存空间的虚拟地址与所述第二计算节点的ID之间的对应关系;
    所述第一设备根据所述第二内存空间的虚拟地址和所述第一对应关系,确定所述第二计算节点的ID;
    所述第一设备根据所述第二计算节点的ID和所述跨节点读指令,得到第一网络传输报文,并将所述第一网络传输报文发送至所述第二设备,所述第一网络传输报文包括所述第二内存空间的虚拟地址和所述第一数据的大小;
    所述第二设备接收所述第一网络传输报文,从所述第二内存空间中读取所述第一数据,并将所述第一数据发送至所述第一设备。
  2. 根据权利要求1所述的方法,其特征在于,所述计算机系统中的计算节点共享内存资源池中的资源,所述内存资源池包括所述第一内存和所述第二内存。
  3. 根据权利要求2所述的方法,其特征在于,所述第一计算节点还包括第一处理器,所述方法还包括:
    所述第一处理器对所述内存资源池的地址空间进行编址,得到所述内存资源池的全局虚拟地址;
    所述第一计算节点通过所述全局虚拟地址访问所述内存资源池的存储空间。
  4. 根据权利要求3所述的方法,其特征在于,所述第一设备获取跨节点读指令,包括:
    所述第一处理器从所述内存资源池获得与所述第一数据对应的所述第一内存空间的虚拟地址和所述第二内存空间的虚拟地址,生成所述跨节点读指令;
    所述第一处理器将所述跨节点读指令发送至所述第一设备。
  5. 根据权利要求3所述的方法,其特征在于,所述第一设备获取所述跨节点读指令,包括:
    所述第一设备从所述内存资源池获得与所述第一数据对应的所述第一内存空间的虚拟地址和所述第二内存空间的虚拟地址,生成所述跨节点读指令。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述跨节点读指令还包括第一目的地址,所述第一目的地址为所述第一内存空间的虚拟地址,所述方法还包括:
    所述第一设备接收所述第一数据;
    所述第一设备根据所述第一内存空间的虚拟地址,将所述第一数据写入所述第一内存空间。
  7. 根据权利要求6所述的方法,其特征在于,所述第一对应关系包括所述内存资源池的全局虚拟地址、所述内存资源池的存储空间的物理地址以及所述内存资源池关联的各个计算节点的ID之间的对应关系,所述第一设备根据所述第一内存空间的虚拟地址,将所述第一数据写入所述第一内存空间,包括:
    所述第一设备根据所述第一对应关系和所述第一内存空间的虚拟地址,确定所述第一内存空间的物理地址;
    所述第一设备通过直接内存访问DMA方式将所述第一数据写入所述第一内存空间。
  8. 根据权利要求7所述的方法,其特征在于,所述第二设备存储有所述第一对应关系,所述第二设备接收所述第一网络传输报文,从所述第二内存空间中读取所述第一数据,包括:
    所述第二设备接收所述第一网络传输报文,获得所述第二内存空间的虚拟地址;
    所述第二设备根据所述第一对应关系和第二内存空间的虚拟地址,确定所述第二内存空间的物理地址;
    所述第二设备通过所述DMA方式从所述第二内存空间中读取所述第一数据。
  9. 根据权利要求7或8所述的方法,其特征在于,所述第一内存还包括第三内存空间,所述第二内存还包括第四内存空间,所述方法还包括:
    所述第一设备获取跨节点写指令,所述跨节点写指令包括第二源地址、第二目的地址以及第二数据的大小,所述第二源地址为所述第三内存空间的虚拟地址,所述第二目的地址为所述第四内存空间的虚拟地址;
    所述第一设备根据所述第一对应关系和所述第三内存空间的虚拟地址,确定所述第三内存空间的物理地址;
    所述第一设备所述通过DMA方式从所述第三内存空间中读取所述第二数据;
    所述第一设备根据所述第一对应关系和所述第四内存空间的虚拟地址,确定所述第二计算节点的ID;
    所述第一设备根据所述第二计算节点的ID和所述跨节点写指令,得到第二网络传输报文,并将所述第二网络传输报文发送至所述第二设备,其中,所述第二网络传输报文包括所述第四内存空间的虚拟地址和所述第二数据;
    所述第二设备接收所述第二网络传输报文,将所述第二数据写入所述第四内存空间。
  10. 一种数据处理方法,其特征在于,应用于计算机系统,所述计算机系统包括第一计算节点和第二计算节点,所述第一计算节点包括第一设备,所述第二计算节点包括第二设备,所述方法包括:
    所述第一设备获取跨节点加速指令,所述跨节点加速指令包括所述第二设备的ID和目标加速功能ID,所述第一设备存储有第二对应关系,所述第二对应关系包括所述第二设备的ID与所述第二计算节点的ID之间的对应关系;
    所述第一设备根据所述第二设备的ID和所述第二对应关系,确定所述第二计算节点的ID;
    所述第一设备根据所述第二计算节点的ID和所述跨节点加速指令,得到第三网络传输报文,并将所述第三网络传输报文发送至所述第二设备,所述第三网络传输报文包括所述目标加速功能ID;
    所述第二设备根据所述目标加速功能ID,对第三数据进行相应的处理;
    所述第二设备将所述第三数据的处理结果发送至所述第一计算节点。
  11. 根据权利要求10所述的方法,其特征在于,所述计算机系统中的计算节点共享计算资源池中的资源,所述计算资源池包括所述第二设备。
  12. 根据权利要求11所述的方法,其特征在于,所述第一计算节点还包括第一处理器,所述方法还包括:
    所述第一处理器对所述计算资源池中的加速设备及每个加速设备的加速功能进行编号,得到多个加速设备ID以及每个加速设备ID对应的加速功能ID;
    所述第一计算节点通过所述多个加速设备ID以及所述每个加速设备ID对应的加速功能 ID,使用所述计算资源池中的加速设备对所述第三数据进行处理。
  13. 根据权利要求12所述的方法,其特征在于,所述第一设备获取跨节点加速指令,包括:
    所述第一处理器从所述计算资源池获得与所述第三数据对应的所述第二设备的ID以及所述目标加速功能ID,生成所述跨节点加速指令;
    所述第一处理器将所述跨节点加速指令发送至所述第一设备。
  14. 根据权利要求12所述的方法,其特征在于,所述第一设备获取跨节点加速指令,包括:
    所述第一设备从所述计算资源池获得与所述第三数据对应的所述第二设备的ID以及所述目标加速功能ID,生成所述跨节点加速指令。
  15. 根据权利要求10-14任一项所述的方法,其特征在于,所述跨节点加速指令还包括第三源地址和第三目的地址,所述第三源地址为存储有所述第三数据的设备存储空间的地址,所述第三目的地址为将所述第三数据的处理结果写入的设备存储空间的地址。
  16. 根据权利要求15所述的方法,其特征在于,所述第三源地址为所述第一设备的存储空间的地址,在所述第二设备根据所述目标加速功能ID,对所述第三数据进行相应的处理之前,所述方法还包括:
    所述第一设备根据所述跨节点加速指令,获得所述第三源地址;
    所述第一设备从所述第一设备的存储空间中读取所述第三数据;
    所述第一设备将所述第三数据发送至所述第二设备。
  17. 根据权利要求15所述的方法,其特征在于,所述第三源地址为所述第二设备的存储空间的地址,所述第三网络传输报文还包括所述第三源地址,在所述第二设备根据所述目标加速功能ID,对所述第三数据进行相应的处理之前,所述方法还包括:
    所述第二设备根据所述第三网络传输报文,获得所述第三源地址;
    所述第二设备从所述第二设备的存储空间中读取所述第三数据。
  18. 一种计算机系统,所述计算机系统包括第一计算节点和第二计算节点,所述第一计算节点包括第一设备和第一内存,所述第二计算节点包括第二设备和第二内存,所述第一内存包括第一内存空间,所述第二内存包括第二内存空间,
    所述第一设备用于获取跨节点读指令,所述跨节点读指令包括第一源地址和第一数据的大小,所述第一源地址为所述第二内存空间的虚拟地址,所述第一设备存储有第一对应关系,所述第一对应关系包括所述第二内存空间的虚拟地址与所述第二计算节点的ID之间的对应关系;
    所述第一设备还用于根据所述第二内存空间的虚拟地址和所述第一对应关系,确定所述第二计算节点的ID;
    所述第一设备还用于根据所述第二计算节点的ID和所述跨节点读指令,得到第一网络传输报文,并将所述第一网络传输报文发送至所述第二设备,所述第一网络传输报文包括所述第二内存空间的虚拟地址和所述第一数据的大小;
    所述第二设备用于接收所述第一网络传输报文,从所述第二内存空间中读取所述第一数据,并将所述第一数据发送至所述第一设备。
  19. 根据权利要求18所述的系统,其特征在于,所述计算机系统中的计算节点共享内存资源池中的资源,所述内存资源池包括所述第一内存和所述第二内存。
  20. 根据权利要求19所述的系统,其特征在于,所述第一计算节点还包括第一处理器,
    所述第一处理器用于对所述内存资源池的地址空间进行编址,得到所述内存资源池的全局虚拟地址;
    所述第一计算节点用于通过所述全局虚拟地址访问所述内存资源池的存储空间。
  21. 根据权利要求20所述的系统,其特征在于,
    所述第一处理器还用于从所述内存资源池获得与所述第一数据对应的所述第一内存空间的虚拟地址和所述第二内存空间的虚拟地址,生成所述跨节点读指令;
    所述第一处理器还用于将所述跨节点读指令发送至所述第一设备。
  22. 根据权利要求20所述的系统,其特征在于,所述第一设备具体用于:
    从所述内存资源池获得与所述第一数据对应的所述第一内存空间的虚拟地址和所述第二内存空间的虚拟地址,生成所述跨节点读指令。
  23. 根据权利要求18-22任一项所述的系统,其特征在于,所述跨节点读指令还包括第一目的地址,所述第一目的地址为所述第一内存空间的虚拟地址,
    所述第一设备还用于接收所述第一数据;
    所述第一设备还用于根据所述第一内存空间的虚拟地址,将所述第一数据写入所述第一内存空间。
  24. 根据权利要求18-23任一项所述的系统,其特征在于,所述第一内存还包括第三内存空间,所述第二内存包括还包括第四内存空间,所述第一对应关系还包括所述第三内存空间的虚拟地址与所述第三内存空间的物理地址之间的对应关系,以及所述第四内存空间的虚拟地址和所述第二计算节点的ID之间的对应关系,
    所述第一设备还用于获取跨节点写指令,所述跨节点写指令包括第二源地址、第二目的地址以及第二数据的大小,所述第二源地址为所述第三内存空间的虚拟地址,所述第二目的地址为所述第四内存空间的虚拟地址;
    所述第一设备还用于根据所述第一对应关系和所述第三内存空间的虚拟地址,确定所述第三内存空间的物理地址;
    所述第一设备还用于通过直接内存访问DMA方式从所述第三内存空间中读取所述第二数据;
    所述第一设备还用于根据所述第一对应关系和所述第四内存空间的虚拟地址,确定所述第二计算节点的ID;
    所述第一设备还用于根据所述第二计算节点的ID和所述跨节点写指令,得到第二网络传输报文,并将所述第二网络传输报文发送至所述第二设备,其中,所述第二网络传输报文包括所述第四内存空间的虚拟地址和所述第二数据;
    所述第二设备还用于接收所述第二网络传输报文,将所述第二数据写入所述第四内存空间。
  25. 一种计算机系统,其特征在于,所述计算机系统包括第一计算节点和第二计算节点,所述第一计算节点包括第一设备,所述第二计算节点包括第二设备,所述方法包括:
    所述第一设备用于获取跨节点加速指令,所述跨节点加速指令包括所述第二设备的ID和目标加速功能ID,所述第一设备存储有第二对应关系,所述第二对应关系包括所述第二设备的ID与所述第二计算节点的ID之间的对应关系;
    所述第一设备还用于根据所述第二设备的ID和所述第二对应关系,确定所述第二计算节点的ID;
    所述第一设备还用于根据所述第二计算节点的ID和所述跨节点加速指令,得到第三网络 传输报文,并将所述第三网络传输报文发送至所述第二设备,所述第三网络传输报文包括所述目标加速功能ID;
    所述第二设备还用于根据所述目标加速功能ID,对第三数据进行相应的处理;
    所述第二设备还用于将所述第三数据的处理结果发送至所述第一计算节点。
  26. 根据权利要求25所述的系统,其特征在于,所述计算机系统中的计算节点共享计算资源池中的资源,所述计算资源池包括所述第二设备。
  27. 根据权利要求26所述的系统,其特征在于,所述第一计算节点还包括第一处理器,
    所述第一处理器用于对所述计算资源池中的加速设备及每个加速设备的加速功能进行编号,得到多个加速设备ID以及每个加速设备ID对应的加速功能ID;
    所述第一计算节点用于通过所述多个加速设备ID以及所述每个加速设备ID对应的加速功能ID,使用所述计算资源池中的加速设备对所述第三数据进行处理。
  28. 根据权利要求27所述的系统,其特征在于,
    所述第一处理器还用于从所述计算资源池获得与所述第三数据对应的所述第二设备的ID以及所述目标加速功能ID,生成所述跨节点加速指令;
    所述第一处理器还用于将所述跨节点加速指令发送至所述第一设备。
  29. 根据权利要求27所述的系统,其特征在于,所述第一设备具体用于:
    从所述计算资源池获得与所述第三数据对应的所述第二设备的ID以及所述目标加速功能ID,生成所述跨节点加速指令。
  30. 根据权利要求25-29任一项所述的系统,其特征在于,所述跨节点加速指令还包括第三源地址和第三目的地址,所述第三源地址为存储有所述第三数据的设备存储空间的地址,所述第三目的地址为将所述第三数据的处理结果写入的设备存储空间的地址。
  31. 根据权利要求30所述的系统,其特征在于,所述第三源地址为所述第一设备的存储空间的地址,所述第一设备具体用于:
    根据所述跨节点加速指令,获得所述第三源地址;
    从所述第一设备的存储空间中读取所述第三数据;
    将所述第三数据发送至所述第二设备。
  32. 根据权利要求30所述的系统,其特征在于,所述第三源地址为所述第二设备的存储空间的地址,所述第三网络传输报文还包括所述第三源地址,所述第二设备还用于:
    根据所述第三网络传输报文,获得所述第三源地址;
    从所述第二设备的存储空间中读取所述第三数据。
  33. 一种计算机可读存储介质,其特征在于,存储有第一计算机指令和第二计算机指令,所述第一计算机指令和所述第二计算指令分别运行在第一计算节点和第二计算节点上,以实现前述权利要求1至17中任一项所述的第一计算节点与第二计算节点之间的数据处理。
PCT/CN2022/089705 2021-04-30 2022-04-28 一种数据传输方法、数据处理方法及相关产品 WO2022228485A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22794955.9A EP4322003A1 (en) 2021-04-30 2022-04-28 Data transmission method, data processing method, and related product
US18/496,234 US20240061802A1 (en) 2021-04-30 2023-10-27 Data Transmission Method, Data Processing Method, and Related Product

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202110486548.1 2021-04-30
CN202110486548 2021-04-30
CN202110720639.7A CN115269174A (zh) 2021-04-30 2021-06-28 一种数据传输方法、数据处理方法及相关产品
CN202110720639.7 2021-06-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/496,234 Continuation US20240061802A1 (en) 2021-04-30 2023-10-27 Data Transmission Method, Data Processing Method, and Related Product

Publications (1)

Publication Number Publication Date
WO2022228485A1 true WO2022228485A1 (zh) 2022-11-03

Family

ID=83758663

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089705 WO2022228485A1 (zh) 2021-04-30 2022-04-28 一种数据传输方法、数据处理方法及相关产品

Country Status (4)

Country Link
US (1) US20240061802A1 (zh)
EP (1) EP4322003A1 (zh)
CN (1) CN115269174A (zh)
WO (1) WO2022228485A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116755889A (zh) * 2023-08-16 2023-09-15 北京国电通网络技术有限公司 应用于服务器集群数据交互的数据加速方法、装置与设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140281107A1 (en) * 2013-03-15 2014-09-18 Google Inc. Efficient input/output (i/o) operations
CN104166628A (zh) * 2013-05-17 2014-11-26 华为技术有限公司 管理内存的方法、装置和系统
CN105404597A (zh) * 2015-10-21 2016-03-16 华为技术有限公司 数据传输的方法、设备及系统
CN107003904A (zh) * 2015-04-28 2017-08-01 华为技术有限公司 一种内存管理方法、设备和系统
CN110392084A (zh) * 2018-04-20 2019-10-29 伊姆西Ip控股有限责任公司 在分布式系统中管理地址的方法、设备和计算机程序产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140281107A1 (en) * 2013-03-15 2014-09-18 Google Inc. Efficient input/output (i/o) operations
CN104166628A (zh) * 2013-05-17 2014-11-26 华为技术有限公司 管理内存的方法、装置和系统
CN107003904A (zh) * 2015-04-28 2017-08-01 华为技术有限公司 一种内存管理方法、设备和系统
CN105404597A (zh) * 2015-10-21 2016-03-16 华为技术有限公司 数据传输的方法、设备及系统
CN110392084A (zh) * 2018-04-20 2019-10-29 伊姆西Ip控股有限责任公司 在分布式系统中管理地址的方法、设备和计算机程序产品

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116755889A (zh) * 2023-08-16 2023-09-15 北京国电通网络技术有限公司 应用于服务器集群数据交互的数据加速方法、装置与设备
CN116755889B (zh) * 2023-08-16 2023-10-27 北京国电通网络技术有限公司 应用于服务器集群数据交互的数据加速方法、装置与设备

Also Published As

Publication number Publication date
EP4322003A1 (en) 2024-02-14
CN115269174A (zh) 2022-11-01
US20240061802A1 (en) 2024-02-22

Similar Documents

Publication Publication Date Title
CN110892380B (zh) 用于流处理的数据处理单元
US11748278B2 (en) Multi-protocol support for transactions
EP3706394B1 (en) Writes to multiple memory destinations
EP3748510A1 (en) Network interface for data transport in heterogeneous computing environments
JP4768386B2 (ja) 外部デバイスとデータ通信可能なインターフェイスデバイスを有するシステム及び装置
US8806025B2 (en) Systems and methods for input/output virtualization
US8478926B1 (en) Co-processing acceleration method, apparatus, and system
US8316220B2 (en) Operating processors over a network
EP1768033A1 (en) Operating a cell processor over a network
WO2019233322A1 (zh) 资源池的管理方法、装置、资源池控制单元和通信设备
US20220114145A1 (en) Resource Lock Management Method And Apparatus
US20210326177A1 (en) Queue scaling based, at least, in part, on processing load
US20180181421A1 (en) Transferring packets between virtual machines via a direct memory access device
CN114153778A (zh) 跨网络桥接
CN111966446A (zh) 一种容器环境下rdma虚拟化方法
US20240061802A1 (en) Data Transmission Method, Data Processing Method, and Related Product
WO2023174146A1 (zh) 卸载卡命名空间管理、输入输出请求处理系统和方法
US20220358002A1 (en) Network attached mpi processing architecture in smartnics
Shim et al. Design and implementation of initial OpenSHMEM on PCIe NTB based cloud computing
Tang et al. Towards high-performance packet processing on commodity multi-cores: current issues and future directions
WO2023198128A1 (zh) 一种分布式资源共享方法及相关装置
WO2024051311A1 (zh) 数据处理方法、终端设备和可读存储介质
WO2024001850A1 (zh) 数据处理系统、方法、装置和控制器
WO2022193108A1 (zh) 一种集成芯片及数据搬运方法
WO2022179293A1 (zh) 网卡、计算设备和获取数据的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22794955

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022794955

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022794955

Country of ref document: EP

Effective date: 20231110

NENP Non-entry into the national phase

Ref country code: DE