US12335372B2 - Common symmetric memory key for parallel processes - Google Patents

Common symmetric memory key for parallel processes Download PDF

Info

Publication number
US12335372B2
US12335372B2 US18/176,521 US202318176521A US12335372B2 US 12335372 B2 US12335372 B2 US 12335372B2 US 202318176521 A US202318176521 A US 202318176521A US 12335372 B2 US12335372 B2 US 12335372B2
Authority
US
United States
Prior art keywords
memory
parallel computing
key
symmetric
network interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US18/176,521
Other versions
US20240297781A1 (en
Inventor
Manjunath Gorentla Venkata
Artem Yurievich Polyakov
Subhadeep Bhattacharya
Gil Bloch
William Ferrol Aderholdt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mellanox Technologies Ltd
Original Assignee
Mellanox Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mellanox Technologies Ltd filed Critical Mellanox Technologies Ltd
Priority to US18/176,521 priority Critical patent/US12335372B2/en
Assigned to MELLANOX TECHNOLOGIES, LTD. reassignment MELLANOX TECHNOLOGIES, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BLOCH, GIL, Venkata, Manjunath Gorentla, Polyakov, Artem Yurievich, Aderholdt, William Ferrol, Bhattacharya, Subhadeep
Priority to DE102024201855.4A priority patent/DE102024201855A1/en
Priority to CN202410230287.0A priority patent/CN118590220A/en
Publication of US20240297781A1 publication Critical patent/US20240297781A1/en
Application granted granted Critical
Publication of US12335372B2 publication Critical patent/US12335372B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0816Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
    • H04L9/0819Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s)
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0894Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/14Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using a plurality of keys or algorithms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/12Details relating to cryptographic hardware or logic circuitry
    • H04L2209/125Parallelization or pipelining, e.g. for accelerating processing of cryptographic operations

Definitions

  • the present invention relates to computer systems, and in particular, but not exclusively, to remote memory access.
  • An end point device such as a management node, may initiate a parallel computing job with other end point device across a network.
  • switches in the network may also participate in the parallel computing job by providing the various end point devices with the necessary data to perform the parallel computing job and also by performing such tasks as addition.
  • Parallel computing jobs may use remote direct memory access (RDMA) for data transfers between processes running on different end point devices.
  • RDMA remote direct memory access
  • NICs Advanced network interface controllers
  • NICs are designed to support RDMA operations, in which the NIC transfers data by direct memory access from the memory of one computer into that of another.
  • InfiniBand and other RDMA-capable networks require pinning (e.g., registering) a remotely accessible memory to implement Remote Memory Access (RMA).
  • RMA Remote Memory Access
  • the RMA initiator needs a remote key (RKey) to access the corresponding memory.
  • RKey remote key
  • each process participating in the job generates a unique RKey per remotely accessible memory region.
  • HCA Host Channel Adapter
  • NIC network interface controller
  • the different processes distribute the different RKeys for the job among themselves in order to facilitate RDMA data transfers between the processes on different hosts.
  • Each process stores a mapping table (RKey table) containing the RKeys of all memory regions of the different processes on the different hosts participating in the job.
  • RKey table The RKey table's size scales linearly with the number of memory regions and processes in the job.
  • the combined size of all RKey tables on a High-Performance Computing (HPC) compute node can grow up to the order of hundreds of Gigabytes.
  • RDMA network thrashing impacting not only the network operations but also the compute part of the job.
  • a parallel computing system including a key manager to assign symmetric memory keys to parallel computing jobs including a first symmetric memory key to a first parallel computing job, and a plurality of server nodes to execute parallel computing processes of the first parallel computing job, and cause registration of host memory regions of the server nodes with the assigned first symmetric memory key in corresponding network interface controllers of the server nodes so that different ones of the host memory regions are accessible with the first symmetric memory key by remote ones of the server nodes using remote direct memory access.
  • the server nodes include a first server node and a second server node, the first server node being to access one of the host memory regions of the second server node using remote direct memory access with the first symmetric memory key.
  • the server nodes include a first server node having a first host memory and a first network interface controller, and a second server node having a second host memory and a second interface controller, the first network interface controller and the second interface controller providing network access to the first server node and the second server node, respectively, and wherein the first server node is to execute a first parallel computing process of the first parallel computing job, receive the first symmetric memory key, select a first memory region of the first host memory, and provide the first symmetric memory key and an identification of the first memory region to the first network interface controller, the first network interface controller is to receive the first symmetric memory key and the identification of the first memory region, and perform a first registration of the first memory region with the first symmetric memory key, the second server node is to execute a second parallel computing process of the first parallel computing job, receive the first symmetric memory key, select a second memory region of the second host memory, and provide the first symmetric memory key and an identification of the second memory region to the second network
  • the first network interface controller is to receive from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key
  • the second network interface controller is to receive from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key.
  • the first network interface controller is to receive from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key, find the first memory region based on the first symmetric memory key included in the first remote direct memory access request and the first registration, and write data to, or retrieve data from, the first memory region of the first host memory responsively to the found first memory region and the first remote direct memory access request
  • the second network interface controller is to receive from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key, find the second memory region based on the first symmetric memory key included in the second remote direct memory access request and the second registration, and write data to, or retrieve data from, the second memory region of the second host memory responsively to the found second memory region and the second remote direct memory access request.
  • the first parallel computing process includes a first parallel computing model and a first network library
  • the first parallel computing model is to receive the first symmetric memory key, select the first memory region of the first host memory, and provide the first symmetric memory key and the identification of the first memory region to the first network library
  • the first network library is to provide the first symmetric memory key and the identification of the first memory region to the first network interface controller
  • the second parallel computing process includes a second parallel computing model and a second network library
  • the second parallel computing model is to receive the first symmetric memory key, select the second memory region of the second host memory, and provide the first symmetric memory key and the identification of the second memory region to the second network library
  • the second network library is to provide the first symmetric memory key and the identification of the second memory region to the second network interface controller.
  • the key manager is to maintain a list of which of the symmetric memory keys are unassigned to active parallel computing jobs and which of the symmetric memory keys are assigned to the active parallel computing jobs, and assign one of the unassigned symmetric memory keys to the first parallel computing job.
  • the system includes a resource manager to establish the parallel computing processes of the first parallel computing job on the server nodes, request an unassigned symmetric memory key from the key manager for the first parallel computing job, receive the first symmetric memory key from the key manager for the first parallel computing job, and provide the first symmetric memory key to the parallel computing processes of the first parallel computing job running on the server nodes, wherein the key manager is to indicate in the list that the first symmetric memory key is assigned to the first parallel computing job.
  • the key manager is to assign the same symmetric memory key to two different parallel computing jobs which do not use one or more common ones of the network interface controllers.
  • the key manager is to assign multiple symmetric memory keys to the first parallel computing job, wherein each of the server nodes is to cause registration of the multiple memory keys with corresponding multiple memory regions in a corresponding one of the network interface controllers.
  • a server node device including a network interface controller to send and receive packets over a network, and a host memory, and a processor to execute a first parallel computing process of a parallel computing job, receive a symmetric memory key from a key manager, cause registration of a host memory region of the host memory with the symmetric memory key in the network interface controller so that the host memory region is accessible via the network interface controller by different remote server nodes with the symmetric memory key using remote direct memory access, send a first remote direct memory access request including the symmetric memory key to access a host memory of a first remote server node executing a second parallel computing process of the parallel computing job with the symmetric memory key, and send a second remote direct memory access request including the symmetric memory key to access a host memory of a second remote server node executing a third parallel computing process of the parallel computing job with the symmetric memory key.
  • the processor is to select the memory region of the host memory, and provide the symmetric memory key and an identification of the memory region to the network interface controller, and the network interface controller is to receive the symmetric memory key and the identification of the memory region, and perform the registration of the memory region with the symmetric memory key.
  • the first parallel computing process includes a first parallel computing model and a first network library.
  • the first parallel computing model is to receive the symmetric memory key, select the memory region of the host memory, and provide the symmetric memory key and the identification of the memory region to the first network library, and the first network library is to provide the symmetric memory key and the identification of the memory region to the network interface controller.
  • a parallel computing method including assigning symmetric memory keys to parallel computing jobs including a first symmetric memory key to a first parallel computing job, executing parallel computing processes of the first parallel computing job, causing registration of host memory regions of server nodes with the assigned first symmetric memory key in corresponding network interface controllers of the server nodes so that different ones of the host memory regions are accessible with the first symmetric memory key by remote ones of the server nodes using remote direct memory access.
  • the method includes accessing, by a first server node, a host memory region of a second server node using remote direct memory access with the first symmetric memory key.
  • the method includes by a first server node executing a first parallel computing process of the first parallel computing job, receiving the first symmetric memory key, selecting a first memory region of a first host memory of the first server node, and providing the first symmetric memory key and an identification of the first memory region to a first network interface controller of the first server node, by the first network interface controller receiving the first symmetric memory key and the identification of the first memory region, and performing a first registration of the first memory region with the first symmetric memory key, by a second server node executing a second parallel computing process of the first parallel computing job, receiving the first symmetric memory key, selecting a second memory region of a second host memory of the second server node, and providing the first symmetric memory key and an identification of the second memory region to a second network interface controller of the second server node, and by the second network interface controller receiving the first symmetric memory key and the identification of the second memory region, and performing a second registration of the second first memory region with the first symmetric
  • the method includes receiving, by the first network interface controller, from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key, and receiving, by the second network interface controller, from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key.
  • the method includes by the first network interface controller receiving from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key, finding the first memory region based on the first symmetric memory key included in the first remote direct memory access request and the first registration, and writing data to, or retrieving data from, the first memory region of the first host memory responsively to the found first memory region and the first remote direct memory access request, and by the second network interface controller receiving from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key, finding the second memory region based on the first symmetric memory key included in the second remote direct memory access request and the second registration, and writing data to, or retrieving data from, the second memory region of the second host memory responsively to the found second memory region and the second remote direct memory access request.
  • the first parallel computing process includes a first parallel computing model and a first network library, the second parallel computing process including a second parallel computing model and a second network library, the method further including by the first parallel computing model receiving the first symmetric memory key, selecting the first memory region of the first host memory, and providing the first symmetric memory key and the identification of the first memory region to the first network library, providing, by the first network library, the first symmetric memory key and the identification of the first memory region to the first network interface controller, by the second parallel computing model receiving the first symmetric memory key, selecting the second memory region of the second host memory, and providing the first symmetric memory key and the identification of the second memory region to the second network library, and providing, by the second network library, the first symmetric memory key and the identification of the second memory region to the second network interface controller.
  • the method includes maintaining a list of which of the symmetric memory keys are unassigned to active parallel computing jobs and which of the symmetric memory keys are assigned to the active parallel computing jobs, and assigning one of the unassigned symmetric memory keys to the first parallel computing job.
  • the method includes establishing the parallel computing processes of the first parallel computing job on the server nodes, requesting an unassigned symmetric memory key from a key manager for the first parallel computing job, receiving the first symmetric memory key from the key manager for the first parallel computing job, providing the first symmetric memory key to the parallel computing processes of the first parallel computing job running on the server nodes, and indicating in the list that the first symmetric memory key is assigned to the first parallel computing job.
  • the method includes assigning the same symmetric memory key to two different parallel computing jobs which do not use one or more common ones of the network interface controllers.
  • the method includes assigning multiple symmetric memory keys to the first parallel computing job, and causing registration of the multiple memory keys with corresponding multiple memory regions in a corresponding one of the network interface controllers.
  • a parallel computing method including sending and receiving packets over a network, executing a first parallel computing process of a parallel computing job, receiving a symmetric memory key from a key manager, causing registration of a host memory region of a host memory with the symmetric memory key in a network interface controller so that the host memory region is accessible via the network interface controller by different remote server nodes with the symmetric memory key using remote direct memory access, sending a first remote direct memory access request including the symmetric memory key to access a host memory of a first remote server node executing a second parallel computing process of the parallel computing job with the symmetric memory key, and sending a second remote direct memory access request including the symmetric memory key to access a host memory of a second remote server node executing a third parallel computing process of the parallel computing job with the symmetric memory key.
  • the method includes selecting the memory region of the host memory, providing the symmetric memory key and an identification of the memory region to the network interface controller, receiving, by the network interface controller, the symmetric memory key, and the identification of the memory region, and performing the registration of the memory region with the symmetric memory key.
  • FIG. 1 is a block diagram view of a parallel computing system constructed and operative in accordance with an embodiment of the present invention
  • FIG. 2 is a flowchart including steps in a method performed by a resource manager in the system of FIG. 1 ;
  • FIG. 3 is a flowchart including steps in a method performed by a key manager in the system of FIG. 1 ;
  • FIG. 4 is a block diagram view of the system of FIG. 1 illustrating memory key registration
  • FIG. 5 is a flowchart including steps in a method performed by a processor in a server node of the system of FIG. 1 to initiate memory key registration;
  • FIG. 6 is a block diagram view of a parallel computing process for use in the system of FIG. 1 ;
  • FIG. 7 is a flowchart including steps in a method performed by a network interface controller in a server node of the system of FIG. 1 performing memory key registration;
  • FIG. 8 is a block diagram view of the system of FIG. 1 illustrating two server nodes sending RDMA requests to two other server nodes;
  • FIG. 9 is a flowchart including steps in a method performed by a parallel computing process in the system of FIG. 1 to generate and send an RDMA request;
  • FIG. 10 is a flowchart including steps in a method performed by a network interface controller in the system of FIG. 1 processing a received RDMA request;
  • FIG. 11 is a block diagram view of the system of FIG. 1 showing a server node sending RDMA requests to different server nodes;
  • FIG. 12 is a block diagram view of the system of FIG. 1 showing two different processes of different parallel computing jobs sending RDMA requests to the same server node.
  • the underlying networks require HPC software such as network libraries, and programming model implementations to register and pin the data buffers' memory. On registering and pinning, the information required to access this memory remotely is encapsulated in an RKey object.
  • a parallel application is a set of processes on different central processing unit (CPU) cores either in the same or different nodes working together to solve a single problem.
  • the parallel application is comprised of Execution Units (EUs) that perform computation and initiate communications.
  • EUs are typically represented by an operating system (OS) processes or threads.
  • OS operating system
  • a job may include one or more parallel applications, it should be noted that for the sake of simplicity the terms “job”, “parallel computing job”, “parallel application” and “application” are used interchangeably herein assuming that a job includes a single parallel application. However, it should be noted that embodiments of the present invention may be used with a job including multiple parallel applications.
  • each of the EUs is required to hold the RKey for every memory region accessed with RDMA operations.
  • this results in each process holding a large number of RKeys and causing significant memory overhead.
  • a typical RKey table memory consumption increases with the number of processes and RDMA-exposed memory regions in the parallel application which, as mentioned earlier, can be an order of millions and is typically proportional to the number of CPU cores used by clusters and supercomputers.
  • the combined size of all RKey tables on a High-Performance Computing (HPC) compute node can grow up to the order of hundreds of Gigabytes.
  • RDMA network
  • embodiments of the present invention solve at least some of the above drawbacks by using a single memory key (termed “symmetric memory key” or “SM key”) for use by all the processes of a job to provide remote direct memory access to host memory instead of using RKeys which are unique to each registered host memory region for each process of the job.
  • the SM key includes information for accessing a single memory region per process (e.g., EU) for all the processes (e.g., EUs) in the job. As a consequence, it is not required to build and maintain an RKey table.
  • a job includes multiple parallel applications
  • the terms “job”, “parallel computing job”, “parallel application” and “application” are used interchangeably herein based on the assumption that a job generally includes a single parallel application.
  • an SM Key is used for all the processes in each parallel application in that job.
  • a resource manager requests a new SM key for a new job from a key manager.
  • the resource manager may provide other functions such as establishing parallel computing processes of the new job on server nodes.
  • the key manager tracks which SM keys have been assigned to jobs and provides an unassigned SM key to the new job. When a job completes processing, the SM key assigned to that job generally returns to the pool of unassigned SM keys.
  • the same SM key may be assigned for use by different jobs if the different jobs are not accessing a common NIC.
  • the same SM key cannot be used by different jobs if those jobs are accessing one or more same NICs. Therefore, in some embodiments, the key manager tracks the use of SM Keys assigned to jobs and the NICs and/or server nodes participating in the jobs.
  • the resource manager may request that more than one SM key from the key manager be assigned to a parallel application in order to provide more memory regions for use by that parallel application during execution.
  • a parallel application may be assigned SM Key 1 A for registering with a first memory region and SM Key 1 B for registering with a second memory region, and so on.
  • an RDMA request for the parallel application may use SM Key 1 A or SM Key 1 B.
  • the resource manager may request one or more additional SM Keys while a job is already running, for example, if there is an additional memory need.
  • the resource manager receives the new SM key from the key manager and distributes the received SM Key to the processes of the job running on the different server nodes.
  • a process (of the new job) e.g., EU
  • the NIC receives the new SM key from the key manager and distributes the received SM Key to the processes of the job running on the different server nodes.
  • a process (of the new job) e.g., EU
  • the resource manager receives the new SM key from the key manager and distributes the received SM Key to the processes of the job running on the different server nodes.
  • a process (of the new job) e.g., EU
  • ID identification
  • the NIC registers the new SM key with the selected memory region. In such a manner all the memory regions on all the servers participating in this new job can access the host memory of each other using the SM key as each local NIC knows the mapping between the SM key and the memory region in its local host memory assigned to be used by the new job.
  • the EU runs some HPC software stack including a parallel programming model and network library which may cause the NIC to register the SM Key with the selected memory region.
  • the parallel programming model may deliver the selected memory region and new SM key to the network library, which programs the NIC with this information to register the SM key with the selected memory region in NIC memory.
  • FIG. 1 is a block diagram view of a parallel computing system 10 constructed and operative in accordance with an embodiment of the present invention.
  • the system 10 includes a key manager 12 , a resource manager 14 , and a plurality of server nodes 18 .
  • the key manager 12 and the resource manager 14 may be disposed in independent network nodes or servers, or may be disposed in the same network node 16 or server.
  • Each server node 18 includes a processor 20 (e.g., central processing unit (CPU) and/or graphics processing unit (GPU) or Field Programmable Gate Array(s) (FPGA(s)) and/or accelerator(s)), host memory 22 , and network interface controller (NIC) 24 .
  • the network interface controller 24 of each server node 18 provides network access for its server node 18 , and is configured to send and receive packets over a network.
  • processor 20 may be combined in a single physical component or, alternatively, implemented using multiple physical components. These physical components may comprise hard-wired or programmable devices, or a combination of the two. In some embodiments, at least some of the functions of the processor 20 may be carried out by a programmable processor under the control of suitable software. This software may be downloaded to a device in electronic form, over a network, for example. Alternatively, or additionally, the software may be stored in tangible, non-transitory computer-readable storage media, such as optical, magnetic, or electronic memory.
  • the processors 20 of the server nodes 18 execute parallel computing processes of one or more parallel computing jobs, e.g., parallel computing job 1 .
  • the key manager 12 is configured to assign symmetric memory keys (SMKs) 26 (e.g., upon receiving requests from the resource manager 14 ) to the parallel computing jobs including SMK 1 to parallel computing job 1 .
  • SSKs symmetric memory keys
  • Each processor 20 shown in FIG. 1 is shown executing a parallel computing process (PCP) 28 .
  • PCP parallel computing process
  • the processor 20 of server node 1 is shown executing PCP 1
  • the processor 20 of server node 2 is shown executing PCP 2 .
  • the server nodes 18 cause registration of respective host memory regions of the host memories 22 of the server nodes 18 with the assigned SMK 1 in the corresponding network interface controllers 24 of the server nodes 18 so that different host memory regions are accessible with SMK 1 by remote server nodes 18 using remote direct memory access.
  • each server node 18 causes registration of its local host memory region with the assigned SMK 1 in its network interface controller 24 so that its host memory region is accessible with SMK 1 by remote server nodes 18 using remote direct memory access.
  • server node 1 is configured to access the host memory region of server node 2 using remote direct memory access with SMK 1 . Memory registration is described in more detail below.
  • FIG. 2 is a flowchart 200 including steps in a method performed by the resource manager 14 in the system 10 of FIG. 1 .
  • the resource manager 14 is configured to: establish the parallel computing processes 28 of parallel computing job 1 on the server nodes 18 (block 202 ); request an unassigned symmetric memory key from the key manager 12 for parallel computing job 1 (block 204 ); receive the SMK 1 from the key manager 12 for parallel computing job 1 (block 206 ); and provide (instances of) SMK 1 to the parallel computing processes 28 of parallel computing job 1 running on the server nodes 28 (block 208 ).
  • FIG. 3 is a flowchart 300 including steps in a method performed by the key manager 12 in the system 10 of FIG. 1 .
  • the key manager 12 is configured to: maintain a list 34 of which symmetric memory keys 26 are assigned to the active parallel computing jobs (block 302 ) (and which network interface controllers 24 are being used in the jobs to prevent the same key being assigned to different jobs using one or more of common network interface controllers 24 ) and optionally which symmetric memory keys 26 are unassigned to active parallel computing jobs and; assign one of the unassigned symmetric memory keys 26 (e.g., SMK 1 ) to parallel computing job 1 in response to the request from the resource manager 14 (block 304 ); indicate in the list 34 that SMK 1 is assigned to parallel computing job 1 (block 306 ); and in response to parallel computing job 1 being completed, update the list 34 to show that SMK 1 is now unassigned to an active parallel computing job (block 308 ).
  • An active parallel computing e.g., SMK 1
  • the key manager 12 may be configured to assign the same symmetric memory key to two or more different parallel computing jobs which do not use one or more common ones of the network interface controllers 24 .
  • the key manager 12 may be configured to assign multiple symmetric memory keys to parallel computing job 1 (in response to a request by the resource manager 14 ).
  • each server node 18 is configured to cause registration of the multiple memory keys (assigned by the key manager 12 to parallel computing job 1 ) with corresponding multiple memory regions in a corresponding one of the network interface controllers 24 .
  • the key manager 12 may encrypt and/or sign symmetric memory key(s) 26 to create a secure channel between the key manager 12 and the network interface controllers 24 of the server nodes 18 . Therefore, in some embodiments, the network interface controller 24 includes one or more secret keys to decrypt and/or verify the symmetric memory key(s) 26 .
  • FIG. 4 is a block diagram view of the system 10 of FIG. 1 illustrating memory key registration.
  • FIG. 5 is a flowchart 500 including steps in a method performed by the processor 20 in server node 1 of the system 10 of FIG. 1 to initiate memory key registration.
  • the processor 20 of server node 1 is configured to: execute parallel computing process 1 (PCP 1 ) of parallel computing job 1 (block 502 ); receive SMK 1 from the resource manager 14 (block 504 ); select a memory region 30 (MR 1 ) of the host memory 22 of server node 1 (block 506 ); and cause registration (block 32 ) of MR 1 with SMK 1 in the network interface controller 24 of server node 1 so that MR 1 is accessible via that network interface controller 24 by different remote server nodes 28 (i.e., server nodes other than server node 1 which are also part of parallel computing job 1 ) with SMK 1 using remote direct memory access (RDMA), including providing SMK 1 and an identification of MR 1 to the network interface controller 24 of server node 1 (block 508 ).
  • PCP 1 parallel computing process 1
  • RDMA remote direct memory access
  • each of the other server nodes 18 that receive SMK 1 from the resource manager 14 except that in each server node 18 a different process is performed (e.g., PCP 2 in server node 2 ) and a different host memory region 30 may be selected (e.g. MR 2 in server node 2 ) and registered (block 32 ) by the network interface controller 24 of that server node 18 (e.g., MR 2 is registered with SMK 1 by the network interface controller 24 of server node 2 ).
  • a different process e.g., PCP 2 in server node 2
  • a different host memory region 30 may be selected (e.g. MR 2 in server node 2 ) and registered (block 32 ) by the network interface controller 24 of that server node 18 (e.g., MR 2 is registered with SMK 1 by the network interface controller 24 of server node 2 ).
  • FIG. 6 is a block diagram view of an example of parallel computing process 1 (PCP 1 ) for use in the system 10 of FIG. 1 .
  • Parallel computing process 1 (block 600 ) may include parallel computing model 1 (block 602 ) and network library 1 (block 604 ).
  • One or more of the steps of blocks 502 - 508 may be performed by parallel computing model 1 and one or more of the steps of blocks 502 - 508 may be performed by network library 1 .
  • parallel computing model 1 may perform the steps of blocks 502 - 506 and provide SMK 1 and the identification of MR 1 to network library 1 which then performs the step of block 508 .
  • PCP 2 may include parallel computing model 2 and network library 2 , and so on.
  • one or more of the steps of blocks 502 - 508 may be performed by parallel computing model 2 and one or more of the steps of blocks 502 - 508 may be performed by network library 2 for PCP 2 .
  • parallel computing model 2 may perform the steps of blocks 502 - 506 and provide SMK 1 and the identification of MR 2 to network library 2 which then performs the step of block 508 .
  • FIG. 7 is a flowchart 700 including steps in a method performed by the network interface controller 24 in server node 1 of the system 10 of FIG. 1 performing memory key registration (block 32 ).
  • the network interface controller 24 of server node 1 is configured to: receive SMK 1 and the identification of MR 1 (block 702 ); and perform a registration (block 32 ) of MR 1 with SMK 1 (block 704 ).
  • the network interface controllers 24 of the other server nodes 18 perform similar operations.
  • the network interface controller 24 of server node 2 is configured to receive SMK 1 and the identification of MR 2 , and perform a registration of MR 2 with SMK 1 .
  • FIG. 8 is a block diagram view of the system 10 of FIG. 1 illustrating two server nodes (server nodes 3 and 4 ) sending RDMA requests 36 - 1 , 36 - 2 to two other server nodes (server nodes 1 and 2 , respectively).
  • FIG. 9 is a flowchart 900 including steps in a method performed by a parallel computing process (e.g., PCP 3 ) in the system 10 of FIG. 1 to generate and send the RDMA request 36 - 1 .
  • a parallel computing process e.g., PCP 3
  • PCP 3 is configured to find the symmetric memory key 26 (i.e., SMK 1 ) for parallel computing job 1 of PCP 3 in a memory (block 902 ), which may include a table of symmetric memory keys 26 for the parallel computing jobs being run by server node 3 .
  • PCP 3 is configured to generate the RDMA request 36 - 1 with the found SM key (i.e., SMK 1 ) (block 904 ), and send the generated RDMA request 36 - 1 to a remote server (e.g., server node 1 ), which is executing PCP 1 of (the same) parallel computing job 1 (block 906 ).
  • FIG. 8 also shows PCP 4 running on server node 4 performing similar steps to generate the RDMA request 36 - 2 and sent the RDMA request 36 - 2 to server node 2 .
  • FIG. 10 is a flowchart 1000 including steps in a method performed by the network interface controller 24 of server node 1 in the system 10 of FIG. 1 processing the received RDMA request 36 - 1 .
  • FIG. 8 is also made to FIG. 10 .
  • the network interface controller 24 of server node 1 is configured to: receive from PCP 3 of the parallel computing job 1 the RDMA request 36 - 1 including SMK 1 (block 1002 ); find the memory region (i.e., MR 1 ) in the host memory 22 of server node 1 corresponding to SMK 1 based on SMK 1 being included in the RDMA request 36 - 1 and the registration (block 32 ) previously performed mapping SMK 1 to MR 1 (block 1004 ); and write data to, or retrieve data from, MR 1 of the host memory 22 of server node 1 responsively to the found memory region 30 (i.e., MR 1 ) and the RDMA request 36 - 1 , which is requesting performance of the RDMA process (block 1006 ).
  • the network interface controller 24 of server node 2 is configured to: receive from PCP 4 of the parallel computing job 1 the RDMA request 36 - 2 including SMK 1 ; find MR 2 in the host memory 22 of server node 2 based on SMK 1 included in the RDMA request 36 - 2 and the registration (block 32 ) previously performed mapping SMK 1 to MR 2 ; and write data to, or retrieve data from, MR 2 of the host memory 22 of server node 2 responsively to the found memory region 30 (i.e., MR 2 ) and the RDMA request 36 - 2 , which is requesting performance of the RDMA process.
  • FIG. 11 is a block diagram view of the system 10 of FIG. 1 showing server node 1 sending RDMA requests 36 - 3 , 36 - 4 to different server nodes (server nodes 3 and 4 , respectively).
  • PCP 1 running on the processor 20 of server node 1 is configured to generate and send RDMA request 36 - 3 including SMK 1 to access the host memory 22 of remote server node 3 (which is executing PCP 3 of parallel computing job 1 ) with SMK 1 .
  • PCP 1 running on the processor 20 of server node 1 is configured to generate and send RDMA request 36 - 4 including SMK 1 to access the host memory 22 of remote server node 4 (which is executing PCP 4 of parallel computing job 1 ) with SMK 1 .
  • FIG. 12 is a block diagram view of the system 10 of FIG. 1 showing two different processes 28 of different parallel computing jobs (PCP 2 of job 1 and PCP 4 of job 3 ) sending RDMA requests to the same server node (server node 1 ).
  • PCP 2 and PCP 4 may be running on the same server node 18 or on different server nodes 18 .
  • PCP 2 of parallel computing job 1 running on a server node is configured to generate and send an RDMA request 36 - 5 including SMK 1 to access the host memory 22 of server node 1 (which is executing PCP 1 of parallel computing job 1 ) with SMK 1 .
  • PCP 4 of parallel computing job 3 is running on a server node (which is remote to server node 1 ) and is configured to generate and send an RDMA request 36 - 6 including SMK 3 (as SMK 3 is the SM key assigned by the key manager 12 to parallel computing job 3 ) to access the host memory 22 of server node 1 (which is executing PCP 9 of parallel computing job 3 ) with SMK 3 .
  • the network interface controller 24 of server node 1 is configured to: receive from PCP 2 of the parallel computing job 1 the RDMA request 36 - 5 including SMK 1 ; find MR 1 in the host memory 22 of server node 1 based on SMK 1 being included in the RDMA request 36 - 5 and the registration (block 32 ) previously performed mapping SMK 1 to MR 1 ; and write data to, or retrieve data from, MR 1 of the host memory 22 of server node 1 responsively to the found memory region 30 (i.e., MR 1 ) and the RDMA request 36 - 5 , which is requesting performance of the RDMA process.
  • the network interface controller 24 of server node 1 is configured to: receive from PCP 4 of the parallel computing job 3 the RDMA request 36 - 6 including SMK 3 ; find MR 6 in the host memory 22 of server node 1 based on SMK 3 being included in the RDMA request 36 - 6 and the registration (block 32 ) previously performed mapping SMK 3 to MR 6 ; and write data to, or retrieve data from, MR 6 of the host memory 22 of server node 1 responsively to the found memory region 30 (i.e., MR 6 ) and the RDMA request 36 - 6 , which is requesting performance of the RDMA process.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Multi Processors (AREA)
  • Storage Device Security (AREA)

Abstract

In one embodiment, a parallel computing system includes a key manager to assign symmetric memory keys to parallel computing jobs including a first symmetric memory key to a first parallel computing job, and a plurality of server nodes to execute parallel computing processes of the first parallel computing job, and cause registration of host memory regions of the server nodes with the assigned first symmetric memory key in corresponding network interface controllers of the server nodes so that different ones of the host memory regions are accessible with the first symmetric memory key by remote ones of the server nodes using remote direct memory access.

Description

FIELD OF THE INVENTION
The present invention relates to computer systems, and in particular, but not exclusively, to remote memory access.
BACKGROUND
An end point device, such as a management node, may initiate a parallel computing job with other end point device across a network. In addition to the other end point devices participating in the parallel computing job, switches in the network may also participate in the parallel computing job by providing the various end point devices with the necessary data to perform the parallel computing job and also by performing such tasks as addition.
Parallel computing jobs may use remote direct memory access (RDMA) for data transfers between processes running on different end point devices. Advanced network interface controllers (NICs) are designed to support RDMA operations, in which the NIC transfers data by direct memory access from the memory of one computer into that of another.
InfiniBand and other RDMA-capable networks require pinning (e.g., registering) a remotely accessible memory to implement Remote Memory Access (RMA). During the data transfer, the RMA initiator needs a remote key (RKey) to access the corresponding memory. In current parallel programming models, each process participating in the job generates a unique RKey per remotely accessible memory region. For example, a Host Channel Adapter (HCA) or network interface controller (NIC) provides an RKey to its local process running on a local host in response to a registration request from the local process to register a memory region of the local host memory. The different processes distribute the different RKeys for the job among themselves in order to facilitate RDMA data transfers between the processes on different hosts. Each process stores a mapping table (RKey table) containing the RKeys of all memory regions of the different processes on the different hosts participating in the job. The RKey table's size scales linearly with the number of memory regions and processes in the job. The combined size of all RKey tables on a High-Performance Computing (HPC) compute node can grow up to the order of hundreds of Gigabytes. Moreover, as the RKey table is searched on every network (RDMA) operation, this causes cache thrashing impacting not only the network operations but also the compute part of the job.
SUMMARY
There is provided in accordance with an embodiment of the present disclosure, a parallel computing system, including a key manager to assign symmetric memory keys to parallel computing jobs including a first symmetric memory key to a first parallel computing job, and a plurality of server nodes to execute parallel computing processes of the first parallel computing job, and cause registration of host memory regions of the server nodes with the assigned first symmetric memory key in corresponding network interface controllers of the server nodes so that different ones of the host memory regions are accessible with the first symmetric memory key by remote ones of the server nodes using remote direct memory access.
Further in accordance with an embodiment of the present disclosure the server nodes include a first server node and a second server node, the first server node being to access one of the host memory regions of the second server node using remote direct memory access with the first symmetric memory key.
Still further in accordance with an embodiment of the present disclosure the server nodes include a first server node having a first host memory and a first network interface controller, and a second server node having a second host memory and a second interface controller, the first network interface controller and the second interface controller providing network access to the first server node and the second server node, respectively, and wherein the first server node is to execute a first parallel computing process of the first parallel computing job, receive the first symmetric memory key, select a first memory region of the first host memory, and provide the first symmetric memory key and an identification of the first memory region to the first network interface controller, the first network interface controller is to receive the first symmetric memory key and the identification of the first memory region, and perform a first registration of the first memory region with the first symmetric memory key, the second server node is to execute a second parallel computing process of the first parallel computing job, receive the first symmetric memory key, select a second memory region of the second host memory, and provide the first symmetric memory key and an identification of the second memory region to the second network interface controller, and the second network interface controller is to receive the first symmetric memory key and the identification of the second memory region, and perform a second registration of the second first memory region with the first symmetric memory key.
Additionally in accordance with an embodiment of the present disclosure the first network interface controller is to receive from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key, and the second network interface controller is to receive from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key.
Moreover, in accordance with an embodiment of the present disclosure the first network interface controller is to receive from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key, find the first memory region based on the first symmetric memory key included in the first remote direct memory access request and the first registration, and write data to, or retrieve data from, the first memory region of the first host memory responsively to the found first memory region and the first remote direct memory access request, and the second network interface controller is to receive from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key, find the second memory region based on the first symmetric memory key included in the second remote direct memory access request and the second registration, and write data to, or retrieve data from, the second memory region of the second host memory responsively to the found second memory region and the second remote direct memory access request.
Further in accordance with an embodiment of the present disclosure the first parallel computing process includes a first parallel computing model and a first network library, the first parallel computing model is to receive the first symmetric memory key, select the first memory region of the first host memory, and provide the first symmetric memory key and the identification of the first memory region to the first network library, the first network library is to provide the first symmetric memory key and the identification of the first memory region to the first network interface controller, the second parallel computing process includes a second parallel computing model and a second network library, the second parallel computing model is to receive the first symmetric memory key, select the second memory region of the second host memory, and provide the first symmetric memory key and the identification of the second memory region to the second network library, and the second network library is to provide the first symmetric memory key and the identification of the second memory region to the second network interface controller.
Still further in accordance with an embodiment of the present disclosure the key manager is to maintain a list of which of the symmetric memory keys are unassigned to active parallel computing jobs and which of the symmetric memory keys are assigned to the active parallel computing jobs, and assign one of the unassigned symmetric memory keys to the first parallel computing job.
Additionally in accordance with an embodiment of the present disclosure, the system includes a resource manager to establish the parallel computing processes of the first parallel computing job on the server nodes, request an unassigned symmetric memory key from the key manager for the first parallel computing job, receive the first symmetric memory key from the key manager for the first parallel computing job, and provide the first symmetric memory key to the parallel computing processes of the first parallel computing job running on the server nodes, wherein the key manager is to indicate in the list that the first symmetric memory key is assigned to the first parallel computing job.
Moreover, in accordance with an embodiment of the present disclosure the key manager is to assign the same symmetric memory key to two different parallel computing jobs which do not use one or more common ones of the network interface controllers.
Further in accordance with an embodiment of the present disclosure the key manager is to assign multiple symmetric memory keys to the first parallel computing job, wherein each of the server nodes is to cause registration of the multiple memory keys with corresponding multiple memory regions in a corresponding one of the network interface controllers.
There is also provided in accordance with another embodiment of the present disclosure, a server node device, including a network interface controller to send and receive packets over a network, and a host memory, and a processor to execute a first parallel computing process of a parallel computing job, receive a symmetric memory key from a key manager, cause registration of a host memory region of the host memory with the symmetric memory key in the network interface controller so that the host memory region is accessible via the network interface controller by different remote server nodes with the symmetric memory key using remote direct memory access, send a first remote direct memory access request including the symmetric memory key to access a host memory of a first remote server node executing a second parallel computing process of the parallel computing job with the symmetric memory key, and send a second remote direct memory access request including the symmetric memory key to access a host memory of a second remote server node executing a third parallel computing process of the parallel computing job with the symmetric memory key.
Still further in accordance with an embodiment of the present disclosure the processor is to select the memory region of the host memory, and provide the symmetric memory key and an identification of the memory region to the network interface controller, and the network interface controller is to receive the symmetric memory key and the identification of the memory region, and perform the registration of the memory region with the symmetric memory key.
Additionally in accordance with an embodiment of the present disclosure the first parallel computing process includes a first parallel computing model and a first network library.
Moreover, in accordance with an embodiment of the present disclosure the first parallel computing model is to receive the symmetric memory key, select the memory region of the host memory, and provide the symmetric memory key and the identification of the memory region to the first network library, and the first network library is to provide the symmetric memory key and the identification of the memory region to the network interface controller.
There is also provided in accordance with still another embodiment of the present disclosure, a parallel computing method, including assigning symmetric memory keys to parallel computing jobs including a first symmetric memory key to a first parallel computing job, executing parallel computing processes of the first parallel computing job, causing registration of host memory regions of server nodes with the assigned first symmetric memory key in corresponding network interface controllers of the server nodes so that different ones of the host memory regions are accessible with the first symmetric memory key by remote ones of the server nodes using remote direct memory access.
Further in accordance with an embodiment of the present disclosure, the method includes accessing, by a first server node, a host memory region of a second server node using remote direct memory access with the first symmetric memory key.
Still further in accordance with an embodiment of the present disclosure, the method includes by a first server node executing a first parallel computing process of the first parallel computing job, receiving the first symmetric memory key, selecting a first memory region of a first host memory of the first server node, and providing the first symmetric memory key and an identification of the first memory region to a first network interface controller of the first server node, by the first network interface controller receiving the first symmetric memory key and the identification of the first memory region, and performing a first registration of the first memory region with the first symmetric memory key, by a second server node executing a second parallel computing process of the first parallel computing job, receiving the first symmetric memory key, selecting a second memory region of a second host memory of the second server node, and providing the first symmetric memory key and an identification of the second memory region to a second network interface controller of the second server node, and by the second network interface controller receiving the first symmetric memory key and the identification of the second memory region, and performing a second registration of the second first memory region with the first symmetric memory key.
Additionally in accordance with an embodiment of the present disclosure, the method includes receiving, by the first network interface controller, from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key, and receiving, by the second network interface controller, from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key.
Moreover in accordance with an embodiment of the present disclosure, the method includes by the first network interface controller receiving from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key, finding the first memory region based on the first symmetric memory key included in the first remote direct memory access request and the first registration, and writing data to, or retrieving data from, the first memory region of the first host memory responsively to the found first memory region and the first remote direct memory access request, and by the second network interface controller receiving from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key, finding the second memory region based on the first symmetric memory key included in the second remote direct memory access request and the second registration, and writing data to, or retrieving data from, the second memory region of the second host memory responsively to the found second memory region and the second remote direct memory access request.
Further in accordance with an embodiment of the present disclosure the first parallel computing process includes a first parallel computing model and a first network library, the second parallel computing process including a second parallel computing model and a second network library, the method further including by the first parallel computing model receiving the first symmetric memory key, selecting the first memory region of the first host memory, and providing the first symmetric memory key and the identification of the first memory region to the first network library, providing, by the first network library, the first symmetric memory key and the identification of the first memory region to the first network interface controller, by the second parallel computing model receiving the first symmetric memory key, selecting the second memory region of the second host memory, and providing the first symmetric memory key and the identification of the second memory region to the second network library, and providing, by the second network library, the first symmetric memory key and the identification of the second memory region to the second network interface controller.
Still further in accordance with an embodiment of the present disclosure, the method includes maintaining a list of which of the symmetric memory keys are unassigned to active parallel computing jobs and which of the symmetric memory keys are assigned to the active parallel computing jobs, and assigning one of the unassigned symmetric memory keys to the first parallel computing job.
Additionally in accordance with an embodiment of the present disclosure, the method includes establishing the parallel computing processes of the first parallel computing job on the server nodes, requesting an unassigned symmetric memory key from a key manager for the first parallel computing job, receiving the first symmetric memory key from the key manager for the first parallel computing job, providing the first symmetric memory key to the parallel computing processes of the first parallel computing job running on the server nodes, and indicating in the list that the first symmetric memory key is assigned to the first parallel computing job.
Moreover, in accordance with an embodiment of the present disclosure, the method includes assigning the same symmetric memory key to two different parallel computing jobs which do not use one or more common ones of the network interface controllers.
Further in accordance with an embodiment of the present disclosure, the method includes assigning multiple symmetric memory keys to the first parallel computing job, and causing registration of the multiple memory keys with corresponding multiple memory regions in a corresponding one of the network interface controllers.
There is also provided in accordance with still another embodiment of the present disclosure a parallel computing method, including sending and receiving packets over a network, executing a first parallel computing process of a parallel computing job, receiving a symmetric memory key from a key manager, causing registration of a host memory region of a host memory with the symmetric memory key in a network interface controller so that the host memory region is accessible via the network interface controller by different remote server nodes with the symmetric memory key using remote direct memory access, sending a first remote direct memory access request including the symmetric memory key to access a host memory of a first remote server node executing a second parallel computing process of the parallel computing job with the symmetric memory key, and sending a second remote direct memory access request including the symmetric memory key to access a host memory of a second remote server node executing a third parallel computing process of the parallel computing job with the symmetric memory key.
Still further in accordance with an embodiment of the present disclosure, the method includes selecting the memory region of the host memory, providing the symmetric memory key and an identification of the memory region to the network interface controller, receiving, by the network interface controller, the symmetric memory key, and the identification of the memory region, and performing the registration of the memory region with the symmetric memory key.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood from the following detailed description, taken in conjunction with the drawings in which:
FIG. 1 is a block diagram view of a parallel computing system constructed and operative in accordance with an embodiment of the present invention;
FIG. 2 is a flowchart including steps in a method performed by a resource manager in the system of FIG. 1 ;
FIG. 3 is a flowchart including steps in a method performed by a key manager in the system of FIG. 1 ;
FIG. 4 is a block diagram view of the system of FIG. 1 illustrating memory key registration;
FIG. 5 is a flowchart including steps in a method performed by a processor in a server node of the system of FIG. 1 to initiate memory key registration;
FIG. 6 is a block diagram view of a parallel computing process for use in the system of FIG. 1 ;
FIG. 7 is a flowchart including steps in a method performed by a network interface controller in a server node of the system of FIG. 1 performing memory key registration;
FIG. 8 is a block diagram view of the system of FIG. 1 illustrating two server nodes sending RDMA requests to two other server nodes;
FIG. 9 is a flowchart including steps in a method performed by a parallel computing process in the system of FIG. 1 to generate and send an RDMA request;
FIG. 10 is a flowchart including steps in a method performed by a network interface controller in the system of FIG. 1 processing a received RDMA request;
FIG. 11 is a block diagram view of the system of FIG. 1 showing a server node sending RDMA requests to different server nodes; and
FIG. 12 is a block diagram view of the system of FIG. 1 showing two different processes of different parallel computing jobs sending RDMA requests to the same server node.
DESCRIPTION OF EXAMPLE EMBODIMENTS Overview
For RDMA transfers, the underlying networks require HPC software such as network libraries, and programming model implementations to register and pin the data buffers' memory. On registering and pinning, the information required to access this memory remotely is encapsulated in an RKey object.
A parallel application is a set of processes on different central processing unit (CPU) cores either in the same or different nodes working together to solve a single problem. The parallel application is comprised of Execution Units (EUs) that perform computation and initiate communications. The EUs are typically represented by an operating system (OS) processes or threads. Although a job may include one or more parallel applications, it should be noted that for the sake of simplicity the terms “job”, “parallel computing job”, “parallel application” and “application” are used interchangeably herein assuming that a job includes a single parallel application. However, it should be noted that embodiments of the present invention may be used with a job including multiple parallel applications.
For a typical parallel application with many Execution Units often ranging from a few thousand to millions, each of the EUs is required to hold the RKey for every memory region accessed with RDMA operations. In combination with software meta-information associated with each RKey in existing implementations, this results in each process holding a large number of RKeys and causing significant memory overhead. A typical RKey table memory consumption increases with the number of processes and RDMA-exposed memory regions in the parallel application which, as mentioned earlier, can be an order of millions and is typically proportional to the number of CPU cores used by clusters and supercomputers. The combined size of all RKey tables on a High-Performance Computing (HPC) compute node can grow up to the order of hundreds of Gigabytes. Moreover, as RKey table is searched on every network (RDMA) operation, this causes cache thrashing impacting not only the network operations but also the compute part of the application or job.
Therefore, embodiments of the present invention solve at least some of the above drawbacks by using a single memory key (termed “symmetric memory key” or “SM key”) for use by all the processes of a job to provide remote direct memory access to host memory instead of using RKeys which are unique to each registered host memory region for each process of the job. The SM key includes information for accessing a single memory region per process (e.g., EU) for all the processes (e.g., EUs) in the job. As a consequence, it is not required to build and maintain an RKey table.
It was mentioned previously that although in some cases a job includes multiple parallel applications, for the sake of simplicity the terms “job”, “parallel computing job”, “parallel application” and “application” are used interchangeably herein based on the assumption that a job generally includes a single parallel application. In some embodiments where a job includes multiple parallel applications, an SM Key is used for all the processes in each parallel application in that job.
In some embodiments, a resource manager requests a new SM key for a new job from a key manager. The resource manager may provide other functions such as establishing parallel computing processes of the new job on server nodes. The key manager tracks which SM keys have been assigned to jobs and provides an unassigned SM key to the new job. When a job completes processing, the SM key assigned to that job generally returns to the pool of unassigned SM keys.
In some embodiments, the same SM key may be assigned for use by different jobs if the different jobs are not accessing a common NIC. The same SM key cannot be used by different jobs if those jobs are accessing one or more same NICs. Therefore, in some embodiments, the key manager tracks the use of SM Keys assigned to jobs and the NICs and/or server nodes participating in the jobs.
In some embodiments, the resource manager may request that more than one SM key from the key manager be assigned to a parallel application in order to provide more memory regions for use by that parallel application during execution. For example, a parallel application may be assigned SM Key 1A for registering with a first memory region and SM Key 1B for registering with a second memory region, and so on. In such a case, an RDMA request for the parallel application may use SM Key 1A or SM Key 1B.
In some embodiments, the resource manager may request one or more additional SM Keys while a job is already running, for example, if there is an additional memory need.
In some embodiments, the resource manager receives the new SM key from the key manager and distributes the received SM Key to the processes of the job running on the different server nodes. In each server, a process (of the new job) (e.g., EU) running on that server selects a memory region in its local host memory and provides the new SM key and an identification (ID) of the selected memory region to the NIC of that server. The NIC then registers the new SM key with the selected memory region. In such a manner all the memory regions on all the servers participating in this new job can access the host memory of each other using the SM key as each local NIC knows the mapping between the SM key and the memory region in its local host memory assigned to be used by the new job.
In some embodiments, the EU runs some HPC software stack including a parallel programming model and network library which may cause the NIC to register the SM Key with the selected memory region. For example, the parallel programming model may deliver the selected memory region and new SM key to the network library, which programs the NIC with this information to register the SM key with the selected memory region in NIC memory.
System Description
Reference is now made to FIG. 1 , which is a block diagram view of a parallel computing system 10 constructed and operative in accordance with an embodiment of the present invention.
The system 10 includes a key manager 12, a resource manager 14, and a plurality of server nodes 18. The key manager 12 and the resource manager 14 may be disposed in independent network nodes or servers, or may be disposed in the same network node 16 or server. Each server node 18 includes a processor 20 (e.g., central processing unit (CPU) and/or graphics processing unit (GPU) or Field Programmable Gate Array(s) (FPGA(s)) and/or accelerator(s)), host memory 22, and network interface controller (NIC) 24. The network interface controller 24 of each server node 18 provides network access for its server node 18, and is configured to send and receive packets over a network.
In practice, some or all of the functions of the processor 20 may be combined in a single physical component or, alternatively, implemented using multiple physical components. These physical components may comprise hard-wired or programmable devices, or a combination of the two. In some embodiments, at least some of the functions of the processor 20 may be carried out by a programmable processor under the control of suitable software. This software may be downloaded to a device in electronic form, over a network, for example. Alternatively, or additionally, the software may be stored in tangible, non-transitory computer-readable storage media, such as optical, magnetic, or electronic memory.
The processors 20 of the server nodes 18 execute parallel computing processes of one or more parallel computing jobs, e.g., parallel computing job 1. The key manager 12 is configured to assign symmetric memory keys (SMKs) 26 (e.g., upon receiving requests from the resource manager 14) to the parallel computing jobs including SMK1 to parallel computing job 1. Each processor 20 shown in FIG. 1 is shown executing a parallel computing process (PCP) 28. For example, the processor 20 of server node 1 is shown executing PCP1 and the processor 20 of server node 2 is shown executing PCP2.
As will be described in more detail below with reference to FIG. 4 , the server nodes 18 cause registration of respective host memory regions of the host memories 22 of the server nodes 18 with the assigned SMK1 in the corresponding network interface controllers 24 of the server nodes 18 so that different host memory regions are accessible with SMK1 by remote server nodes 18 using remote direct memory access. In other words, each server node 18 causes registration of its local host memory region with the assigned SMK1 in its network interface controller 24 so that its host memory region is accessible with SMK1 by remote server nodes 18 using remote direct memory access. For example, server node 1 is configured to access the host memory region of server node 2 using remote direct memory access with SMK1. Memory registration is described in more detail below.
Reference is now made to FIG. 2 , which is a flowchart 200 including steps in a method performed by the resource manager 14 in the system 10 of FIG. 1 . Reference is also made to FIG. 1 . The resource manager 14 is configured to: establish the parallel computing processes 28 of parallel computing job 1 on the server nodes 18 (block 202); request an unassigned symmetric memory key from the key manager 12 for parallel computing job 1 (block 204); receive the SMK1 from the key manager 12 for parallel computing job 1 (block 206); and provide (instances of) SMK1 to the parallel computing processes 28 of parallel computing job 1 running on the server nodes 28 (block 208).
Reference is now made to FIG. 3 , which is a flowchart 300 including steps in a method performed by the key manager 12 in the system 10 of FIG. 1 . Reference is also made to FIG. 1 . The key manager 12 is configured to: maintain a list 34 of which symmetric memory keys 26 are assigned to the active parallel computing jobs (block 302) (and which network interface controllers 24 are being used in the jobs to prevent the same key being assigned to different jobs using one or more of common network interface controllers 24) and optionally which symmetric memory keys 26 are unassigned to active parallel computing jobs and; assign one of the unassigned symmetric memory keys 26 (e.g., SMK1) to parallel computing job 1 in response to the request from the resource manager 14 (block 304); indicate in the list 34 that SMK1 is assigned to parallel computing job 1 (block 306); and in response to parallel computing job 1 being completed, update the list 34 to show that SMK1 is now unassigned to an active parallel computing job (block 308). An active parallel computing job is a parallel computing job which is being, or has been, setup for processing and has not yet completed processing.
In some embodiments, the key manager 12 may be configured to assign the same symmetric memory key to two or more different parallel computing jobs which do not use one or more common ones of the network interface controllers 24. In some embodiments, the key manager 12 may be configured to assign multiple symmetric memory keys to parallel computing job 1 (in response to a request by the resource manager 14). In such embodiments, each server node 18 is configured to cause registration of the multiple memory keys (assigned by the key manager 12 to parallel computing job 1) with corresponding multiple memory regions in a corresponding one of the network interface controllers 24.
To ensure the protection of the parallel computing job/application from intentional and unintentional SMK conflicts, the key manager 12 may encrypt and/or sign symmetric memory key(s) 26 to create a secure channel between the key manager 12 and the network interface controllers 24 of the server nodes 18. Therefore, in some embodiments, the network interface controller 24 includes one or more secret keys to decrypt and/or verify the symmetric memory key(s) 26.
Reference is now made to FIGS. 4 and 5 . FIG. 4 is a block diagram view of the system 10 of FIG. 1 illustrating memory key registration. FIG. 5 is a flowchart 500 including steps in a method performed by the processor 20 in server node 1 of the system 10 of FIG. 1 to initiate memory key registration.
The processor 20 of server node 1 is configured to: execute parallel computing process 1 (PCP1) of parallel computing job 1 (block 502); receive SMK1 from the resource manager 14 (block 504); select a memory region 30 (MR1) of the host memory 22 of server node 1 (block 506); and cause registration (block 32) of MR1 with SMK1 in the network interface controller 24 of server node 1 so that MR1 is accessible via that network interface controller 24 by different remote server nodes 28 (i.e., server nodes other than server node 1 which are also part of parallel computing job 1) with SMK1 using remote direct memory access (RDMA), including providing SMK1 and an identification of MR1 to the network interface controller 24 of server node 1 (block 508). The above steps are also performed by each of the other server nodes 18 that receive SMK1 from the resource manager 14 except that in each server node 18 a different process is performed (e.g., PCP2 in server node 2) and a different host memory region 30 may be selected (e.g. MR2 in server node 2) and registered (block 32) by the network interface controller 24 of that server node 18 (e.g., MR2 is registered with SMK1 by the network interface controller 24 of server node 2).
Reference is now made FIG. 6 , which is a block diagram view of an example of parallel computing process 1 (PCP1) for use in the system 10 of FIG. 1 . Reference is also made to FIG. 5 . Parallel computing process 1 (block 600) may include parallel computing model 1 (block 602) and network library 1 (block 604).
One or more of the steps of blocks 502-508 may be performed by parallel computing model 1 and one or more of the steps of blocks 502-508 may be performed by network library 1. For example, parallel computing model 1 may perform the steps of blocks 502-506 and provide SMK1 and the identification of MR1 to network library 1 which then performs the step of block 508.
Similarly, PCP2 may include parallel computing model 2 and network library 2, and so on. In a similar manner as described above with respect to PCP1, one or more of the steps of blocks 502-508 may be performed by parallel computing model 2 and one or more of the steps of blocks 502-508 may be performed by network library 2 for PCP2. For example, parallel computing model 2 may perform the steps of blocks 502-506 and provide SMK1 and the identification of MR2 to network library 2 which then performs the step of block 508.
Reference is now made to FIG. 7 , which is a flowchart 700 including steps in a method performed by the network interface controller 24 in server node 1 of the system 10 of FIG. 1 performing memory key registration (block 32). Reference is also made to FIG. 4 . The network interface controller 24 of server node 1 is configured to: receive SMK1 and the identification of MR1 (block 702); and perform a registration (block 32) of MR1 with SMK1 (block 704). The network interface controllers 24 of the other server nodes 18 perform similar operations. For example, the network interface controller 24 of server node 2 is configured to receive SMK1 and the identification of MR2, and perform a registration of MR2 with SMK1.
Reference is now made to FIGS. 8 and 9 . FIG. 8 is a block diagram view of the system 10 of FIG. 1 illustrating two server nodes (server nodes 3 and 4) sending RDMA requests 36-1, 36-2 to two other server nodes ( server nodes 1 and 2, respectively). FIG. 9 is a flowchart 900 including steps in a method performed by a parallel computing process (e.g., PCP3) in the system 10 of FIG. 1 to generate and send the RDMA request 36-1. PCP3 is configured to find the symmetric memory key 26 (i.e., SMK1) for parallel computing job 1 of PCP3 in a memory (block 902), which may include a table of symmetric memory keys 26 for the parallel computing jobs being run by server node 3. PCP3 is configured to generate the RDMA request 36-1 with the found SM key (i.e., SMK1) (block 904), and send the generated RDMA request 36-1 to a remote server (e.g., server node 1), which is executing PCP1 of (the same) parallel computing job 1 (block 906). FIG. 8 also shows PCP4 running on server node 4 performing similar steps to generate the RDMA request 36-2 and sent the RDMA request 36-2 to server node 2.
Reference is now made to FIG. 10 , which is a flowchart 1000 including steps in a method performed by the network interface controller 24 of server node 1 in the system 10 of FIG. 1 processing the received RDMA request 36-1. Reference is also made to FIG. 8 . The network interface controller 24 of server node 1 is configured to: receive from PCP3 of the parallel computing job 1 the RDMA request 36-1 including SMK1 (block 1002); find the memory region (i.e., MR1) in the host memory 22 of server node 1 corresponding to SMK1 based on SMK1 being included in the RDMA request 36-1 and the registration (block 32) previously performed mapping SMK1 to MR1 (block 1004); and write data to, or retrieve data from, MR1 of the host memory 22 of server node 1 responsively to the found memory region 30 (i.e., MR1) and the RDMA request 36-1, which is requesting performance of the RDMA process (block 1006).
Similarly, the network interface controller 24 of server node 2 is configured to: receive from PCP4 of the parallel computing job 1 the RDMA request 36-2 including SMK1; find MR2 in the host memory 22 of server node 2 based on SMK1 included in the RDMA request 36-2 and the registration (block 32) previously performed mapping SMK1 to MR2; and write data to, or retrieve data from, MR2 of the host memory 22 of server node 2 responsively to the found memory region 30 (i.e., MR2) and the RDMA request 36-2, which is requesting performance of the RDMA process.
Reference is now made to FIG. 11 , which is a block diagram view of the system 10 of FIG. 1 showing server node 1 sending RDMA requests 36-3, 36-4 to different server nodes ( server nodes 3 and 4, respectively). PCP1 running on the processor 20 of server node 1 is configured to generate and send RDMA request 36-3 including SMK1 to access the host memory 22 of remote server node 3 (which is executing PCP3 of parallel computing job 1) with SMK1. Similarly, PCP1 running on the processor 20 of server node 1 is configured to generate and send RDMA request 36-4 including SMK1 to access the host memory 22 of remote server node 4 (which is executing PCP4 of parallel computing job 1) with SMK1.
Reference is now made to FIG. 12 , which is a block diagram view of the system 10 of FIG. 1 showing two different processes 28 of different parallel computing jobs (PCP2 of job 1 and PCP4 of job 3) sending RDMA requests to the same server node (server node 1). PCP2 and PCP4 may be running on the same server node 18 or on different server nodes 18.
PCP2 of parallel computing job 1 running on a server node (which is remote to server node 1) is configured to generate and send an RDMA request 36-5 including SMK1 to access the host memory 22 of server node 1 (which is executing PCP1 of parallel computing job 1) with SMK1.
PCP4 of parallel computing job 3 is running on a server node (which is remote to server node 1) and is configured to generate and send an RDMA request 36-6 including SMK3 (as SMK3 is the SM key assigned by the key manager 12 to parallel computing job 3) to access the host memory 22 of server node 1 (which is executing PCP9 of parallel computing job 3) with SMK3.
The network interface controller 24 of server node 1 is configured to: receive from PCP2 of the parallel computing job 1 the RDMA request 36-5 including SMK1; find MR1 in the host memory 22 of server node 1 based on SMK1 being included in the RDMA request 36-5 and the registration (block 32) previously performed mapping SMK1 to MR1; and write data to, or retrieve data from, MR1 of the host memory 22 of server node 1 responsively to the found memory region 30 (i.e., MR1) and the RDMA request 36-5, which is requesting performance of the RDMA process.
The network interface controller 24 of server node 1 is configured to: receive from PCP4 of the parallel computing job 3 the RDMA request 36-6 including SMK3; find MR6 in the host memory 22 of server node 1 based on SMK3 being included in the RDMA request 36-6 and the registration (block 32) previously performed mapping SMK3 to MR6; and write data to, or retrieve data from, MR6 of the host memory 22 of server node 1 responsively to the found memory region 30 (i.e., MR6) and the RDMA request 36-6, which is requesting performance of the RDMA process.
Various features of the invention which are, for clarity, described in the contexts of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable sub-combination.
The embodiments described above are cited by way of example, and the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the invention includes both combinations and sub-combinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.

Claims (26)

What is claimed is:
1. A parallel computing system, comprising:
a plurality of server nodes including respective processors configured to:
execute parallel computing processes of the first parallel computing job, symmetric memory keys being assigned by a key manager to parallel computing jobs including a first symmetric memory key to a first parallel computing job; and
cause registration of host memory regions of the server nodes with the assigned first symmetric memory key in corresponding network interface controllers of the server nodes so that different ones of the host memory regions are accessible with the first symmetric memory key by remote ones of the server nodes using remote direct memory access, wherein:
the server nodes include: a first server node including a first host memory and a first network interface controller; and a second server node including a second processor a second host memory, and a second interface controller;
the first network interface controller and the second interface controller are configured to provide network access to the first server node and the second server node, respectively;
the first network interface controller is configured to perform a first registration of a first memory region of the first host memory with the first symmetric memory key; and
the second network interface controller is configured to perform a second registration of a second memory region of the second host memory with the first symmetric memory key.
2. The system according to claim 1, wherein the first server node includes a first processor being configured to access one of the host memory regions of the second server node using remote direct memory access with the first symmetric memory key.
3. The system according to claim 1, wherein:
the first server node includes a first processor configured to:
execute a first parallel computing process of the first parallel computing job;
receive the first symmetric memory key;
select the first memory region of the first host memory; and
provide the first symmetric memory key and an identification of the first memory region to the first network interface controller;
the first network interface controller is configured to receive the first symmetric memory key and the identification of the first memory region;
the second server node includes a second processor is configured to:
execute a second parallel computing process of the first parallel computing job;
receive the first symmetric memory key;
select the second memory region of the second host memory; and
provide the first symmetric memory key and an identification of the second memory region to the second network interface controller; and
the second network interface controller is configured to receive the first symmetric memory key and the identification of the second memory region.
4. The system according to claim 3, wherein:
the first network interface controller is configured to receive from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key; and
the second network interface controller is configured to receive from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key.
5. The system according to claim 3, wherein:
the first network interface controller is configured to:
receive from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key;
find the first memory region based on the first symmetric memory key included in the first remote direct memory access request and the first registration; and
write data to, or retrieve data from, the first memory region of the first host memory responsively to the found first memory region and the first remote direct memory access request; and
the second network interface controller is configured to:
receive from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key;
find the second memory region based on the first symmetric memory key included in the second remote direct memory access request and the second registration; and
write data to, or retrieve data from, the second memory region of the second host memory responsively to the found second memory region and the second remote direct memory access request.
6. The system according to claim 3, wherein:
the first parallel computing process includes a first parallel computing model and a first network library;
the first processor is configured to execute the first parallel computing model to: receive the first symmetric memory key; select the first memory region of the first host memory; and provide the first symmetric memory key and the identification of the first memory region to the first network library;
the first processor is configured to execute the first network library to provide the first symmetric memory key and the identification of the first memory region to the first network interface controller;
the second parallel computing process includes a second parallel computing model and a second network library;
the second processor is configured to execute the second parallel computing model is to: receive the first symmetric memory key; select the second memory region of the second host memory; and provide the first symmetric memory key and the identification of the second memory region to the second network library; and
the second processor is configured to execute the second network library to provide the first symmetric memory key and the identification of the second memory region to the second network interface controller.
7. The system according to claim 1, further comprising the key manager is configured to: maintain a list of which of the symmetric memory keys are unassigned to active parallel computing jobs and which of the symmetric memory keys are assigned to the active parallel computing jobs; and assign one of the unassigned symmetric memory keys to the first parallel computing job.
8. The system according to claim 7, further comprising a resource manager configured to:
establish the parallel computing processes of the first parallel computing job on the server nodes;
request an unassigned symmetric memory key from the key manager for the first parallel computing job;
receive the first symmetric memory key from the key manager for the first parallel computing job; and
provide the first symmetric memory key to the parallel computing processes of the first parallel computing job running on the server nodes, wherein the key manager is configured to indicate in the list that the first symmetric memory key is assigned to the first parallel computing job.
9. The system according to claim 1, wherein the key manager is configured to assign the same symmetric memory key to two different parallel computing jobs which do not use one or more common ones of the network interface controllers.
10. The system according to claim 1, wherein the key manager is configured to assign multiple symmetric memory keys to the first parallel computing job, wherein each of the server nodes is to cause registration of the multiple memory keys with corresponding multiple memory regions in a corresponding one of the network interface controllers.
11. A server node device, comprising:
a network interface controller to send and receive packets over a network; and
a host memory; and
a processor to:
execute a first parallel computing process of a parallel computing job;
receive a symmetric memory key from a key manager;
cause registration of a host memory region of the host memory with the symmetric memory key in the network interface controller so that the host memory region is accessible via the network interface controller by different remote server nodes with the symmetric memory key using remote direct memory access;
send a first remote direct memory access request including the symmetric memory key to access a host memory of a first remote server node executing a second parallel computing process of the parallel computing job with the symmetric memory key; and
send a second remote direct memory access request including the symmetric memory key to access a host memory of a second remote server node executing a third parallel computing process of the parallel computing job with the symmetric memory key.
12. The device according to claim 11, wherein:
the processor is to:
select the memory region of the host memory; and
provide the symmetric memory key and an identification of the memory region to the network interface controller; and
the network interface controller is to:
receive the symmetric memory key and the identification of the memory region; and
perform the registration of the memory region with the symmetric memory key.
13. The device according to claim 12, wherein the first parallel computing process includes a first parallel computing model and a first network library.
14. The device according to claim 13, wherein the processor is configured to execute:
the first parallel computing model to:
receive the symmetric memory key;
select the memory region of the host memory; and
provide the symmetric memory key and the identification of the memory region to the first network library; and
the first network library to provide the symmetric memory key and the identification of the memory region to the network interface controller.
15. A parallel computing method, comprising:
assigning symmetric memory keys to parallel computing jobs including a first symmetric memory key to a first parallel computing job;
executing parallel computing processes of the first parallel computing job;
causing registration of host memory regions of server nodes with the assigned first symmetric memory key in corresponding network interface controllers of the server nodes so that different ones of the host memory regions are accessible with the first symmetric memory key by remote ones of the server nodes using remote direct memory access;
providing network access by a first network interface controller to a first server node;
providing network access by a second network interface controller to a second server node;
performing, by the first network interface controller, a first registration of a first memory region of a first host memory of the first server node with the first symmetric memory key; and
performing, by the second network interface controller, a second registration of a second memory region of a second host memory of the second server node with the first symmetric memory key.
16. The method according to claim 15, further comprising accessing, by a first server node, a host memory region of a second server node using remote direct memory access with the first symmetric memory key.
17. The method according to claim 15, further comprising:
by a first server node:
executing a first parallel computing process of the first parallel computing job;
receiving the first symmetric memory key;
selecting the first memory region of a first host memory of the first server node; and
providing the first symmetric memory key and an identification of the first memory region to a first network interface controller of the first server node;
by the first network interface controller receiving the first symmetric memory key and the identification of the first memory region;
by a second server node:
executing a second parallel computing process of the first parallel computing job;
receiving the first symmetric memory key;
selecting the second memory region of a second host memory of the second server node; and
providing the first symmetric memory key and an identification of the second memory region to a second network interface controller of the second server node; and
by the second network interface controller receiving the first symmetric memory key and the identification of the second memory region.
18. The method according to claim 17, further comprising:
receiving, by the first network interface controller, from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key; and
receiving, by the second network interface controller, from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key.
19. The method according to claim 17, further comprising:
by the first network interface controller:
receiving from a third parallel computing process of the first parallel computing job a first remote direct memory access request including the first symmetric memory key;
finding the first memory region based on the first symmetric memory key included in the first remote direct memory access request and the first registration; and
writing data to, or retrieving data from, the first memory region of the first host memory responsively to the found first memory region and the first remote direct memory access request; and
by the second network interface controller:
receiving from a fourth parallel computing process of the first parallel computing job a second remote direct memory access request including the first symmetric memory key;
finding the second memory region based on the first symmetric memory key included in the second remote direct memory access request and the second registration; and
writing data to, or retrieving data from, the second memory region of the second host memory responsively to the found second memory region and the second remote direct memory access request.
20. The method according to claim 17, wherein the first parallel computing process includes a first parallel computing model and a first network library, the second parallel computing process including a second parallel computing model and a second network library, the method further comprising:
by the first parallel computing model: receiving the first symmetric memory key; selecting the first memory region of the first host memory; and providing the first symmetric memory key and the identification of the first memory region to the first network library;
providing, by the first network library, the first symmetric memory key and the identification of the first memory region to the first network interface controller;
by the second parallel computing model: receiving the first symmetric memory key; selecting the second memory region of the second host memory; and providing the first symmetric memory key and the identification of the second memory region to the second network library; and
providing, by the second network library, the first symmetric memory key and the identification of the second memory region to the second network interface controller.
21. The method according to claim 17, further comprising:
maintaining a list of which of the symmetric memory keys are unassigned to active parallel computing jobs and which of the symmetric memory keys are assigned to the active parallel computing jobs; and
assigning one of the unassigned symmetric memory keys to the first parallel computing job.
22. The method according to claim 21, further comprising:
establishing the parallel computing processes of the first parallel computing job on the server nodes;
requesting an unassigned symmetric memory key from a key manager for the first parallel computing job;
receiving the first symmetric memory key from the key manager for the first parallel computing job;
providing the first symmetric memory key to the parallel computing processes of the first parallel computing job running on the server nodes; and
indicating in the list that the first symmetric memory key is assigned to the first parallel computing job.
23. The method according to claim 15, further comprising assigning the same symmetric memory key to two different parallel computing jobs which do not use one or more common ones of the network interface controllers.
24. The method according to claim 15, further comprising:
assigning multiple symmetric memory keys to the first parallel computing job; and
causing registration of the multiple memory keys with corresponding multiple memory regions in a corresponding one of the network interface controllers.
25. A parallel computing method, comprising:
sending and receiving packets over a network;
executing a first parallel computing process of a parallel computing job;
receiving a symmetric memory key from a key manager;
causing registration of a host memory region of a host memory with the symmetric memory key in a network interface controller so that the host memory region is accessible via the network interface controller by different remote server nodes with the symmetric memory key using remote direct memory access;
sending a first remote direct memory access request including the symmetric memory key to access a host memory of a first remote server node executing a second parallel computing process of the parallel computing job with the symmetric memory key; and
sending a second remote direct memory access request including the symmetric memory key to access a host memory of a second remote server node executing a third parallel computing process of the parallel computing job with the symmetric memory key.
26. The method according to claim 25, further comprising:
selecting the memory region of the host memory;
providing the symmetric memory key and an identification of the memory region to the network interface controller;
receiving, by the network interface controller, the symmetric memory key and the identification of the memory region; and
performing the registration of the memory region with the symmetric memory key.
US18/176,521 2023-03-01 2023-03-01 Common symmetric memory key for parallel processes Active 2043-10-20 US12335372B2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US18/176,521 US12335372B2 (en) 2023-03-01 2023-03-01 Common symmetric memory key for parallel processes
DE102024201855.4A DE102024201855A1 (en) 2023-03-01 2024-02-28 SHARED SYMMETRICAL MEMORY KEY FOR PARALLEL PROCESSES
CN202410230287.0A CN118590220A (en) 2023-03-01 2024-02-29 Common symmetric memory keys for parallel processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US18/176,521 US12335372B2 (en) 2023-03-01 2023-03-01 Common symmetric memory key for parallel processes

Publications (2)

Publication Number Publication Date
US20240297781A1 US20240297781A1 (en) 2024-09-05
US12335372B2 true US12335372B2 (en) 2025-06-17

Family

ID=92422369

Family Applications (1)

Application Number Title Priority Date Filing Date
US18/176,521 Active 2043-10-20 US12335372B2 (en) 2023-03-01 2023-03-01 Common symmetric memory key for parallel processes

Country Status (3)

Country Link
US (1) US12335372B2 (en)
CN (1) CN118590220A (en)
DE (1) DE102024201855A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240250815A1 (en) * 2023-01-23 2024-07-25 Hewlett Packard Enterprise Development Lp Scalable key state for network encryption

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130185381A1 (en) * 2012-01-17 2013-07-18 International Business Machines Corporation Configuring Compute Nodes In A Parallel Computer Using Remote Direct Memory Access ('RDMA')
US20160239430A1 (en) * 2015-02-12 2016-08-18 Red Hat Israel, Ltd. Local access dma with shared memory pool
US20190384923A1 (en) * 2018-06-13 2019-12-19 International Business Machines Corporation Mechanism to enable secure memory sharing between enclaves and i/o adapters
US20200242258A1 (en) * 2019-04-11 2020-07-30 Intel Corporation Protected data accesses using remote copy operations
US11792003B2 (en) * 2020-09-29 2023-10-17 Vmware, Inc. Distributed storage system and method of reusing symmetric keys for encrypted message transmissions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130185381A1 (en) * 2012-01-17 2013-07-18 International Business Machines Corporation Configuring Compute Nodes In A Parallel Computer Using Remote Direct Memory Access ('RDMA')
US20160239430A1 (en) * 2015-02-12 2016-08-18 Red Hat Israel, Ltd. Local access dma with shared memory pool
US20190384923A1 (en) * 2018-06-13 2019-12-19 International Business Machines Corporation Mechanism to enable secure memory sharing between enclaves and i/o adapters
US20200242258A1 (en) * 2019-04-11 2020-07-30 Intel Corporation Protected data accesses using remote copy operations
US11792003B2 (en) * 2020-09-29 2023-10-17 Vmware, Inc. Distributed storage system and method of reusing symmetric keys for encrypted message transmissions

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Diamond, Noah, Scott Graham, and Gilbert Clark. "Securing InfiniBand traffic with BlueField-2 data processing units." International Conference on Critical Infrastructure Protection. Cham: Springer Nature Switzerland. (Year: 2022). *
Schuh, Henry N., et al. "Xenic: SmartNIC-accelerated distributed transactions." Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. (Year: 2021). *
Taranov, Konstantin, et al. "{sRDMA}—Efficient {NIC-based} Authentication and Encryption for Remote Direct Memory Access." 2020 USENIX Annual Technical Conference (USENIX ATC 20). (Year: 2020). *

Also Published As

Publication number Publication date
DE102024201855A1 (en) 2024-09-05
US20240297781A1 (en) 2024-09-05
CN118590220A (en) 2024-09-03

Similar Documents

Publication Publication Date Title
CN112204513B (en) Group-based data replication in a multi-tenant storage system
US11734137B2 (en) System, and control method and program for input/output requests for storage systems
CN100495343C (en) Method and system for providing two-level server virtualization
US7200695B2 (en) Method, system, and program for processing packets utilizing descriptors
CN111865657B (en) Acceleration management node, acceleration node, client and method
US11500693B2 (en) Distributed system for distributed lock management and method for operating the same
US20140372611A1 (en) Assigning method, apparatus, and system
US20160371122A1 (en) File processing workflow management
CN105630731A (en) Network card data processing method and device in multi-CPU (Central Processing Unit) environment
CN117370029A (en) Cluster resource management in a distributed computing system
JP7075888B2 (en) Systems for efficient diversion of connections in a multi-tenant database environment, computer implementation methods, computer programs and equipment
US20220083355A1 (en) Systems and methods for connection broker free remote desktop connections in a virtual desktop environment
US20160232037A1 (en) Latency-hiding context management for concurrent distributed tasks
WO2017054650A1 (en) Task distribution method, device and system
US12335372B2 (en) Common symmetric memory key for parallel processes
CN101542448B (en) Lock manager rotation in a multiprocessor storage area network
CN105373563B (en) Database switching method and device
US10757175B2 (en) Synchronization optimization based upon allocation data
US10749921B2 (en) Techniques for warming up a node in a distributed data store
US10652094B2 (en) Network traffic management for virtualized graphics devices
US9910893B2 (en) Failover and resume when using ordered sequences in a multi-instance database environment
US10270715B2 (en) High performance network I/O in a virtualized environment
US8327380B2 (en) Method and interprocess communication driver for managing requests of a database client to a database server
US10824640B1 (en) Framework for scheduling concurrent replication cycles
US20090271802A1 (en) Application and verb resource management

Legal Events

Date Code Title Description
AS Assignment

Owner name: MELLANOX TECHNOLOGIES, LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VENKATA, MANJUNATH GORENTLA;POLYAKOV, ARTEM YURIEVICH;BHATTACHARYA, SUBHADEEP;AND OTHERS;SIGNING DATES FROM 20230201 TO 20230221;REEL/FRAME:062837/0981

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STCF Information on status: patent grant

Free format text: PATENTED CASE