WO2022037777A1 - Interchangeable queue type for network connections - Google Patents

Interchangeable queue type for network connections Download PDF

Info

Publication number
WO2022037777A1
WO2022037777A1 PCT/EP2020/073259 EP2020073259W WO2022037777A1 WO 2022037777 A1 WO2022037777 A1 WO 2022037777A1 EP 2020073259 W EP2020073259 W EP 2020073259W WO 2022037777 A1 WO2022037777 A1 WO 2022037777A1
Authority
WO
WIPO (PCT)
Prior art keywords
type
network connection
network
request
connection
Prior art date
Application number
PCT/EP2020/073259
Other languages
French (fr)
Inventor
Ben-Shahar BELKAR
Lior Khermosh
Reuven Cohen
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2020/073259 priority Critical patent/WO2022037777A1/en
Priority to CN202080103258.8A priority patent/CN115885270A/en
Publication of WO2022037777A1 publication Critical patent/WO2022037777A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling

Definitions

  • the present disclosure in some embodiments thereof, relates to network connections and, more specifically, but not exclusively, to systems and methods for management of resources for establishing network connections.
  • a network node for example a server, may establish and simultaneously support thousands of network connections to other network nodes, such as storage servers, endpoint devices, and other servers in order to provide exchange of data across the network.
  • network nodes such as storage servers, endpoint devices, and other servers in order to provide exchange of data across the network.
  • the large number of simultaneous network connections consumes significant amount of resources.
  • processing circuitry for selecting a type of a queue pair, QP, for data transfer across a network
  • the processing circuitry is configured for: receiving a request to establish a network connection for data transfer across the network, analyzing a plurality of resource-related parameters each indicative of a respective resource related state of the network connection for establishment, selecting the type for the QP from a plurality of candidate types according to the analysis, and establishing the network connection with the selected type of the QP for data transfer across the network.
  • a method for selecting a type of a queue pair, QP, for data transfer across a network comprises: receiving a request to establish a network connection for data transfer across the network, analyzing a plurality of resource-related parameters each indicative of a respective resource related state of the network connection for establishment, selecting the type for the QP from a plurality of candidate types according to the analysis, and establishing the network connection with the selected type of the QP for data transfer across the network.
  • a computer program comprising program instructions which, when executed by a processor, cause the processor to: receive a request to establish a network connection for data transfer across the network, analyze a plurality of resource-related parameters each indicative of a respective resource related state of the network connection for establishment, select the type for the QP from a plurality of candidate types according to the analysis, and establish the network connection with the selected type of the QP for data transfer across the network.
  • the requested QP type for a new connection may not necessarily represent the most optimal use of existing resources of the host and/or device and/or network, for example, memory, hardware resources, processing resources, cache, and network resources.
  • RC QPs may be the best solution in terms of transport perspective, but RC QPs are very expensive on available resources.
  • the type of the QP that provides most overall (e.g., globally for the device, network and/or host) optimal utilization of available resources may be selected, rather than the QP that is best for the request application.
  • Optimization of overall resources is especially significant for high end servers that establish thousands of QPs for thousands of network connections, for example, 10 000, 100 000, or 1 000 000.
  • the dynamic selection of the QP type for establishing connections provides improved utilization of resources over existing approaches where each application is granted whichever QP type it requested. For example, for N number of network nodes, each network node running M processes that establish network connection. For the case of all M processes wishing to communicate with all the processes on all the nodes, the number of RC QPs needed to engage this “all to all” communication is (M A 2)*(N-1) per node.
  • RD has a lower footprint: each node needs M QPs + N “end-to-end” (EE) connections to achieve the same all-to-all communication pattern. But RD is limited on transport. Its most significant limitation is the single outstanding message supported per EE context.
  • DC which is a proprietary solution, has a lower footprint compared to RC, but if it uses many DCI/DCTs it still consumes a lot of resources and it not efficient for caching. Another drawback of DC is that it must use frequent connect-disconnect operations.
  • first, second and third aspects further configured for and/or further comprising: re-analyzing the plurality of resource-related parameters during data transfer over the established network connection, re-selecting another type of the QP from the plurality of candidate types according to the re-analysis, establishing a second network connection with the re-selected other type of the QP, and at least one of: dynamically transferring the network connection to the second network connection, and reinitiating the transfer of data of the network connection to be transferred over the second network connection.
  • the QP types of the existing connection may be dynamically changed during transfer of data over the network using the established network connection, to provide a different QP type that is expected to improve utilization of resources.
  • the network connection comprises a first network connection
  • receiving a second request to establish a second network connection for data transfer across the network conducing a second analysis of the plurality of resource-related parameters each indicative of a respective resource related state of the established first network connection and for establishment of the second network connection, selecting a second type of the QP from the plurality of candidate types according to the second analysis, establishing the second network connection with the second type of QP, dynamically migrating the first network connection to the second network connection, wherein the second network connection transfers data across the network for the first network connection and for the second network connection.
  • Two network connections to transfer two sets of data over the network may be merged into a single network connection.
  • the merged second network connection may use fewer resources than the two independent network connections.
  • the request is provided by a first application
  • the second request is provided by a second application
  • the second network connection transfers data over the network for the first application and the second application.
  • Two independent network connections to transfer two sets of data over the network requested by two different applications may be merged into a single network connection that is used by both applications.
  • the merged second network connection may use fewer resources than the two independent network connections.
  • dynamically migrating comprises: stalling network traffic on the first network connection, receiving acknowledge messages that packets transmitted over the first network connection prior to the stalling have been received by a device at another end of the first network connection, and using the second type of QP of the second network connection for additional network traffic destined for the first network connection.
  • the migration may be performed transparently, without significant interruption.
  • first, second and third aspects further configured for and/or further comprising: in response to receiving an indication that at least one first packet of the additional network traffic has passed over the second network connection, terminating the first network connection.
  • first, second and third aspects further configured for and/or further comprising: receiving a third request to establish a third network connection for data transfer across the network, conducing a third analysis of the plurality of resource- related parameters each indicative of a respective resource related state of the established network connection and for establishment of the third network connection, selecting a third type of the QP from the plurality of candidate types according to the third analysis, wherein a fourth network connection with the third type of QP has been previously established by the processing circuitry for a fourth request prior to receiving the third request, and using the fourth network connection for transfer of data across the network associated with the third request and with the fourth request, wherein the third network connection is not independently established.
  • the network traffic destined for a newly established network connection may be added to the existing network connection. Resources may be saved by using the already existing network connection, rather than adding another network connection.
  • the request to establish the network connection is for a first type of QP, and wherein a second type of QP different than the first type is selected.
  • the selection process may be performed in an implicit mode, where the request is ignored and the QP type providing best optimization of available resources is selected without the requesting Application and/or process being aware that the requested QP type has been changed.
  • the request to establish the network connection is for a reliable type of QP or for an unreliable type of QP of a first sub-type of QP
  • the selected type of QP is the reliable type of QP or the unreliable type of QP as defined by the request of a second sub-type of QP different than the first sub-type of QP defined by the request.
  • Maintaining the reliability or unreliability of the QP types according to the requested QP type for the network connection maintains compatibility with the applications and/or processes that use the data transferred over the network. For example, an application that expects reliable transfer of data receives reliably transferred data, and does not need to handle unexpected unreliably transferred data.
  • the network connection is for the reliable type of QP
  • the selected type of QP is of the unreliable type of QP with reliability provided by a reliability layer implemented by at least one of or combination of: software, firmware, and hardware.
  • the requested reliability may be provided using the unreliable QP type with the reliability layer, which may improve resource utilization by using the unreliable QP type which uses fewer resources than the reliable QP type.
  • the request to establish the network connection is for an interchangeable QP type, IQT, and wherein the type of the QP is selected from the plurality of candidate types that exclude the dynamic type.
  • Highest and/or best resource utilization efficiency may result by processes and/or applications indicating selecting any of the candidate types for the QP using the dynamic QP type.
  • the plurality of resource-related parameters are selected from a group consisting of: number of network nodes an application that provided the request is communicating with, number of existing network connections of the application that provided the request, topology of existing network connection of the application, total number of active QPs of network connections that were established by the processing circuitry, whether QPs of the type of the request are available, transport reliability of the network, current memory utilization, and current utilization of the processing circuitry.
  • the parameters provide a picture of the existing resource usage, which enables selecting the QP type for the network connection that will provide most optimal use of the existing resources.
  • the plurality of candidate types for the QP pair are selected from a group consisting of: Reliable Connection, RC, Reliable Datagram, RD, Extended Reliable Connection, XRC, Unreliable Datagram, UD, Unreliable Connection, UC, Scalable Reliable Datagram, SRD, and Dynamic Connection, DC.
  • the analyzing is done using at least one of: a set of rules for the plurality of resource-related parameters that result in the selected type for the QP, a classifier that receives the plurality of resource-related parameters as input and generates an outcome of the selected type for the QP where the classifier is trained on a training dataset of plurality of sample resource-related parameters and a label of type of QP, and optimization code that uses a mathematical model and/or set of equations to compute the type for the QP that optimizes the plurality of resource-related parameters.
  • the data transfer across the network is according to a Remote Direct Memory Access, RDMA, protocol, and the plurality of candidate types of QP are defined by a network transport protocol for RDMA.
  • RDMA Remote Direct Memory Access
  • RDMA The viability of RDMA relies heavily on its reliability, high bandwidth, and low-latency properties. Therefore, selection of the QP type plays an important role in determining the balance between reliability, high bandwidth, and low-latency of RDMA, especially for a large number of network connections. At least some embodiments described herein select the optimal QP types for RDMA to maximize low-latency while meeting required reliability and/or lower memory footprint and/or utilize device cache in an optimal way (e.g., prevent a large amount of cache misses), especially when a large number of connections are established.
  • the network transport protocol for RDMA defining the plurality of candidate types of QP is selected from a group consisting of: InfiniBand, IB, RoCE, Remote Direct Memory Access, RDMA, over Converged Ethernet, RoCEv2, iWARP, and derivatives of the aforementioned.
  • FIG. 1 is a block diagram of a computing device for selecting a QP type for data transfer across a network, in accordance with some embodiments
  • FIG. 2 is a block diagram of multiple computing devices transferring data across network using the selected QP type, in accordance with some embodiments;
  • FIG. 3 is a is a flowchart of a method for selecting a QP type for data transfer across a network, in accordance with some embodiments
  • FIG. 4A-4F are schematics depicting exemplary QP types for selection, in accordance with some embodiments.
  • FIG. 5 is a schematic depicting a comparison of dynamically selecting the QP type with standard approaches, in accordance with some embodiments
  • FIG. 6 is another schematic depicting a comparison of dynamically selecting the QP type with standard approaches, in accordance with some embodiments.
  • FIG. 7 is yet another schematic depicting a comparison of dynamically selecting the QP type with standard approaches, in accordance with some embodiments.
  • the present disclosure in some embodiments thereof, relates to network connections and, more specifically, but not exclusively, to systems and methods for management of resources for establishing network connections.
  • An aspect of some embodiments relates to processing circuitry, systems, methods, an apparatus, and/or code instructions (i.e. , stored on a computer readable medium for execution by one or more hardware processors) for automated selection of a type of a queue pair (QP), also referred to as QP type, for transfer of data across a network between two devices, for example, based on a Remote Direct Memory Access (RDMA) protocol, where the QP types are defined by a transport protocol, for example, Infiniband TM, RoCE, Remote Direct Memory Access, RDMA, over Converged Ethernet, RoCEv2, iWARP, proprietary QP types (e.g., of different vendors), and derivatives of the aforementioned.
  • QP queue pair
  • RDMA Remote Direct Memory Access
  • the selection of the QP type is in response to a requested establishment of a connection for transfer of the data across the network, for example, by an application uploading and/or downloading data to a remote data storage device.
  • One or more resource-related parameters are analyzed.
  • Each resource-parameter is indicative of a respective resource related state of the connection for establishment, for example, based on the available resources for establishment of the network connection, for example, available memory, available QPs of a requested type, utilization state of processor(s), state of network (e.g., noisy or not).
  • the QP type is selected from multiple candidate QP types according to the analysis.
  • the QP type is selected for optimizing the available resources for establishment of the connection, for example, in comparison to the requested QP type and/or in comparison to the candidate QP types.
  • the initially requested QP type may be changed to the selected QP type.
  • the connection is established with the selected QP type for transfer of data across the network.
  • the QP type used for the connection may be dynamically re-selected while the connection is active.
  • Another QP type may be selected from the candidate QP types according to a re-analysis of the current state of the resource-related parameters.
  • Another connection may be newly established using the dynamically re-selected QP type.
  • the existing connection may be migrated to the newly established connection with the re-selected QP type.
  • the QP type is for establishment of another connection, where one or more other connections have been previously established.
  • the connection that is requested may be joined to an existing connection using a selected QP type, for example, by multiplexing the two (or more) streams of data for transfer across the existing connection.
  • a new connection is established using a QP type selected based on an analysis of the resource-related parameters. One or more previously established connections are migrated to the new connection.
  • the requested QP type for a new connection may not necessarily represent the most optimal use of existing resources of the host and/or device and/or network, for example, memory, hardware resources, processing resources, cache, and network resources.
  • RC QPs may be the best solution in terms of transport perspective, but RC QPs are very expensive on available resources.
  • the type of the QP that provides most overall (e.g., globally for the device, network and/or host) optimal utilization of available resources may be selected, rather than the QP that is best for the request application.
  • Optimization of overall resources is especially significant for high end servers that establish thousands of QPs for thousands of network connections, for example, 10,000, 100,000, or 1,000,000.
  • the dynamic selection of the QP type for establishing connections provides improved utilization of resources over existing approaches where each application is granted whichever QP type it requested. For example, for N number of network nodes, each network node running M processes that establish network connection. For the case of all M processes wishing to communicate with all the processes on all the nodes, the number of RC QPs needed to engage this “all to all” communication is (M A 2)*(N-1) per node.
  • RD has a lower footprint: each node needs M QPs + N “end-to-end” (EE) connections to achieve the same all-to-all communication pattern. But RD is limited on transport. Its most significant limitation is the single outstanding message supported per EE context.
  • DC which is a proprietary solution, has a lower footprint compared to RC, but if it uses many DCI/DCTs it still consumes a lot of resources and it not efficient for caching. Another drawback of DC is that it must use frequent connect-disconnect operations.
  • the data transfer described herein between different computing devices is based on the Remote Direct Memory Access (RDMA) protocol, which is the primary method of transport for remote memory operations.
  • RDMA Remote Direct Memory Access
  • the viability of RDMA relies heavily on its reliability, high bandwidth and low-latency properties. Therefore, selection of the QP type plays an important role in determining the balance between reliability, high bandwidth, and low-latency of RDMA, especially for a large number of network connections.
  • At least some embodiments described herein select the optimal QP types for RDMA to maximize low-latency while meeting required reliability and/or lower memory footprint and/or utilize device cache in an optimal way (e.g., prevent a large amount of cache misses), especially when a large number of connections are established.
  • the different QP types are available.
  • the different QP types may be represented as a combination of a first sub-type, and a second sub-type.
  • the first sub-type is Reliable or Unreliable.
  • the Reliable first sub-type provides a guarantee that messages are delivered at most once, mostly in order and without corruption.
  • the Unreliable first sub-type does not provide any guarantee that the messages will be delivered or about the order of the packets.
  • every packet has a cyclic redundancy check (CRC) and corrupted packets are dropped (for any transport type).
  • CRC cyclic redundancy check
  • the Reliability of a QP transport type refers to the whole message reliability.
  • the second sub-type is Connected or Unconnected.
  • the Connected first sub-type refers to one send/receive QP being associated with exactly one other QP.
  • the Unconnected first sub-type refers to one send/receive QP being associated with multiple other QPs.
  • Exemplary QP types are defined by the InfinibandTM specification, including Reliable Connection (RC), Reliable Datagram (RD), Extended Reliable Connection (XRC), Unreliable Datagram (UD), and Unreliable Connection (UC). Additional QP types have been developed, for example, Scalable Reliable Datagram (SRD), and Dynamic Connection (DC) also known as Dynamically Connected Transport (DCT).
  • the present disclosure may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • a network for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
  • FPGA field-programmable gate arrays
  • PLA programmable logic arrays
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • FIG. 1 is a block diagram of a computing device 104 for selecting a QP type for data transfer across a network 120, in accordance with some embodiments.
  • FIG. 2 is a block diagram of multiple computing devices 104 transferring data across network 120 using the selected QP type, in accordance with some embodiments.
  • FIG. 3 is a flowchart of a method for selecting a QP type for data transfer across a network, in accordance with some embodiments.
  • Computing device 104 may implement the acts of the method described with reference to FIG.
  • processor(s) 102A of a computing device 104 executing code instructions (e.g., code 150) stored in a memory 106A, by processor(s) 102A of computing device 104 implemented in hardware to perform the instructions defined by code 150, by processor(s) 102B of a network interface device (e.g., network interface card) 114 executing code instructions (e.g., code 150) stored in a memory 106B, and/or processor(s) 102B implemented in hardware to perform the instructions defined by code 150.
  • code instructions e.g., code 150
  • Queue 106B stores the QP of the selected type, as described herein.
  • QP 106B may be stored, for example, by memory 106 A of computing device 104 and/or memory 106B of network interface device 114.
  • Computing device 104 may act as a network node, and may sometime be referred to herein as network node.
  • computing device 104 communicates with one or multiple other instances of the computing device 104 (e.g., another network node) over a network 120, by establishing the network connection using the QP of the dynamically selected type, as described herein.
  • Computing device 104 may be implemented as, for example, as server, a client, an initiator network node that initiates the establishment of network connection, and/or a target network node that receives he request from the initiator to establish the network connection.
  • Computing device 104 may be implemented as, for example, one or more of: a computing cloud, a single computing device (e.g., client terminal), a group of computing devices arranged in parallel, a network server, a local server, a remote server, a client terminal, a mobile device, a stationary device, a kiosk, a smartphone, a laptop, a tablet computer, a wearable computing device, a glasses computing device, a watch computing device, and a desktop computer.
  • Processor(s) 102 implemented as for example, central processing unit(s) (CPU), graphics processing unit(s) (GPU), field programmable gate array(s) (FPGA), digital signal processor(s) (DSP), application specific integrated circuit(s) (ASIC), customized circuit(s), processors for interfacing with other units, and/or specialized hardware accelerators.
  • Processor(s) 102 may be implemented as a single processor, a multi-core processor, and/or a cluster of processors arranged for parallel processing (which may include homogenous and/or heterogeneous processor architectures).
  • Memory 106A stores code instructions implementable by processor(s) 102A, and/or memory 106B stores code instructions implementable by processors) 102B of network interface device 114.
  • Memory 106A-B is implemented as, for example, a random access memory (RAM), read-only memory (ROM), and/or a storage device, for example, non-volatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media (e.g., DVD, CD-ROM).
  • RAM random access memory
  • ROM read-only memory
  • storage device for example, non-volatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media (e.g., DVD, CD-ROM).
  • Memory 106A may store virtual machine manager (VMM) 108 that manages and/or runs one or more virtual machines (VM) 110.
  • VMM 108 may be implemented as a hypervisor.
  • VMM 108 may be implemented in hardware, software, firmware, and/or combination of the aforementioned.
  • Each VM 110 executes one or more virtual function (VF) drives 112.
  • VF virtual function
  • Computing device 104 includes and/or is in communication with one or more network interface devices 114, optionally network interface cards and/or network adapters
  • Network interface device 114 may include processor(s) 102B and memory 106B. Additional features of the methods described herein may be implemented by computing device 104 (e.g., processor(s) 102 A executing code 105 stored in memory 106 A) and/or by network interface device 114 (e.g., processor(s) 102B executing code 105 stored in memory 106B).
  • computing device 104 e.g., processor(s) 102 A executing code 105 stored in memory 106 A
  • network interface device 114 e.g., processor(s) 102B executing code 105 stored in memory 106B
  • Computing device 104 may include and/or be in communication with one or more data storage devices 118.
  • Data storage devices 118 may store, for example, the candidate QP types that may be selected. It is noted that code instructions may be selectively loaded from data storage device 118 into memory 106 for execution by processor(s) 102.
  • Data storage device(s) 118 may be implemented as, for example, a memory, a local hard-drive, a removable storage unit, an optical disk, a storage device, and/or as a remote server and/or computing cloud (e.g., accessed via a network connection).
  • Computing device 104 may be in communication with network 120 via network interface device 114, for example, the internet, a local area network, a virtual network, a wireless network, a cellular network, a local bus, a point to point link (e.g., wired), and/or combinations of the aforementioned.
  • network interface device 114 for example, the internet, a local area network, a virtual network, a wireless network, a cellular network, a local bus, a point to point link (e.g., wired), and/or combinations of the aforementioned.
  • Network interface device 114 may be associated with one or more physical function (PF) driver(s) 116.
  • Network interface device 114 may be virtualized, for use by multiple VMs 110 via corresponding executed VF drivers 112.
  • different VMs 110 may access network 120 by VF driver(s) 112 used to access PF driver(s) 116 of the network interface.
  • Computing device 104 may include and/or be in communication with one or more physical user interfaces 122 that include a mechanism for user interaction, for example, to enter data (e.g., define set of rules for selection of the QP types) and/or to view data (e.g., view available resources and/or established connections).
  • data e.g., define set of rules for selection of the QP types
  • view data e.g., view available resources and/or established connections
  • Exemplary physical user interfaces 122 include, for example, one or more of, a touchscreen, a display, gesture activation devices, a keyboard, a mouse, voice activated software using speakers and microphone, and an orchestrator sending data over a network interface.
  • multiple instances of computing device 104 which include at least processors 102A and/or 102B, transfer data to each other over network 120 with network connections using the selected QP type 106B.
  • Various implementations of the multiple instance of computing device 104 are described below, for example, transfer of data between two instances of computing device 104 (e.g., one computing device is an initiator/local and the other is a target/remote), all-to-all communication where each computing device 104 communicates with all other computing devices 104, and single to multiple where one computing device 104 communicates with multiple other computing devices 104.
  • a request to establish a network connection for data transfer across the network is received.
  • the request may be initiated, for example, by an application running on a local computing device.
  • the request may be initiated from a remote device (e.g., client, such as an initiating device) for establishment of the network connection for communication with a local device (e.g., server, such as a target device).
  • client such as an initiating device
  • server such as a target device
  • the request may in response to, and/or including, devices notifying each other that they support the capability of dynamic selection of QP type, as described herein.
  • a handshake is performed between the remote and local device (e.g., between the target and initiator device) to ensure that both devices support the dynamic selection of QP type, as described herein.
  • the capability for dynamic selection of the QP type may be enabled and/or managed, for example, using a discovery capability process and/or negotiating protocol.
  • the request is received, for example, by one or more of: by the processing circuitry running the application, by the network interface card that established the network connection for connecting to the network, and/or by a hypervisor hosting the application.
  • the request may be to use a QP type defined by the Application for establishment of the network connection.
  • the Application may select the QP type that is best for its own needs, without consideration of the available resources.
  • the requesting Application may be unaware of the actual QP that is selected when the selected QP is different than the requested QP type.
  • the request defines a dynamic QP type for establishment of the network connection.
  • the dynamic QP type indicates that the Application is aware that any candidate QP type may be selected, as described herein.
  • Each resource-related parameter is indicative of a respective resource related state of the network connection for establishment.
  • Each resource-related parameter may be indicative of the real time state of one or more resources uses to establish the network connection, and/or used to process the data transmitted over the established network connection.
  • Resource-related parameters may indicate the current resource availability state of one or more components for example, processor(s), memory, available QPs of the requested type, remotely connected nodes, and the network.
  • the parameters provide a picture of the existing resource usage, which enables selecting the QP type for the network connection that will provide most optimal use of the existing resources.
  • Exemplary resource-related parameters include: number of network nodes the Application that provided the request is communicating with, number of existing network connections of the Application that provided the request, topology of existing network connection of the Application, total number of active QPs of network connections that were previously established, whether remaining QPs of the type of the request are available for allocations, transport reliability of the network, current memory utilization, and current utilization of the processing circuitry.
  • the analysis may be performed using one or more methods. For example, using a set of rules for the resource-related parameters that result in the selected type for the QP.
  • the set of rules may be manually and/or automatically defined.
  • the set of rules may be based on a prediction that following the set of rules may improve utilization of the available resources, and/or optimize the available resources.
  • the analysis is performed using a trained resource classifier that receives the resource-related parameters as input and generates an outcome of the selected type for the QP.
  • the resource classifier is trained on a training dataset of sample resource-related parameters and a label of a type of QP.
  • optimization code uses a mathematical model and/or set of equations to compute the type for the QP that optimizes the resource-related parameters and/or that simulates the resource outcomes for different QP types.
  • the QP type is selected from multiple candidate QP types according to the analysis.
  • the QP type may be defined for the RDMA and/or transport protocol.
  • the QP types may be based on published definitions, and/or based on custom created QP types.
  • the QPs are created according to the selected type.
  • the type of QP is specified when the QP is created.
  • Exemplary QP types include: Reliable Connection (RC), Reliable Datagram (RD), Extended Reliable Connection (XRC), Unreliable Datagram (UD), Unreliable Connection (UC), Scalable Reliable Datagram (SRD), and Dynamic Connection (DC), which are discussed below in additional detail.
  • RC Reliable Connection
  • RD Reliable Datagram
  • XRC Extended Reliable Connection
  • UD Unreliable Datagram
  • UC Unreliable Connection
  • SRD Scalable Reliable Datagram
  • DC Dynamic Connection
  • the selected QP type is within a same group as the requested QP type.
  • the group may be a reliable group of reliable QP types, or an unreliable group of unreliable QP types.
  • the selected QP type may be of a different sub-type than the requested QP type. Maintaining the reliability or unreliability of the QP types according to the requested QP type for the network connection maintains compatibility with the applications and/or processes that use the data transferred over the network. For example, an application that expects reliable transfer of data receives reliably transferred data, and does not need to handle unexpected unreliably transferred data.
  • the network connection is for the reliable type of QP
  • the selected type of QP is of the unreliable type of QP with reliability provided by a reliability layer implemented by one of, or combination of: software, firmware, and hardware.
  • the requested reliability may be provided using the unreliable QP type with the reliability layer, which may improve resource utilization by using the unreliable QP type which uses fewer resources than the reliable QP type.
  • the request to establish the network connection may be for one type of QP, and another QP type that is different than the requested QP type is selected.
  • the selection process may be performed in an implicit mode, where the request is ignored and the QP type providing best optimization of available resources is selected without the requesting Application and/or process being aware that the requested QP type has been changed.
  • the request to establish the network connection may be for an interchangeable QP type (IQT) that is dynamically selected.
  • the dynamic QP type is not an actual QP type, but an indication for dynamic selection of the QP type, without specifying which specific QP type is requested for establishment of the network connection.
  • the dynamic QP type indicates that the Application will handle any of QP types that are actually selected from the candidate types (that exclude the dynamic type). Highest and/or best resource utilization efficiency may result by processes and/or applications indicating selecting any of the candidate types for the QP using the dynamic QP type.
  • Each QP type may include a pair of queues, a send queue (SQ) and a receive queue (RQ).
  • Message transfer requests may be posted to the SQ, for example, by the Application sending the data across the network.
  • SQ Logic transmits an outbound message transfer request to a remote QP’s RQ Logic, i.e., only untagged operations - SEND opcodes, since read and write messages do not pass through the RQ.
  • Work Requests WRs
  • Each computing device i.e., network node
  • Selection of the QP types may reduce the total number of QPs and/or reduce the total number of network connections and/or improve utilization of processing resources, as descried herein.
  • the network connection is established with the selected type of the QP for data transfer across the network.
  • the QP contexts of the two QPs may be each programmed with the identity of the remote QP as well as the address of the port behind which the remote QP lives.
  • the port is constant, and there are other fields to distinguish the application, for example:
  • RoCE RD - RDETH and DETH has EEC and source QP
  • the following is an exemplary process for establishing the network connection using the selected QP type, between a client which requests establishment of the network connection with a service in a remote network node, and a server which hosts the service provided to the client.
  • the client sends to the server a REQ message with an indication (e.g., ServicelD) of the service to establish the network connection with.
  • the server verifies that the service exists, and then creates a local QP and/or an EEC and a QP, and sends back info in a message about the created QP and/or EEC to the requesting client.
  • the new local QP and/or EEC is in the ready to receive (RTR) state, i.e., ready to receive messages, but it cannot send messages until the client’s QP’s setup is complete.
  • the client receives the message, and uses the info in the message to complete the setup of the local QP and/or EEC, and transition its local QP and/or EEC to the ready to send (RTS) state.
  • the client sends a message to the server to transition is QP and/or EEC to the RTS state.
  • the server transitions its local QP and/or EEC to RTS state.
  • the server’s local QP and/or EEC will automatically transition from the RTR to the RTS state (even when the message from the client has not been received) when the first packet sent by the remote QP and/or EEC’s Send Logic is received.
  • the iterations may be performed for monitoring the resource-related parameters, to detect significant changes that trigger a re-analysis of the resource-related parameters, for example, increased noise in the network, new connections to new client terminals, reduction in available memory and/or reduction in processor capacity. The iterations may be performed without being necessarily triggered by the monitoring.
  • the resource-related parameters may be re-analyzed, for example, at defined time intervals (e.g., every minute, every 5 minutes, every 30 minutes, every hour, and other time intervals), and/or while data is transferred over the network and/or when the network connection is active but no data is being transferred, and/or every time a new network connection is being brought up.
  • defined time intervals e.g., every minute, every 5 minutes, every 30 minutes, every hour, and other time intervals
  • the QP types of the existing connection may be dynamically changed during transfer of data over the network using the established network connection, to provide a different QP type that is expected to improve utilization of resources.
  • the resource-related parameters may be re-analyzed while the established network connection is active, optionally during data transfer over the established network connection and/or when no data transfer is taking place.
  • another QP type is re-selected from the candidate types according to the re-analysis.
  • another new network connection may be established with the re-selected other type of the QP.
  • the previously established network connection may be dynamically migrated to the new network connection, and/or the transfer of data over the previously established network connection may be reinstated to be transferred over the new network connection.
  • one or more features described with reference to 302-308 are iterated.
  • the iterations may be performed for processing of new requests to establish new network connections, for example, by the same Application, and/or by a different Application.
  • data traffic on the previously established network connection may be migrated to the new network connection, and multiplexed with other data traffic designated for the new network connection.
  • the previously established network connection may be terminated once the data traffic has been migrated to the new network connection.
  • data designated for the new network connection is migrated to the previously established network connection, and multiplexed with other data traffic being transferred over the previously established network connection.
  • the new network connection (as in 308) is not necessarily established.
  • data traffic on the previously established network connection may be migrated to the new network connection, in response to an iteration of 302-308.
  • Two network connections to transfer two sets of data over the network may be merged into a single network connection.
  • the merged second network connection may use fewer resources than the two independent network connections.
  • an additional request to establish another network connection for data transfer across the network is received, from the same application that issued the previous request, and/or from a different application.
  • an additional analysis of the resource-related parameters is performed.
  • the additional analysis of the resource-related parameters may be performed while the previously established network connection is active, optionally transferring data.
  • the additional analysis may represent the resource impact of the previously established network connection and/or for establishment of the additional network connection, optionally considering the resource impact of a combination of the previously established and requested network connections.
  • another QP type is selected according to the additional analysis.
  • the additional network connection with the newly selected type of QP is established.
  • the previously established network connection is migrated to the newly established (i.e., additional) network connection.
  • the newly established (i.e., additional) network connection transfers data across the network for the previous network connection and for the newly established network connection, for example, by multiplexing the two data streams.
  • Two independent network connections to transfer two sets of data over the network requested by two different applications may be merged into a single network connection that is used by both applications.
  • the merged second network connection may use fewer resources than the two independent network connections.
  • the previously established network connection may have been established in response to a request from a first Application.
  • the new request may be provided by a second Application, which is different from the first Application.
  • the newly established network connection transfers data over the network for the first application and the second application.
  • the dynamic migration from the previously established network connection to the newly established network connection is performed by an exemplar processes that stalls the current network traffic over the previously established network connection, for performing the migration transparently, without significant interruption, and/or without the Application associated with the current network traffic being aware of the migration.
  • the dynamic migration may be performed using the following exemplary process: network traffic on the previously established network connection is stalled. Acknowledge messages are received. The acknowledge messages indicate that packets transmitted over the previously established network connection prior to the stalling have been received by a device at another end of the previously established network connection. The acknowledge messages indicate that the previously established network connection is “empty”, i.e., no packets are currently traversing the previously established network connection. Additional network traffic destined for the previously established network connection is redirected to the additional (i.e., newly established) network connection, that uses the additional (i.e., newly) selected QP type.
  • the previously established network connection may be terminated when the data traffic designed for the previously established network connection has been redirected to the additional (i.e., newly established) network connection.
  • the previously established network connection may be terminated in response to receiving an indication that at least one first packet of the re-routed network traffic has passed over the additional network connection, reaching the device at the other end of the additional network connection. Resources tied up by the previously established network connection are freed by the termination of the previously established network connection.
  • data traffic destined for a new network connection may be initialized on the previously established network connection, in response to an iteration of 302-308.
  • the network traffic destined for a newly established network connection may be added to the existing network connection. Resources may be saved by using the already existing network connection, rather than adding another network connection.
  • an additional request to establish another network connection for data transfer across the network is received, from the same application that issued the previous request, and/or from a different application.
  • an additional analysis of the resource-related parameters is performed.
  • the additional analysis of the resource-related parameters may be performed while the previously established network connection is active, optionally transferring data.
  • the additional analysis may represent the resource impact of the previously established network connection and/or for establishment of the additional network connection, optionally considering the resource impact of a combination of the previously established and requested network connections.
  • another QP type is selected according to the additional analysis. It is noted that in 308, another new network connection with the newly selected QP type is not established.
  • a previously established network connection that uses the QP type that is the same as the newly selected QP type is identified.
  • the previously established network connection has been previously established in response to a previous request, before the new request to establish the additional network connection has been received.
  • the previously established network connection is used for transfer of data across the network for both network traffic originally destined for the previously established network connection, and for network traffic destined for the additional network connection (which is not independently established), for example, by multiplexing both data streams.
  • the new data stream may be initialized on the previously established network connection.
  • the devices e.g., target and initiator, server and client
  • the devices may handshake and/or notify whether they support the dynamic QP type selection capability.
  • resource-related parameters are analyzed.
  • a first RC QP is selected.
  • the selected first RC QP and first network connection is created on both devices.
  • the RC QP traffic starts.
  • an additional request is received.
  • the request may be to establish a network connection using RC QP, or no specific QP is specified (e.g., the dynamic QP type is requested).
  • an additional analysis of the current state of the resource-related parameters is performed.
  • the RD QP type is selected, for example, based on the analysis that determines that the RD QP type saves resources over another QP type such as the RC QP type.
  • the RD QP type may be selected even when the RC QP type is requested.
  • an RD QP and an EEC End to End Context are created, which are used by the second network connection for transfer of traffic across the network.
  • the traffic of the RD QP starts for the first EE (End to End).
  • data traffic over the first established network connection that uses the RC QP type is migrated to the additional network connection established with the RD QP type. Additional data starts transmitting over the new selected RD QP type after creating the EEC on the remote device.
  • the network interface device and/or hypervisor stalls the traffic over the previously established network connection using the RC QP for a short period of time to “drain the pipe”, i.e., make sure that no packets are being transmitted over the previously established network connection.
  • ACKs acknowledgement messages
  • the WQEs are posted on the RD QP of the additional (i.e., newly established) network connection rather than on the RC QP of the previously established network connection.
  • FIGs. 4A-4F are schematics depicting exemplary QP types for selection, in accordance with some embodiments.
  • FIG. 4A depicts a reliable connection (RC) QP type.
  • RC is usually selected for applications that require the highest quality of service for data transmission over a network, for example, mission critical applications.
  • RC provides the highest level of reliability and predictability available.
  • ACKs A positive acknowledge packet that is returned to signal the successful receipt and processing of a send or RDMA write request packet
  • NAKs A negative acknowledgement packet that is returned to signal one of the following: a temporary receiver not read condition, a PSN sequence error NAK, and a fatal NAK error
  • the RC protocol consumes a substantial amount of bandwidth.
  • a local QP 402 on a local host 404 is associated with exactly one remote QP 406 on a remote host 408, forming a dedicated channel between the QPs for transmission of data 410.
  • the hardware protocol between QPs 402 and 406 provides reliable transport. Reliable transport is provided by detecting missing, corrupted or invalid packets and automatically suspending further activity between the two QPs, resends the lost/faulty packets, and then resumes operation. Every queue operation is acknowledged 412 and every operation completes exactly once, and in the same order it was issued.
  • the maximum message size is not limited by packet size or by the maximum transfer unit (MTU) size defined for the channel. Segmentation and re-assembly of long messages happens in hardware and is transparent to the application.
  • RC supports all InfiniBandTM services: SEND/RECEIVE, RDMA-Read, RDMA- Write and Atomic.
  • FIG. 4B depicts an unreliable connection (UC) QP type. Since the target QP’s RQ Logic does not generate ACKs and NAKs, the UC protocol consumes significantly less bandwidth than the RC protocol.
  • UC unreliable connection
  • the setup of the Unreliable Connection is similar to the Reliable Connection described with reference to FIG. 4A, including a dedicated channel between local and remote QPs 402 and 406 of local and remote hosts 404 408 for transmission of data 410.
  • Send queue operations are marked as complete as soon as the QP transmits them.
  • a missing or erroneous message is not automatically retried. It is dropped and the QP does not provide the sender with any indication as to whether the message was successfully delivered or not.
  • RC service operations complete in order, and the maximum size of a message is not limited by the packet size or path MTU. SEND/RECEIVE and RDMA- Write are supported, but RDMA-Read and Atomic Operations are not.
  • UC provides an efficient communication for some applications, like streaming, for which missing data is not critical.
  • FIG. 4C depicts an Unreliable Datagram (UD) QP type.
  • An UD QP can send messages to and receive messages from any number of UD QPs located in one or more other network nodes. No ACK or NAK is returned for each request packet received. Since the target QP’s Logic does not generate ACKs and NAKs, the UD protocol consumes significantly less bandwidth than the RC and RD protocols.
  • Unreliable Datagram is a connectionless service.
  • An UD QP can potentially send data to any other UD QP in the system.
  • a QP 402 on an initiating host 404 sends Data 410A-C to any of QPs 406A-C on target hosts 408A-C.
  • the sending QP must send a key, called a Q Key, which must match the receiver’s Q Key, or the message is dropped. This prevents writing to unintended locations.
  • the only valid operation is SEND/RECEIVE.
  • a QP configured for Unreliable Datagram service cannot detect lost data, data received out of order or data received multiple times. In all these cases, the message to which this data belongs is said to be received in error, and is silently dropped.
  • the maximum message size is limited by the largest packet size (MTU) supported by the path (256 bytes to 4K bytes). If reliability is needed, it is provided by the upper layer software protocols.
  • MTU packet size
  • FIG. 4D depicts a Reliable Datagram (RD) QP type.
  • a RD QP can send messages to and receive messages from any number of RD QPs located in one or more other network nodes. It does so through one or more “pipelines” that are established between the local network node and one or more remote network nodes.
  • Each “pipeline” is referred to as a Reliable Datagram Channel (RDC) and acts as the conduit through which multiple local client RD QPs send messages to and receive messages from RD QPs residing in the remote network node. Due to the generation of ACKs and NAKs, the RD protocol consumes a substantial amount of bandwidth.
  • RDC Reliable Datagram Channel
  • the Reliable Datagram service combines the features of RC and UD service. In particular, it allows the same QP to interact with multiple remote QPs simultaneously, while providing reliability.
  • a QP 402 on an initiating host 404 sends Data 410A-C to any of QPs 406A-C on target hosts 408A-C, and receives acknowledgements (ACKs) 412A-C.
  • ACKs acknowledgements
  • RD provides a multiplexed reliable connection channel.
  • the QP 402 is logically associated with a set of remote RD QPs 406A-C. This service is most useful to an application having a number of different processes running on each node that need to communicate with each other in a reliable manner.
  • EEC End to End Context
  • RDD reliable datagram domain
  • FIG. 4E depicts an Extended Reliable Connection (XRC) QP type.
  • XRC will be described in the environment of data 410 and acknowledgements (ACK) 412 transmitted between a local QP 402 and a remote QP and 406 of respective local host 404 and remote host 404 408.
  • ACK acknowledgements
  • XRC allows significant savings in the number of QPs required to establish all-to-all process connectivity in large clusters.
  • XRC is different from RD in several ways, but first and foremost it eliminates the most significant limitation of the RD Transport Service: having a single outstanding message per EE context. The savings in the overall number of required QPs occurs because of the way XRC operates on the responder side.
  • the responder connection context (denoted as XRC TGT QP) allows the requester process to send messages targeting multiple destination XRC SRQs 414, which belong to multiple processes on the responder's node.
  • XRC INI XRC INI
  • a process in one node can communicate with all processes on a remote node, thus reducing by a factor of p (number of processes per node) the total number of QPs required for full connectivity, compared to when RC QPs are used.
  • XRC SRQs 414 are the responder node per-process receive queues that can be targeted from multiple remote end nodes through the XRC TGT QPs.
  • RD QPs are in a way equivalent to the receive queues in RD QPs and as such there is only one required per process that allows it to receive messages from any process on any node in the cluster.
  • RD QPs are limited to be used with RD EE contexts in their same Reliable Datagram Domain (RDD)
  • the XRC Transport Service implements an equivalent XRC Domain mechanism that serves the same purpose.
  • XRC TGT QPs can only be used as a conduit to access XRC SRQs that were setup on their same XRC Domain.
  • FIG. 4F depicts a Dynamically Connected (DC) QP type.
  • DC is a scalable Transport Service, which reduces the number of QPs per node, compared to RC.
  • DC has RC-like reliability semantics.
  • DC has a symmetric API.
  • DCT 416 On the responder's side there is DC Target, or DCT 416, where one is enough.
  • DCT 416 On the responder's side there is a DC Initiator, or DCI 418, where one is enough.
  • DC forms “temporary connections”. First send-WR on a DCI, connects this DCI to a remote DCT. Second send-WR uses this open connection.
  • DCI disconnects after some idle period without sends.
  • a DCI can “switch destinations”, if the next send-WR has a different destination specified.
  • a DCT has a pool of “responders” (DCRs). Each incoming DC connection is allocated a DCR.
  • DCI Recycling Tradeoffs Too few DCIs - Same DCI switches back-and-forth between destinations. Redundant connect/disconnect flows (worst case: per-send). Hurts latency. Too many DCIs - Still not as bad as N A 2 RC QPs. Consumes resources and is bad for caching. Best practice is to maintain a ⁇ DCI dest> hash table, reducing connection re-establishment. LRU recycling policy, to increase the odds of picking a disconnected DCI to send on.
  • FIGs. 5-7 are schematics depicting a comparison of dynamically selecting the QP type with standard approaches, in accordance with some embodiments.
  • FIG. 5 depicts a process of establishing RC connections according to standard approaches.
  • Schematic 502A depicts establishment of a first RC network connection 504A (for reliable transfer of Data by providing ACK) between a server host 506 and a client host 508 using RC QP type 510A.
  • Schematic 502B depicts establishment of a second RC network connection 504B between server 506 and client 508 using RC QP type 510B.
  • First RC connection 504A connects between Process A 512 running on server 506 and Process C 514 running on client 508.
  • Second RC connection 504B connects between Process A 512 running on server 506 and Process D 516 running on client 508.
  • FIG. 6 depicts an example of a process of establishing connections using embodiments described herein that dynamically select the QP type.
  • Schematic 602A depicts establishment of a first RC network connection 604A (for reliable transfer of Data by providing ACK) between a server host 606 and a client host 608 using RC QP type 610A. It is noted that the process depicted in schematic 602A may be the same as the standard approach depicted in schematic 502A of FIG. 5.
  • Schematic 602B depicts the response to a request from client 608 to add a second RC connection to server 606.
  • First RC connection 604A is between Process A 612 running on server 606 and Process C 614 running on client 608.
  • the request is for a second RC connection is between Process A 612 (i.e., the same as for the first RC connection) running on server 606 and Process D 616 running on client 608 (i.e., different than for the first RC connection).
  • the QP type is changed from RC type 610A to RD type 610B, both on the server 606 and on the client 608.
  • the RD QP type may be selected, for example, to save hardware resources and/or when no more RC QPs are available.
  • FIG. 7 depicts another example of a process of establishing connections using embodiments described herein that dynamically select the QP type.
  • Schematic 702A depicts establishment of a first RC network connection 704A (for reliable transfer of data by providing ACK) between a server host 706 and a client host 708 using RC QP type 710A. It is noted that the process depicted in schematic 702A may be the same as the standard approach depicted in schematic 502A of FIG. 5 and/or schematic 602A of FIG. 6.
  • Schematic 702B depicts the response to a request from client 708 to add a second RC connection to server 706.
  • a second RC network connection would be established between server 706 and client 708.
  • a DC QP type 710B is dynamically selected and used to establish a network connection 704B between server 706 and client 708.
  • Previous RC QP type 710A used to establish network connection 704A is migrated to network connection 704B established using DC QP 710B.
  • the DC QP type may be selected, for example, to save hardware resources and/or when no more RC QPs are available.
  • composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
  • a compound or “at least one compound” may include a plurality of compounds, including mixtures thereof.
  • a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range.
  • the phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

There is provided processing circuitry, a method, and code instructions for selecting a type of a queue pair, QP, for data transfer across a network, for example, based on a Remote Direct Memory Access (RDMA) protocol. A request to establish a connection for data transfer across the network is received. Resource-related parameters are analyzed. Each resource-related parameter is indicative of a respective resource related state of the connection for establishment. The type for the QP is selected from multiple candidate types according to the analysis. The candidate QP types may be defined by a transport protocol The connection with the selected type of the QP for data transfer across the network is established.

Description

INTERCHANGEABLE QUEUE TYPE FOR NETWORK CONNECTIONS
BACKGROUND
The present disclosure, in some embodiments thereof, relates to network connections and, more specifically, but not exclusively, to systems and methods for management of resources for establishing network connections.
A network node, for example a server, may establish and simultaneously support thousands of network connections to other network nodes, such as storage servers, endpoint devices, and other servers in order to provide exchange of data across the network. The large number of simultaneous network connections consumes significant amount of resources.
SUMMARY
It is an object of the present disclosure to provide processing circuitry, a computing device, a method, and a computer-readable storage medium for data transfer across a network.
The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
According to a first aspect, processing circuitry for selecting a type of a queue pair, QP, for data transfer across a network, the processing circuitry is configured for: receiving a request to establish a network connection for data transfer across the network, analyzing a plurality of resource-related parameters each indicative of a respective resource related state of the network connection for establishment, selecting the type for the QP from a plurality of candidate types according to the analysis, and establishing the network connection with the selected type of the QP for data transfer across the network.
According to a second aspect, a method for selecting a type of a queue pair, QP, for data transfer across a network, comprises: receiving a request to establish a network connection for data transfer across the network, analyzing a plurality of resource-related parameters each indicative of a respective resource related state of the network connection for establishment, selecting the type for the QP from a plurality of candidate types according to the analysis, and establishing the network connection with the selected type of the QP for data transfer across the network.
According to a third aspect, a computer program comprising program instructions which, when executed by a processor, cause the processor to: receive a request to establish a network connection for data transfer across the network, analyze a plurality of resource-related parameters each indicative of a respective resource related state of the network connection for establishment, select the type for the QP from a plurality of candidate types according to the analysis, and establish the network connection with the selected type of the QP for data transfer across the network.
The requested QP type for a new connection may not necessarily represent the most optimal use of existing resources of the host and/or device and/or network, for example, memory, hardware resources, processing resources, cache, and network resources. For example, RC QPs may be the best solution in terms of transport perspective, but RC QPs are very expensive on available resources. The type of the QP that provides most overall (e.g., globally for the device, network and/or host) optimal utilization of available resources may be selected, rather than the QP that is best for the request application.
Optimization of overall resources is especially significant for high end servers that establish thousands of QPs for thousands of network connections, for example, 10 000, 100 000, or 1 000 000.
The dynamic selection of the QP type for establishing connections provides improved utilization of resources over existing approaches where each application is granted whichever QP type it requested. For example, for N number of network nodes, each network node running M processes that establish network connection. For the case of all M processes wishing to communicate with all the processes on all the nodes, the number of RC QPs needed to engage this “all to all” communication is (MA2)*(N-1) per node. RD has a lower footprint: each node needs M QPs + N “end-to-end” (EE) connections to achieve the same all-to-all communication pattern. But RD is limited on transport. Its most significant limitation is the single outstanding message supported per EE context. DC, which is a proprietary solution, has a lower footprint compared to RC, but if it uses many DCI/DCTs it still consumes a lot of resources and it not efficient for caching. Another drawback of DC is that it must use frequent connect-disconnect operations.
In a further implementation form of the first, second and third aspects, further configured for and/or further comprising: re-analyzing the plurality of resource-related parameters during data transfer over the established network connection, re-selecting another type of the QP from the plurality of candidate types according to the re-analysis, establishing a second network connection with the re-selected other type of the QP, and at least one of: dynamically transferring the network connection to the second network connection, and reinitiating the transfer of data of the network connection to be transferred over the second network connection.
When the resource-related parameters change, the QP types of the existing connection may be dynamically changed during transfer of data over the network using the established network connection, to provide a different QP type that is expected to improve utilization of resources.
In a further implementation form of the first, second and third aspects, further configured for and/or further comprising: wherein the network connection comprises a first network connection, receiving a second request to establish a second network connection for data transfer across the network, conducing a second analysis of the plurality of resource-related parameters each indicative of a respective resource related state of the established first network connection and for establishment of the second network connection, selecting a second type of the QP from the plurality of candidate types according to the second analysis, establishing the second network connection with the second type of QP, dynamically migrating the first network connection to the second network connection, wherein the second network connection transfers data across the network for the first network connection and for the second network connection.
Two network connections to transfer two sets of data over the network may be merged into a single network connection. The merged second network connection may use fewer resources than the two independent network connections.
In a further implementation form of the first, second and third aspects, the request is provided by a first application, the second request is provided by a second application, and the second network connection transfers data over the network for the first application and the second application.
Two independent network connections to transfer two sets of data over the network requested by two different applications may be merged into a single network connection that is used by both applications. The merged second network connection may use fewer resources than the two independent network connections.
In a further implementation form of the first, second and third aspects, dynamically migrating comprises: stalling network traffic on the first network connection, receiving acknowledge messages that packets transmitted over the first network connection prior to the stalling have been received by a device at another end of the first network connection, and using the second type of QP of the second network connection for additional network traffic destined for the first network connection.
By staling the current network traffic over the previously established network connection, the migration may be performed transparently, without significant interruption.
In a further implementation form of the first, second and third aspects, further configured for and/or further comprising: in response to receiving an indication that at least one first packet of the additional network traffic has passed over the second network connection, terminating the first network connection.
Resources tied up by the previously established network connection are freed by the termination of the previously established network connection.
In a further implementation form of the first, second and third aspects, further configured for and/or further comprising: receiving a third request to establish a third network connection for data transfer across the network, conducing a third analysis of the plurality of resource- related parameters each indicative of a respective resource related state of the established network connection and for establishment of the third network connection, selecting a third type of the QP from the plurality of candidate types according to the third analysis, wherein a fourth network connection with the third type of QP has been previously established by the processing circuitry for a fourth request prior to receiving the third request, and using the fourth network connection for transfer of data across the network associated with the third request and with the fourth request, wherein the third network connection is not independently established.
When a network connection with the selected second QP type already exists, the network traffic destined for a newly established network connection may be added to the existing network connection. Resources may be saved by using the already existing network connection, rather than adding another network connection.
In a further implementation form of the first, second and third aspects, the request to establish the network connection is for a first type of QP, and wherein a second type of QP different than the first type is selected.
The selection process may be performed in an implicit mode, where the request is ignored and the QP type providing best optimization of available resources is selected without the requesting Application and/or process being aware that the requested QP type has been changed.
In a further implementation form of the first, second and third aspects, the request to establish the network connection is for a reliable type of QP or for an unreliable type of QP of a first sub-type of QP, and wherein the selected type of QP is the reliable type of QP or the unreliable type of QP as defined by the request of a second sub-type of QP different than the first sub-type of QP defined by the request.
Maintaining the reliability or unreliability of the QP types according to the requested QP type for the network connection maintains compatibility with the applications and/or processes that use the data transferred over the network. For example, an application that expects reliable transfer of data receives reliably transferred data, and does not need to handle unexpected unreliably transferred data.
In a further implementation form of the first, second and third aspects, the network connection is for the reliable type of QP, and the selected type of QP is of the unreliable type of QP with reliability provided by a reliability layer implemented by at least one of or combination of: software, firmware, and hardware. Rather than using resource intensive reliable QP types, the requested reliability may be provided using the unreliable QP type with the reliability layer, which may improve resource utilization by using the unreliable QP type which uses fewer resources than the reliable QP type.
In a further implementation form of the first, second and third aspects, the request to establish the network connection is for an interchangeable QP type, IQT, and wherein the type of the QP is selected from the plurality of candidate types that exclude the dynamic type.
Highest and/or best resource utilization efficiency may result by processes and/or applications indicating selecting any of the candidate types for the QP using the dynamic QP type.
In a further implementation form of the first, second and third aspects, the plurality of resource-related parameters are selected from a group consisting of: number of network nodes an application that provided the request is communicating with, number of existing network connections of the application that provided the request, topology of existing network connection of the application, total number of active QPs of network connections that were established by the processing circuitry, whether QPs of the type of the request are available, transport reliability of the network, current memory utilization, and current utilization of the processing circuitry.
The parameters provide a picture of the existing resource usage, which enables selecting the QP type for the network connection that will provide most optimal use of the existing resources.
In a further implementation form of the first, second and third aspects, the plurality of candidate types for the QP pair are selected from a group consisting of: Reliable Connection, RC, Reliable Datagram, RD, Extended Reliable Connection, XRC, Unreliable Datagram, UD, Unreliable Connection, UC, Scalable Reliable Datagram, SRD, and Dynamic Connection, DC.
In a further implementation form of the first, second and third aspects, the analyzing is done using at least one of: a set of rules for the plurality of resource-related parameters that result in the selected type for the QP, a classifier that receives the plurality of resource-related parameters as input and generates an outcome of the selected type for the QP where the classifier is trained on a training dataset of plurality of sample resource-related parameters and a label of type of QP, and optimization code that uses a mathematical model and/or set of equations to compute the type for the QP that optimizes the plurality of resource-related parameters.
In a further implementation form of the first, second and third aspects, the data transfer across the network is according to a Remote Direct Memory Access, RDMA, protocol, and the plurality of candidate types of QP are defined by a network transport protocol for RDMA.
The viability of RDMA relies heavily on its reliability, high bandwidth, and low-latency properties. Therefore, selection of the QP type plays an important role in determining the balance between reliability, high bandwidth, and low-latency of RDMA, especially for a large number of network connections. At least some embodiments described herein select the optimal QP types for RDMA to maximize low-latency while meeting required reliability and/or lower memory footprint and/or utilize device cache in an optimal way (e.g., prevent a large amount of cache misses), especially when a large number of connections are established. In a further implementation form of the first, second and third aspects, the network transport protocol for RDMA defining the plurality of candidate types of QP is selected from a group consisting of: InfiniBand, IB, RoCE, Remote Direct Memory Access, RDMA, over Converged Ethernet, RoCEv2, iWARP, and derivatives of the aforementioned.
Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the disclosure pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the disclosure, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
Some embodiments of the disclosure are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the disclosure. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the disclosure may be practiced.
In the drawings:
FIG. 1 is a block diagram of a computing device for selecting a QP type for data transfer across a network, in accordance with some embodiments;
FIG. 2 is a block diagram of multiple computing devices transferring data across network using the selected QP type, in accordance with some embodiments;
FIG. 3 is a is a flowchart of a method for selecting a QP type for data transfer across a network, in accordance with some embodiments;
FIG. 4A-4F are schematics depicting exemplary QP types for selection, in accordance with some embodiments;
FIG. 5 is a schematic depicting a comparison of dynamically selecting the QP type with standard approaches, in accordance with some embodiments;
FIG. 6 is another schematic depicting a comparison of dynamically selecting the QP type with standard approaches, in accordance with some embodiments; and
FIG. 7 is yet another schematic depicting a comparison of dynamically selecting the QP type with standard approaches, in accordance with some embodiments.
DETAILED DESCRIPTION
The present disclosure, in some embodiments thereof, relates to network connections and, more specifically, but not exclusively, to systems and methods for management of resources for establishing network connections.
An aspect of some embodiments relates to processing circuitry, systems, methods, an apparatus, and/or code instructions (i.e. , stored on a computer readable medium for execution by one or more hardware processors) for automated selection of a type of a queue pair (QP), also referred to as QP type, for transfer of data across a network between two devices, for example, based on a Remote Direct Memory Access (RDMA) protocol, where the QP types are defined by a transport protocol, for example, Infiniband ™, RoCE, Remote Direct Memory Access, RDMA, over Converged Ethernet, RoCEv2, iWARP, proprietary QP types (e.g., of different vendors), and derivatives of the aforementioned. The selection of the QP type is in response to a requested establishment of a connection for transfer of the data across the network, for example, by an application uploading and/or downloading data to a remote data storage device. One or more resource-related parameters are analyzed. Each resource-parameter is indicative of a respective resource related state of the connection for establishment, for example, based on the available resources for establishment of the network connection, for example, available memory, available QPs of a requested type, utilization state of processor(s), state of network (e.g., noisy or not). The QP type is selected from multiple candidate QP types according to the analysis. The QP type is selected for optimizing the available resources for establishment of the connection, for example, in comparison to the requested QP type and/or in comparison to the candidate QP types. The initially requested QP type may be changed to the selected QP type. The connection is established with the selected QP type for transfer of data across the network.
Optionally, the QP type used for the connection may be dynamically re-selected while the connection is active. Another QP type may be selected from the candidate QP types according to a re-analysis of the current state of the resource-related parameters. Another connection may be newly established using the dynamically re-selected QP type. The existing connection may be migrated to the newly established connection with the re-selected QP type.
Optionally, the QP type is for establishment of another connection, where one or more other connections have been previously established. Based on the analysis of the resource-related parameters, the connection that is requested may be joined to an existing connection using a selected QP type, for example, by multiplexing the two (or more) streams of data for transfer across the existing connection. In another implementation, a new connection is established using a QP type selected based on an analysis of the resource-related parameters. One or more previously established connections are migrated to the new connection.
The requested QP type for a new connection may not necessarily represent the most optimal use of existing resources of the host and/or device and/or network, for example, memory, hardware resources, processing resources, cache, and network resources. For example, RC QPs may be the best solution in terms of transport perspective, but RC QPs are very expensive on available resources. The type of the QP that provides most overall (e.g., globally for the device, network and/or host) optimal utilization of available resources may be selected, rather than the QP that is best for the request application.
Optimization of overall resources is especially significant for high end servers that establish thousands of QPs for thousands of network connections, for example, 10,000, 100,000, or 1,000,000.
The dynamic selection of the QP type for establishing connections provides improved utilization of resources over existing approaches where each application is granted whichever QP type it requested. For example, for N number of network nodes, each network node running M processes that establish network connection. For the case of all M processes wishing to communicate with all the processes on all the nodes, the number of RC QPs needed to engage this “all to all” communication is (MA2)*(N-1) per node. RD has a lower footprint: each node needs M QPs + N “end-to-end” (EE) connections to achieve the same all-to-all communication pattern. But RD is limited on transport. Its most significant limitation is the single outstanding message supported per EE context. DC, which is a proprietary solution, has a lower footprint compared to RC, but if it uses many DCI/DCTs it still consumes a lot of resources and it not efficient for caching. Another drawback of DC is that it must use frequent connect-disconnect operations.
Optionally, the data transfer described herein between different computing devices is based on the Remote Direct Memory Access (RDMA) protocol, which is the primary method of transport for remote memory operations. The viability of RDMA relies heavily on its reliability, high bandwidth and low-latency properties. Therefore, selection of the QP type plays an important role in determining the balance between reliability, high bandwidth, and low-latency of RDMA, especially for a large number of network connections. At least some embodiments described herein select the optimal QP types for RDMA to maximize low-latency while meeting required reliability and/or lower memory footprint and/or utilize device cache in an optimal way (e.g., prevent a large amount of cache misses), especially when a large number of connections are established.
For RDMA, different QP types are available. The different QP types may be represented as a combination of a first sub-type, and a second sub-type. The first sub-type is Reliable or Unreliable. The Reliable first sub-type provides a guarantee that messages are delivered at most once, mostly in order and without corruption. The Unreliable first sub-type does not provide any guarantee that the messages will be delivered or about the order of the packets. In RDMA, every packet has a cyclic redundancy check (CRC) and corrupted packets are dropped (for any transport type). The Reliability of a QP transport type refers to the whole message reliability. The second sub-type is Connected or Unconnected. The Connected first sub-type refers to one send/receive QP being associated with exactly one other QP. The Unconnected first sub-type refers to one send/receive QP being associated with multiple other QPs. Exemplary QP types are defined by the Infiniband™ specification, including Reliable Connection (RC), Reliable Datagram (RD), Extended Reliable Connection (XRC), Unreliable Datagram (UD), and Unreliable Connection (UC). Additional QP types have been developed, for example, Scalable Reliable Datagram (SRD), and Dynamic Connection (DC) also known as Dynamically Connected Transport (DCT).
Before explaining at least one embodiment of the disclosure in detail, it is to be understood that the disclosure is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The disclosure is capable of other embodiments or of being practiced or carried out in various ways.
The present disclosure may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Reference is now made to FIG. 1, which is a block diagram of a computing device 104 for selecting a QP type for data transfer across a network 120, in accordance with some embodiments. Reference is also made to FIG. 2, which is a block diagram of multiple computing devices 104 transferring data across network 120 using the selected QP type, in accordance with some embodiments. Reference is also made to FIG. 3, which is a flowchart of a method for selecting a QP type for data transfer across a network, in accordance with some embodiments. Computing device 104 may implement the acts of the method described with reference to FIG. 3, for example, by one or more, or combination of: processor(s) 102A of a computing device 104 executing code instructions (e.g., code 150) stored in a memory 106A, by processor(s) 102A of computing device 104 implemented in hardware to perform the instructions defined by code 150, by processor(s) 102B of a network interface device (e.g., network interface card) 114 executing code instructions (e.g., code 150) stored in a memory 106B, and/or processor(s) 102B implemented in hardware to perform the instructions defined by code 150.
Queue 106B stores the QP of the selected type, as described herein. QP 106B may be stored, for example, by memory 106 A of computing device 104 and/or memory 106B of network interface device 114.
Computing device 104 may act as a network node, and may sometime be referred to herein as network node.
As shown, computing device 104 communicates with one or multiple other instances of the computing device 104 (e.g., another network node) over a network 120, by establishing the network connection using the QP of the dynamically selected type, as described herein. Computing device 104 may be implemented as, for example, as server, a client, an initiator network node that initiates the establishment of network connection, and/or a target network node that receives he request from the initiator to establish the network connection.
Computing device 104 may be implemented as, for example, one or more of: a computing cloud, a single computing device (e.g., client terminal), a group of computing devices arranged in parallel, a network server, a local server, a remote server, a client terminal, a mobile device, a stationary device, a kiosk, a smartphone, a laptop, a tablet computer, a wearable computing device, a glasses computing device, a watch computing device, and a desktop computer.
Processor(s) 102, implemented as for example, central processing unit(s) (CPU), graphics processing unit(s) (GPU), field programmable gate array(s) (FPGA), digital signal processor(s) (DSP), application specific integrated circuit(s) (ASIC), customized circuit(s), processors for interfacing with other units, and/or specialized hardware accelerators. Processor(s) 102 may be implemented as a single processor, a multi-core processor, and/or a cluster of processors arranged for parallel processing (which may include homogenous and/or heterogeneous processor architectures).
Memory 106A stores code instructions implementable by processor(s) 102A, and/or memory 106B stores code instructions implementable by processors) 102B of network interface device 114. Memory 106A-B is implemented as, for example, a random access memory (RAM), read-only memory (ROM), and/or a storage device, for example, non-volatile memory, magnetic media, semiconductor memory devices, hard drive, removable storage, and optical media (e.g., DVD, CD-ROM).
Memory 106A may store virtual machine manager (VMM) 108 that manages and/or runs one or more virtual machines (VM) 110. VMM 108 may be implemented as a hypervisor. VMM 108 may be implemented in hardware, software, firmware, and/or combination of the aforementioned.
Each VM 110 executes one or more virtual function (VF) drives 112.
Computing device 104 includes and/or is in communication with one or more network interface devices 114, optionally network interface cards and/or network adapters
Network interface device 114 may include processor(s) 102B and memory 106B. Features of the methods described herein may be implemented by computing device 104 (e.g., processor(s) 102 A executing code 105 stored in memory 106 A) and/or by network interface device 114 (e.g., processor(s) 102B executing code 105 stored in memory 106B).
Computing device 104 may include and/or be in communication with one or more data storage devices 118. Data storage devices 118 may store, for example, the candidate QP types that may be selected. It is noted that code instructions may be selectively loaded from data storage device 118 into memory 106 for execution by processor(s) 102. Data storage device(s) 118 may be implemented as, for example, a memory, a local hard-drive, a removable storage unit, an optical disk, a storage device, and/or as a remote server and/or computing cloud (e.g., accessed via a network connection).
Computing device 104 may be in communication with network 120 via network interface device 114, for example, the internet, a local area network, a virtual network, a wireless network, a cellular network, a local bus, a point to point link (e.g., wired), and/or combinations of the aforementioned.
Network interface device 114 may be associated with one or more physical function (PF) driver(s) 116. Network interface device 114 may be virtualized, for use by multiple VMs 110 via corresponding executed VF drivers 112. For example, different VMs 110 may access network 120 by VF driver(s) 112 used to access PF driver(s) 116 of the network interface.
Computing device 104 may include and/or be in communication with one or more physical user interfaces 122 that include a mechanism for user interaction, for example, to enter data (e.g., define set of rules for selection of the QP types) and/or to view data (e.g., view available resources and/or established connections).
Exemplary physical user interfaces 122 include, for example, one or more of, a touchscreen, a display, gesture activation devices, a keyboard, a mouse, voice activated software using speakers and microphone, and an orchestrator sending data over a network interface.
Referring now back to FIG. 2, multiple instances of computing device 104, which include at least processors 102A and/or 102B, transfer data to each other over network 120 with network connections using the selected QP type 106B. Various implementations of the multiple instance of computing device 104 are described below, for example, transfer of data between two instances of computing device 104 (e.g., one computing device is an initiator/local and the other is a target/remote), all-to-all communication where each computing device 104 communicates with all other computing devices 104, and single to multiple where one computing device 104 communicates with multiple other computing devices 104.
Referring now back to FIG. 3, at 302, a request to establish a network connection for data transfer across the network is received. The request may be initiated, for example, by an application running on a local computing device. In another example, the request may be initiated from a remote device (e.g., client, such as an initiating device) for establishment of the network connection for communication with a local device (e.g., server, such as a target device).
The request may in response to, and/or including, devices notifying each other that they support the capability of dynamic selection of QP type, as described herein. Optionally, a handshake is performed between the remote and local device (e.g., between the target and initiator device) to ensure that both devices support the dynamic selection of QP type, as described herein. The capability for dynamic selection of the QP type may be enabled and/or managed, for example, using a discovery capability process and/or negotiating protocol.
The request is received, for example, by one or more of: by the processing circuitry running the application, by the network interface card that established the network connection for connecting to the network, and/or by a hypervisor hosting the application.
The request may be to use a QP type defined by the Application for establishment of the network connection. The Application may select the QP type that is best for its own needs, without consideration of the available resources. The requesting Application may be unaware of the actual QP that is selected when the selected QP is different than the requested QP type.
Alternatively, the request defines a dynamic QP type for establishment of the network connection. The dynamic QP type indicates that the Application is aware that any candidate QP type may be selected, as described herein.
At 304, one or more resource-related parameters are analyzed. Each resource-related parameter is indicative of a respective resource related state of the network connection for establishment. Each resource-related parameter may be indicative of the real time state of one or more resources uses to establish the network connection, and/or used to process the data transmitted over the established network connection. Resource-related parameters may indicate the current resource availability state of one or more components for example, processor(s), memory, available QPs of the requested type, remotely connected nodes, and the network.
The parameters provide a picture of the existing resource usage, which enables selecting the QP type for the network connection that will provide most optimal use of the existing resources. Exemplary resource-related parameters include: number of network nodes the Application that provided the request is communicating with, number of existing network connections of the Application that provided the request, topology of existing network connection of the Application, total number of active QPs of network connections that were previously established, whether remaining QPs of the type of the request are available for allocations, transport reliability of the network, current memory utilization, and current utilization of the processing circuitry.
The analysis may be performed using one or more methods. For example, using a set of rules for the resource-related parameters that result in the selected type for the QP. The set of rules may be manually and/or automatically defined. The set of rules may be based on a prediction that following the set of rules may improve utilization of the available resources, and/or optimize the available resources. In another example, the analysis is performed using a trained resource classifier that receives the resource-related parameters as input and generates an outcome of the selected type for the QP. The resource classifier is trained on a training dataset of sample resource-related parameters and a label of a type of QP. In yet another example, optimization code uses a mathematical model and/or set of equations to compute the type for the QP that optimizes the resource-related parameters and/or that simulates the resource outcomes for different QP types.
At 306, the QP type is selected from multiple candidate QP types according to the analysis. The QP type may be defined for the RDMA and/or transport protocol. The QP types may be based on published definitions, and/or based on custom created QP types.
The QPs are created according to the selected type. The type of QP is specified when the QP is created.
Exemplary QP types include: Reliable Connection (RC), Reliable Datagram (RD), Extended Reliable Connection (XRC), Unreliable Datagram (UD), Unreliable Connection (UC), Scalable Reliable Datagram (SRD), and Dynamic Connection (DC), which are discussed below in additional detail.
Optionally, the selected QP type is within a same group as the requested QP type. The group may be a reliable group of reliable QP types, or an unreliable group of unreliable QP types. Within each group, the selected QP type may be of a different sub-type than the requested QP type. Maintaining the reliability or unreliability of the QP types according to the requested QP type for the network connection maintains compatibility with the applications and/or processes that use the data transferred over the network. For example, an application that expects reliable transfer of data receives reliably transferred data, and does not need to handle unexpected unreliably transferred data.
Optionally, the network connection is for the reliable type of QP, and the selected type of QP is of the unreliable type of QP with reliability provided by a reliability layer implemented by one of, or combination of: software, firmware, and hardware. Rather than using resource intensive reliable QP types, the requested reliability may be provided using the unreliable QP type with the reliability layer, which may improve resource utilization by using the unreliable QP type which uses fewer resources than the reliable QP type.
The request to establish the network connection may be for one type of QP, and another QP type that is different than the requested QP type is selected. The selection process may be performed in an implicit mode, where the request is ignored and the QP type providing best optimization of available resources is selected without the requesting Application and/or process being aware that the requested QP type has been changed.
The request to establish the network connection may be for an interchangeable QP type (IQT) that is dynamically selected. The dynamic QP type is not an actual QP type, but an indication for dynamic selection of the QP type, without specifying which specific QP type is requested for establishment of the network connection. The dynamic QP type indicates that the Application will handle any of QP types that are actually selected from the candidate types (that exclude the dynamic type). Highest and/or best resource utilization efficiency may result by processes and/or applications indicating selecting any of the candidate types for the QP using the dynamic QP type.
Each QP type may include a pair of queues, a send queue (SQ) and a receive queue (RQ). Message transfer requests may be posted to the SQ, for example, by the Application sending the data across the network. As each message is executed, SQ Logic transmits an outbound message transfer request to a remote QP’s RQ Logic, i.e., only untagged operations - SEND opcodes, since read and write messages do not pass through the RQ. Work Requests (WRs) may be posted to the RQ to handle certain types of inbound message transfer request transmitted to the RQ Logic by a remote QP’s SQ Logic. Each computing device, i.e., network node, may implement multiple QPs of one or more types, for example, as many as 1 million possible QPs, each of which may be capable of sending messages to and receiving messages from one or more QPs in remote network nodes. Selection of the QP types may reduce the total number of QPs and/or reduce the total number of network connections and/or improve utilization of processing resources, as descried herein.
At 308, the network connection is established with the selected type of the QP for data transfer across the network.
Before any messages may be transferred, a connection is established between the QPs of the two network nodes. The QP contexts of the two QPs may be each programmed with the identity of the remote QP as well as the address of the port behind which the remote QP lives. For the case of RoCE, the port is constant, and there are other fields to distinguish the application, for example:
1. RoCE RC - QP ID on the BTH header
2. RoCE RD - RDETH and DETH has EEC and source QP
The following is an exemplary process for establishing the network connection using the selected QP type, between a client which requests establishment of the network connection with a service in a remote network node, and a server which hosts the service provided to the client. The client sends to the server a REQ message with an indication (e.g., ServicelD) of the service to establish the network connection with. The server verifies that the service exists, and then creates a local QP and/or an EEC and a QP, and sends back info in a message about the created QP and/or EEC to the requesting client. At this point, the new local QP and/or EEC is in the ready to receive (RTR) state, i.e., ready to receive messages, but it cannot send messages until the client’s QP’s setup is complete. The client receives the message, and uses the info in the message to complete the setup of the local QP and/or EEC, and transition its local QP and/or EEC to the ready to send (RTS) state. The client sends a message to the server to transition is QP and/or EEC to the RTS state. Upon receipt of the message, the server transitions its local QP and/or EEC to RTS state. It should be noted that the server’s local QP and/or EEC will automatically transition from the RTR to the RTS state (even when the message from the client has not been received) when the first packet sent by the remote QP and/or EEC’s Send Logic is received. At 310, one or more features described with reference to 304-308 are iterated. The iterations may be performed for monitoring the resource-related parameters, to detect significant changes that trigger a re-analysis of the resource-related parameters, for example, increased noise in the network, new connections to new client terminals, reduction in available memory and/or reduction in processor capacity. The iterations may be performed without being necessarily triggered by the monitoring. The resource-related parameters may be re-analyzed, for example, at defined time intervals (e.g., every minute, every 5 minutes, every 30 minutes, every hour, and other time intervals), and/or while data is transferred over the network and/or when the network connection is active but no data is being transferred, and/or every time a new network connection is being brought up.
When the resource-related parameters change, the QP types of the existing connection may be dynamically changed during transfer of data over the network using the established network connection, to provide a different QP type that is expected to improve utilization of resources.
As in 304, the resource-related parameters may be re-analyzed while the established network connection is active, optionally during data transfer over the established network connection and/or when no data transfer is taking place. As in 306, another QP type is re-selected from the candidate types according to the re-analysis. As in 308, another new network connection may be established with the re-selected other type of the QP. The previously established network connection may be dynamically migrated to the new network connection, and/or the transfer of data over the previously established network connection may be reinstated to be transferred over the new network connection.
At 312, one or more features described with reference to 302-308 are iterated. The iterations may be performed for processing of new requests to establish new network connections, for example, by the same Application, and/or by a different Application.
Following the iterations of 302-308, in 314, data traffic on the previously established network connection may be migrated to the new network connection, and multiplexed with other data traffic designated for the new network connection. In such a case, the previously established network connection may be terminated once the data traffic has been migrated to the new network connection. Alternatively, following the iterations of 302-308, in 316, data designated for the new network connection is migrated to the previously established network connection, and multiplexed with other data traffic being transferred over the previously established network connection. In such a case, the new network connection (as in 308) is not necessarily established.
At 314, data traffic on the previously established network connection may be migrated to the new network connection, in response to an iteration of 302-308. Two network connections to transfer two sets of data over the network may be merged into a single network connection. The merged second network connection may use fewer resources than the two independent network connections.
As in 302, an additional request to establish another network connection for data transfer across the network is received, from the same application that issued the previous request, and/or from a different application. As in 304, an additional analysis of the resource-related parameters is performed. The additional analysis of the resource-related parameters may be performed while the previously established network connection is active, optionally transferring data. The additional analysis may represent the resource impact of the previously established network connection and/or for establishment of the additional network connection, optionally considering the resource impact of a combination of the previously established and requested network connections. As in 306, another QP type is selected according to the additional analysis. As in 308, the additional network connection with the newly selected type of QP is established. Now, in 314, the previously established network connection is migrated to the newly established (i.e., additional) network connection. The newly established (i.e., additional) network connection transfers data across the network for the previous network connection and for the newly established network connection, for example, by multiplexing the two data streams.
Two independent network connections to transfer two sets of data over the network requested by two different applications may be merged into a single network connection that is used by both applications. The merged second network connection may use fewer resources than the two independent network connections. The previously established network connection may have been established in response to a request from a first Application. The new request may be provided by a second Application, which is different from the first Application. The newly established network connection transfers data over the network for the first application and the second application. Optionally, the dynamic migration from the previously established network connection to the newly established network connection is performed by an exemplar processes that stalls the current network traffic over the previously established network connection, for performing the migration transparently, without significant interruption, and/or without the Application associated with the current network traffic being aware of the migration.
The dynamic migration may be performed using the following exemplary process: network traffic on the previously established network connection is stalled. Acknowledge messages are received. The acknowledge messages indicate that packets transmitted over the previously established network connection prior to the stalling have been received by a device at another end of the previously established network connection. The acknowledge messages indicate that the previously established network connection is “empty”, i.e., no packets are currently traversing the previously established network connection. Additional network traffic destined for the previously established network connection is redirected to the additional (i.e., newly established) network connection, that uses the additional (i.e., newly) selected QP type.
The previously established network connection may be terminated when the data traffic designed for the previously established network connection has been redirected to the additional (i.e., newly established) network connection. Optionally, the previously established network connection may be terminated in response to receiving an indication that at least one first packet of the re-routed network traffic has passed over the additional network connection, reaching the device at the other end of the additional network connection. Resources tied up by the previously established network connection are freed by the termination of the previously established network connection.
At 316, data traffic destined for a new network connection may be initialized on the previously established network connection, in response to an iteration of 302-308. When a network connection with the selected second QP type already exists, the network traffic destined for a newly established network connection may be added to the existing network connection. Resources may be saved by using the already existing network connection, rather than adding another network connection.
As in 302, an additional request to establish another network connection for data transfer across the network is received, from the same application that issued the previous request, and/or from a different application. As in 304, an additional analysis of the resource-related parameters is performed. The additional analysis of the resource-related parameters may be performed while the previously established network connection is active, optionally transferring data. The additional analysis may represent the resource impact of the previously established network connection and/or for establishment of the additional network connection, optionally considering the resource impact of a combination of the previously established and requested network connections. As in 306, another QP type is selected according to the additional analysis. It is noted that in 308, another new network connection with the newly selected QP type is not established. Rather, a previously established network connection that uses the QP type that is the same as the newly selected QP type is identified. The previously established network connection has been previously established in response to a previous request, before the new request to establish the additional network connection has been received. Now, in 316, the previously established network connection is used for transfer of data across the network for both network traffic originally destined for the previously established network connection, and for network traffic destined for the additional network connection (which is not independently established), for example, by multiplexing both data streams. The new data stream may be initialized on the previously established network connection.
An example based on the method described with reference 3 is now provided.
As in 302, during the connection establishment (e.g., RDMA CM) the devices (e.g., target and initiator, server and client) may handshake and/or notify whether they support the dynamic QP type selection capability.
As in 304, resource-related parameters are analyzed.
As in 306, a first RC QP is selected.
As in 308, the selected first RC QP and first network connection is created on both devices. The RC QP traffic starts.
As in 312, another iteration of 302-308 is performed.
As in 302, an additional request is received. The request may be to establish a network connection using RC QP, or no specific QP is specified (e.g., the dynamic QP type is requested). As in 304, an additional analysis of the current state of the resource-related parameters is performed.
As in 306, the RD QP type is selected, for example, based on the analysis that determines that the RD QP type saves resources over another QP type such as the RC QP type. The RD QP type may be selected even when the RC QP type is requested.
As in 308, an RD QP and an EEC (End to End Context) are created, which are used by the second network connection for transfer of traffic across the network.
It is noted that now there exist two QPs in total on each host: one is RC and another is RD.
The traffic of the RD QP starts for the first EE (End to End).
As in 314, data traffic over the first established network connection that uses the RC QP type is migrated to the additional network connection established with the RD QP type. Additional data starts transmitting over the new selected RD QP type after creating the EEC on the remote device.
The network interface device and/or hypervisor stalls the traffic over the previously established network connection using the RC QP for a short period of time to “drain the pipe”, i.e., make sure that no packets are being transmitted over the previously established network connection. In response to receiving acknowledgement messages (ACKs) for all packets that were sent over the previously established network connection, the WQEs are posted on the RD QP of the additional (i.e., newly established) network connection rather than on the RC QP of the previously established network connection.
Now there exist four QPs in total: one RC QP and one RD type on each side. Once the first packet has passed over the newly established network connection that uses RD QP, the previously established network connection that uses RC QP is destroyed. Now there exist two QPs in total: one RD QP on the server and one RD QP on the client. Selecting the RD QP type improves resources over the case of using only RC QPs, where would have been four QPs in total, and a higher memory footprint in comparison to the selection of the RD QP and migration from the previous RC QP to the newly established RD QP. Reference is now made to FIGs. 4A-4F, which are schematics depicting exemplary QP types for selection, in accordance with some embodiments.
FIG. 4A depicts a reliable connection (RC) QP type. RC is usually selected for applications that require the highest quality of service for data transmission over a network, for example, mission critical applications. RC provides the highest level of reliability and predictability available. However, due to the generation of ACKs (A positive acknowledge packet that is returned to signal the successful receipt and processing of a send or RDMA write request packet) and NAKs (A negative acknowledgement packet that is returned to signal one of the following: a temporary receiver not read condition, a PSN sequence error NAK, and a fatal NAK error), the RC protocol consumes a substantial amount of bandwidth.
In an RC implementation a local QP 402 on a local host 404 is associated with exactly one remote QP 406 on a remote host 408, forming a dedicated channel between the QPs for transmission of data 410. The hardware protocol between QPs 402 and 406 provides reliable transport. Reliable transport is provided by detecting missing, corrupted or invalid packets and automatically suspending further activity between the two QPs, resends the lost/faulty packets, and then resumes operation. Every queue operation is acknowledged 412 and every operation completes exactly once, and in the same order it was issued. The maximum message size is not limited by packet size or by the maximum transfer unit (MTU) size defined for the channel. Segmentation and re-assembly of long messages happens in hardware and is transparent to the application. RC supports all InfiniBand™ services: SEND/RECEIVE, RDMA-Read, RDMA- Write and Atomic.
FIG. 4B depicts an unreliable connection (UC) QP type. Since the target QP’s RQ Logic does not generate ACKs and NAKs, the UC protocol consumes significantly less bandwidth than the RC protocol.
The setup of the Unreliable Connection is similar to the Reliable Connection described with reference to FIG. 4A, including a dedicated channel between local and remote QPs 402 and 406 of local and remote hosts 404 408 for transmission of data 410. However, there is no acknowledgement provided. Send queue operations are marked as complete as soon as the QP transmits them. A missing or erroneous message, is not automatically retried. It is dropped and the QP does not provide the sender with any indication as to whether the message was successfully delivered or not. Like RC service, operations complete in order, and the maximum size of a message is not limited by the packet size or path MTU. SEND/RECEIVE and RDMA- Write are supported, but RDMA-Read and Atomic Operations are not. UC provides an efficient communication for some applications, like streaming, for which missing data is not critical.
FIG. 4C depicts an Unreliable Datagram (UD) QP type. An UD QP can send messages to and receive messages from any number of UD QPs located in one or more other network nodes. No ACK or NAK is returned for each request packet received. Since the target QP’s Logic does not generate ACKs and NAKs, the UD protocol consumes significantly less bandwidth than the RC and RD protocols.
Unreliable Datagram is a connectionless service. An UD QP can potentially send data to any other UD QP in the system. A QP 402 on an initiating host 404 sends Data 410A-C to any of QPs 406A-C on target hosts 408A-C. The sending QP must send a key, called a Q Key, which must match the receiver’s Q Key, or the message is dropped. This prevents writing to unintended locations. The only valid operation is SEND/RECEIVE. A QP configured for Unreliable Datagram service cannot detect lost data, data received out of order or data received multiple times. In all these cases, the message to which this data belongs is said to be received in error, and is silently dropped. The maximum message size is limited by the largest packet size (MTU) supported by the path (256 bytes to 4K bytes). If reliability is needed, it is provided by the upper layer software protocols.
FIG. 4D depicts a Reliable Datagram (RD) QP type. A RD QP can send messages to and receive messages from any number of RD QPs located in one or more other network nodes. It does so through one or more “pipelines” that are established between the local network node and one or more remote network nodes. Each “pipeline” is referred to as a Reliable Datagram Channel (RDC) and acts as the conduit through which multiple local client RD QPs send messages to and receive messages from RD QPs residing in the remote network node. Due to the generation of ACKs and NAKs, the RD protocol consumes a substantial amount of bandwidth.
The Reliable Datagram service combines the features of RC and UD service. In particular, it allows the same QP to interact with multiple remote QPs simultaneously, while providing reliability. A QP 402 on an initiating host 404 sends Data 410A-C to any of QPs 406A-C on target hosts 408A-C, and receives acknowledgements (ACKs) 412A-C. Essentially RD provides a multiplexed reliable connection channel. The QP 402 is logically associated with a set of remote RD QPs 406A-C. This service is most useful to an application having a number of different processes running on each node that need to communicate with each other in a reliable manner. Operations are acknowledged, complete exactly once, complete in order, and are automatically retried on error. All InfiniBand™ services are supported, i.e. SEND/RECEIVE, RDMA-Write, RDMA-Read, and Atomic Operations. The endpoint of a Reliable Datagram channel is called EEC (End to End Context), EEC 1 to EEC 6 as shown. A reliable datagram domain (RDD) determines which sets of RD QPs can access which sets of EECs. Each EEC is shared by all Reliable Datagram QPs for that RDD.
FIG. 4E depicts an Extended Reliable Connection (XRC) QP type. XRC will be described in the environment of data 410 and acknowledgements (ACK) 412 transmitted between a local QP 402 and a remote QP and 406 of respective local host 404 and remote host 404 408. XRC allows significant savings in the number of QPs required to establish all-to-all process connectivity in large clusters. XRC is different from RD in several ways, but first and foremost it eliminates the most significant limitation of the RD Transport Service: having a single outstanding message per EE context. The savings in the overall number of required QPs occurs because of the way XRC operates on the responder side. The responder connection context (denoted as XRC TGT QP) allows the requester process to send messages targeting multiple destination XRC SRQs 414, which belong to multiple processes on the responder's node. Thus, with a single (XRC INI) QP, a process in one node can communicate with all processes on a remote node, thus reducing by a factor of p (number of processes per node) the total number of QPs required for full connectivity, compared to when RC QPs are used. XRC SRQs 414 are the responder node per-process receive queues that can be targeted from multiple remote end nodes through the XRC TGT QPs. They are in a way equivalent to the receive queues in RD QPs and as such there is only one required per process that allows it to receive messages from any process on any node in the cluster. In a similar way as RD QPs are limited to be used with RD EE contexts in their same Reliable Datagram Domain (RDD), the XRC Transport Service implements an equivalent XRC Domain mechanism that serves the same purpose. XRC TGT QPs can only be used as a conduit to access XRC SRQs that were setup on their same XRC Domain.
FIG. 4F depicts a Dynamically Connected (DC) QP type. A QP 402 on an initiating host
404 sends Data 410A-C to any of QPs 406A-C on target hosts 408 A-C, and receives acknowledgements (ACKs) 412A-C. DC is a scalable Transport Service, which reduces the number of QPs per node, compared to RC. DC has RC-like reliability semantics. DC has a symmetric API. On the responder's side there is DC Target, or DCT 416, where one is enough. On the requester's side there is a DC Initiator, or DCI 418, where one is enough. DC forms “temporary connections”. First send-WR on a DCI, connects this DCI to a remote DCT. Second send-WR uses this open connection. DCI disconnects after some idle period without sends. A DCI can “switch destinations”, if the next send-WR has a different destination specified. A DCT has a pool of “responders” (DCRs). Each incoming DC connection is allocated a DCR. DCI Recycling Tradeoffs: Too few DCIs - Same DCI switches back-and-forth between destinations. Redundant connect/disconnect flows (worst case: per-send). Hurts latency. Too many DCIs - Still not as bad as NA2 RC QPs. Consumes resources and is bad for caching. Best practice is to maintain a <DCI dest> hash table, reducing connection re-establishment. LRU recycling policy, to increase the odds of picking a disconnected DCI to send on.
Reference is now made to FIGs. 5-7 which are schematics depicting a comparison of dynamically selecting the QP type with standard approaches, in accordance with some embodiments.
FIG. 5 depicts a process of establishing RC connections according to standard approaches. Schematic 502A depicts establishment of a first RC network connection 504A (for reliable transfer of Data by providing ACK) between a server host 506 and a client host 508 using RC QP type 510A.
Schematic 502B depicts establishment of a second RC network connection 504B between server 506 and client 508 using RC QP type 510B. First RC connection 504A connects between Process A 512 running on server 506 and Process C 514 running on client 508. Second RC connection 504B connects between Process A 512 running on server 506 and Process D 516 running on client 508.
FIG. 6 depicts an example of a process of establishing connections using embodiments described herein that dynamically select the QP type. Schematic 602A depicts establishment of a first RC network connection 604A (for reliable transfer of Data by providing ACK) between a server host 606 and a client host 608 using RC QP type 610A. It is noted that the process depicted in schematic 602A may be the same as the standard approach depicted in schematic 502A of FIG. 5.
Schematic 602B depicts the response to a request from client 608 to add a second RC connection to server 606. First RC connection 604A is between Process A 612 running on server 606 and Process C 614 running on client 608. The request is for a second RC connection is between Process A 612 (i.e., the same as for the first RC connection) running on server 606 and Process D 616 running on client 608 (i.e., different than for the first RC connection). In response to receiving the request to establish the second RC connection, the QP type is changed from RC type 610A to RD type 610B, both on the server 606 and on the client 608. The RD QP type may be selected, for example, to save hardware resources and/or when no more RC QPs are available.
FIG. 7 depicts another example of a process of establishing connections using embodiments described herein that dynamically select the QP type. Schematic 702A depicts establishment of a first RC network connection 704A (for reliable transfer of data by providing ACK) between a server host 706 and a client host 708 using RC QP type 710A. It is noted that the process depicted in schematic 702A may be the same as the standard approach depicted in schematic 502A of FIG. 5 and/or schematic 602A of FIG. 6.
Schematic 702B depicts the response to a request from client 708 to add a second RC connection to server 706. Using standard approaches, a second RC network connection would be established between server 706 and client 708. However, according to at least one embodiment described herein, a DC QP type 710B is dynamically selected and used to establish a network connection 704B between server 706 and client 708. Previous RC QP type 710A used to establish network connection 704A is migrated to network connection 704B established using DC QP 710B. The DC QP type may be selected, for example, to save hardware resources and/or when no more RC QPs are available.
Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims. The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
It is expected that during the life of a patent maturing from this application many relevant QP types will be developed and the scope of the term QP type is intended to include all such new technologies a priori.
As used herein the term “about” refers to ± 10 %.
The terms "comprises", "comprising", "includes", "including", “having” and their conjugates mean "including but not limited to". This term encompasses the terms "consisting of' and "consisting essentially of.
The phrase "consisting essentially of' means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof.
The word “exemplary” is used herein to mean “serving as an example, instance or illustration”. Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.
The word “optionally” is used herein to mean “is provided in some embodiments and not provided in other embodiments”. Any particular embodiment of the disclosure may include a plurality of “optional” features unless such features conflict. Throughout this application, various embodiments of this disclosure may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the disclosure. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases “ranging/ranges between” a first indicate number and a second indicate number and “ranging/ranges from” a first indicate number “to” a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.
It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the disclosure. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present disclosure. To the extent that section headings are used, they should not be construed as necessarily limiting.

Claims

WHAT IS CLAIMED IS:
1. Processing circuitry (102A, 102B) for selecting a type of a queue pair, QP, for data transfer across a network (120), the processing circuitry configured for: receiving a request to establish a network connection for data transfer across the network; analyzing a plurality of resource-related parameters each indicative of a respective resource related state of the network connection for establishment; selecting the type for the QP from a plurality of candidate types according to the analysis; and establishing the network connection with the selected type of the QP for data transfer across the network.
2. The processing circuitry of claim 1, further configured for: re-analyzing the plurality of resource-related parameters during data transfer over the established network connection; re-selecting another type of the QP from the plurality of candidate types according to the re-analysis; establishing a second network connection with the re-selected other type of the QP; and at least one of: dynamically transferring the network connection to the second network connection, and reinitiating the transfer of data of the network connection to be transferred over the second network connection.
3. The processing circuitry of any of the previous claims, further configured for: wherein the network connection comprises a first network connection, receiving a second request to establish a second network connection for data transfer across the network; conducing a second analysis of the plurality of resource-related parameters each indicative of a respective resource related state of the established first network connection and for establishment of the second network connection; selecting a second type of the QP from the plurality of candidate types according to the second analysis;
32 establishing the second network connection with the second type of QP; dynamically migrating the first network connection to the second network connection, wherein the second network connection transfers data across the network for the first network connection and for the second network connection.
4. The processing circuitry of claim 3, wherein the request is provided by a first application, the second request is provided by a second application, and the second network connection transfers data over the network for the first application and the second application.
5. The processing circuitry of claim 3 or claim 4, wherein dynamically migrating comprises: stalling network traffic on the first network connection; receiving acknowledge messages that packets transmitted over the first network connection prior to the stalling have been received by a device at another end of the first network connection; and using the second type of QP of the second network connection for additional network traffic destined for the first network connection.
6. The processing circuitry of claim 5, further configured for: in response to receiving an indication that at least one first packet of the additional network traffic has passed over the second network connection, terminating the first network connection.
7. The processing circuitry of any of the previous claims, further configured for: receiving a third request to establish a third network connection for data transfer across the network; conducing a third analysis of the plurality of resource-related parameters each indicative of a respective resource related state of the established network connection and for establishment of the third network connection;
33 selecting a third type of the QP from the plurality of candidate types according to the third analysis; wherein a fourth network connection with the third type of QP has been previously established by the processing circuitry for a fourth request prior to receiving the third request; and using the fourth network connection for transfer of data across the network associated with the third request and with the fourth request, wherein the third network connection is not independently established.
8. The processing circuitry of any of the previous claims, wherein the request to establish the network connection is for a first type of QP, and wherein a second type of QP different than the first type is selected.
9. The processing circuitry of any of the previous claims, wherein the request to establish the network connection is for a reliable type of QP or for an unreliable type of QP of a first subtype of QP, and wherein the selected type of QP is the reliable type of QP or the unreliable type of QP as defined by the request of a second sub-type of QP different than the first sub-type of QP defined by the request.
10. The processing circuitry of claim 9, wherein the network connection is for the reliable type of QP, and the selected type of QP is of the unreliable type of QP with reliability provided by a reliability layer implemented by at least one of or combination of: software, firmware, and hardware.
11. The processing circuitry of any of the previous claims, wherein the request to establish the network connection is for an interchangeable QP type, IQT, and wherein the type of the QP is selected from the plurality of candidate types that exclude the dynamic type.
12. The processing circuitry of any of the previous claims, wherein the plurality of resource- related parameters are selected from a group consisting of: number of network nodes an application that provided the request is communicating with, number of existing network connections of the application that provided the request, topology of existing network connection of the application, total number of active QPs of network connections that were established by the processing circuitry, whether QPs of the type of the request are available, transport reliability of the network, current memory utilization, and current utilization of the processing circuitry.
13. The processing circuitry of any of the previous claims, wherein the plurality of candidate types for the QP pair are selected from a group consisting of: Reliable Connection, RC, Reliable Datagram, RD, Extended Reliable Connection, XRC, Unreliable Datagram, UD, Unreliable Connection, UC, Scalable Reliable Datagram, SRD, and Dynamic Connection, DC.
14. The processing circuitry of any of the previous claims, wherein the analyzing is done using at least one of: a set of rules for the plurality of resource-related parameters that result in the selected type for the QP, a classifier that receives the plurality of resource-related parameters as input and generates an outcome of the selected type for the QP where the classifier is trained on a training dataset of plurality of sample resource-related parameters and a label of type of QP, and optimization code that uses a mathematical model and/or set of equations to compute the type for the QP that optimizes the plurality of resource-related parameters.
15. The processing circuitry of any of the previous claims, wherein the data transfer across the network is according to a Remote Direct Memory Access, RDMA, protocol, and the plurality of candidate types of QP are defined by a network transport protocol for RDMA.
16. The processing circuitry of claim 15, wherein the network transport protocol for RDMA defining the plurality of candidate types of QP is selected from a group consisting of: InfiniBand, IB, RoCE, Remote Direct Memory Access, RDMA, over Converged Ethernet, RoCEv2, iWARP, and derivatives of the aforementioned.
17. A method for selecting a type of a queue pair, QP, for data transfer across a network, comprising: receiving a request to establish a network connection for data transfer across the network (302); analyzing a plurality of resource-related parameters each indicative of a respective resource related state of the network connection for establishment (304); selecting the type for the QP from a plurality of candidate types according to the analysis (306); and establishing the network connection with the selected type of the QP for data transfer across the network (308).
18. A computer program comprising program instructions (150) which, when executed by a processor (102A, 102B), cause the processor to: receive a request to establish a network connection for data transfer across the network; analyze a plurality of resource-related parameters each indicative of a respective resource related state of the network connection for establishment; select the type for the QP from a plurality of candidate types according to the analysis; and establish the network connection with the selected type of the QP for data transfer across the network.
36
PCT/EP2020/073259 2020-08-19 2020-08-19 Interchangeable queue type for network connections WO2022037777A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/EP2020/073259 WO2022037777A1 (en) 2020-08-19 2020-08-19 Interchangeable queue type for network connections
CN202080103258.8A CN115885270A (en) 2020-08-19 2020-08-19 Exchangeable queue types for network connections

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/073259 WO2022037777A1 (en) 2020-08-19 2020-08-19 Interchangeable queue type for network connections

Publications (1)

Publication Number Publication Date
WO2022037777A1 true WO2022037777A1 (en) 2022-02-24

Family

ID=72474277

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/073259 WO2022037777A1 (en) 2020-08-19 2020-08-19 Interchangeable queue type for network connections

Country Status (2)

Country Link
CN (1) CN115885270A (en)
WO (1) WO2022037777A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090201926A1 (en) * 2006-08-30 2009-08-13 Mellanox Technologies Ltd Fibre channel processing by a host channel adapter

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090201926A1 (en) * 2006-08-30 2009-08-13 Mellanox Technologies Ltd Fibre channel processing by a host channel adapter

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PARK JIWOONG ET AL: "SoftDC: software-based dynamically connected transport", CLUSTER COMPUTING, BALTZER SCIENCE PUBLISHERS, BUSSUM, NL, vol. 23, no. 1, 19 March 2019 (2019-03-19), pages 347 - 357, XP037003640, ISSN: 1386-7857, [retrieved on 20190319], DOI: 10.1007/S10586-019-02926-0 *

Also Published As

Publication number Publication date
CN115885270A (en) 2023-03-31

Similar Documents

Publication Publication Date Title
US9313139B2 (en) Physical port sharing in a link aggregation group
US10673772B2 (en) Connectionless transport service
US10645019B2 (en) Relaxed reliable datagram
US10728179B2 (en) Distributed virtual switch configuration and state management
US9565095B2 (en) Take-over of network frame handling in a computing environment
US9442812B2 (en) Priming failover of stateful offload adapters
US10880204B1 (en) Low latency access for storage using multiple paths
US8874638B2 (en) Interactive analytics processing
US10693801B2 (en) Packet drop reduction in virtual machine migration
US20140052808A1 (en) Speculation based approach for reliable message communications
CN102576309B (en) Communication between partitions in a logically partitioned system by bypassing the network stack when communicating between applications executed on the same data processing system
US10606780B2 (en) Programming interface operations in a driver in communication with a port for reinitialization of storage controller elements
US8819242B2 (en) Method and system to transfer data utilizing cut-through sockets
US20180357196A1 (en) Programming interface operations in a port in communication with a driver for reinitialization of storage controller elements
US9787590B2 (en) Transport-level bonding
US11474880B2 (en) Network state synchronization for workload migrations in edge devices
WO2018054271A1 (en) Method and device for data transmission
US11409569B2 (en) Data processing system
US10692168B1 (en) Availability modes for virtualized graphics processing
US11347594B2 (en) Inter-processor communications fault handling in high performance computing networks
US8527664B2 (en) Direct memory access with minimal host interruption
WO2022037777A1 (en) Interchangeable queue type for network connections
US11985065B2 (en) Enabling isolated virtual network configuration options for network function accelerators
US20120191772A1 (en) Processing a unit of work
US10901820B1 (en) Error state message management

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20771767

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20771767

Country of ref document: EP

Kind code of ref document: A1