CN115885270A - Exchangeable queue types for network connections - Google Patents

Exchangeable queue types for network connections Download PDF

Info

Publication number
CN115885270A
CN115885270A CN202080103258.8A CN202080103258A CN115885270A CN 115885270 A CN115885270 A CN 115885270A CN 202080103258 A CN202080103258 A CN 202080103258A CN 115885270 A CN115885270 A CN 115885270A
Authority
CN
China
Prior art keywords
network
type
network connection
resource
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080103258.8A
Other languages
Chinese (zh)
Inventor
本-沙哈尔·贝尔彻
利奥·赫尔莫什
鲁文·科恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN115885270A publication Critical patent/CN115885270A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)

Abstract

Processing circuitry, methods, and code instructions are provided for selecting a type of Queue Pair (QP) for data transmission across a network, e.g., based on a Remote Direct Memory Access (RDMA) protocol. A request to establish a connection for data transfer across the network is received. Analyzing the resource-related parameters. Each resource-related parameter indicates a respective resource-related status for the established connection. Selecting the type of the QP from a plurality of candidate types according to the analysis. The candidate QP type may be defined by a transport protocol. Establishing the connection using the selected type of the QP for data transfer across the network.

Description

Exchangeable queue types for network connections
Background
In some embodiments of the invention, the invention relates to network connections, and more particularly, but not exclusively, to resource management systems and methods for establishing network connections.
Network nodes, such as servers, may establish and simultaneously support thousands of network connections with other network nodes, such as storage servers, endpoint devices, and other servers, to provide data exchange across a network. The large number of simultaneous network connections consumes a large amount of resources.
Disclosure of Invention
It is an object of the present invention to provide processing circuitry, computing devices, methods and computer-readable storage media for data transmission across a network.
The above and other objects are achieved by the features of the independent claims. Other implementations are apparent from the dependent claims, the description and the drawings.
According to a first aspect, a processing circuit for selecting a type of Queue Pair (QP) for data transmission across a network, the processing circuit for: receiving a request to establish a network connection for data transmission across the network; analyzing a plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status of the network connection for establishment; selecting the type of the QP from a plurality of candidate types according to the analysis; establishing the network connection using the selected type of the QP for data transmission across the network.
According to a second aspect, a method for selecting a type of Queue Pair (QP) for data transmission across a network, comprising: receiving a request to establish a network connection for data transmission across the network; analyzing a plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status of the network connection for establishment; selecting the type of the QP from a plurality of candidate types according to the analysis; establishing the network connection using the selected type of the QP for data transmission across the network.
According to a third aspect, a computer program comprising program instructions which, when executed by a processor, cause the processor to: receiving a request to establish a network connection for data transmission across the network; analyzing a plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status of the network connection for establishment; selecting the type of the QP from a plurality of candidate types according to the analysis; establishing the network connection using the selected type of the QP for data transfer across the network.
The requested QP type for the new connection does not necessarily represent the best use of existing resources of the host and/or device and/or network, such as memory, hardware resources, processing resources, cache, and network resources. For example, RC QP may be the best solution in terms of transmission, but RC QP is very expensive in terms of available resources. The QP type that provides the most comprehensive (e.g., global to the device, network, and/or host) QP with the best utilization of the available resources may be selected instead of the QP that is best for the requesting application.
Optimization of the overall resources is particularly important for high-end servers that establish thousands of QPs for thousands of network connections, e.g., 10,000, 100,000, or 1,000,000.
Dynamically selecting the QP type used to establish the connection provides better resource utilization than existing methods, where each application is granted the QP type it requests. For example, for N network nodes, each network node runs M processes to establish a network connection. For the case where all M processes wish to communicate with all processes on all nodes, the number of RC QPs required to participate in this "many-to-many" communication is each node (M ^ 2) × (N-1). The occupancy rate of RD is low: each node requires M QP + N "end-to-end" (EE) connections to achieve the same many-to-many communication mode. But RD is limited in transmission. Its most important limitation is the single outstanding message supported per EE context. As a proprietary scheme, DC is lower occupancy than RC, but if it uses many DCI/DCTs, it still consumes a lot of resources and buffering efficiency is not high. Another disadvantage of DC is that it has to use frequent connect-disconnect operations.
In another implementation manner of the first aspect, the second aspect, and the third aspect, the method is further configured and/or further includes: re-analyzing the plurality of resource-related parameters during data transmission over the established network connection; reselecting another type of the QP from the plurality of candidate types according to the re-analysis; establishing a second network connection using the other type of the reselection of the QP; performing at least one of the following operations: dynamically transferring the network connection to the second network connection, re-initiating transfer of data of the network connection to be transferred over the second network connection.
As the resource-related parameters change, the QP type of the existing connection may be dynamically changed during the use of the established network connection to transmit data over the network to provide a different QP type that is expected to improve resource utilization.
In another implementation manner of the first aspect, the second aspect, and the third aspect, the method is further configured and/or further includes: the network connection comprises a first network connection, receiving a second request to establish a second network connection for data transmission across the network; performing a second analysis on the plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status of the established first network connection and being used for the establishment of the second network connection; selecting a second type of the QP from the plurality of candidate types according to the second analysis; establishing the second network connection using a second type of the QP; dynamically migrating the first network connection to the second network connection, wherein the second network connection transports data across the network for the first network connection and the second network connection.
Two network connections transmitting two data sets over the network may be merged into a single network connection. The merged second network connection may use fewer resources than two separate network connections.
In another implementation form of the first, second and third aspects, the request is provided by a first application, the second request is provided by a second application, and the second network connection transfers data for the first and second applications over the network.
Two separate network connections requested by two different applications for transmitting two data sets over the network may be combined into a single network connection used by the two applications. The merged second network connection may use fewer resources than two separate network connections.
In another implementation of the first, second, and third aspects, dynamically migrating includes: suspending network traffic on the first network connection; receiving an acknowledgement message that a packet sent over the first network connection before the suspension has been received by a device at the other end of the first network connection; using a second type of the QP for the second network connection for additional network traffic destined for the first network connection.
By suspending current network traffic on a previously established network connection, migration can be performed transparently without significant interruption.
In another implementation manner of the first aspect, the second aspect, and the third aspect, the method is further used for and/or further includes: terminating the first network connection in response to receiving an indication that at least one first packet of the additional network traffic has passed through the second network connection.
Resources bound by the previously established network connection will be released by terminating the previously established network connection.
In another implementation manner of the first aspect, the second aspect, and the third aspect, the method is further configured and/or further includes: receiving a third request to establish a third network connection for data transmission across the network; performing a third analysis on the plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status of the established network connection and being used for establishment of the third network connection; selecting a third type of the QP from the plurality of candidate types according to the third analysis; prior to receiving the third request, a fourth network connection of a third type using the QP has been previously established by the processing circuit for a fourth request; using the fourth network connection to transfer data across the network associated with the third request and the fourth request, wherein the third network connection is not established independently.
When a network connection with the selected second QP type already exists, network traffic destined for the newly established network connection may be added to the existing network connection. Resources may be saved by using an existing network connection rather than adding another network connection.
In another implementation of the first, second, and third aspects, the request to establish the network connection is of a first type for QPs, and wherein a second type of QPs different from the first type is selected.
The selection process may be performed in an implicit mode, where the request is ignored and the QP type that provides the best optimization of the available resources is selected, without the requesting application and/or process knowing that the requested QP type has changed.
In another implementation of the first, second, and third aspects, the request to establish the network connection is a QP reliable type or a QP unreliable type for a first subtype of QP, and wherein the selected type of QP is a QP reliable type or a QP unreliable type for a second subtype of QP defined by the request that is different from the first subtype of QP defined by the request.
Maintaining reliability or unreliability of QP types according to the QP type of the network connection request maintains compatibility with applications and/or processes that use the data transmitted over the network. For example, applications desiring reliable data transfer receive data transferred in a reliable manner and do not need to process unexpected data transferred in an unreliable manner.
In another implementation of the first, second and third aspects, the network connection is for the QP reliable type and the selected type of QP is the QP unreliable type, wherein reliability is provided by a reliability layer implemented by at least one or a combination of software, firmware and hardware.
The requested reliability may be provided using an unreliable QP type with the reliability layer instead of using a resource intensive reliable QP type, which may improve resource utilization by using an unreliable QP type that uses fewer resources than a reliable QP type.
In another implementation of the first, second, and third aspects, the request to establish the network connection is for an Interchangeable QP Type (IQT), and wherein the type of the QP is selected from the plurality of candidate types that do not include a dynamic type.
The highest and/or best resource utilization efficiency may be due to the process and/or application indicating any candidate type for selecting a QP using a dynamic QP type.
In another implementation form of the first, second and third aspects, the plurality of resource-related parameters are selected from the group consisting of: a number of network nodes in communication with an application providing the request, a number of existing network connections of the application providing the request, a topology of existing network connections of the application, a total number of active QPs of network connections established by the processing circuitry, whether a QP of the type of the request is available, a transmission reliability of the network, a current memory utilization, and a current utilization of the processing circuitry.
These parameters provide a description of the existing resource usage, which allows for the selection of the QP type for the network connection that will provide the best utilization of the existing resources.
In another implementation of the first, second, and third aspects, the plurality of candidate types of the QP pair is selected from the group consisting of: a Reliable Connection (RC), a Reliable Datagram (RD), an extended reliable connection (XRC), an Unreliable Datagram (UD), an Unreliable Connection (UC), an extensible reliable datagram (SRD), and a Dynamic Connection (DC).
In another implementation form of the first, second and third aspects, the analyzing is done using at least one of: a set of rules for generating the plurality of resource-related parameters of the selected type of the QP; a classifier that receives the plurality of resource-related parameters as input and generates a result of the selected type of the QP, wherein the classifier is trained on training data sets of a plurality of sample resource-related parameters and labels of a QP type; optimization code to calculate the type of the QP that optimizes the plurality of resource-related parameters using a mathematical model and/or a system of equations.
In another implementation of the first, second, and third aspects, the data transmission across the network is according to a Remote Direct Memory Access (RDMA) protocol, and the plurality of candidate types of QPs are defined by a network transport protocol for RDMA.
The feasibility of RDMA depends to a large extent on its reliability, high bandwidth and low latency characteristics. Therefore, the choice of QP type plays an important role in deciding the balance between RDMA reliability, high bandwidth and low latency, especially for large numbers of network connections. At least some embodiments described herein select an optimal QP type for RDMA to maximize low latency while meeting required reliability and/or lower memory occupancy, and/or to utilize device caching in an optimal manner (e.g., to prevent large numbers of cache misses), particularly when large numbers of connections are established. In another implementation of the first, second and third aspects, the network transport protocol for RDMA defining the plurality of candidate types of QP is selected from the group consisting of: infiniBand, IB, remote direct memory access over converted ethernet over ethernet (RoCE), roCEv2, iWARP and derivatives of the above.
Unless defined otherwise, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the present invention, exemplary methods and/or materials are described below. In case of conflict, the present patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not necessarily limiting.
Drawings
Some embodiments of the invention are described herein, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the embodiments of the present invention only. In this regard, it will be apparent to those skilled in the art from the description of the figures how embodiments of the invention may be practiced.
In the drawings:
fig. 1 is a block diagram of a computing device for selecting a QP type for data transmission across a network, in accordance with some embodiments;
fig. 2 is a block diagram of multiple computing devices transmitting data across a network using a selected QP type, according to some embodiments;
fig. 3 is a flow diagram of a method for selecting a QP type for data transmission across a network, in accordance with some embodiments;
fig. 4A-4F are schematic diagrams of example QP types for selection, in accordance with some embodiments;
FIG. 5 is a schematic diagram of a comparison of dynamically selected QP types to a standard method, according to some embodiments;
FIG. 6 is another schematic diagram of a comparison of dynamically selecting QP types to a standard method, according to some embodiments;
fig. 7 is yet another schematic diagram of a comparison of dynamically selecting QP types to a standard method, according to some embodiments.
Detailed Description
In some embodiments of the invention, the invention relates to network connections, and more particularly, but not exclusively, to resource management systems and methods for establishing network connections.
An aspect of some embodiments relates to processing circuits, systems, methods, apparatuses, and/orCode instructions (i.e., stored on a computer-readable medium for execution by one or more hardware processors) for automatically selecting a type of Queue Pair (QP), also referred to as a QP type, for transferring data across a network between two devices, e.g., based on Remote Direct Memory Access (RDMA) protocol, wherein the QP type is defined by a transfer protocol, e.g., infiniband TM Remote direct memory access over converted ethernet (RoCE), roCEv2, iWARP, proprietary QP types (e.g., from different vendors), and derivatives thereof. The selection of the QP type establishes a connection to transfer data across a network in response to a request, e.g., by an application uploading and/or downloading the data to a remote data storage device. One or more resource-related parameters are analyzed. Each resource parameter indicates a respective resource-related status for establishing the connection, e.g., based on available resources for establishing the network connection, e.g., available memory, available QPs of the requested type, utilization status of one or more processors, network status (e.g., whether there is noise). A QP type is selected from a plurality of candidate QP types based on the analysis. The QP type is selected for optimizing the available resources for establishing the connection, e.g., as compared to the requested QP type and/or as compared to candidate QP types. The QP type originally requested may be changed to the selected QP type. A connection is established using the selected QP type for data transfer across the network.
Alternatively, the QP type for the connection may be dynamically reselected while the connection is active. From the re-analysis of the current state of the resource-related parameter, another QP type may be selected from the candidate QP types. Another connection may be newly established using the dynamically reselected QP type. The existing connection may be migrated to the newly established connection using the reselected QP type.
Optionally, the QP type is used to establish another connection, where one or more other connections have been previously established. Based on the analysis of the resource-related parameters, the requested connection may join the existing connection using the selected QP type, e.g., by multiplexing two (or more) data streams for transmission across the existing connection. In another implementation, the new connection is established using a QP type selected based on an analysis of the resource-related parameters. One or more previously established connections will migrate to the new connection.
The requested QP type for the new connection does not necessarily represent the best use of existing resources of the host and/or device and/or network, such as memory, hardware resources, processing resources, cache, and network resources. For example, RC QP may be the best solution in terms of transmission, but RC QP is very expensive in terms of available resources. The QP type that provides the most comprehensive (e.g., global to the device, network, and/or host) QP with the best utilization of the available resources may be selected instead of the QP that is best for the requesting application.
Optimization of the overall resources is particularly important for high-end servers that establish thousands of QPs for thousands of network connections, e.g., 10,000, 100,000, or 1,000,000.
Dynamically selecting the QP type used to establish the connection provides better resource utilization than existing methods, where each application is granted the QP type it requests. For example, for N network nodes, each network node runs M processes to establish a network connection. For the case where all M processes wish to communicate with all processes on all nodes, the number of RC QPs required to participate in this "many-to-many" communication is each node (M ^ 2) × (N-1). The occupancy rate of RD is low: each node requires M QP + N "end-to-end" (EE) connections to achieve the same many-to-many communication mode. But RD is limited in transmission. Its most important limitation is the single outstanding message supported per EE context. As a proprietary scheme, DC is lower occupancy than RC, but if it uses many DCI/DCTs, it still consumes a lot of resources and buffering efficiency is not high. Another disadvantage of DC is that it has to use frequent connect-disconnect operations.
Optionally, the data transfer between different computing devices described herein is based on the Remote Direct Memory Access (RDMA) protocol, which is the primary transfer method for remote memory operations. The feasibility of RDMA depends to a large extent on its reliability, high bandwidth and low latency characteristics. Therefore, the choice of QP type plays an important role in deciding the balance between RDMA reliability, high bandwidth and low latency, especially for large numbers of network connections. At least some embodiments described herein select an optimal QP type for RDMA to maximize low latency while meeting required reliability and/or lower memory occupancy, and/or to utilize device caching in an optimal manner (e.g., to prevent large numbers of cache misses), particularly when large numbers of connections are established.
For RDMA, different QP types are available. The different QP types may be represented as a combination of a first subtype and a second subtype. The first subtype is reliable or unreliable. The first subtype, which is reliable, guarantees that messages are delivered at most once, mainly in sequence, without being corrupted. The unreliable first subtype does not provide any guarantees that messages will be delivered or regarding packet order. In RDMA, there is a Cyclic Redundancy Check (CRC) for each packet, and the corrupted packet will be dropped (for any transport type). The reliability of the QP transmission type refers to the reliability of the entire message. The second subtype is connected or unconnected. The first subtype connected means that one send/receive QP is associated with exactly another QP. The first subtype unconnected means that one send/receive QP is associated with multiple other QPs. An exemplary QP type is Infiniband TM The specification defines, including a Reliable Connection (RC), a Reliable Datagram (RD), an extended reliable connection (XRC), an Unreliable Datagram (UD), and an Unreliable Connection (UC). Additional QP types have also been developed, such as Scalable Reliable Datagram (SRD) and Dynamic Connection (DC), also known as Dynamic Connected Transport (DCT).
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions that cause a processor to perform various aspects of the present invention.
The computer readable storage medium may be a tangible device capable of retaining and storing instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a corresponding computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network.
The computer-readable program instructions may execute entirely on the user's computer and partly on the user's computer; as a stand-alone software package, executing in part on the user's computer and in part on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), and the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, an electronic circuit comprising a programmable logic circuit, a field-programmable gate array (FPGA), a Programmable Logic Array (PLA), or the like, can execute the computer-readable program instructions to perform various aspects of the present invention by personalizing the electronic circuit with state information of the computer-readable program instructions.
Aspects of the present invention are described herein in connection with flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products provided by embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Referring now to fig. 1, fig. 1 is a block diagram of a computing device 104 for selecting a QP type to transmit data across a network 120, in accordance with some embodiments. Referring also to fig. 2, fig. 2 is a block diagram of multiple computing devices 120 using a selected QP type to transmit data across network 104, according to some embodiments. Referring also to fig. 3, fig. 3 is a flow diagram of a method for selecting a QP type for data transmission across a network, according to some embodiments. The computing device 104 may implement the acts of the method described with reference to fig. 3, for example, by one or more or a combination of: the one or more processors 102A of the computing device 104 execute code instructions (e.g., code 150) stored in the memory 106A, the one or more processors 102A of the computing device 104 are implemented in hardware to execute instructions defined by the code 150, the one or more processors 102B of the network interface device (e.g., network interface card) 114 execute code instructions (e.g., code 150) stored in the memory 106B, and/or the one or more processors 102B are implemented in hardware to execute instructions defined by the code 150.
The queue 106B stores QP of the selected type, as described herein. The QP 106B can be stored, for example, by the memory 106A of the computing device 104 and/or the memory 106B of the network interface device 114.
The computing device 104 may act as a network node and may sometimes be referred to herein as a network node.
As shown, the computing device 104 communicates with one or more other instances of the computing device 104 (e.g., another network node) over the network 120 by establishing a network connection using a dynamically selected type of QP, as described herein. Computing device 104 may be implemented, for example, as a server, a client, an initiator network node that initiates network connection establishment, and/or a target network node that receives a request from an initiator to establish a network connection.
The computing device 104 may be implemented, for example, as one or more of the following: a computing cloud, a single computing device (e.g., a client terminal), a group of computing devices arranged in parallel, a network server, a local server, a remote server, a client terminal, a mobile device, a stationary device, a kiosk, a smartphone, a laptop, a tablet, a wearable computing device, a glasses computing device, a watch computing device, and a desktop computer.
The one or more processors 102 are implemented, for example, as one or more Central Processing Units (CPUs), one or more Graphics Processing Units (GPUs), one or more Field Programmable Gate Arrays (FPGAs), one or more Digital Signal Processors (DSPs), one or more Application Specific Integrated Circuits (ASICs), one or more custom circuits, processors for interfacing with other units, and/or dedicated hardware accelerators. The one or more processors 102 may be implemented as a single processor, a multi-core processor, and/or a cluster of processors arranged for parallel processing (which may include homogeneous and/or heterogeneous processor architectures).
The memory 106A stores code instructions that may be implemented by the one or more processors 102A and/or the memory 106B stores code instructions that may be implemented by the processor 102B of the network interface device 114. The memories 106A-106B are implemented, for example, as Random Access Memories (RAMs), read-only memories (ROMs), and/or storage devices such as non-volatile memories, magnetic media, semiconductor memory devices, hard disk drives, removable memories, and optical media (e.g., DVDs, CD-ROMs).
Memory 106A may store a Virtual Machine Manager (VMM) 108 that manages and/or runs one or more Virtual Machines (VMs) 110. The VMM 108 may be implemented as a virtual machine hypervisor. The VMM 108 may be implemented in hardware, software, firmware, and/or a combination thereof.
Each VM 110 executes one or more Virtual Function (VF) drivers 112.
The computing device 104 includes and/or communicates with one or more network interface devices 114, optionally a network interface card and/or a network adapter.
The network interface device 114 may include one or more processors 102B and memory 106B. Features of the methods described herein may be implemented by the computing device 104 (e.g., the one or more processors 102A executing the code 105 stored in the memory 106A) and/or by the network interface device 114 (e.g., the one or more processors 102B executing the code 105 stored in the memory 106B).
The computing device 104 may include and/or be in communication with one or more data storage devices 118. The data store 118 may store, for example, candidate QP types that may be selected. It should be noted that the code instructions may optionally be loaded from the data storage device 118 into the memory 106 for execution by the processor 102. The one or more data storage devices 118 may be implemented as, for example, memory, a local hard disk, a removable storage unit, an optical disk, a storage device, and/or a remote server and/or computing cloud (e.g., accessed via a network connection).
The computing device 104 may communicate with a network 120, such as the internet, a local area network, a virtual network, a wireless network, a cellular network, a local bus, a point-to-point link (e.g., wired), and/or combinations thereof, through the network interface device 114.
The network interface device 114 may be associated with one or more Physical Function (PF) drivers 116. The network interface device 114 may be virtualized for use by multiple VMs 110 through corresponding executing VF drivers 112. For example, different VMs 110 may access the network 120 through one or more VF drivers 112 that are used to access one or more PF drivers 116 of the network interface.
The computing device 104 may include and/or communicate with one or more physical user interfaces 122 that include mechanisms for user interaction, for example, to input data (e.g., define a set of rules for selecting QP types) and/or view data (e.g., view available resources and/or established connections).
Exemplary physical user interfaces 122 include, for example, one or more of a touch screen, a display, a gesture activation device, a keyboard, a mouse, voice activated software using a speaker and microphone, and a coordinator to send data over a network interface.
Returning now to fig. 2, multiple instances of the computing device 104 (including at least the processors 102A and/or 102B) communicate data with each other over the network 120 through a network connection using the selected QP type 106B. Various implementations of multiple instances of the computing device 104 are described below, such as data transfer between two instances of the computing device 104 (e.g., one computing device is an initiator device/local device and the other is a target device/remote device), many-to-many communication where each computing device 104 communicates with all other computing devices 104, and single-to-many communication where one computing device 104 communicates with multiple other computing devices 104.
Returning now to fig. 3, at 302, a request to establish a network connection for data transfer across a network is received. The request may be initiated, for example, by an application running on the local computing device. In another example, a request may be initiated from a remote device (e.g., a client, e.g., an initiator device) to establish a network connection for communicating with a local device (e.g., a server, e.g., a target device).
The request may be in response to and/or include devices that inform each other that they support QP type dynamic selection capability, as described herein. Optionally, a handshake is performed between the remote device and the local device (e.g., between the target device and the initiator device) to ensure that both devices support dynamic selection of QP type, as described herein. For example, discovery capability procedures and/or negotiation protocols may be used to enable and/or manage dynamic selection capabilities of QP type.
For example, the request is received by one or more of: processing circuitry running an application, a network interface card establishing a network connection for connecting to a network, and/or a virtual machine hypervisor hosting an application.
The request may be to establish a network connection using an application-defined QP type. An application may select the QP type that best suits its own needs, regardless of the available resources. When the selected QP is different from the requested QP type, the requesting application may not know the actual QP selected.
Alternatively, the request defines a dynamic QP type for establishing the network connection. The dynamic QP type indicates that the application knows that any candidate QP type can be selected, as described herein.
At 304, one or more resource-related parameters are analyzed. Each resource-related parameter indicates a respective resource-related status for the established network connection. Each resource-related parameter may indicate a real-time status of one or more resources used to establish the network connection and/or used to process data transmitted over the established network connection. The resource-related parameters may indicate the current resource availability status of one or more components, such as one or more processors, memory, available QPs for request types, remotely connected nodes, and networks.
These parameters provide a description of the existing resource usage, which allows for the selection of the QP type for the network connection that will provide the best utilization of the existing resources.
Exemplary resource-related parameters include: the number of network nodes communicating with the application providing the request, the number of existing network connections of the application providing the request, the topology of the existing network connections of the application, the total number of active QPs for previously established network connections, whether the remaining QPs of the request type are available for allocation, the transmission reliability of the network, the current memory utilization, and the current utilization of the processing circuitry.
The analysis may be performed using one or more methods. For example, a rule set is used for the selected type of resource-related parameter that generates the QP. The rule sets may be defined manually and/or automatically. The rule set may be based on the following predictions: adherence to the rule set may improve utilization of and/or optimize available resources. In another example, the analysis is performed using a trained resource classifier that receives the resource-related parameters as input and generates a selected type of result for the QP. The resource classifier is trained on a training data set of sample resource related parameters and a label of QP type. In yet another example, the optimization code uses a mathematical model and/or a system of equations to calculate the types of QPs that optimize the resource-related parameters and/or simulate the resource results for different QP types.
At 306, a QP type is selected from a plurality of candidate QP types according to the analysis. QP types may be defined for RDMA and/or transport protocols. The QP type can be based on published definitions and/or based on custom created QP types.
The QP will be created according to the selected type. The type of QP is specified when the QP is created.
Exemplary QP types include: reliable Connection (RC), reliable Datagram (RD), extended reliable connection (XRC), unreliable Datagram (UD), unreliable Connection (UC), scalable Reliable Datagram (SRD), and Dynamic Connection (DC), which will be discussed in detail below.
Optionally, the selected QP type is in the same group as the requested QP type. The group may be a reliable group of a reliable QP type or an unreliable group of an unreliable QP type. Within each group, the selected QP type may belong to a different subtype than the requested QP type. Maintaining the reliability or unreliability of QP types according to the QP type of network connection request maintains compatibility with applications and/or processes that use the data transmitted over the network. For example, an application desiring reliable data transfer receives data transferred in a reliable manner and does not need to process unexpected data transferred in an unreliable manner.
Optionally, the network connection is for a reliable QP type, and the selected type of QP is an unreliable type of QP, wherein reliability is provided by a reliability layer implemented by one or a combination of software, firmware, and hardware. The requested reliability may be provided using an unreliable QP type with the reliability layer instead of using a resource intensive reliable QP type, which may improve resource utilization by using an unreliable QP type that uses fewer resources than a reliable QP type.
The request to establish the network connection may be for one QP type and another QP type different from the requested QP type is selected. The selection process may be performed in an implicit mode, where the request is ignored and the QP type that provides the best optimization of the available resources is selected, without the requesting application and/or process knowing that the requested QP type has changed.
The request to establish the network connection may be for a dynamically selected Interchangeable QP Type (IQT). The dynamic QP type is not an actual QP type, but rather an indication of which QP type is dynamically selected, without specifying which particular QP type is requested to establish the network connection. The dynamic QP type indicates that the application will process any QP type actually selected from the candidate types (excluding the dynamic type). The highest and/or best resource utilization efficiency may be due to a process and/or application indicating any candidate type for selecting a QP using a dynamic QP type.
Each QP type may include a queue pair: a Send Queue (SQ) and a Receive Queue (RQ). A message transfer request may be issued to the SQ, for example, by an application sending data across the network. When each message is executed, the SQ logic transmits the outbound message transfer request to the RQ logic of the remote QP, i.e., only the no-tag operation-send opcode, because the read and write messages do not pass the RQ. Work Requests (WRs) may be issued to the RQ to handle certain types of inbound message transfer requests transferred to the RQ logic by the SQ logic of the remote QP. Each computing device, i.e., network node, may implement multiple QPs of one or more types, e.g., up to 100 million possible QPs, each QP may be capable of sending and receiving messages to and from one or more QPs in a remote network node. The selection of QP types may reduce the total number of QPs and/or reduce the total number of network connections and/or increase utilization of processing resources, as described herein.
At 308, a network connection is established using the selected type of QP for data transfer across the network.
Before any messages can be transmitted, a connection is established between the QPs of the two network nodes. The QP contexts for the two QPs can each be programmed with the identity of the remote QP and the address of the port on which the remote QP is located. For RoCE, the port is constant and there are other fields to differentiate applications as well, for example:
QP ID on RoCE RC-BTH header
RoCE RD-RDETH and DETH with EEC and source QP
The following is an exemplary process of establishing a network connection using a selected QP type between a client requesting establishment of a network connection with a service in a remote network node and a server hosting the service provided to the client. The client sends a REQ message to the server with an indication of the service (e.g., serviceID) with which the network connection is established. The server verifies that the service exists, then creates a local QP and/or EEC and QP, and sends information in a message about the created QP and/or EEC back to the requesting client. At this point, the new local QP and/or EEC is in Ready To Receive (RTR) state, i.e., ready to receive messages, but it cannot send messages until the QP settings of the client are complete. The client receives the message, completes the establishment of the local QP and/or the EEC by using the information in the message, and converts the local QP and/or the EEC into a Ready To Send (RTS) state. The client sends a message to the server to convert the QP and/or EEC to the RTS state. Upon receipt of the message, the server transitions its local QP and/or EEC to RTS state. Note that the local QP and/or EEC of the server will automatically transition from RTR to RTS state when the first packet sent by the send logic of the remote QP and/or EEC is received (even if no message has been received from the client).
At 310, one or more features described with reference to 304-308 are iterated. Iterations may be performed to monitor resource-related parameters to detect significant changes that trigger re-analysis of the resource-related parameters, such as increased noise in the network, formation of new connections to new client terminals, reduction in available memory, and/or reduction in processor capacity. The iteration may be performed without necessarily being triggered by monitoring. For example, the resource-related parameters may be re-analyzed at defined time intervals (e.g., every minute, every 5 minutes, every 30 minutes, every hour, and other time intervals), and/or when data is transmitted over the network and/or when the network connection is active but not transmitting data, and/or each time a new network connection is initiated.
As the resource-related parameters change, the QP type of the existing connection may be dynamically changed during the use of the established network connection to transmit data over the network to provide a different QP type that is expected to improve resource utilization.
The resource-related parameters may be re-analyzed while the established network connection is active, optionally during data transmission over the established network connection and/or while no data transmission occurs, as in 304. Another QP type is reselected from the candidate type based on the re-analysis, as in 306. Another new network connection may be established using another type of reselection of the QP, as in 308. The previously established network connection may be dynamically migrated to the new network connection and/or data transfers over the previously established network connection may be resumed for transfer over the new network connection.
At 312, one or more features described with reference to 302-308 are iterated. Iterations may be performed for processing the new request to establish a new network connection, e.g., by the same application and/or by a different application.
After the iterations 302 through 308, data traffic on the previously established network connection may be migrated to the new network connection and multiplexed with other data traffic designated for the new network connection at 314. In this case, the previously established network connection may be terminated once the data traffic has migrated to the new network connection.
Alternatively, after the iterations 302 through 308, the data specified for the new network connection is migrated to the previously established network connection and multiplexed with other data traffic transmitted over the previously established network connection at 316. In this case, a new network connection (as in 308) is not necessarily established.
At 314, data traffic on the previously established network connection may be migrated to the new network connection in response to the iterations 302 through 308. Two network connections transmitting two data sets over the network may be merged into a single network connection. The merged second network connection may use fewer resources than two separate network connections.
Additional requests to establish another network connection for data transfer across the network are received, as in 302, from the same application that issued the previous request and/or from a different application. Additional analysis of the resource-related parameters is performed, as in 304. Additional analysis of resource-related parameters may be performed while a previously established network connection is active, optionally transferring data. The additional analysis may represent previously established network connections and/or resource impact for establishing additional network connections, optionally taking into account resource impact of a combination of previously established network connections and requested network connections. Another QP type is selected based on the additional analysis, as in 306. Additional network connections are established using the newly selected QP type, as in 308. Now, in 314, the previously established network connection is migrated to the newly established (i.e., additional) network connection. The newly established (i.e., additional) network connection transports data across the network for the previous network connection and for the newly established network connection, e.g., by multiplexing two data streams.
Two separate network connections requested by two different applications for transmitting two data sets over the network may be combined into a single network connection used by the two applications. The merged second network connection may use less resources than two separate network connections. The previously established network connection may be established in response to a request from the first application. The new request may be provided by a second application different from the first application. The newly established network connection transfers data for the first application and the second application over the network.
Optionally, the dynamic migration from a previously established network connection to a newly established network connection is performed by an exemplary process that suspends current network traffic on the previously established network connection for transparently performing the migration without significant interruption and/or without the application associated with the current network traffic being aware of the migration.
Live migration may be performed using the following exemplary process: network traffic on the previously established network connection is suspended. An acknowledgement message is received. The acknowledgement message indicates that a packet transmitted over the previously established network connection prior to the suspension has been received by a device at the other end of the previously established network connection. The acknowledgement message indicates that the previously established network connection is "empty," i.e., no packets are currently traversing the previously established network connection. Additional network traffic destined for the previously established network connection is redirected to the additional (i.e., newly established) network connection using the additional (i.e., newly established) selected QP type.
The previously established network connection may be terminated when data traffic designed for the previously established network connection has been redirected to an additional (i.e., newly established) network connection. Optionally, the previously established network connection may be terminated in response to receiving an indication that at least one first packet of the rerouted network traffic has passed through the additional network connection to the device at the other end of the additional network connection. Resources bound by a previously established network connection will be released by terminating the previously established network connection.
At 316, data traffic directed to the new network connection may be initialized over the previously established network connection in response to the iterations of 302 through 308. When a network connection with the selected second QP type already exists, network traffic destined for the newly established network connection may be added to the existing network connection. Resources may be saved by using an existing network connection rather than adding another network connection.
Additional requests to establish another network connection for data transfer across the network are received, as in 302, from the same application that issued the previous request and/or from a different application. Additional analysis of the resource-related parameters is performed, as in 304. Additional analysis of resource-related parameters may be performed while a previously established network connection is active, optionally transferring data. The additional analysis may represent previously established network connections and/or resource impacts for establishing additional network connections, optionally taking into account resource impacts of a combination of previously established network connections and requested network connections. Another QP type is selected based on the additional analysis, as in 306. Note that another new network connection is not established using the newly selected QP type at 308. Instead, previously established network connections are identified that use the same QP type as the newly selected QP type. The previously established network connection has been established in response to a previous request before a new request to establish an additional network connection is received. Now, in 316, the previously established network connection is used to transport across the network data of the network traffic originally destined to the previously established network connection and the network traffic destined to the additional network connection (which is not established separately), e.g. by multiplexing the two data streams. The new data flow may be initialized over the previously established network connection.
An example based on the method described with reference to fig. 3 is now provided.
As in 302, during connection setup (e.g., RDMA CM), devices (e.g., target and initiator devices, servers and clients) may handshake and/or inform them whether dynamic QP type selection capability is supported.
The resource-related parameters are analyzed, as in 304.
As in 306, a first RC QP is selected.
A selected first RC QP and first network connection are created on both devices, as in 308. The RC QP flow begins.
Another iteration of 302 through 308 is performed, as in 312.
Additional requests are received, as in 302. The request may be to establish a network connection using an RC QP, or to not specify a particular QP (e.g., request a dynamic QP type).
Additional analysis of the current state of the resource-related parameter is performed, as in 304.
As in 306, the RD QP type is selected, for example, based on an analysis that determines that the RD QP type saves resources over another QP type (e.g., an RC QP type). The RD QP type may be selected even if an RC QP type is requested.
As in 308, an RD QP and end to end context (EEC) are created, which are connected by the second network for transporting traffic across the network.
Note that there are now a total of two QPs on each host: one is RC and the other is RD.
The flow of RD QP starts from the first end-to-end (EE).
Data traffic on the first established network connection using the RC QP type is migrated to an additional network connection established using the RD QP type as in 314. After the EEC is created on the remote device, additional data begins to be sent over the newly selected RD QP type.
The network interface device and/or virtual machine hypervisor use the RC QP to pause traffic on the previously established network connection for a short time to "drain the pipe," i.e., ensure that no packets are sent over the previously established network connection. In response to receiving an acknowledgement message (ACK) for all packets sent over the previously established network connection, the WQE is posted on the RD QP of the additional (i.e., newly established) network connection, rather than on the RC QP of the previously established network connection.
There are now a total of four QPs: one RC QP and one RD type per side. Once the first packet passes through the newly established network connection using the RD QP, the previously established network connection using the RC QP will be destroyed. There are now a total of two QPs: one RD QP on the server and one RD QP on the client. Selecting an RD QP type improves resources compared to the case where only RC QPs are used, with a total of four QPs, and with higher memory usage than selecting an RD QP and migrating from a previous RC QP to a newly established RD QP.
Referring now to fig. 4A-4F, fig. 4A-4F are schematic diagrams of an exemplary QP type for selection, according to some embodiments.
FIG. 4A depicts a Reliable Connection (RC) QP type. The RC is typically selected for the application requiring the highest quality of service, such as a mission critical application, to transmit data over the network. The RC provides the highest level of reliability and predictability. However, the RC protocol consumes a large amount of bandwidth due to the generation of ACKs (returned positive acknowledgement packets to signal successful reception and processing of a Send or RDMA write request packet) and NAKs (returned negative acknowledgement packets to signal one of a temporary receiver unread condition, PSN sequence errors NAKs, and fatal NAK errors).
In an RC implementation, the local QP402 on the local host 404 and the remote QP 408 on the remote host 408One isThe remote QPs 406 are associated, forming a dedicated channel between the QPs for transmitting data 410. The hardware protocol between QP402 and QP406 provides reliable transport. The way to reliably transmit is to detect lost, damaged or invalid packets and automatically suspend further activity between the two QPs, resend the lost/failed packets, and then resume operation. Each queue operation is acknowledged 412, each operation is completed exactly once, and issued in the same order. The maximum message size is not limited by the packet size or the Maximum Transfer Unit (MTU) size defined for the channel. Segmentation and reassembly of long messages occurs in hardware and is transparent to the application. RC supports all InfiniBand TM Service: send/receive, RDMA read, RDMA write, and atomic.
Fig. 4B depicts an Unreliable Connection (UC) QP type. The UC protocol consumes significantly less bandwidth than the RC protocol because the RQ logic of the target QP does not generate ACKs and NAKs.
The setup of the unreliable connection is similar to the reliable connection described with reference to fig. 4A, including a dedicated channel between the local QP402 of the local host 404 and the remote QP406 of the remote host 408 for transmitting data 410. However, no confirmation is provided. The send queue operation is marked as complete immediately after being transferred by the QP. Lost or erroneous messages are not automatically retried. It is discarded and the QP does not provide any indication to the sender as to whether the message was successfully delivered. As with RC services, the operations are done in order and the maximum size of the message is not limited by the packet size or path MTU. Send/receive and RDMA writes are supported, but RDMA reads and atomic operations are not supported. UC provides efficient communication for certain applications (e.g., streaming media) and is not critical to the data lost by these applications.
Fig. 4C depicts Unreliable Datagram (UD) QP types. The UD QP may send and receive messages to any number of UD QPs located in one or more other network nodes. No ACK or NAK is returned for each request packet received. Since the logic of the target QP does not generate ACKs and NAKs, the UD protocol consumes significantly less bandwidth than the RC and RD protocols.
Unreliable datagrams are connectionless services. UD QP may send data into the system(Any)Other UD QPs. The QP402 on the initiating host 404 sends data 410A-410C to any one of the QPs 406A-406C on the target hosts 408A-408C. The sending QP must send a Key named Q _ Key that must match the receiving party's Q _ Key, otherwise the message will be discarded. This may prevent writing to unexpected locations. The only effective operation is transmission/reception. QPs for unreliable datagram services cannot detect lost data, data received out of order, or data received multiple times. In all these cases, the message to which this data belongs is considered as erroneously received and is silently discarded. The maximum message size is limited by the maximum packet size (MTU) supported by the path (256 bytes to 4 kbytes). If reliability is required, thenReliability is provided by upper layer software protocols.
FIG. 4D depicts Reliable Datagram (RD) QP types. The RD QPs may send and receive messages to and from any number of RD QPs located in one or more other network nodes. It performs this operation through one or more "pipes" established between the local network node and one or more remote network nodes. Each "pipe" is referred to as a Reliable Datagram Channel (RDC) and serves as a pipe through which multiple local client RD QPs send and receive messages to and from RD QPs residing in remote network nodes. The RD protocol consumes a large amount of bandwidth due to the generation of ACKs and NAKs.
Reliable datagram services combine the features of RC and UD services. In particular, it allows the same QP to interact with multiple remote QPs simultaneously, while providing reliability. The QP402 on the initiating host 404 sends data 410A-410C to any of the QPs 406A-406C on the target hosts 408A-408C and receives Acknowledgements (ACKs) 412A-412C. In essence, the RD provides a multiplexed reliable connection channel. The QP402 is logically associated with a set of remote RD QPs 406A-406C. This service is most useful for applications running many different processes on each node, which need to communicate with each other in a reliable way. The operation will be acknowledged, completed only once, completed in sequence, and automatically retried in the event of an error. Support all InfiniBand TM Services, namely send/receive, RDMA write, RDMA read, and atomic operations. The endpoints of reliable datagram channels are referred to as end-to-end context (EEC) EEC _1 to EEC _6, as shown. The reliable datagram field (RDD) determines which set of RD QPs may access which set of EECs. Each EEC is shared by all reliable datagrams QP of the RDD.
Fig. 4E depicts an extended reliable connection (XRC) QP type. XRC will be described in the context of data 410 and Acknowledgements (ACKs) 412, which are transmitted between the local QP402 of the respective local host 404 and the remote QP406 of the remote hosts 404 and 408. XRC allows significant savings in the number of QPs required to establish a full-to-full process connection in a large cluster. XRC differs from RD in several ways, but first, it removes the most important restriction of RD transport services: there is a single wait message per EE context. Due to the way XRC operates on the responder side, savings in the total number of QPs required are realized. The responder connection context (denoted XRC TGT QP) allows the requesting process to send messages for multiple destination XRC SRQs 414, which destination XRC SRQs 414 belong to multiple processes on the responder node. Thus, using a single (XRC INI) QP, a process in one node can communicate with all processes on a remote node, thereby reducing the total number of QPs required for a full connection by a factor of p (number of processes per node) compared to when using an RC QP. XRC SRQ 414 is a responder node per process receive queue that can target multiple remote end nodes through XRC TGT QP. They behave somewhat like a receive queue in the RD QP, so each process only needs one to allow it to receive messages from any process on any node in the cluster. In a similar manner as RD QP is restricted to use with RD EE context in its same Reliable Datagram Domain (RDD), XRC transport service achieves an equivalent XRC domain mechanism serving the same purpose. The XRC TGT QP can only be used as a conduit to access XRC SRQs located on its same XRC domain.
FIG. 4F depicts a Dynamic Connected (DC) QP type. The QP402 on the initiating host 404 sends data 410A-410C to any of the QPs 406A-406C on the target hosts 408A-408C and receives Acknowledgements (ACKs) 412A-412C. DC is a scalable transport service that reduces the number of QPs per node compared to RC. DC has RC-like reliability semantics. DC has a symmetric API. On the responder side, there is a DC target device or DCT 416, one of which is sufficient. On the requestor side, there is a DC initiator device or DCI418, one of which is sufficient. DC forms a "temporary connection". The first transmit WR on the DCI connects this DCI to the remote DCT. The second transmit WR uses this open connection. The DCI is disconnected after a period of idle time without being transmitted. If the next transmit WR specifies a different destination, the DCI may "switch destinations". The DCT has a "responder" (DCR) pool. Each incoming DC connection is assigned a DCR. DCI reclamation trade-off: DCI too few-the same DCI switches back and forth between destinations. Redundant connect/disconnect flows (worst case: per transmission). The damage is delayed. Too much DCI-still no N ^2RC QP is bad. Consuming resources and not beneficial to caching. The best practice is to maintain a < DCI dest > hash table to reduce connection re-establishment. LRU evicts the policy to increase the probability of selecting DCI transmission for disconnection.
Referring now to fig. 5-7, fig. 5-7 are schematic diagrams depicting a comparison of dynamically selecting QP type to a standard method, in accordance with some embodiments.
Fig. 5 describes the process of establishing an RC connection according to standard methods. Diagram 502A depicts the use of an RC QP type 510A to establish a first RC network connection 504A (for reliably transmitting data by providing an ACK) between the server host 506 and the client host 508.
Diagram 502B depicts the use of an RC QP type 510B to establish a second RC network connection 504B between the server 506 and the client 508. A first RC connection 504A connects between process a 512 running on server 506 and process C514 running on client 508. A second RC connection 504B connects between process a 512 running on server 506 and process D516 running on client 508.
Fig. 6 depicts an example of a process for establishing a connection using an embodiment of dynamically selecting a QP type described herein. Diagram 602A depicts the use of an RC QP type 610A to establish a first RC network connection 604A between the server host 606 and the client host 608 (for reliable transmission of data by providing an ACK). It is noted that the process depicted in diagram 602A may be the same as the standard method depicted in diagram 502A of fig. 5.
Diagram 602B depicts a response to a request from client 608 to add a second RC connection to server 606. The first RC connection 604A is between process a 612 running on the server 606 and process C614 running on the client 608. The request for the second RC connection is between process a 612 running on server 606 (i.e., the same as the first RC connection) and process D616 running on client 608 (i.e., different from the first RC connection). In response to receiving the request to establish the second RC connection, the QP type is changed from RC type 610A to RD type 610B on server 606 and client 608. For example, to save hardware resources and/or when no more RC QPs are available, an RD QP type may be selected.
Fig. 7 depicts another example of a process for establishing a connection using an embodiment of dynamically selecting a QP type described herein. Diagram 702A depicts the use of RC QP type 710A to establish a first RC network connection 704A (for reliably transmitting data by providing an ACK) between the server host 706 and the client host 708. It is noted that the process depicted in diagram 702A may be the same as the standard method depicted in diagram 502A of FIG. 5 and/or diagram 602A of FIG. 6.
Diagram 702B depicts a response to a request from a client 708 to add a second RC connection to a server 706. Using standard methods, a second RC network connection will be established between the server 706 and the client 708. However, according to at least one embodiment described herein, the DC QP type 710B is dynamically selected and used to establish the network connection 704B between the server 706 and the client 708. The previous RC QP type 710A used to establish the network connection 704A is migrated to the network connection 704B established using the DC QP 710B. For example, a DC QP type may be selected in order to save hardware resources and/or when no more RC QPs are available.
Other systems, methods, features and advantages of the invention will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the accompanying claims.
The description of the various embodiments of the present invention is intended to be illustrative, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical advances, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein, as compared to existing technologies in the market.
It is expected that during the life of a patent maturing from this application many relevant QP types will be developed and the scope of the term QP type is intended to include all such new technologies a priori.
The term "about" as used herein means ± 10%.
The terms "including", "having" and variations thereof mean "including but not limited to". This term includes the terms "consisting of … …" and "consisting essentially of … …".
The phrase "consisting essentially of … …" means that a composition or method may include additional components and/or steps, provided that the additional components and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.
As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. For example, the term "a complex" or "at least one complex" may include a plurality of complexes, including mixtures thereof.
The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the presence of other combinations of features of other embodiments.
The word "optionally" is used herein to mean "provided in some embodiments and not provided in other embodiments. Any particular embodiment of the invention may include a plurality of "optional" features unless such features conflict.
In the present application, various embodiments of the present invention may be presented in a range format. It is to be understood that the description of the range format is merely for convenience and brevity and should not be construed as a fixed limitation on the scope of the present invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible sub-ranges as well as individual numerical values within that range. For example, a description of a range such as from 1 to 6 should be considered to have specifically disclosed sub-ranges from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6, etc., as well as individual numbers within that range such as 1, 2, 3, 4, 5, and 6. This applies regardless of the wide range.
When a range of numbers is indicated herein, the expression includes any number (fractional or integer) recited within the indicated range. The phrases "a range between a first indicated digit and a second indicated digit" and "a range from a first indicated digit to a second indicated digit" are used interchangeably herein and are meant to include the first indicated digit and the second indicated digit as well as all fractions and integers in between.
It is appreciated that certain features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as any suitable alternative embodiment of the invention. Certain features described in the context of various embodiments are not considered essential features of those embodiments, unless the embodiments are inoperable without those elements.
All publications, patents, and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.

Claims (18)

1. A processing circuit (102A, 102B) for selecting a type of Queue Pair (QP) for data transmission across a network (120), the processing circuit being configured to:
receiving a request to establish a network connection for data transmission across the network;
analyzing a plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status of the network connection for establishment;
selecting the type of the QP from a plurality of candidate types according to the analysis;
establishing the network connection using the selected type of the QP for data transfer across the network.
2. The processing circuit of claim 1, further configured to:
re-analyzing the plurality of resource-related parameters during data transmission over the established network connection;
reselecting another type of the QP from the plurality of candidate types according to the re-analysis;
establishing a second network connection using the other type of the reselection of the QP;
performing at least one of the following operations: dynamically transferring the network connection to the second network connection, re-initiating transfer of data of the network connection to be transferred over the second network connection.
3. The processing circuit of any of the preceding claims, further configured to:
the network connection comprises a first network connection,
receiving a second request to establish a second network connection for data transmission across the network;
performing a second analysis on the plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status of the established first network connection and being used for the establishment of the second network connection;
selecting a second type of the QP from the plurality of candidate types according to the second analysis;
establishing the second network connection using a second type of the QP;
dynamically migrating the first network connection to the second network connection, wherein the second network connection transports data across the network for the first network connection and the second network connection.
4. The processing circuit of claim 3, wherein the request is provided by a first application, wherein the second request is provided by a second application, and wherein the second network connection transfers data for the first application and the second application over the network.
5. The processing circuit of claim 3 or 4, wherein dynamically migrating comprises:
suspending network traffic on the first network connection;
receiving an acknowledgement message that a packet sent over the first network connection before the suspension has been received by a device at the other end of the first network connection;
using the second type of the QP for the second network connection for additional network traffic destined for the first network connection.
6. The processing circuit of claim 5, further configured to: terminating the first network connection in response to receiving an indication that at least one first packet of the additional network traffic has passed through the second network connection.
7. The processing circuit of any of the preceding claims, further configured to:
receiving a third request to establish a third network connection for data transmission across the network;
performing a third analysis on the plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status of the established network connection and being used for establishment of the third network connection;
selecting a third type of the QP from the plurality of candidate types according to the third analysis;
prior to receiving the third request, a fourth network connection of a third type using the QP has been previously established by the processing circuit for a fourth request;
using the fourth network connection to transfer data across the network associated with the third request and the fourth request, wherein the third network connection is not established independently.
8. The processing circuit according to any of the preceding claims, wherein the request to establish the network connection is of a first type for QP, and wherein a second type of QP different from the first type is selected.
9. The processing circuit according to any of the preceding claims, wherein the request to establish the network connection is a QP reliable type or a QP unreliable type for a first subtype of QP, and wherein the selected type of QP is a QP reliable type or a QP unreliable type for a second subtype of QP defined by the request, the second subtype of QP defined by the request being different from the first subtype of QP defined by the request.
10. The processing circuit of claim 9, wherein the network connection is for the QP reliable type and the selected type of QP is the QP unreliable type, wherein reliability is provided by a reliability layer implemented by at least one or a combination of software, firmware, and hardware.
11. The processing circuit of any preceding claim, wherein the request to establish the network connection is for an Interchangeable QP Type (IQT), and wherein the type of the QP is selected from the plurality of candidate types that do not include a dynamic type.
12. The processing circuit according to any of the preceding claims, wherein the plurality of resource-related parameters are selected from the group consisting of: a number of network nodes in communication with an application providing the request, a number of existing network connections of the application providing the request, a topology of existing network connections of the application, a total number of active QPs for network connections established by the processing circuitry, whether a QP of the type of the request is available, a transmission reliability of the network, a current memory utilization, and a current utilization of the processing circuitry.
13. The processing circuit according to any of the preceding claims, wherein the plurality of candidate types of the QP pairs are selected from the group consisting of: a Reliable Connection (RC), a Reliable Datagram (RD), an extended reliable connection (XRC), an Unreliable Datagram (UD), an Unreliable Connection (UC), an extensible reliable datagram (SRD), and a Dynamic Connection (DC).
14. The processing circuit according to any of the preceding claims, wherein the analysis is done using at least one of: a set of rules for generating the plurality of resource-related parameters of the selected type of the QP; a classifier that receives the plurality of resource-related parameters as input and generates a result of the selected type of the QP, wherein the classifier is trained on training data sets of a plurality of sample resource-related parameters and labels of a QP type; optimization code that uses a mathematical model and/or a system of equations to calculate the type of the QP that optimizes the plurality of resource-related parameters.
15. The processing circuit of any preceding claim, wherein the data transfer across the network is in accordance with a Remote Direct Memory Access (RDMA) protocol, and the plurality of candidate types of QPs are defined by a network transfer protocol for RDMA.
16. The processing circuit of claim 15, wherein the network transport protocol for RDMA defining the plurality of candidate types of QPs is selected from the group consisting of: infiniBand, IB, remote direct memory access over converged Ethernet (RoCE), roCEv2, iWARP, and derivatives thereof.
17. A method for selecting a type of Queue Pair (QP) for data transmission across a network, comprising:
receiving a request to establish a network connection for data transfer across the network (302);
analyzing a plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status for the established network connection (304);
selecting the type of the QP from a plurality of candidate types according to the analyzing (306);
establishing the network connection using the selected type of the QP for data transmission across the network (308).
18. A computer program comprising program instructions (150), wherein the program instructions (150), when executed by a processor (102A, 102B), cause the processor to:
receiving a request to establish a network connection for data transmission across the network;
analyzing a plurality of resource-related parameters, each resource-related parameter indicating a respective resource-related status of the network connection for establishment;
selecting the type of the QP from a plurality of candidate types according to the analysis;
establishing the network connection using the selected type of the QP for data transfer across the network.
CN202080103258.8A 2020-08-19 2020-08-19 Exchangeable queue types for network connections Pending CN115885270A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/073259 WO2022037777A1 (en) 2020-08-19 2020-08-19 Interchangeable queue type for network connections

Publications (1)

Publication Number Publication Date
CN115885270A true CN115885270A (en) 2023-03-31

Family

ID=72474277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080103258.8A Pending CN115885270A (en) 2020-08-19 2020-08-19 Exchangeable queue types for network connections

Country Status (2)

Country Link
CN (1) CN115885270A (en)
WO (1) WO2022037777A1 (en)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8948199B2 (en) * 2006-08-30 2015-02-03 Mellanox Technologies Ltd. Fibre channel processing by a host channel adapter

Also Published As

Publication number Publication date
WO2022037777A1 (en) 2022-02-24

Similar Documents

Publication Publication Date Title
US20220214919A1 (en) System and method for facilitating efficient load balancing in a network interface controller (nic)
US10778767B2 (en) Persistent memory replication in RDMA-capable networks
US20200314181A1 (en) Communication with accelerator via RDMA-based network adapter
US9442812B2 (en) Priming failover of stateful offload adapters
US9503383B2 (en) Flow control for reliable message passing
US9253287B2 (en) Speculation based approach for reliable message communications
KR20180098358A (en) Multipath transmission design
CN1881945A (en) Improved distributed kernel operating system
CN112291293B (en) Task processing method, related equipment and computer storage medium
CN1881944A (en) Improved distributed kernel operating system
US20200213144A1 (en) Methods and network nodes for providing coordinated flowcontrol for a group of sockets in a network
US8819242B2 (en) Method and system to transfer data utilizing cut-through sockets
US11403253B2 (en) Transport protocol and interface for efficient data transfer over RDMA fabric
WO2021073546A1 (en) Data access method, device, and first computer device
US11418582B1 (en) Priority-based transport connection control
WO2018054271A1 (en) Method and device for data transmission
EP4357901A1 (en) Data writing method and apparatus, data reading method and apparatus, and device, system and medium
US9749825B2 (en) Connection-oriented messaging and signaling in mobile heath networks
US11121960B2 (en) Detecting and managing relocation of network communication endpoints in a distributed computing environment
US11720413B2 (en) Systems and methods for virtualizing fabric-attached storage devices
US20040267960A1 (en) Force master capability during multicast transfers
US10692168B1 (en) Availability modes for virtualized graphics processing
US11347594B2 (en) Inter-processor communications fault handling in high performance computing networks
WO2020082839A1 (en) Message processing method, related device and computer storage medium
CN115885270A (en) Exchangeable queue types for network connections

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination