US20180285151A1 - Dynamic load balancing in network interface cards for optimal system level performance - Google Patents

Dynamic load balancing in network interface cards for optimal system level performance Download PDF

Info

Publication number
US20180285151A1
US20180285151A1 US15/476,379 US201715476379A US2018285151A1 US 20180285151 A1 US20180285151 A1 US 20180285151A1 US 201715476379 A US201715476379 A US 201715476379A US 2018285151 A1 US2018285151 A1 US 2018285151A1
Authority
US
United States
Prior art keywords
cpu core
receive queue
nic
overloaded
queue length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/476,379
Inventor
Ren Wang
Daniel P. Daly
Antoine Kaufmann
Saikrishna Edupuganti
Tsung-Yuan C. Tai
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US15/476,379 priority Critical patent/US20180285151A1/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: WANG, REN, EDUPUGANTI, Saikrishna, KAUFMANN, Antoine, TAI, TSUNG-YUAN C., DALY, Daniel P.
Priority to CN201810213728.0A priority patent/CN108694087A/en
Priority to DE102018204859.2A priority patent/DE102018204859A1/en
Publication of US20180285151A1 publication Critical patent/US20180285151A1/en
Priority to US17/152,573 priority patent/US20210141676A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management

Definitions

  • NICs network interface cards
  • HFIs host fabric interfaces
  • CPU central processing unit cores
  • CPU core load distribution management CPU core load distribution management
  • FIG. 1 is a functional block diagram illustrating an example of a system 100 that includes a computing device 110 , such as a network appliance.
  • the computing device 110 includes a central processing unit (CPU) 112 for executing instructions as well as a memory 114 for storing such instructions.
  • the CPU 112 has n CPU cores.
  • the term core generally refers to a basic computation unit of the CPU.
  • the memory 114 may include random access memory (RAM), flash memory, hard disks, solid state disks, optical disks, or any suitable combination thereof.
  • the computing device 110 also includes a network interface card (NIC) 116 for enabling the computing device 110 to communicate with at least one other computing device 120 , such as an external or otherwise remote device, by way of a communication medium such as a wired or wireless packet network, for example.
  • the computing device 110 may thus transmit data to and/or receive data from the other computing device(s) by way of its NIC 116 .
  • the NIC 116 has n receive queues for receiving data, e.g., ingress packets, from the other computing device(s).
  • NICs can steer data flows, e.g., data packets, to any of a number of receive queues by way of Receive Side Scaling (RSS) or implementation of a flow director.
  • RSS Receive Side Scaling
  • Servers generally take advantage of such capabilities to distribute connections, e.g., transmission control protocol (TCP) connections, to different CPU cores for processing.
  • TCP transmission control protocol
  • RSS typically includes application of a filter that applies a hash function over the packet headers of received data packets.
  • An indirection table can then be used to map each data packet to a certain receive queue, e.g., based on the corresponding hash value.
  • the CPU cores can then be assigned to work on one or more specific queues in order to enable distributed processing.
  • RSS usually involves the mapping of many data flows into a limited number of receive queues targeting a limited number of CPU cores
  • there is typically a high likelihood of data traffic imbalance in which one or more CPU cores are disadvantageously required to handle a higher amount of data traffic. While such CPU cores struggle to keep up with the incoming data packets, other CPU cores remain relatively idle. Such situations are inefficient and not optimal for system-wide performance.
  • FIG. 1 is a functional block diagram illustrating an example of a system having a computing device that includes a central processing unit (CPU), a memory, and a network interface card (NIC).
  • CPU central processing unit
  • NIC network interface card
  • FIG. 2 is a functional block diagram illustrating a first example of a system having a computing device that includes a network interface card (NIC) and at least one central processing unit (CPU) core in accordance with certain embodiments of the disclosed technology.
  • NIC network interface card
  • CPU central processing unit
  • FIG. 3 is a flow diagram illustrating an example of a computer-implemented method of performing CPU core load balancing in accordance with certain embodiments of the disclosed technology.
  • FIG. 4 is a flow diagram illustrating another example of a computer-implemented method of performing CPU core load balancing in accordance with certain embodiments of the disclosed technology.
  • FIG. 5 illustrates an example of multiple receive queue thresholds in accordance with certain embodiments of the disclosed technology.
  • references in the specification to “one embodiment,” “an embodiment,” “an illustrative embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, such feature, structure, or characteristic can be employed in connection with another disclosed embodiment whether or not such feature is explicitly described in conjunction with such other disclosed embodiment.
  • the disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof.
  • the disclosed embodiments may also be implemented as instructions (e.g. a computer program product) carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage mediums, which may be read and executed by one or more processors.
  • a machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device).
  • Embodiments of the disclosed technology generally pertain to network interface card (NIC)-based adaptive techniques for performing dynamic load distribution among multiple CPU cores.
  • NIC network interface card
  • the NIC can effectively and dynamically load-balance incoming data traffic and consequently optimize the full-system performance. Indeed, significant improvement may be realized in network processing performance with many workloads without requiring software support.
  • Embodiments can address both connection-oriented and connectionless data traffic.
  • Such dynamic load balancing in a NIC generally includes detecting whether one or more of the CPU cores are overloaded. Such detection can be done, for example, by measuring CPU core responsiveness speed in real-time using one or more metrics such as receive queue length. If a determination is made that a certain CPU core is overloaded, a portion of data packets that were originally targeted or otherwise mapped to the CPU core can be directed elsewhere. For example, the data packets can be redirected to a relatively idle CPU core.
  • FIG. 2 is a functional block diagram illustrating a first example of a system 200 having a computing device 210 that includes a network interface card (NIC) 216 , such as an Ethernet card, and at least one central processing unit (CPU) core 230 in accordance with certain embodiments of the disclosed technology.
  • NIC network interface card
  • CPU central processing unit
  • FIG. 2 is a functional block diagram illustrating a first example of a system 200 having a computing device 210 that includes a network interface card (NIC) 216 , such as an Ethernet card, and at least one central processing unit (CPU) core 230 in accordance with certain embodiments of the disclosed technology.
  • NIC network interface card
  • CPU central processing unit
  • HFI host fabric interface
  • the NIC 216 has n receive queues, such as registers or other suitable data storage components, for receiving data packets from other computing devices.
  • a first receive queue 217 of the NIC 216 may receive one or more incoming data packets 205 , e.g., from a separate computing device over a wired or wireless connection.
  • Each of then receive queues of the NIC 216 may be mapped to one or more CPU cores. This mapping may be re-configurable, e.g., depending on the hardware specifications and/or other details of the particular implementation. In the example, data packets sent to the first receive queue 217 and the nth receive queue 218 are mapped to a first CPU core 230 .
  • the data packets from either or both of the first and nth receive queues 217 and 218 may be redirected, e.g., re-mapped to, another CPU core such as the nth CPU core 231 .
  • the CPU core to which the data packets are redirected may be selected based on a determination that the CPU core is less busy than the first CPU core 230 .
  • Certain embodiments may include an NIC-based load balancer configured to handle different run-time situations during redirection. For example, with regard to situations involving TCP connections, SYN packets (which typically mark the beginning of a new data flow) may be identified and steered to a lightly loaded CPU core when the system determines that the CPU core handling the data traffic is over-loaded. A flow director may implement an exact match rule identifying this data flow along with an action to redirect the packets to the CPU core having a lighter load. This advantageously maintains the data flow affinity for subsequent data packets belonging to this flow. Also, existing connections may continue to be served by their original CPU core choices.
  • a portion of the key may be used to direct data packets to a lookup CPU core to minimize cross-core snoop and also maximize the system performance generally, because the same key typically directs to the same CPU core for the lookup.
  • Embodiments may cause data packet redirection to happen immediately once CPU core congestion is detected by modifying an RSS indirection table to point some hash values to receive queues that are serviced CPU cores having a lighter load.
  • Current load balancing techniques are generally performed at a dedicated appliance or a server in front of multiple servers, which may work at the node level but not at the CPU core level. While other techniques can be implemented in software, such as Receive Packet Steering (RPS, a software implementation of RSS), and can be used in conjunction with utilities that help monitor CPU load, e.g., mpstat, such techniques disadvantageously result in extra latency and also occupy valuable CPU cycles for the load balancing tasks. In contrast, embodiments of the disclosed technology advantageously enable an NIC to transparently balance the load in real time, without software interference in the critical path.
  • RSS Receive Packet Steering
  • Embodiments generally include use of an NIC's capability to steer data packets/flows to different receive queues to be processed by different CPU cores, advantageously resulting in improvements in latency, such as avoiding core-core transferring for TCP connections, and also increased data throughput.
  • dynamic load balancing may need to be enforced to change the mapping of data packets to CPU cores.
  • RSS may be used to perform a hash function on the data packet header and map data flows to different receive queues assigned to different CPU cores using a corresponding indirection table. Multiple data queues can mapped to the same CPU core, and this can be configured by a user.
  • Other embodiments may include a flow director having programmable filters that can be used to identify specific data flows or sets of data flows based on an exact match of a portion of data packets, for example, and then route the data packet(s) to specific receives queues, e.g., mapped to specific CPU cores.
  • FIG. 3 is a flow diagram illustrating an example of a computer-implemented method 300 of performing CPU core load balancing in accordance with certain embodiments of the disclosed technology.
  • a particular CPU core is monitored by the NIC.
  • the NIC may monitor the queue length for a receive queue that is mapped to the CPU core.
  • the queue length for the receive queue may include a quantified measure of how many computing tasks for the receive queue are lined up at that particular moment, e.g., the number of outstanding receive packets, e.g., data packets, that have not yet been processed by the corresponding CPU core.
  • object-level affinity generally results in distributing requests to corresponding CPU cores based on the application's partitions, e.g., key space partitioning. Requests having the same key (or same region of keys) may be sent to the same CPU core for processing, thus significantly reducing cross-core communication overhead and improving performance, often significantly.
  • overloading of a CPU core must be detected. This may be accomplished by enabling the CPU cores to communicate with the NIC, e.g., using out-of-band messaging, about their utilization. Alternatively, the NIC may observe the receive queue length to a certain CPU core. If the NIC determines that a certain receive queue length exceeds a particular threshold, it may determine that overloading is occurring and subsequently steer data traffic to the CPU core elsewhere.
  • receive queue length generally refers to a quantified measure of how many computing tasks for a certain receive queue are awaiting processing by the NIC at a particular moment, e.g., the number of outstanding receive packets, such as data packets, that have not yet been processed by a corresponding CPU core.
  • FIG. 4 is a flow diagram illustrating another example of a computer-implemented method 400 of performing load balancing in accordance with certain embodiments of the disclosed technology.
  • the queue length for a particular receive queue e.g., a receive queue of a NIC, that is mapped to a certain CPU core is monitored by the NIC.
  • a determination may be made as to how close the receive queue length is to the first threshold, how quickly the receive queue length is approaching—or moving away from—the first threshold, the queue length of other receive queues that are mapped to the CPU core, or any combination thereof.
  • FIG. 5 illustrates an example 500 of multiple receive queue thresholds in accordance with certain embodiments of the disclosed technology.
  • the example 500 includes a first threshold 505 and a second threshold 510 , such as the first and second thresholds discussed above in connection with FIG. 4 , for example.
  • the NIC may identify SYN packets and re-steer new data flows to other CPU cores.
  • the flow director may have filters placed earlier in the receive path of the NIC and the target flow may be added into the flow director to steer subsequent data packets of that flow to a relatively idle CPU core.
  • the NIC may first perform a match against the flow director filters and, if there is a match, the data packet may be steered to its newly selected CPU core; otherwise, the data packet may continue to the RSS indirection table, e.g., by default.
  • the data can be partitioned, e.g., sharded, such that each CPU core can exclusively access its own partition in parallel processing without inter-core communication.
  • Object-level core affinity generally involves distribution of requests to CPU cores based on the application's partitioning. For example, requests sharing the same key would all go to the CPU core handling that key's partition.
  • Embodiments can include detecting the overloaded CPU core by monitoring the receive queue length, and re-configuring the RSS indirection table such that the congested CPU core is mapped to fewer queues.
  • An embodiment of the technologies disclosed herein may include any one or more, and any combination of, the examples described below.
  • Example 1 includes a network interface card (NIC) configured to monitor a first central processing unit (CPU) core mapped to a first receive queue having a receive queue length; determine whether the CPU core is overloaded based at least in part on the receive queue length; and, responsive to a determination that the CPU core is overloaded, redirect data packets that were targeted from the first receive queue to the first CPU core to a second CPU core.
  • NIC network interface card
  • Example 2 includes the subject matter of Example 1, the NIC further configured to determine that the second CPU core has a lighter load than the first CPU core.
  • Example 3 includes the subject matter of any of Examples 1-2, and wherein determining whether the CPU core is overloaded includes determining whether the receive queue length exceeds a first threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the first threshold.
  • Example 4 includes the subject matter of any of Examples 1-3, and wherein determining whether the CPU core is overloaded further includes, responsive to a determination that the receive queue length does not exceed the first threshold, determining whether the receive queue length exceeds a second threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the second threshold.
  • Example 5 includes the subject matter of Example 4, and wherein the redirecting is performed probabilistically.
  • Example 6 includes the subject matter of any of Examples 1-5, and wherein the redirecting includes the NIC identifying SYN packets and re-steering new data flows to at least the second CPU core.
  • Example 7 includes the subject matter of any of Examples 1-6, the NIC further configured to repeat the monitoring and determining continuously.
  • Example 8 includes the subject matter of Example 7, the NIC further configured to repeat the receiving and determining at a specified time interval.
  • Example 9 includes the subject matter of any of Examples 1-8, and wherein the NIC is an Ethernet card.
  • Example 10 includes a system comprising: a network interface card (NIC) of a first computing device, the MC having a first receive queue; a first central processing unit (CPU) core of the first computing device, the first CPU core being mapped to the first receive queue; and hardware configured to determine, based at least in part on a receive queue length of the receive queue, whether the first CPU core is overloaded.
  • NIC network interface card
  • CPU central processing unit
  • Example 11 includes the subject matter of Example 10, the system further comprising a second CPU core having a lighter load than the first CPU core.
  • Example 12 includes the subject matter of any of Examples 10-11, and wherein the hardware is further configured to cause data packets that were targeted for the first CPU core to be redirected to the second CPU core.
  • Example 13 includes the subject matter of any of Examples 10-12, and wherein the hardware is configured to determine whether the first CPU core is overloaded by determining that a receive queue length of the first receive queue exceeds a first threshold.
  • Example 14 includes the subject matter of any of Examples 10-13, and wherein the hardware is configured to determine whether the first CPU core is overloaded by determining that the receive queue length of the first receive queue does not exceed the first threshold but does exceed a second threshold.
  • Example 15 includes the subject matter of any of Examples 10-14, and wherein the NIC is an Ethernet card.
  • Example 16 includes one or more non-transitory, computer-readable media comprising instructions that, when executed by a processor, cause the processor to perform operations pertaining to load balancing in a network interface card (NIC), the operations comprising: monitoring a first central processing unit (CPU) core of the NIC, wherein the first CPU core is mapped to a first receive queue having a receive queue length; determining whether the CPU core is overloaded based at least in part on the receive queue length; and responsive to a determination that the CPU core is overloaded, redirecting data packets that were targeted from the first receive queue to the first CPU core to a second CPU core.
  • NIC network interface card
  • Example 17 includes the subject matter of Example 16, and wherein the operations further comprise determining that the second CPU core has a lighter load than the first CPU core.
  • Example 18 includes the subject matter of any of Examples 16-17, and wherein determining whether the CPU core is overloaded includes determining whether the receive queue length exceeds a first threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the first threshold.
  • Example 19 includes the subject matter of any of Examples 16-18, and wherein determining whether the CPU core is overloaded further includes, responsive to a determination that the receive queue length does not exceed the first threshold, determining whether the receive queue length exceeds a second threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the second threshold.
  • Example 20 includes the subject matter of any of Examples 16-19, and wherein the redirecting is performed probabilistically.
  • Example 21 includes the subject matter of any of Examples 16-20, and wherein the operations further include repeating the monitoring and determining continuously.
  • Embodiments of the disclosed technology may be incorporated in various types of architectures.
  • certain embodiments may be implemented as any of or a combination of the following: one or more microchips or integrated circuits interconnected using a motherboard, a graphics and/or video processor, a multicore processor, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA).
  • logic as used herein may include, by way of example, software, hardware, or any combination thereof.

Abstract

A network interface card (NIC) can be configured to monitor a first central processing unit (CPU) core mapped to a first receive queue having a receive queue length. The NIC can also be configured to determine whether the CPU core is overloaded based on the receive queue length. The NIC can also be configured to redirect data packets that were targeted from the first receive queue to the CPU core to another CPU core responsive to a determination that the CPU core is overloaded.

Description

    TECHNICAL FIELD
  • The disclosed technology relates generally to network interface cards (NICs), also referred to herein as host fabric interfaces (HFIs), central processing unit (CPU) cores, and CPU core load distribution management.
  • BACKGROUND
  • FIG. 1 is a functional block diagram illustrating an example of a system 100 that includes a computing device 110, such as a network appliance. In the example, the computing device 110 includes a central processing unit (CPU) 112 for executing instructions as well as a memory 114 for storing such instructions. The CPU 112 has n CPU cores. As used herein, the term core generally refers to a basic computation unit of the CPU. The memory 114 may include random access memory (RAM), flash memory, hard disks, solid state disks, optical disks, or any suitable combination thereof.
  • The computing device 110 also includes a network interface card (NIC) 116 for enabling the computing device 110 to communicate with at least one other computing device 120, such as an external or otherwise remote device, by way of a communication medium such as a wired or wireless packet network, for example. The computing device 110 may thus transmit data to and/or receive data from the other computing device(s) by way of its NIC 116. For example, the NIC 116 has n receive queues for receiving data, e.g., ingress packets, from the other computing device(s).
  • Generally, NICs can steer data flows, e.g., data packets, to any of a number of receive queues by way of Receive Side Scaling (RSS) or implementation of a flow director. Servers generally take advantage of such capabilities to distribute connections, e.g., transmission control protocol (TCP) connections, to different CPU cores for processing.
  • The use of RSS typically includes application of a filter that applies a hash function over the packet headers of received data packets. An indirection table can then be used to map each data packet to a certain receive queue, e.g., based on the corresponding hash value. The CPU cores can then be assigned to work on one or more specific queues in order to enable distributed processing.
  • Because RSS usually involves the mapping of many data flows into a limited number of receive queues targeting a limited number of CPU cores, there is typically a high likelihood of data traffic imbalance, in which one or more CPU cores are disadvantageously required to handle a higher amount of data traffic. While such CPU cores struggle to keep up with the incoming data packets, other CPU cores remain relatively idle. Such situations are inefficient and not optimal for system-wide performance.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The concepts described herein are illustrated by way of example and not by way of limitation in the accompanying figures. For simplicity and clarity of illustration, elements illustrated in the figures are not drawn to scale unless otherwise noted.
  • FIG. 1 is a functional block diagram illustrating an example of a system having a computing device that includes a central processing unit (CPU), a memory, and a network interface card (NIC).
  • FIG. 2 is a functional block diagram illustrating a first example of a system having a computing device that includes a network interface card (NIC) and at least one central processing unit (CPU) core in accordance with certain embodiments of the disclosed technology.
  • FIG. 3 is a flow diagram illustrating an example of a computer-implemented method of performing CPU core load balancing in accordance with certain embodiments of the disclosed technology.
  • FIG. 4 is a flow diagram illustrating another example of a computer-implemented method of performing CPU core load balancing in accordance with certain embodiments of the disclosed technology.
  • FIG. 5 illustrates an example of multiple receive queue thresholds in accordance with certain embodiments of the disclosed technology.
  • DETAILED DESCRIPTION OF THE DRAWINGS
  • While the concepts of the present disclosure are susceptible to various modifications and alternative forms, specific embodiments thereof have been shown by way of example in the drawings and will be described herein in detail. It should be understood, however, that there is no intent to limit the concepts of the present disclosure to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives consistent with the present disclosure and the appended claims.
  • References in the specification to “one embodiment,” “an embodiment,” “an illustrative embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include that particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, such feature, structure, or characteristic can be employed in connection with another disclosed embodiment whether or not such feature is explicitly described in conjunction with such other disclosed embodiment.
  • The disclosed embodiments may be implemented, in some cases, in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions (e.g. a computer program product) carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage mediums, which may be read and executed by one or more processors. A machine-readable storage medium may be embodied as any storage device, mechanism, or other physical structure for storing or transmitting information in a form readable by a machine (e.g., a volatile or non-volatile memory, a media disc, or other media device).
  • In the drawings, some structural or method features may be shown in specific arrangements and/or orderings. However, it should be appreciated that such specific arrangements and/or orderings may not be required. Rather, in some embodiments, such features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of a structural or method feature in a particular figure is not meant to imply that such feature is required in all embodiments and, in some embodiments, may not be included or may be combined with other features.
  • Embodiments of the disclosed technology generally pertain to network interface card (NIC)-based adaptive techniques for performing dynamic load distribution among multiple CPU cores. In such embodiments, the NIC can effectively and dynamically load-balance incoming data traffic and consequently optimize the full-system performance. Indeed, significant improvement may be realized in network processing performance with many workloads without requiring software support.
  • Embodiments can address both connection-oriented and connectionless data traffic. Such dynamic load balancing in a NIC generally includes detecting whether one or more of the CPU cores are overloaded. Such detection can be done, for example, by measuring CPU core responsiveness speed in real-time using one or more metrics such as receive queue length. If a determination is made that a certain CPU core is overloaded, a portion of data packets that were originally targeted or otherwise mapped to the CPU core can be directed elsewhere. For example, the data packets can be redirected to a relatively idle CPU core.
  • FIG. 2 is a functional block diagram illustrating a first example of a system 200 having a computing device 210 that includes a network interface card (NIC) 216, such as an Ethernet card, and at least one central processing unit (CPU) core 230 in accordance with certain embodiments of the disclosed technology. It should be noted that, as used herein, the terms NIC and host fabric interface (HFI) are interchangeable.
  • In the example, the NIC 216 has n receive queues, such as registers or other suitable data storage components, for receiving data packets from other computing devices. A first receive queue 217 of the NIC 216 may receive one or more incoming data packets 205, e.g., from a separate computing device over a wired or wireless connection.
  • Each of then receive queues of the NIC 216 may be mapped to one or more CPU cores. This mapping may be re-configurable, e.g., depending on the hardware specifications and/or other details of the particular implementation. In the example, data packets sent to the first receive queue 217 and the nth receive queue 218 are mapped to a first CPU core 230.
  • Responsive to a determination that the first CPU core 230 is overloaded, e.g., the lengths of either or both of the first and nth receive queues 217 and 218 exceed a certain threshold, the data packets from either or both of the first and nth receive queues 217 and 218 may be redirected, e.g., re-mapped to, another CPU core such as the nth CPU core 231. The CPU core to which the data packets are redirected may be selected based on a determination that the CPU core is less busy than the first CPU core 230.
  • Certain embodiments may include an NIC-based load balancer configured to handle different run-time situations during redirection. For example, with regard to situations involving TCP connections, SYN packets (which typically mark the beginning of a new data flow) may be identified and steered to a lightly loaded CPU core when the system determines that the CPU core handling the data traffic is over-loaded. A flow director may implement an exact match rule identifying this data flow along with an action to redirect the packets to the CPU core having a lighter load. This advantageously maintains the data flow affinity for subsequent data packets belonging to this flow. Also, existing connections may continue to be served by their original CPU core choices.
  • With regard to situations involving connection-less workloads, such as those involving key-value store (KVS) (e.g., memcached), for example, a portion of the key may be used to direct data packets to a lookup CPU core to minimize cross-core snoop and also maximize the system performance generally, because the same key typically directs to the same CPU core for the lookup. Embodiments may cause data packet redirection to happen immediately once CPU core congestion is detected by modifying an RSS indirection table to point some hash values to receive queues that are serviced CPU cores having a lighter load.
  • Current load balancing techniques are generally performed at a dedicated appliance or a server in front of multiple servers, which may work at the node level but not at the CPU core level. While other techniques can be implemented in software, such as Receive Packet Steering (RPS, a software implementation of RSS), and can be used in conjunction with utilities that help monitor CPU load, e.g., mpstat, such techniques disadvantageously result in extra latency and also occupy valuable CPU cycles for the load balancing tasks. In contrast, embodiments of the disclosed technology advantageously enable an NIC to transparently balance the load in real time, without software interference in the critical path.
  • Embodiments generally include use of an NIC's capability to steer data packets/flows to different receive queues to be processed by different CPU cores, advantageously resulting in improvements in latency, such as avoiding core-core transferring for TCP connections, and also increased data throughput. In certain situations, dynamic load balancing may need to be enforced to change the mapping of data packets to CPU cores.
  • In certain embodiments, RSS may be used to perform a hash function on the data packet header and map data flows to different receive queues assigned to different CPU cores using a corresponding indirection table. Multiple data queues can mapped to the same CPU core, and this can be configured by a user.
  • Other embodiments may include a flow director having programmable filters that can be used to identify specific data flows or sets of data flows based on an exact match of a portion of data packets, for example, and then route the data packet(s) to specific receives queues, e.g., mapped to specific CPU cores.
  • FIG. 3 is a flow diagram illustrating an example of a computer-implemented method 300 of performing CPU core load balancing in accordance with certain embodiments of the disclosed technology. At block 302, a particular CPU core is monitored by the NIC. For example, the NIC may monitor the queue length for a receive queue that is mapped to the CPU core. The queue length for the receive queue may include a quantified measure of how many computing tasks for the receive queue are lined up at that particular moment, e.g., the number of outstanding receive packets, e.g., data packets, that have not yet been processed by the corresponding CPU core.
  • At block 304, a determination is made as to whether the CPU core is overloaded. Such determination may be made by the NIC at least in part based on the receive queue length, for example. Responsive to a determination that the CPU core is overloaded, e.g., that the receive queue length is too large, data packets that had been targeted to the CPU core may be redirected to a different CPU core, as indicated at block 306, and processing may return to block 302; otherwise, processing simply proceeds directly back to block 302.
  • It will be appreciated that subsequent performance of the process 300 beginning at block 302 may be done continuously, on a periodic basis, or responsive to a certain event such as a user request, for example. It will also be appreciated that the CPU cores may be part of the same CPU or separate CPUs.
  • There are a number of situations in which current attempts at RSS-based load balancing fail to meet performance requirements, such as situations in which TCP flows overload a certain CPU core. With RSS, data packet headers having a certain hash value can be mapped to a certain CPU core based on the corresponding indirection table. If a certain CPU core is handling a few large TCP flows, or temporarily gets too many flows mapped to it, that CPU core becomes overloaded. In such situations, new data flows may be re-assigned to CPU cores that have a lighter load, for example.
  • In situations involving KVS-type workloads, object-level affinity generally results in distributing requests to corresponding CPU cores based on the application's partitions, e.g., key space partitioning. Requests having the same key (or same region of keys) may be sent to the same CPU core for processing, thus significantly reducing cross-core communication overhead and improving performance, often significantly.
  • In order to perform dynamic load balancing, overloading of a CPU core must be detected. This may be accomplished by enabling the CPU cores to communicate with the NIC, e.g., using out-of-band messaging, about their utilization. Alternatively, the NIC may observe the receive queue length to a certain CPU core. If the NIC determines that a certain receive queue length exceeds a particular threshold, it may determine that overloading is occurring and subsequently steer data traffic to the CPU core elsewhere.
  • As used herein, the term receive queue length generally refers to a quantified measure of how many computing tasks for a certain receive queue are awaiting processing by the NIC at a particular moment, e.g., the number of outstanding receive packets, such as data packets, that have not yet been processed by a corresponding CPU core.
  • FIG. 4 is a flow diagram illustrating another example of a computer-implemented method 400 of performing load balancing in accordance with certain embodiments of the disclosed technology. At block 402, the queue length for a particular receive queue, e.g., a receive queue of a NIC, that is mapped to a certain CPU core is monitored by the NIC.
  • At block 404, a determination is made as to whether the receive queue length exceeds a first threshold. Responsive to a determination that the receive queue length does exceed the first threshold, data packets that had been targeted to the CPU core may be redirected to a different CPU core, as indicated at block 406, and processing may return to block 402; otherwise, processing proceeds to block 408.
  • At block 408, a determination is made as to whether the receive queue length exceeds a second threshold. Responsive to a determination that the receive queue length does exceed the second threshold, data packets that had been targeted to the CPU core may be redirected to a different CPU core probabilistically, as indicated at block 410, and processing may return to block 402; otherwise, processing simply returns to block 402.
  • In situations where the receive queue length exceeds the second threshold but not the first threshold, a determination may be made as to how close the receive queue length is to the first threshold, how quickly the receive queue length is approaching—or moving away from—the first threshold, the queue length of other receive queues that are mapped to the CPU core, or any combination thereof.
  • FIG. 5 illustrates an example 500 of multiple receive queue thresholds in accordance with certain embodiments of the disclosed technology. The example 500 includes a first threshold 505 and a second threshold 510, such as the first and second thresholds discussed above in connection with FIG. 4, for example.
  • In situations involving TCP connections, data packets belonging to the same connection are generally not sent to different CPU cores. Thus, when a CPU core is determined to be overloaded, the NIC may identify SYN packets and re-steer new data flows to other CPU cores. In such embodiments, the flow director may have filters placed earlier in the receive path of the NIC and the target flow may be added into the flow director to steer subsequent data packets of that flow to a relatively idle CPU core. Subsequently, when a new data packet comes in, the NIC may first perform a match against the flow director filters and, if there is a match, the data packet may be steered to its newly selected CPU core; otherwise, the data packet may continue to the RSS indirection table, e.g., by default.
  • For KVS workloads (e.g., memcached), the data can be partitioned, e.g., sharded, such that each CPU core can exclusively access its own partition in parallel processing without inter-core communication. Object-level core affinity generally involves distribution of requests to CPU cores based on the application's partitioning. For example, requests sharing the same key would all go to the CPU core handling that key's partition. Embodiments can include detecting the overloaded CPU core by monitoring the receive queue length, and re-configuring the RSS indirection table such that the congested CPU core is mapped to fewer queues.
  • EXAMPLES
  • Illustrative examples of the technologies disclosed herein are provided below. An embodiment of the technologies may include any one or more, and any combination of, the examples described below.
  • Example 1 includes a network interface card (NIC) configured to monitor a first central processing unit (CPU) core mapped to a first receive queue having a receive queue length; determine whether the CPU core is overloaded based at least in part on the receive queue length; and, responsive to a determination that the CPU core is overloaded, redirect data packets that were targeted from the first receive queue to the first CPU core to a second CPU core.
  • Example 2 includes the subject matter of Example 1, the NIC further configured to determine that the second CPU core has a lighter load than the first CPU core.
  • Example 3 includes the subject matter of any of Examples 1-2, and wherein determining whether the CPU core is overloaded includes determining whether the receive queue length exceeds a first threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the first threshold.
  • Example 4 includes the subject matter of any of Examples 1-3, and wherein determining whether the CPU core is overloaded further includes, responsive to a determination that the receive queue length does not exceed the first threshold, determining whether the receive queue length exceeds a second threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the second threshold.
  • Example 5 includes the subject matter of Example 4, and wherein the redirecting is performed probabilistically.
  • Example 6 includes the subject matter of any of Examples 1-5, and wherein the redirecting includes the NIC identifying SYN packets and re-steering new data flows to at least the second CPU core.
  • Example 7 includes the subject matter of any of Examples 1-6, the NIC further configured to repeat the monitoring and determining continuously.
  • Example 8 includes the subject matter of Example 7, the NIC further configured to repeat the receiving and determining at a specified time interval.
  • Example 9 includes the subject matter of any of Examples 1-8, and wherein the NIC is an Ethernet card.
  • Example 10 includes a system comprising: a network interface card (NIC) of a first computing device, the MC having a first receive queue; a first central processing unit (CPU) core of the first computing device, the first CPU core being mapped to the first receive queue; and hardware configured to determine, based at least in part on a receive queue length of the receive queue, whether the first CPU core is overloaded.
  • Example 11 includes the subject matter of Example 10, the system further comprising a second CPU core having a lighter load than the first CPU core.
  • Example 12 includes the subject matter of any of Examples 10-11, and wherein the hardware is further configured to cause data packets that were targeted for the first CPU core to be redirected to the second CPU core.
  • Example 13 includes the subject matter of any of Examples 10-12, and wherein the hardware is configured to determine whether the first CPU core is overloaded by determining that a receive queue length of the first receive queue exceeds a first threshold.
  • Example 14 includes the subject matter of any of Examples 10-13, and wherein the hardware is configured to determine whether the first CPU core is overloaded by determining that the receive queue length of the first receive queue does not exceed the first threshold but does exceed a second threshold.
  • Example 15 includes the subject matter of any of Examples 10-14, and wherein the NIC is an Ethernet card.
  • Example 16 includes one or more non-transitory, computer-readable media comprising instructions that, when executed by a processor, cause the processor to perform operations pertaining to load balancing in a network interface card (NIC), the operations comprising: monitoring a first central processing unit (CPU) core of the NIC, wherein the first CPU core is mapped to a first receive queue having a receive queue length; determining whether the CPU core is overloaded based at least in part on the receive queue length; and responsive to a determination that the CPU core is overloaded, redirecting data packets that were targeted from the first receive queue to the first CPU core to a second CPU core.
  • Example 17 includes the subject matter of Example 16, and wherein the operations further comprise determining that the second CPU core has a lighter load than the first CPU core.
  • Example 18 includes the subject matter of any of Examples 16-17, and wherein determining whether the CPU core is overloaded includes determining whether the receive queue length exceeds a first threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the first threshold.
  • Example 19 includes the subject matter of any of Examples 16-18, and wherein determining whether the CPU core is overloaded further includes, responsive to a determination that the receive queue length does not exceed the first threshold, determining whether the receive queue length exceeds a second threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the second threshold.
  • Example 20 includes the subject matter of any of Examples 16-19, and wherein the redirecting is performed probabilistically.
  • Example 21 includes the subject matter of any of Examples 16-20, and wherein the operations further include repeating the monitoring and determining continuously.
  • The previously described versions of the disclosed subject matter have many advantages that were either described or would be apparent to a person of ordinary skill. Even so, all of these advantages or features are not required in all versions of the disclosed apparatus, systems, or methods.
  • Additionally, this written description makes reference to particular features. It is to be understood that the disclosure in this specification includes all possible combinations of those particular features. For example, where a particular feature is disclosed in the context of a particular aspect or embodiment, that feature can also be used, to the extent possible, in the context of other aspects and embodiments.
  • Also, when reference is made in this application to a method having two or more defined steps or operations, the defined steps or operations can be carried out in any order or simultaneously, unless the context excludes those possibilities.
  • Embodiments of the disclosed technology may be incorporated in various types of architectures. For example, certain embodiments may be implemented as any of or a combination of the following: one or more microchips or integrated circuits interconnected using a motherboard, a graphics and/or video processor, a multicore processor, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The term “logic” as used herein may include, by way of example, software, hardware, or any combination thereof.
  • Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a wide variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the embodiments of the disclosed technology. This application is intended to cover any adaptations or variations of the embodiments illustrated and described herein. Therefore, it is manifestly intended that embodiments of the disclosed technology be limited only by the following claims and equivalents thereof.

Claims (21)

We claim:
1. A network interface card (NIC) configured to:
monitor a first central processing unit (CPU) core mapped to a first receive queue having a receive queue length;
determine whether the CPU core is overloaded based at least in part on the receive queue length; and
responsive to a determination that the CPU core is overloaded, redirect data packets that were targeted from the first receive queue to the first CPU core to a second CPU core.
2. The NIC of claim 1, further configured to determine that the second CPU core has a lighter load than the first CPU core.
3. The NIC of claim 1, wherein the NIC is configured to determine whether the CPU core is overloaded by determining whether the receive queue length exceeds a first threshold, and further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the first threshold.
4. The NIC of claim 3, wherein the NIC is further configured to determine whether the CPU core is overloaded by, responsive to a determination that the receive queue length does not exceed the first threshold, determining whether the receive queue length exceeds a second threshold, and further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the second threshold.
5. The NIC of claim 4, wherein the redirecting is performed probabilistically.
6. The NIC of claim 1, wherein the redirecting includes identifying SYN packets and re-steering new data flows to at least the second CPU core.
7. The NIC of claim 1, wherein the NIC is further configured to repeat the monitoring and determining continuously.
8. The NIC of claim 7, wherein the NIC is further configured to repeat the receiving and determining at a specified time interval.
9. The NIC of claim 1, wherein the NIC is an Ethernet card.
10. A system for performing data traffic load balancing in a first computing device, the system comprising:
a network interface card (NIC) of the first computing device, the NIC having a first receive queue;
a first central processing unit (CPU) core of the first computing device, the first CPU core being mapped to the first receive queue; and
hardware configured to determine, based at least in part on a receive queue length of the receive queue, whether the first CPU core is overloaded.
11. The system of claim 10, further comprising a second CPU core having a lighter load than the first CPU core.
12. The system of claim 11, wherein the hardware is further configured to cause data packets that were targeted for the first CPU core to be redirected to the second CPU core.
13. The system of claim 12, wherein the hardware is configured to determine whether the first CPU core is overloaded by determining that a receive queue length of the first receive queue exceeds a first threshold.
14. The system of claim 13, wherein the hardware is configured to determine whether the first CPU core is overloaded by determining that the receive queue length of the first receive queue does not exceed the first threshold but does exceed a second threshold.
15. The system of claim 10, wherein the NIC is an Ethernet card.
16. One or more non-transitory, computer-readable media comprising instructions that, when executed by a processor, cause the processor to perform operations pertaining to load balancing in a network interface card (NIC), the operations comprising:
monitoring a first central processing unit (CPU) core of the NIC, wherein the first CPU core is mapped to a first receive queue having a receive queue length;
determining whether the CPU core is overloaded based at least in part on the receive queue length; and
responsive to a determination that the CPU core is overloaded, redirecting data packets that were targeted from the first receive queue to the first CPU core to a second CPU core.
17. The one or more non-transitory, computer-readable media of claim 16, wherein the operations further comprise determining that the second CPU core has a lighter load than the first CPU core.
18. The one or more non-transitory, computer-readable media of claim 16, wherein determining whether the CPU core is overloaded includes determining whether the receive queue length exceeds a first threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the first threshold.
19. The one or more non-transitory, computer-readable media of claim 18, wherein determining whether the CPU core is overloaded further includes, responsive to a determination that the receive queue length does not exceed the first threshold, determining whether the receive queue length exceeds a second threshold, further wherein the determination that the CPU core is overloaded is based at least in part on a determination that the receive queue length does exceed the second threshold.
20. The one or more non-transitory, computer-readable media of claim 19, wherein the redirecting is performed probabilistically.
21. The one or more non-transitory, computer-readable media of claim 16, wherein the operations further include repeating the monitoring and determining continuously.
US15/476,379 2017-03-31 2017-03-31 Dynamic load balancing in network interface cards for optimal system level performance Abandoned US20180285151A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US15/476,379 US20180285151A1 (en) 2017-03-31 2017-03-31 Dynamic load balancing in network interface cards for optimal system level performance
CN201810213728.0A CN108694087A (en) 2017-03-31 2018-03-15 For the dynamic load leveling in the network interface card of optimal system grade performance
DE102018204859.2A DE102018204859A1 (en) 2017-03-31 2018-03-29 Dynamic load balancing on network interface cards for optimal system-level performance
US17/152,573 US20210141676A1 (en) 2017-03-31 2021-01-19 Dynamic load balancing in network interface cards for optimal system level performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/476,379 US20180285151A1 (en) 2017-03-31 2017-03-31 Dynamic load balancing in network interface cards for optimal system level performance

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/152,573 Continuation US20210141676A1 (en) 2017-03-31 2021-01-19 Dynamic load balancing in network interface cards for optimal system level performance

Publications (1)

Publication Number Publication Date
US20180285151A1 true US20180285151A1 (en) 2018-10-04

Family

ID=63525260

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/476,379 Abandoned US20180285151A1 (en) 2017-03-31 2017-03-31 Dynamic load balancing in network interface cards for optimal system level performance
US17/152,573 Pending US20210141676A1 (en) 2017-03-31 2021-01-19 Dynamic load balancing in network interface cards for optimal system level performance

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/152,573 Pending US20210141676A1 (en) 2017-03-31 2021-01-19 Dynamic load balancing in network interface cards for optimal system level performance

Country Status (3)

Country Link
US (2) US20180285151A1 (en)
CN (1) CN108694087A (en)
DE (1) DE102018204859A1 (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180316609A1 (en) * 2017-04-26 2018-11-01 Futurewei Technologies, Inc. Packet Batch Processing with Graph-Path Based Pre-Classification
US20180336067A1 (en) * 2017-05-17 2018-11-22 Samsung Electronics Co., Ltd. Method and apparatus for data processing based on multicore
US20180365176A1 (en) * 2017-06-15 2018-12-20 Mellanox Technologies, Ltd. Shared processing of a packet flow by multiple cores
US20190319933A1 (en) * 2018-04-12 2019-10-17 Alibaba Group Holding Limited Cooperative tls acceleration
US10476801B1 (en) * 2018-04-27 2019-11-12 Nicira, Inc. Dynamic distribution of RSS engines to virtual machines based on flow data
CN110618880A (en) * 2019-09-19 2019-12-27 中国银行股份有限公司 Cross-system data transmission system and method
US20200028790A1 (en) * 2018-07-23 2020-01-23 Dell Products L.P. Buffer shortage management system
US20200036636A1 (en) * 2018-07-25 2020-01-30 Vmware, Inc. Selection of paired transmit queue
US10686716B2 (en) 2018-07-23 2020-06-16 Vmware, Inc. Dynamic processing of packets using multiple receive queue features
CN111314249A (en) * 2020-05-08 2020-06-19 深圳震有科技股份有限公司 Method and server for avoiding data packet loss of 5G data forwarding plane
US20200210230A1 (en) * 2019-01-02 2020-07-02 Mellanox Technologies, Ltd. Multi-Processor Queuing Model
US10721186B1 (en) * 2019-03-30 2020-07-21 Fortinet, Inc. Packet processing with per-CPU (central processing unit) flow tables in a network device
CN114553678A (en) * 2022-02-09 2022-05-27 紫光云(南京)数字技术有限公司 Diagnosis method for soft SLB traffic problem of cloud network
US20220269426A1 (en) * 2021-02-19 2022-08-25 Vast Data Ltd. Resource allocation in a storage system
US11483246B2 (en) 2020-01-13 2022-10-25 Vmware, Inc. Tenant-specific quality of service
US11539633B2 (en) 2020-08-31 2022-12-27 Vmware, Inc. Determining whether to rate limit traffic
US11575607B2 (en) * 2019-09-11 2023-02-07 Intel Corporation Dynamic load balancing for multi-core computing environments
US11593134B2 (en) 2018-01-26 2023-02-28 Nicira, Inc. Throttling CPU utilization by implementing a rate limiter
US11599395B2 (en) * 2020-02-19 2023-03-07 Vmware, Inc. Dynamic core allocation
CN115858152A (en) * 2022-11-27 2023-03-28 北京泰策科技有限公司 DNS load balancing performance optimization scheme based on single port
US11646980B2 (en) 2018-03-30 2023-05-09 Intel Corporation Technologies for packet forwarding on ingress queue overflow
US11799784B2 (en) 2021-06-08 2023-10-24 Vmware, Inc. Virtualized QoS support in software defined networks
WO2024069219A1 (en) * 2022-09-30 2024-04-04 Telefonaktiebolaget Lm Ericsson (Publ) Receive side application auto-scaling
US11973693B1 (en) * 2023-03-13 2024-04-30 International Business Machines Corporation Symmetric receive-side scaling (RSS) for asymmetric flows

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109450816B (en) * 2018-11-19 2022-08-12 迈普通信技术股份有限公司 Queue scheduling method, device, network equipment and storage medium
CN115174482B (en) * 2019-05-21 2023-06-02 超聚变数字技术有限公司 Message distribution method and device of network equipment
CN110688229B (en) * 2019-10-12 2022-08-02 阿波罗智能技术(北京)有限公司 Task processing method and device
US20210224138A1 (en) * 2020-01-21 2021-07-22 Vmware, Inc. Packet processing with load imbalance handling
US11343152B2 (en) * 2020-04-07 2022-05-24 Cisco Technology, Inc. Traffic management for smart network interface cards
US11876885B2 (en) * 2020-07-02 2024-01-16 Mellanox Technologies, Ltd. Clock queue with arming and/or self-arming features
CN112073327B (en) * 2020-08-19 2023-02-24 广东省新一代通信与网络创新研究院 Anti-congestion software distribution method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050033809A1 (en) * 2003-08-08 2005-02-10 Teamon Systems, Inc. Communications system providing server load balancing based upon weighted health metrics and related methods
US8762534B1 (en) * 2011-05-11 2014-06-24 Juniper Networks, Inc. Server load balancing using a fair weighted hashing technique
US20160156462A1 (en) * 2013-08-30 2016-06-02 L-3 Communications Corporation Cryptographic Device with Detachable Data Planes
US20170230451A1 (en) * 2016-02-04 2017-08-10 Citrix Systems, Inc. System and method for cloud aware application delivery controller
US20180270347A1 (en) * 2017-03-15 2018-09-20 Citrix Systems, Inc. Systems and methods for quality of experience for interactive application in hybrid wan
US20180367460A1 (en) * 2016-02-05 2018-12-20 Huawei Technologies Co., Ltd. Data flow processing method and apparatus, and system

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6760775B1 (en) * 1999-03-05 2004-07-06 At&T Corp. System, method and apparatus for network service load and reliability management
US7499399B2 (en) * 2003-12-12 2009-03-03 Intel Corporation Method and system to determine whether a circular queue is empty or full
US7715428B2 (en) * 2007-01-31 2010-05-11 International Business Machines Corporation Multicore communication processing
US8346999B2 (en) * 2009-12-15 2013-01-01 Intel Corporation Dynamic receive queue balancing with high and low thresholds
US8565092B2 (en) * 2010-11-18 2013-10-22 Cisco Technology, Inc. Dynamic flow redistribution for head of line blocking avoidance
US9602437B1 (en) * 2012-10-03 2017-03-21 Tracey M. Bernath System and method for accelerating network applications using an enhanced network interface and massively parallel distributed processing
US9621633B2 (en) * 2013-03-15 2017-04-11 Intel Corporation Flow director-based low latency networking
US9497281B2 (en) * 2013-04-06 2016-11-15 Citrix Systems, Inc. Systems and methods to cache packet steering decisions for a cluster of load balancers
US9450881B2 (en) * 2013-07-09 2016-09-20 Intel Corporation Method and system for traffic metering to limit a received packet rate
KR101583325B1 (en) * 2014-08-12 2016-01-07 주식회사 구버넷 Network interface apparatus and method for processing virtual packets
US10079740B2 (en) * 2014-11-04 2018-09-18 Fermi Research Alliance, Llc Packet capture engine for commodity network interface cards in high-speed networks
US10148575B2 (en) * 2014-12-22 2018-12-04 Telefonaktiebolaget Lm Ericsson (Publ) Adaptive load balancing in packet processing
US9678806B2 (en) * 2015-06-26 2017-06-13 Advanced Micro Devices, Inc. Method and apparatus for distributing processing core workloads among processing cores
WO2017014758A1 (en) * 2015-07-21 2017-01-26 Hewlett Packard Enterprise Development Lp Providing power to a server
CN106527653A (en) * 2016-10-12 2017-03-22 东软集团股份有限公司 CPU frequency adjusting method and apparatus
US10341241B2 (en) * 2016-11-10 2019-07-02 Hughes Network Systems, Llc History-based classification of traffic into QoS class with self-update
US10050884B1 (en) * 2017-03-21 2018-08-14 Citrix Systems, Inc. Method to remap high priority connection with large congestion window to high latency link to achieve better performance

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050033809A1 (en) * 2003-08-08 2005-02-10 Teamon Systems, Inc. Communications system providing server load balancing based upon weighted health metrics and related methods
US8762534B1 (en) * 2011-05-11 2014-06-24 Juniper Networks, Inc. Server load balancing using a fair weighted hashing technique
US20160156462A1 (en) * 2013-08-30 2016-06-02 L-3 Communications Corporation Cryptographic Device with Detachable Data Planes
US20170230451A1 (en) * 2016-02-04 2017-08-10 Citrix Systems, Inc. System and method for cloud aware application delivery controller
US20180367460A1 (en) * 2016-02-05 2018-12-20 Huawei Technologies Co., Ltd. Data flow processing method and apparatus, and system
US20180270347A1 (en) * 2017-03-15 2018-09-20 Citrix Systems, Inc. Systems and methods for quality of experience for interactive application in hybrid wan

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180316609A1 (en) * 2017-04-26 2018-11-01 Futurewei Technologies, Inc. Packet Batch Processing with Graph-Path Based Pre-Classification
US10439930B2 (en) * 2017-04-26 2019-10-08 Futurewei Technologies, Inc. Packet batch processing with graph-path based pre-classification
US20180336067A1 (en) * 2017-05-17 2018-11-22 Samsung Electronics Co., Ltd. Method and apparatus for data processing based on multicore
US10802885B2 (en) * 2017-05-17 2020-10-13 Samsung Electronics Co., Ltd. Method and apparatus for data processing based on multicore
US10572400B2 (en) * 2017-06-15 2020-02-25 Mellanox Technologies, Ltd. Shared processing of a packet flow by multiple cores
US20180365176A1 (en) * 2017-06-15 2018-12-20 Mellanox Technologies, Ltd. Shared processing of a packet flow by multiple cores
US11593134B2 (en) 2018-01-26 2023-02-28 Nicira, Inc. Throttling CPU utilization by implementing a rate limiter
US11646980B2 (en) 2018-03-30 2023-05-09 Intel Corporation Technologies for packet forwarding on ingress queue overflow
US20190319933A1 (en) * 2018-04-12 2019-10-17 Alibaba Group Holding Limited Cooperative tls acceleration
US10476801B1 (en) * 2018-04-27 2019-11-12 Nicira, Inc. Dynamic distribution of RSS engines to virtual machines based on flow data
US20200028790A1 (en) * 2018-07-23 2020-01-23 Dell Products L.P. Buffer shortage management system
US10686716B2 (en) 2018-07-23 2020-06-16 Vmware, Inc. Dynamic processing of packets using multiple receive queue features
US11356381B2 (en) 2018-07-23 2022-06-07 Vmware, Inc. Dynamic processing of packets using multiple receive queue features
US10834005B2 (en) * 2018-07-23 2020-11-10 Dell Products L.P. Buffer shortage management system
US20200036636A1 (en) * 2018-07-25 2020-01-30 Vmware, Inc. Selection of paired transmit queue
US11025546B2 (en) * 2018-07-25 2021-06-01 Vmware, Inc. Selection of paired transmit queue
US11848869B2 (en) 2018-07-25 2023-12-19 Vmware, Inc. Selection of paired transmit queue
US11182205B2 (en) * 2019-01-02 2021-11-23 Mellanox Technologies, Ltd. Multi-processor queuing model
US20200210230A1 (en) * 2019-01-02 2020-07-02 Mellanox Technologies, Ltd. Multi-Processor Queuing Model
US10721186B1 (en) * 2019-03-30 2020-07-21 Fortinet, Inc. Packet processing with per-CPU (central processing unit) flow tables in a network device
US11575607B2 (en) * 2019-09-11 2023-02-07 Intel Corporation Dynamic load balancing for multi-core computing environments
CN110618880A (en) * 2019-09-19 2019-12-27 中国银行股份有限公司 Cross-system data transmission system and method
US11483246B2 (en) 2020-01-13 2022-10-25 Vmware, Inc. Tenant-specific quality of service
US11599395B2 (en) * 2020-02-19 2023-03-07 Vmware, Inc. Dynamic core allocation
CN111314249A (en) * 2020-05-08 2020-06-19 深圳震有科技股份有限公司 Method and server for avoiding data packet loss of 5G data forwarding plane
US11539633B2 (en) 2020-08-31 2022-12-27 Vmware, Inc. Determining whether to rate limit traffic
US20220269426A1 (en) * 2021-02-19 2022-08-25 Vast Data Ltd. Resource allocation in a storage system
US11644988B2 (en) * 2021-02-19 2023-05-09 Vast Data Ltd. Resource allocation in a storage system
US11799784B2 (en) 2021-06-08 2023-10-24 Vmware, Inc. Virtualized QoS support in software defined networks
CN114553678A (en) * 2022-02-09 2022-05-27 紫光云(南京)数字技术有限公司 Diagnosis method for soft SLB traffic problem of cloud network
WO2024069219A1 (en) * 2022-09-30 2024-04-04 Telefonaktiebolaget Lm Ericsson (Publ) Receive side application auto-scaling
CN115858152A (en) * 2022-11-27 2023-03-28 北京泰策科技有限公司 DNS load balancing performance optimization scheme based on single port
US11973693B1 (en) * 2023-03-13 2024-04-30 International Business Machines Corporation Symmetric receive-side scaling (RSS) for asymmetric flows

Also Published As

Publication number Publication date
US20210141676A1 (en) 2021-05-13
CN108694087A (en) 2018-10-23
DE102018204859A1 (en) 2018-10-04

Similar Documents

Publication Publication Date Title
US20210141676A1 (en) Dynamic load balancing in network interface cards for optimal system level performance
US8446824B2 (en) NUMA-aware scaling for network devices
US10044797B2 (en) Load balancing of distributed services
US9225651B2 (en) Method and apparatus for load balancing
US20160241482A1 (en) Packet communication apparatus and packet communication method
JP2018532172A (en) Method and system for resource scheduling
CN108933829A (en) A kind of load-balancing method and device
CN109729022B (en) Data sending method, device and system based on software defined network
US20150358402A1 (en) Efficient and scalable pull-based load distribution
EP2859448A2 (en) System and method for providing low latency to applications using heterogeneous processors
CN114553780A (en) Load balancing method and device and network card
US20140223026A1 (en) Flow control mechanism for a storage server
US11563830B2 (en) Method and system for processing network packets
US10250517B2 (en) Completion-side client throttling
Lei et al. Parallelizing packet processing in container overlay networks
CN110235105B (en) System and method for client-side throttling after server processing in a trusted client component
CN110289990B (en) Network function virtualization system, method and storage medium based on GPU
US11003506B2 (en) Technique for determining a load of an application
US10284502B2 (en) Dynamic optimization for IP forwarding performance
KR20160080266A (en) Packet processing apparatus and method for cpu load balancing
JP5796693B2 (en) Stabilization system, stabilization method, computer apparatus and program
US20230059820A1 (en) Methods and apparatuses for resource management of a network connection to process tasks across the network
US8219725B2 (en) Cache optimized balanced handling of initiatives in a non-uniform multiprocessor computing system
Zou et al. P4RSS: Load-Aware Intra-Server Load Balancing with Programmable Switching ASICs
CN117221185A (en) Network traffic evaluation method, network measurement device and system

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, REN;DALY, DANIEL P.;KAUFMANN, ANTOINE;AND OTHERS;SIGNING DATES FROM 20170327 TO 20170403;REEL/FRAME:042034/0715

STCT Information on status: administrative procedure adjustment

Free format text: PROSECUTION SUSPENDED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION