US20210382755A1 - Load balancing deterministically-subsetted processing resources using fractional loads - Google Patents

Load balancing deterministically-subsetted processing resources using fractional loads Download PDF

Info

Publication number
US20210382755A1
US20210382755A1 US17/407,343 US202117407343A US2021382755A1 US 20210382755 A1 US20210382755 A1 US 20210382755A1 US 202117407343 A US202117407343 A US 202117407343A US 2021382755 A1 US2021382755 A1 US 2021382755A1
Authority
US
United States
Prior art keywords
servers
client
clients
subset
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US17/407,343
Inventor
Bryce Anderson
Daniel Furse
Eugene Ma
Ruben Oanta
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Twitter Inc
Original Assignee
Twitter Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Twitter Inc filed Critical Twitter Inc
Priority to US17/407,343 priority Critical patent/US20210382755A1/en
Publication of US20210382755A1 publication Critical patent/US20210382755A1/en
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TWITTER, INC.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TWITTER, INC.
Assigned to MORGAN STANLEY SENIOR FUNDING, INC. reassignment MORGAN STANLEY SENIOR FUNDING, INC. SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TWITTER, INC.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • H04L29/08144
    • H04L29/08153
    • H04L29/08243
    • H04L29/08252
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1002
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1025Dynamic adaptation of the criteria on which the server selection is based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1027Persistence of sessions during load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1029Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers using data related to the state of servers by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1031Controlling of the operation of servers by a load balancer, e.g. adding or removing servers that serve requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Definitions

  • This disclosure relates generally to balancing load across a collection of processing resources, and more particularly to techniques for distributing substantially equal amounts of load across a collection of deterministically-subsetted processing resources.
  • a load balancer distributes load across a collection of processing resources, such as, for example, computers configured to perform computing tasks such as data processing tasks, communication/networking tasks and/or data storage tasks.
  • Example loads processed by the processing resources may include service requests (also referred to as “processing requests”) for causing one or more computing tasks to be performed by a processing resource.
  • service requests can include, by way of example and without limitation, requests to write data (e.g., a social media post, write to storage), requests to read data (e.g., accessing a social media post, requesting a timeline from a social media service, read from storage), search requests, compute requests, data download/upload requests, data display requests and the like.
  • the “load” may include a volume of data from/to storage and/or volume of network traffic.
  • Load balancing is an important consideration in any processing system, and helps ensure the performance, scalability, and resilience of high transaction volume processing systems that have multiple processing resources.
  • a load balancer may operate to control the distribution of the service requests across the multiple servers in order to reduce latency and/or increase the proportion of successfully serviced requests.
  • servers The various types of processing resources to which the load is distributed are sometimes collectively referred to as “servers” in this disclosure.
  • Various techniques and algorithms have been proposed for load balancing among a set of servers. These techniques include, for example, round robin load balancing, and least loaded load balancing.
  • Deterministic subsetting enables each client to, be configured to maintain connections to only a subset (also referred to as an “aperture”) of the servers to which it sends load such as service requests.
  • deterministic subsetting (“deterministic aperture”) load balancing, a client is not required to establish connections with every server in a large set of servers that services a particular type of service request, and instead is only required to send its load over a smaller number of servers corresponding to the subset of servers with which the client establishes connections.
  • Example embodiments disclosed herein are related to improved deterministic aperture load balancing techniques for balancing load from a set of clients among a set of servers. These improved load balancing techniques enable each client to distribute load to only a subset (or an “aperture”) of the servers, and/or allow at least one client to distribute a fractional load to some of the servers to which it is connected such that each server of the set of servers receives substantially the same amount of load.
  • the improved techniques reduce the overhead involved in balancing load among a set of servers. Moreover, allowing a client to distribute a fractional load to at least one server enables more even distribution of load among the servers.
  • a load balancing method for balancing a processing load of a plurality of clients among a plurality of servers.
  • the method comprises assigning a different subset of the plurality of servers to each respective client in the plurality of clients.
  • Each client is configured to distribute processing requests only to servers in the subset assigned thereto.
  • the load balancing method further includes, for each respective client in the plurality of clients, determining respective load weights for distributing processing requests to the servers in the subset assigned thereto.
  • the load weights for each respective client are determined such that each server of the plurality of servers processes substantially the same unit amount of processing requests and such that at least one server of the plurality of servers is assigned to multiple clients in the plurality of clients. At least one of the load weights for each respective client is a fraction of another one of the load weights for the client.
  • the load-balancing method may also include controlling the plurality of clients to distribute processing requests based on the determined load weights.
  • Another example embodiment provides a load balancing computer processing system including a plurality of clients.
  • Each respective client includes communication circuitry and a processor.
  • the processor is configured to control the communication circuitry of the respective client to distribute processing requests to a respective subset of a plurality of servers over a communication network by transmitting a first weighted-quantity of the processing requests to each of one or more of the servers in the respective subset and transmitting to at least one other server in the respective subset a second weighted-quantity of the processing requests.
  • the second weighted-quantity is a fraction of the first weighted-quantity such that the respective client is configured to distribute a same volume of processing requests as other clients in the plurality of clients and such that at least one server of the plurality of servers is assigned to multiple clients.
  • Another embodiment provides a non-transitory computer readable storage medium storing computer program instructions that, when executed by a processor of a client, causes the client to balance load distributed among a plurality of servers.
  • the computer program instructions include instructions for determining a total number of the plurality servers as a server set size, a total number of a plurality clients as a client set size, a unique identifier assigned to the client, and a subset size for the client.
  • the subset size is the total number of servers to be connected with the client.
  • the instructions further include, based upon the determined server set size, client set size, subset size, and identifier assigned to the client, determining a subset of servers from the plurality of servers and relative load weights for servers in the selected subset, so that a same volume of processing requests is distributed to the plurality of servers by the client as other clients in the plurality of clients.
  • each of the relative load weights indicates relative amounts of processing requests transmitted from the client to respective server in the subset.
  • the selected subset which has a size of at least the determined subset size and of a same size as respective subsets of selected by each other client in the plurality of clients.
  • At least one of the relative load weights for one server in the selected subset is a fraction of another of the relative load weights for another server in the selected subset.
  • the present disclosure uses the phrases substantially the same amount of load, or substantially equal amounts of load, to indicate that the amounts of load distributed to the servers may be the same, or very nearly the same (e.g., varying only by a relatively small margin, such as, any of, 5%, 2%, 1% etc.), across the servers in a server subset.
  • the clients are programmatically configured to distribute the same amount of load to respective servers in a subset of servers, network conditions and/or processing request availability may result in some of the servers receiving a marginally lower amount of work than the other servers in the subset.
  • FIG. 1 illustrates a non-limiting, example system architecture of an example system supporting balancing load from a set of clients across a set of servers;
  • FIG. 2 illustrates a non-limiting, example workflow for an example load balancer
  • FIG. 3A illustrates a non-limiting, example workflow for determining subsets of servers and relative load weights for servers in each subset
  • FIG. 3B illustrates a non-limiting, example workflow for determining subsets of servers and relative load weights for servers in each subset
  • FIG. 4 illustrates a non-limiting, example configuration of another example system supporting balancing load from a set of clients across a set of servers;
  • FIG. 5 illustrates a non-limiting, example logical ring topology for determining server subsets and relative load weights of servers in each subset
  • FIG. 6 illustrates another non-limiting, example logical ring topology for determining server subsets and relative load weights of servers in each subset
  • FIG. 7 illustrates a non-limiting, example block diagram for an example device on which load balancing according to embodiments can be implemented.
  • certain systems, devices, processes, and methods are disclosed for balancing load across a collection of processing resources. More particularly, certain example embodiments relate to techniques for distributing substantially equal amounts of load across a plurality of deterministically subsetted servers. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of example embodiments.
  • each client in a set of clients may distribute load to only a subset (or an “aperture”) of the set of servers.
  • Subsetting therefore enables a client to use processing resources sufficient to service its load, and to less frequently incur penalties for connection establishment. Avoiding a large number of connections can result in reduced overhead, and may also result in enhanced functionality of circuit breaker logic and the like because their function may be more effective with the increased traffic from a smaller subset of servers.
  • the deterministic subsetting of the plurality of servers as used in example embodiments is different from random subsetting of servers used by some conventional load balancing systems. Random subsetting may result in a load imbalance, e.g., some servers may be picked more frequently than the others due to inherent probabilities associated with selecting a server randomly.
  • the random selection of servers for each client's subset may result in a load distribution that closely resembles a binomial distribution. For example, when two clients transmit service requests to two servers, and each client only randomly picks one of the servers for its service requests, there is a 50% probability of each server getting an equal amount of load, a 25% chance of one server getting all the load and a 25% chance of the other server getting all the load from the two clients.
  • load banding This phenomenon, in which the load is unevenly distributed among the set of servers, is known as “load banding” or “banding.” Reducing or minimizing load banding in random subsetting may require tuning each client's connections over which service requests are transmitted to servers to achieve a more even distribution.
  • Deterministic subsetting (also referred to as “fixed-size subsetting”) which is used in example embodiments is an enhanced server subset selection method used to mitigate the load banding problem while reducing or minimizing the number of connections.
  • each client is aware of the set of its peer clients, and deterministically selects a fixed number of servers with which to connect.
  • a set of clients can distribute its service requests across a set of servers without having established a large number of connections to servers. This technique can reduce the load banding associated with random subsetting in many configurations, and hence reduces or eliminates the need for tuning connections for each client to reduce load banding. Reduced load banding improves utilization of processing resources by distributing loads more evenly.
  • deterministic subsetting can still result in a particular kind of load banding problem (e.g., off-by-one errors) if, for example, the number of clients multiplied by the number of connections each client requires is not a multiple of the number of servers in the set of servers. For example, if two clients need to transmit service requests to a set of seven servers, and each client needs to establish connections with four of the servers, then at least one of the seven servers may receive requests from two clients while the other servers may only receive requests from one client. In this example, if each server receives the same amount of load from each client it connects with, the load would not be evenly distributed among these servers.
  • a “unit load” is the maximum load any particular client in the set of clients is configured to transmit to any particular server in its subset.
  • Fractional load capability enables a client to transmit a “unit load” amount of service requests to some servers in its server subset and fractional amounts of a unit load to the other servers in its server subset.
  • each client is configured to distribute its load of service requests equally among the servers in its server subset.
  • at least one client is configured to send one or more fractional amounts of a unit load to one or more servers in its server subset, while another client is configured to send one full amount of a unit load to a server in its server subset.
  • a server may receive different amounts of load from respective clients.
  • the load balancing system is configured to distribute the total load from all clients in a set of clients equally to each server in the set of servers.
  • a load balancing system may be configured to distribute fractional units of load from multiple clients (e.g., two clients) to one server with the total sum of the load from these clients to the server always being equal to the load of the other servers, e.g., one unit of load.
  • servers may be assigned to different numbers of server subsets (e.g., some servers may be assigned to two subsets while other servers are assigned to only one subset), different amounts of fractional load may be assigned to a server in each subset to which it is assigned in order to configure the load balancing system to distribute the same amount of load to each server in the set of servers.
  • different load amounts may be assigned to the same server for different clients in certain example embodiments, the total sum of the loads assigned to the server is the same as that of the other servers.
  • FIG. 1 illustrates a non-limiting, example system 100 in which balancing load from a set of clients across a set of servers is implemented according to certain example embodiments.
  • Some example embodiments may have different and/or additional devices and sub-modules than those described in relation to FIG. 1 .
  • the described functions may be distributed among the sub-modules in a different manner than is described.
  • the system 100 comprises a set of clients 110 , a set of servers 120 (also referred to as “processing resources”), a set of user devices 140 that may be operated by end users, and a central server 130 .
  • the set of clients 110 , set of servers 120 , and central server 130 may be communicatively connected via a network such as a local area network and/or wide area network internet).
  • the communicative connections may include wired and/or wireless connections.
  • User devices 140 may be connected (by wired and/or wireless connections) to the clients 110 via the interact.
  • service requests also referred to as “processing requests” originating on the user devices 140 are received by the clients 110 and distributed to the servers 120 .
  • These service requests can include, by way of example and without limitation, transmitting messages (e.g., a social media post), requesting messages (e.g., accessing a social media post, requesting a timeline), search requests, and the like.
  • the central server 130 may be a naming service server or the like, and may not be required in some example embodiments.
  • Each client in the set of clients 110 may be a frontend server, such as, for example, a proxy server, or the like.
  • the set of clients 110 may include all proxy servers or the like that receive all incoming service requests for a particular service (e.g., service requests for messaging).
  • the set of clients 110 are configured to load balance service requests they receive by distributing the service requests among the set of servers 120 . Some of the clients 110 may perform additional tasks such as routing, etc.
  • the set of clients 110 may be homogenous, whereas in other example embodiments the set of clients 110 may include non-homogeneous clients.
  • Each server in the set of servers 120 may be a backend server, such as an HTTP server or the like.
  • Each server 121 - 127 may be configured to receive service requests from clients, perform computing and respond to the client or other entity.
  • the set of servers 120 may include all HTTP servers, application servers or the like that process service requests for a particular type of service (e.g., service requests for messaging).
  • the servers 121 - 127 are homogeneous in that they have identical or similar configurations.
  • the servers 121 - 127 are homogeneous in processing capabilities (e.g., one or more of processor types, processor numbers, memory capacity, etc.) and incur identical or at least similar times to process identical load amounts of service requests received from clients.
  • the servers 121 - 127 may not be homogeneous with respect to each other, and may incur different amounts of times and/or system capacities with respect to each other to process identical load amounts of service requests.
  • Some of the set of clients 110 and/or some of the set of servers 120 may not necessarily be separate or different machines. In some example embodiments, at least some of the set of clients 110 and/or servers 120 may be multiple virtual machines running on one or more hosts interconnected by a communication network.
  • Examples of the user devices 140 may include a desktop computer, a mobile phone, a laptop computer, a tablet computer, or any of other kinds of devices that are configured to generate service requests automatically or in response to user input(s).
  • the user devices 140 are, respectively, a laptop computer 141 , a mobile phone 142 , a desktop computer 143 , and other electronic devices.
  • other types of properly configured devices e.g., home appliances, Internet of Things (IoT) devices, and the like
  • user devices 140 may include a server or the like that generates service requests.
  • One or more of the user devices 141 - 143 may initiate a stream of requests (e.g., HTTP requests) sent to one or more of the set of clients 110 .
  • services are implemented as many homogeneous and/or interchangeable servers 120 running on a set of computers.
  • clients 110 running on a set of computers hold connections to these servers.
  • a stream of service requests such as HTTP requests, may be transmitted from the user devices 140 to the set of clients 110 .
  • a client may determine which server(s) should handle the request and then transmit the service request (or corresponding one or more processing requests) to the server(s).
  • a single incoming request may trigger a series of dependent processing requests to several servers.
  • the load of service requests for a given service is spread evenly over all servers providing that service and, at any given point in time, these servers each consume the same amount of processing resources and/or have the same or similar response times.
  • clients 110 may include any type of frontend server that receives service requests from other devices (e.g., user devices 140 ) and direct that traffic, with or without having performed some processing and/or modifications on that traffic, to one or more servers 120 that perform backend processing responsive to the service requests.
  • the servers 120 may respond to the user device 140 that originated a service request either directly or via one of the clients 110 .
  • the load balancing system's clients and servers are implemented on a Finagle platform (Finagle is an extensible RPC system for the JVM, used to construct high concurrency servers).
  • the set of clients 110 may be Finagle processes operating as HTTP servers configured to receive incoming service requests from instances of a social network application running on user devices 140 and to distribute the service requests to the set of servers 120 of Finagle processes that perform application processing responsive to the service requests.
  • These service requests can include, by way of example and without limitation, transmitting messages, requesting messages (e.g., requesting a timeline), search requests, and the like.
  • service requests from a client to a server are transmitted over an established connection between the client and server.
  • Connections between each of the set of clients 110 and one or more of the set of servers 120 may be established and maintained in different ways.
  • a pool of connections between a client and its associated servers are established as the client starts up and/or is initialized and will remain open, with service requests flowing through them, until the client is shut down or fails.
  • a connection is established and terminated for each service request, possibly resulting in significant cost and latency.
  • after a connection remains idle for a long time it may be switched to a cheap “inactive” mode, in which less resources are used to maintain the connection.
  • each client in the set of clients 110 is configured to balance its load by distributing its service requests to a smallest subset (“aperture”) of servers that can satisfy the client's concurrency requirements.
  • Concurrency requirements may be preconfigured and specified as a number of concurrent connections to be maintained by a client.
  • FIG. 1 illustrates a server subset configuration in which clients 110 each have a concurrency requirement of three concurrent connections.
  • client 111 is assigned a server subset 151
  • client 112 is assigned a server subset 152
  • client 113 is assigned server subset 153 .
  • Client 111 only connects to servers 121 , 122 and 123 in server subset 151
  • client 112 only connects to servers 125 , 126 , and 127 in server subset 153
  • client 113 only connects to servers 123 , 124 , and 125 in server subset 152 .
  • each client in the set of clients 110 operates independently to distribute its load among the servers in its corresponding subset of servers such that the total load from all clients in the set of clients 110 is evenly distributed across the set of servers 120 .
  • This configuration allows each of the clients to use resources commensurate to its offered load and incur penalties associated with connection establishment less frequently. That is, clients in these example embodiments are not required to maintain connections with every server in a large set of servers, and are required only to maintain connections to the typically much smaller number of servers in its assigned subset of servers.
  • An appropriate size for server subsets for a load balancing system may be determined based on the characteristics of the system.
  • a system operator may determine a subset size based on historical load of the system and input to the system.
  • a subset size may be automatically determined by a central server or a client based upon historical information and/or the latest load in the system.
  • a determined subset size may be communicated from the central server to all of the clients or from one client to the other clients.
  • the subset size can be large enough so that none of the servers will never receive any of the load distributed by the clients.
  • the server subset size may be automatically determined (e.g., by a client and/or a central server in communication with the load balancer) in accordance with, for example, client load and/or certain preconfigured restriction requirements of clients. That is, in example embodiments, a client may determine a subset size that accommodates its expected load, e.g., its historical load and/or currently pending service requests. Moreover, for certain example embodiments, it is desirable that the number of connections for each client be at or above a minimum numbers of connections required, for example, to ensure that even low-throughput clients have a relatively safe amount of redundancy.
  • the load balancing system may include a feedback controller on a client or a central server that can organically accommodate a client's load.
  • a feedback controller on a client or a central server that can organically accommodate a client's load.
  • the size of each of the server subsets 151 - 153 is 3, the present disclosure is not limited in this respect and different subset sizes may be used depending on a particular system, as discussed above.
  • Fractional load enables at least some clients to send an entire unit load amount to some servers in their respective server subsets, while sending fractional amounts of a unit load to others of the servers in their respective subsets.
  • a server may be sent fractional units of load from multiple clients with the total sum of the loads from these multiple clients preferably being equal to the respective loads of the other servers in the set of servers.
  • all servers in the set of servers are loaded equally with at least one boundary server (e.g., a server that is in more than one server subset) receiving a total load of a full unit load from two or more clients each of which sends a fractional amount of load.
  • one server in the set of servers 120 may receive different fractional units of load (e.g., 1 ⁇ 3 and 2 ⁇ 3 unit of load) from two clients in the set of clients 110 with the total sum of the load from these two clients being equal to the load of the other servers (e.g., one full unit of load).
  • none of the servers of the set of servers receives fractional amounts of load. That is, each server in the set of servers may only be in one server subset and may receive a full unit load from only one client. For example, when there are three clients and nine servers and the required minimum number of connections is 3, each client will simply connect with three servers.
  • a client may be configured to assign different relative load weights to different servers in its server subset. For example, if the client is to send a unit load amount of service requests to a first server in its assigned subset of servers, it may assign a relative weight of 1 to the first server, and if the client is to send a fraction x of a unit load amount of service requests to a second server in its subset, it may assign a relative weight of x to the second server.
  • the sum of the relative weights assigned to any particular server by all clients that send service requests to it may be equal to the relative weight corresponding to a unit load.
  • various relative load weights are determined for servers in each subset, as follows: (1) in the server subset 151 for client 1 : servers 0 - 2 are assigned with relative load weights 1, 1, and 1 ⁇ 3 respectively; (2) in the server subset 153 for client 2 : servers 4 - 6 are assigned with relative load weights 1 ⁇ 3, 1, and 1 respectively; and (3) in the server subset 152 for client 3 : servers 2 - 4 are assigned with relative load weights 2 ⁇ 3, 1, and 2 ⁇ 3 respectively.
  • the total sum of the relative load weights of the servers in each subset is the same, e.g., 21 ⁇ 3, meaning that each client is assigned the same amount of server capacity.
  • all of the servers preferably receive the same volume of service requests from their connected clients.
  • servers 0 , 1 , 3 , 5 and 6 are each is assigned to only one subset, and their relative load weights are the same, e.g., 1.
  • servers 2 and 4 are each assigned to two subsets, and they may each receive a fractional unit of load (e.g., 1 ⁇ 3 or 2 ⁇ 3) from two clients with the total sum of the load received at each of servers 2 and 4 being equal to one, which is the same as the relative load weight of the other servers in the set of servers.
  • server 2 will be sent a fraction 1 ⁇ 3 of a unit load from client 1 , relative to one unit load sent from client 1 to each of servers 0 and 1 , and will be sent a fraction 2 ⁇ 3 of a unit load from client 3 , relative to one unit load sent from client 3 to server 3 .
  • server 4 will be sent a fraction 1 ⁇ 3 of a unit load from client 2 , relative to one unit load sent from client 2 to each of servers 5 and 6 , and will be sent a fraction 2 ⁇ 3 of a unit load from client 3 , relative to one unit load sent from client 3 to server 3 .
  • the total sum of the load received by server 2 or 4 is the same as the load received by the other servers.
  • the load from the set of clients 110 may be uniformly distributed across the set of servers 120 .
  • the load balancing system operates to configure the set of clients 110 to distribute the total load evenly among the set of servers.
  • each client in the set of clients 110 includes an independent load balancer, so that load balancing decisions are made independently by each client in the set of clients 111 - 113 .
  • the load balancing decisions may be made independently by each client in the set of clients, without any one or more of, a centralized coordination mechanism, explicit coordination between clients, or specific relationships between the sizes of the set of client and the set of servers.
  • each client in the set of clients 110 is only required to know the size of the set of clients, its “position” in the set of clients, the size of the set of servers, and addresses of the servers in order to connect to them.
  • this architecture not only allows each client to reduce latency and increase success rate associated with distributed service requests, but also enable clients to operate with fewer dependencies and points of failure while still converging on balanced (e.g., evenly distributed) global load distribution.
  • the size of the set of clients and/or the size of the set of servers may change over time. For example, certain clients or servers may become unavailable or disconnected for upgrading, replaced or temporarily shut down. Any such changes to the sets of clients or servers may cause load balancing decisions to be recalculated. Therefore, it is desirable for each client to know the current status of the set of servers and the set of clients.
  • a client may subscribe to information regarding selected status changes of the set of servers and the set of clients from the central server 130 .
  • the central server may include a peer server set watcher as a process which monitors (e.g., continuously or periodically) the size and/or composition of the set of servers, and each client in the set of clients may register a recalculation of the server subsets (e.g., a closure) to run in its respective process when the peer server set watcher signals a change in the set of servers.
  • a peer server set watcher as a process which monitors (e.g., continuously or periodically) the size and/or composition of the set of servers, and each client in the set of clients may register a recalculation of the server subsets (e.g., a closure) to run in its respective process when the peer server set watcher signals a change in the set of servers.
  • each client in the set of clients maintains a persistent connection to a central server, such as, a naming service server (e.g., a WilyNS endpoint or a lookup bound endpoint) which operates to push updates from a bound name.
  • a naming service server e.g., a WilyNS endpoint or a lookup bound endpoint
  • a lookup bound service running on the lookup bound endpoint may use a data structure such as the map Map[Path, (Option [Response], Queue [Promise])].
  • Map[Path, (Option [Response], Queue [Promise]) Each server in the set of servers and/or the set of servers being monitored may be represented as a bound name in the map.
  • Client subscriptions requesting to be notified regarding updates may be represented as “promises” in the map, Each requested bound name (represented as a “Path” in the map) may have the last good response, along with a queue of requests waiting for the next response.
  • the lookup bound endpoint may update the last good response and drain the queue by fulfilling each promise.
  • the naming service server may “push” updates to clients that have subscribed for such updates. Clients may subsequently send a follow-up request after receiving a “push”—this pattern may sometimes be referred to as “long-polling”.
  • client requests may include an optional stamp. If the stamp matches the stamp of the last cached response, the request is enqueued in the map. Otherwise, the request is satisfied synchronously.
  • load balancing decisions may be made (instead of, or in addition to, being made independently by the respective clients) by the central server 130 based on relevant information, including the size of the set of clients, an identifier of each client (e.g., an index or a “position” of the client in the set of clients), and the size of the set of servers.
  • identifiers of the set of clients may be sorted, and hence the identifier of each client may have a unique “position” with respect to other identifiers.
  • servers and clients may selectively update the central server 130 with respect to changes to their operating status to enable the central server 130 to dynamically make load balancing decisions.
  • the central server 130 may determine the load balancing configurations for each of the clients in the set of clients 110 and may distribute the configurations to the respective clients that thereafter operate to distribute load according to those configurations. In some other embodiments, the central server 130 may itself actively monitor (e.g., by polling) for changes in the set of clients and/or the set of servers. With respect to load balancing decisions, such as recalculations and/or reconfigurations of server subsets etc., being made centrally at the central server 130 , at least in some aspects, a naming service server is natural point of integration for this functionality since the naming server is responsible for directing clients to particular servers, for example, by interpreting a client's logical destination address and returning a concrete bound name or address.
  • FIG. 2 illustrates a non-limiting, example workflow of a process 200 for an example load balancer.
  • process 200 may be implemented by each of the clients in a set of clients such as the set of clients 110 .
  • the set of clients 110 with each implementing an instance of process 200 , may operate to configure the set of clients to evenly distribute load according to example embodiments.
  • the load balancer determines the total number of clients as a client set size and determines the total number of servers as a server set size. For example, in the system 100 of FIG. 1 , the client set size of the set of clients 110 would be determined as 3, and the server set size of the set of servers 120 would be determined as 7.
  • the set of clients 110 are logically arranged in an ordered sequence and each client is assigned a unique identifier (e.g., an “index number”) representing its “position” in the ordered sequence relative to the other clients.
  • process 200 may also determine a position for at least one client in the set of clients.
  • each of the server subsets includes the same number of servers.
  • server subsets may include different numbers of servers. For example, when the client set size multiplied by the number of connections required by a client is not a multiple of the server set size, some servers may be assigned to multiple subsets (e.g., server 2 in FIG. 1 being assigned to subsets 151 and 152 ).
  • Relative load weights may then be assigned to each server in each server subset, where, for each server, the assigned relative load weight in a particular subset represents a relative proportion of the total load that it is expected to receive.
  • different relative load weights may be assigned to different servers in each subset. For example, as described above in relation to FIG. 1 , server subset 151 has relative load weights 1, 1, and 1 ⁇ 3 assigned respectively to server 0 , server 1 , and server 2 ; and server subset 152 has relative load weights 2 ⁇ 3, 1, and 2 ⁇ 3 assigned to server 2 , server 3 , and server 4 , respectively.
  • the relative weight 1 assigned to server 0 may represent that the total load for server 0 is to be received from the client to which the subset 151 is assigned; likewise, the relative weight 1 ⁇ 3 assigned to server 2 as a member of subset 151 may represent that server 2 is to receive only 1 ⁇ 3 of its total load from the client to which subset 151 is assigned.
  • different proportions of the service requests may be transmitted from a client to respective servers in its assigned subset.
  • the total sum of the relative load weights assigned to each server in its associated subset(s) is the same as that of the other servers, and hence, in some embodiments, the same volume of service requests will be transmitted to each of the servers.
  • the relative load weight 1 ⁇ 3 represents 1 ⁇ 3 of the total load at server 2 .
  • a relative weight w assigned to a server by one client does not represent a fraction w of the total weight assigned to that server.
  • the determination of the subsets may also be subject to other constraints, such as, for example, having a server subset size that is equal to or greater than a specified minimum number of connections required for each client.
  • the specified minimum number of connections for a client can be considered as a minimum concurrency requirement to, among other things, ensure that each client had a minimum level of redundancy. Since, as noted above, a client establishes a connection with each server to which it distributes load, the minimum number of connections requirement represents a minimum number of servers that are required to be in a server subset.
  • the load balancer may operate to determine server subsets that, in addition to satisfying the criteria specified in the previous paragraph, also meets specified constraints, such as, for example, the minimum connection constraint, if the size of the server subset determined according to the criteria described in the paragraph above is less than the specified minimum number of connections, then the set of servers may be logically expanded by duplicating the servers a number of times sufficient to satisfy the constraints, and the logically expanded set of servers may be divided among the set of clients.
  • the system may dynamically determine the minimum number of connections based, for example, on projected and/or actual load amounts. Further details of operation 220 are described in relation to FIGS. 3A and 3B .
  • each determined subset is assigned to a respective client.
  • the assignment of a server subset to a client may be made according to the client's “position” in the set of clients relative to the other clients, for example, specified as a unique identifier (e.g., an index) assigned to the client.
  • a unique identifier e.g., an index
  • other technique may be used to assign each subset to a respective client.
  • each client may perform operation 230 independently to determine its assigned server subset (or the same set of server subsets and assignments as other clients) such that the server subsets are uniformly assigned to the clients uniformly.
  • FIG. 2 illustrates operation 230 as following operation 220 , it should be understood that each subset may be assigned to a respective client in the set of clients at operation 220 .
  • Operations 210 - 230 relate to establishing the configurations in each of the clients 110 .
  • each client may establish a connection to each of the servers in its server subset.
  • the connections may be used for distributing the service requests.
  • a client may distribute service requests to the servers of its assigned subset in accordance with their relative load weights.
  • a client may transmit a weighted-quantity or a proportion of service requests to each of one of more servers in its assigned subset, while also transmitting another weighted-quantity or another proportion of service requests to at least one other server in its assigned subset, and the other weighted-quantity is a fraction of the weighted-quantity (or, equivalently, the other proportion is a fraction of the proportion).
  • the client 1 may transmit a first weighted-quantity (e.g., a unit load) of service requests to servers 0 and 1 , while transmitting a second weighted-quantity of service requests to server 2 .
  • the second weighted-quantity is a fraction (e.g., 1 ⁇ 3) of the first weighted-quantity. That is, the second weighted-quantity is a 1 ⁇ 3 fractional load of the unit load.
  • Another client may distribute a fractional load to more than one server in its assigned subset.
  • client 3 distributes a 2 ⁇ 3 fractional load to each of server 2 and server 4 respectively, relative to one unit of load distributed to server 3 , in its assigned subset 152 .
  • each client in a set of clients may transmit service requests to several servers in a set of servers
  • respective servers in the set of servers may receive service requests from different numbers of clients.
  • some of the servers may receive service requests from multiple clients, while other servers may only receive service requests from a single client.
  • servers 2 and 4 will receive service requests from two clients, while the other servers only receive service requests from a single client.
  • FIG. 3A illustrates a non-limiting, example process 300 A for determining subsets of servers and relative load weights for servers in each subset. According to some embodiments, process 300 A may be performed during operation 220 of process 200 described above with respect to FIG. 2 .
  • a client width number is determined based on the number of clients (e.g., the size of the set of clients) and the number of servers (e.g., the size of the set of servers).
  • the client width number can be considered as the amount of unit loads of service requests to be distributed by each client.
  • the client width number is calculated by dividing the number of servers by the number of clients. In the example system shown in FIG. 1 , the client width number is 21 ⁇ 3, which is the result of dividing the number of servers (e.g., 7 in FIG. 1 ) by the number of clients (e.g., 3 in FIG. 1 ).
  • the load balancer determines whether the client width number is less than a minimum subset size, such as the above described specified minimum number of connections required for each client.
  • the load balancer determines at operation 320 that the determined client width number is not less than the minimum subset size, then it will determine subsets of servers and relative load weights for servers in each subset at operation 330 as described above. For example, if the specified required minimum number of connections for each client in the system 100 is 2, which is less than the calculated client width number 21 ⁇ 3, then the server subsets can be determined by dividing the set of servers 120 into three equal-sized subsets for the set of clients 110 as described above.
  • the load balancer determines at operation 320 that the determined client width number is less than the minimum subset size, then it, logically expands the servers in the set of servers at operation 340 .
  • the set of servers may be logically expanded by duplicating the servers in the set of servers. For example, the seven servers shown in FIG. 1 in the set of servers may be logically expanded by duplication to fourteen servers, or twenty-one servers, etc., so that the client width number, re-calculated based on the logically expanded set of servers, is larger than the minimum subset size.
  • the load balancer can logically expand the set of servers by duplicating these servers to be fourteen server instances, and hence each of the servers may be regarded as two server instances, each of which will be assigned to at least one server subset.
  • each server in the set of servers may be assigned to at least two subsets of servers, and hence may receive service requests from at least two clients, as indicated in FIG. 4 .
  • FIG. 4 illustrates a non-limiting, example configuration of another example system 400 for balancing load from a set of clients 410 (e.g., clients 411 , 412 and 413 or clients 1 - 3 ) across a set of servers 420 (e.g., servers 421 - 427 or servers 0 - 6 ).
  • the number of clients 410 is 3, the number of servers 420 is 7, and the minimum number of connections required by each client is 4.
  • a client width number 21 ⁇ 3, as initially calculated at operation 310 is less than the required number of connections 4. Consequently, the set of servers 420 is logically expanded to include fourteen server instances at operation 340 .
  • the load balancer proceeds to operation 310 to re-calculate another client width number based on the number of clients and the number of the expanded server instances (e.g., size of the expanded set of servers). For example, for the example system illustrated in FIG. 4 , at operation 310 , a client width number (42 ⁇ 3) may be calculated by dividing the number of the expanded server instances (e.g., 14) by the number of the clients (e.g., 3). Thus, the balancer will determine at operation 320 that the client width number 42 ⁇ 3 is no longer less than the required number of connections 4, and hence subsets of servers can be determined based on the expanded server instances at operation 330 .
  • FIG. 3B illustrates another non-limiting, example process 300 B for determining subsets of servers and relative load weights for servers in each subset.
  • process 300 B may be performed during operation 220 of process 200 described above with respect to FIG. 2 .
  • process 300 B comprises the same operations of process 300 A except for operation 340 . That is, the operation 340 of process 300 A in FIG. 3A is replaced with an operation 350 in FIG. 3B .
  • the load balancer determines at the operation 320 that the determined client width number is less than the minimum subset size, then it logically multiplies (e.g., double, triple, etc.) the client width number at the operation 350 .
  • the calculated client width number 21 ⁇ 3 may be doubled to be 42 ⁇ 3. After that, it will be determined at the operation 320 that the doubled client width number is larger than the required minimum number of connections 4, and hence subsets of servers can be determined based on the expanded server instances at the operation 330 .
  • some of the servers may be assigned to three subsets with different relative load weights while other servers are assigned to two subsets. Ideally, all of the servers will still receive the same volume of service requests from their connected clients. As shown in FIG. 4 , a server subset including servers 0 - 2 and 4 - 6 is assigned to client 1 , a server subset including servers 2 - 6 is assigned to client 2 , and a server subset including servers 1 - 4 is assigned to client 3 . Thus, client 1 will send service requests to six servers, while clients 2 and 3 will send service requests to five servers.
  • a relative load weight 1 is assigned to server instances 0 - 1 and 5 - 6 , and a relative load weight 1 ⁇ 3 is assigned to both server instances 2 and 4 ;
  • a relative load weight 1 is assigned to server instances 3 - 6 and a relative load weight 2 ⁇ 3 is assigned to server instance 2 ;
  • a relative load weight 1 is assigned to server instances 0 - 3 , and a relative load weight 2 ⁇ 3 is assigned to server instance 4 .
  • the total sum of relative load weights of all of the servers in each subset is the same as the determined client width number—42 ⁇ 3, and the total sum of the assigned relative load weights assigned to each of the servers is also the same—2.
  • FIG. 5 illustrates a non-limiting, example logical ring topology 500 that may be used in determining server subsets and relative load weights of servers in each subset.
  • a set of servers are uniformly distributed on a continuous logical ring (e.g., a server ring) 510 in a manner that each server is assigned a “slice” of a same length on the ring.
  • a continuous logical ring e.g., a server ring
  • the entire circumference of the ring can be equally divided among the set of servers so as to assign a slice of the circumference to each server in the set of servers.
  • a slice assigned to a server may be referred to as a “server slice”.
  • the entire circumference of the ring is also equally sub-divided and allocated to each of the clients in the set of clients, so that each client is also assigned its own portion of the ring.
  • a portion of the ring assigned to a client may be referred to as the “domain” of the client, and that client is configured to only distribute service requests to servers within its domain. That is, each client is configured to distribute service requests only to servers whose server slices overlap the client's own assigned portion (i.e. domain). It is noted that, due to the capability to distribute fractional loads as described above, the lengths of the portions of the ring assigned to all of the clients are the same.
  • a server ring In the ring topology 500 shown in FIG. 5 , seven servers (servers 0 - 6 ) are uniformly distributed on a continuous ring 510 . Servers 0 - 6 are assigned to slices 511 - 517 respectively, and the lengths of the slices 511 - 517 are the same. As described above, the concept of fractional aperture is introduced in example embodiments to enable a client to distribute fractional units of load to one or more server(s). As a result, a server ring can be represented as a continuous ring and can be divided at any point of the ring.
  • each client is required to communicate with the servers which fall within its domain (e.g., servers whose server slices overlap that client's portion of the ring).
  • a client is not required to treat the servers in its assigned subset equally, as different proportions of service requests may be sent from the client to different servers.
  • the ring is evenly divided into three client slices 521 , 522 , and 523 for clients 1 - 3 respectively, and hence each client is equally assigned 1 ⁇ 3 portion of the entire ring.
  • the boundaries of the portions 521 - 523 assigned to each client do not need to be aligned on the edges of server slices 511 - 517 . That is, a server slice (e.g., one of the slices 511 - 517 ) can be shared by two clients. For example, in FIG. 5 , the server slice 513 for server 2 is shared by clients 1 and 3 , and the server slice 515 for server 4 is shared by clients 2 and 3 .
  • the ring topology 500 represents the relationship between the set of servers 0 - 6 and the set of clients 1 - 3 , including the server subset assigned to each client, and relative load weights assigned to servers in each subset.
  • the ring topology 500 shows the following relationships between the servers 0 - 6 and clients 1 - 3 : (1) A server subset including servers 0 - 2 is assigned to client 1 , and servers 0 - 2 in this subset are assigned with relative load weights 1, 1, and 1 ⁇ 3 respectively; (2) A server subset including servers 4 - 6 is assigned to client 2 , and servers 4 - 6 in this subset are assigned with relative load weights 1 ⁇ 3, 1, and 1 respectively; and (3) A server subset including servers 2 - 4 is assigned to client 3 , and servers 2 - 4 in this subset are assigned with relative load weights 2 ⁇ 3, 1, and 2 ⁇ 3 respectively.
  • servers 0 - 6 are loaded equally with servers 2 and 4 (which are boundary servers) each receiving a full share unit of load from two clients.
  • the logical ring topology illustrates features and advantages of some example embodiments.
  • changes to the set of servers as represented in the logical server ring have diminishing effects on clients whose domains are further away from the change on the logical ring.
  • replacement of a server on one part of the ring may not affect a client on the radially opposite part whose assigned portion of the ring does not overlap the changed servers.
  • the capability to affect changes to some servers without affecting a substantial number of the server subsets may have beneficial implications by reducing resource churn and enabling the swapping/upgrading of servers with some degree of seamlessness to ongoing servicing of incoming service requests.
  • the number of connections required by each client may be 1 or 2, so that this constraint can be fulfilled by dividing the size of the server set (e.g., seven servers) evenly among the three clients without logically expanding the client slices 521 , 522 and 523 or the servers 1 - 7 .
  • the number of connections required by each client may be larger than the number calculated by dividing the number of servers by the number of clients.
  • the client slices 521 , 522 and 523 may be repeatedly expanded by duplicating the client slices until the total sum of portion(s) and/or the entirety of server slices overlapping each of the client slices is larger than the required number of connections.
  • the set of servers 1 - 7 may be repeatedly logically expanded by duplicating the set of servers until a number calculated by dividing the size of the expanded set of servers by the number of clients is larger than the required number of connections.
  • FIG. 6 illustrates another non-limiting, example ring topology 600 for determining server subsets and relative load weights of servers in each subset.
  • the ring topology 600 seven servers (servers 0 - 6 ) are uniformly distributed on a continuous ring 610 so that each of the servers is assigned a slice of the server ring 610 and the lengths of the server slices assigned to servers 0 - 6 respectively are the same.
  • servers 0 - 6 are assigned to server slices 611 - 617 respectively,
  • the minimum number of connections required by clients 1 - 3 in this example is four. Therefore, the server subsets cannot be determined by merely dividing the server ring 610 evenly into three equally-sized portions for the three clients, as that only yields 21 ⁇ 3 server units for each client, which is less than 4. In other words, assigning server subsets to clients by dividing the server ring 610 equally among the three clients would yield only three connections to servers for each client—which is less than the required four connections. Therefore, in some example embodiments, the server ring 610 is logically expanded so that each client can be assigned a domain that overlaps at least four server slices.
  • logically duplicating the server ring 610 once is sufficient to satisfy the client connection requirement.
  • This logical expansion can be represented as the load balancer wrapping around the server ring 610 twice with the client domains to satisfy the required number of connections. That is, each of the seven servers is deemed as if being expanded logically to be two server instances, and the connections between clients and servers are expanded as well. After that, the logically expanded two server rings are evenly divided into three portions 621 , 622 and 623 among the three clients, so that each client is assigned one portion of the expanded two server rings.
  • each of the client slices 521 , 522 and 523 may be expanded (e.g., doubled) so that each client slice overlaps at least a total of four server slices.
  • the client slices 521 , 522 and 523 shown in FIG. 5 are expanded to client slices 621 622 and 623 . Due to the expansion of the client slices, the total number of servers overlapping with the expanded client slices is also expanded. That is, the connections between clients and servers are expanded as well.
  • clients 1 - 3 are assigned with the portions 621 - 623 respectively.
  • Each of the portions of the server rings corresponds to a subset of servers assigned to a client, and indicates the relationship between the servers in the subset and the client, including relative load weights assigned to the servers of the subset.
  • a server subset including server instances 0 - 2 and 4 - 6 is assigned to client 1 , and, in this subset, servers instances 0 - 2 and 4 - 6 are assigned with a relative load weight 1, 1, 1 ⁇ 3, 1 ⁇ 3, 1, and 1 respectively;
  • a server subset including server instances 2 - 6 is assigned to client 2 , and, in this subset, serves instances 2 - 6 are assigned with a relative load weights 2 ⁇ 3, 1, 1, 1, and 1 respectively;
  • a server subset including server instances 0 - 4 is assigned to client 3 , and, in this subset, server instances 0 - 4 are assigned with relative load weights 1, 1, 1, 1, and 2 ⁇ 3 respectively.
  • server 2 receives 1 ⁇ 3 unit of load from client 1 , receives 2 ⁇ 3 unit of load from client 2 , and receives one full unit of load from client 3
  • server 4 receives 1 ⁇ 3 unit of load from client 1 , receives 2 ⁇ 3 unit of load from client 3 , and receives one full unit of load from client 2 .
  • Servers 2 and 4 thus each receives a full share unit of load from two different clients.
  • a load balancer may handle restarts and failures gracefully and robustly by continuing to load servers (e.g., backend servers) uniformly while minimizing or reducing churn.
  • load servers e.g., backend servers
  • the client may be required to re-determine their respective subsets of servers and relative load weights for servers in each subset by any of the methods described above when is the client becomes aware of the change of the number of servers, the number of clients, the subset size, and/or its position in the set of clients.
  • a server when a server becomes unavailable, its clients may at least temporarily select a replacement server. When a replacement server is selected, the clients may create new TCP connections, which creates additional overhead. Similarly, when a client restarts, it may be required to reopen the connections to all its servers.
  • a centralized load balancer e.g., running on a central server
  • a set of clients is assumed to converge on a uniform server subset size when they are offered, or configured to receive, the same amount of load.
  • one or more clients may dynamically expand their respective server subsets. For example, when a client receives a burst of traffic beyond projected levels such that it determines cannot be handled by the servers of the current subset, the client may temporarily expand the number of servers it distributes to by, for example, temporarily expanding its server subset. This adjustment may or may not be performed in a coordinated fashion
  • FIG. 7 illustrates a non-limiting, example block diagram for an example device 700 .
  • the example device 700 may be a computer implementing any of the clients or any of the servers described above in connection with FIGS. 1-6 , or a device hosting at least one of these clients and/or one of these servers.
  • the device 700 includes a communication module 710 , an input/output module 720 , a processing system 730 , and a storage 740 , all of which may be communicatively linked by a system bus, network, or other connection mechanisms.
  • the communication module 710 functions to allow the device 700 to communicate with one or more of the other devices (e.g., user devices, clients, servers or a global server).
  • the communication module 710 is configured to transmit data to other devices and/or receive data from other devices.
  • the communication module 710 may comprise one or more communication interfaces supporting satellite communications, radio communications, telephone communications, cellular communications, internet communications, and/or the like.
  • the communication module 710 may comprise a wireless transceiver with connected antenna, a wireless LAN module, a radio-frequency (RE), Infrared, or Bluetooth® transceiver, and/or a near field communication transceiver module.
  • RE radio-frequency
  • the communication module 710 may comprise one or more communication interfaces supporting satellite communications, radio communications, telephone communications, cellular communications, internet communications, and/or the like.
  • the communication module 710 may comprise a wireless transceiver with connected antenna, a wireless LAN module, a radio-frequency (RE), Infrared, or Bluetooth® transceiver, and/or a near field communication transceiver module.
  • RE radio-frequency
  • Bluetooth® Bluetooth® transceiver
  • the data storage 740 may comprise one or more volatile and/or non-volatile storage components, such as, a hard disk, a magnetic disk, an optical disk, read only memory (ROM) and/or random access memory (RAM), and may include removable and/or non-removable components.
  • the date storage 740 may be integrated in whole or in part with the processing system 730 .
  • the processing system 730 may comprise one or more processors 731 , incl one or more general purpose processors and/or one or more special purpose processors (i.e., DSPs, GPUs, FPs or ASICs).
  • the processing system 730 may be capable of executing application program instructions (e.g., compiled or non-compiled program and/or machine code) stored in data storage 740 to perform any of the functions and processes described above.
  • the data storage 740 may include non-transitory computer-readable medium, having stored thereon program instructions that, if executed by the processing system 730 , cause the device 700 to perform any of the processes or functions disclosed herein and/or illustrated by the accompanying drawings.
  • the program instructions stored in the storage 740 may include an operating system program and one or more application programs, such as program instructions for one of the above-described load balancers.
  • the operations in example processes of FIGS. 2-3 can be defined by the program instructions stored in the storage 740 and controlled by processing system 730 executing the program instructions.
  • the input/output module 720 of the device 700 may enable the device 700 to interact with a human or non-human user, such as to receive input from a user and to provide output to the user.
  • the input/output module 720 may include a touch-sensitive or presence-sensitive panel, keypad, keyboard, trackball, joystick, microphone, still camera and/or video camera, and the like.
  • the input/output module 720 may also include one or more output components such as a display device, which may be combined with a touch-sensitive or presence-sensitive panel.
  • the input/output module 720 may display various user interfaces to enable a user or an operator to access services or functions provided by the device 700 .
  • the improved deterministic subsetting load balancing techniques of various embodiments may be implemented such that the load balancing configurations are either centrally determined or are determined. in a distributed manner. Whereas the central determination results in :less use of computing overhead, the distributed determination. further improves the resilience and robustness of the load balancing.
  • Both types of load balancing determinations in example embodiments yield load balancing systems that enable more even distribution of loads and better control of the maximum load levels experiences in certain servers.
  • These improved characteristics of the load balancing improve the computing performance and/or the memory use of the computers used in the load balancing system, and moreover, improves overall system latency, throughput and responsiveness.
  • the above described embodiments may also be used for distributing load among resources other than servers, such as computers, network links, processors, hard drives, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Systems and methods are described for load balancing between a set of servers. Subsets of servers from the set of servers are assigned, via deterministic subsetting, to respective clients from a set of clients. Unlike conventional load balancing techniques using deterministic subsetting, the disclosed techniques enable configuring a client to distribute different amounts of load among the servers in its server subset. Techniques for constructing the subsets are also described.

Description

    CROSS SECTION TO RELATED APPLICATION
  • This application is a continuation of application Ser. No. 16/775,292, filed on Jan. 29, 2020, which is a continuation of application Ser. No. 16/101,748, filed on Aug. 13, 2018 (now U.S. Pat. No. 10,579,432, issued Mar. 3, 2020). The entire contents of each of the applications are incorporated herein by reference.
  • FIELD OF THE DISCLOSURE
  • This disclosure relates generally to balancing load across a collection of processing resources, and more particularly to techniques for distributing substantially equal amounts of load across a collection of deterministically-subsetted processing resources.
  • BACKGROUND
  • A load balancer distributes load across a collection of processing resources, such as, for example, computers configured to perform computing tasks such as data processing tasks, communication/networking tasks and/or data storage tasks. Example loads processed by the processing resources may include service requests (also referred to as “processing requests”) for causing one or more computing tasks to be performed by a processing resource. These service requests can include, by way of example and without limitation, requests to write data (e.g., a social media post, write to storage), requests to read data (e.g., accessing a social media post, requesting a timeline from a social media service, read from storage), search requests, compute requests, data download/upload requests, data display requests and the like. In some example embodiments, the “load” may include a volume of data from/to storage and/or volume of network traffic.
  • Load balancing is an important consideration in any processing system, and helps ensure the performance, scalability, and resilience of high transaction volume processing systems that have multiple processing resources. When processing of service requests can be distributed over multiple servers in a system, a load balancer may operate to control the distribution of the service requests across the multiple servers in order to reduce latency and/or increase the proportion of successfully serviced requests.
  • The various types of processing resources to which the load is distributed are sometimes collectively referred to as “servers” in this disclosure. Various techniques and algorithms have been proposed for load balancing among a set of servers. These techniques include, for example, round robin load balancing, and least loaded load balancing.
  • However, when a set of clients use these conventional load distribution techniques to distribute load to a plurality of servers that perform computing tasks in response to the received load, the overhead for establishing and maintaining connections between each of the clients and the set of servers can be high.
  • “Deterministic subsetting” enables each client to, be configured to maintain connections to only a subset (also referred to as an “aperture”) of the servers to which it sends load such as service requests. With deterministic subsetting (“deterministic aperture”) load balancing, a client is not required to establish connections with every server in a large set of servers that services a particular type of service request, and instead is only required to send its load over a smaller number of servers corresponding to the subset of servers with which the client establishes connections.
  • SUMMARY OF EXAMPLE EMBODIMENTS
  • Example embodiments disclosed herein are related to improved deterministic aperture load balancing techniques for balancing load from a set of clients among a set of servers. These improved load balancing techniques enable each client to distribute load to only a subset (or an “aperture”) of the servers, and/or allow at least one client to distribute a fractional load to some of the servers to which it is connected such that each server of the set of servers receives substantially the same amount of load.
  • By maintaining connections and distributing load to only a subset of servers assigned to a client, the improved techniques reduce the overhead involved in balancing load among a set of servers. Moreover, allowing a client to distribute a fractional load to at least one server enables more even distribution of load among the servers. These improved characteristics not only improve the computing performance and/or utilization of processing resources of the clients and the servers, but also enable the clients to reduce latency and increase success rate associated with distributed service requests.
  • According to one embodiment, there is provided a load balancing method for balancing a processing load of a plurality of clients among a plurality of servers. The method comprises assigning a different subset of the plurality of servers to each respective client in the plurality of clients. Each client is configured to distribute processing requests only to servers in the subset assigned thereto. The load balancing method further includes, for each respective client in the plurality of clients, determining respective load weights for distributing processing requests to the servers in the subset assigned thereto. The load weights for each respective client are determined such that each server of the plurality of servers processes substantially the same unit amount of processing requests and such that at least one server of the plurality of servers is assigned to multiple clients in the plurality of clients. At least one of the load weights for each respective client is a fraction of another one of the load weights for the client. The load-balancing method may also include controlling the plurality of clients to distribute processing requests based on the determined load weights.
  • Another example embodiment provides a load balancing computer processing system including a plurality of clients. Each respective client includes communication circuitry and a processor. The processor is configured to control the communication circuitry of the respective client to distribute processing requests to a respective subset of a plurality of servers over a communication network by transmitting a first weighted-quantity of the processing requests to each of one or more of the servers in the respective subset and transmitting to at least one other server in the respective subset a second weighted-quantity of the processing requests. The second weighted-quantity is a fraction of the first weighted-quantity such that the respective client is configured to distribute a same volume of processing requests as other clients in the plurality of clients and such that at least one server of the plurality of servers is assigned to multiple clients.
  • Another embodiment provides a non-transitory computer readable storage medium storing computer program instructions that, when executed by a processor of a client, causes the client to balance load distributed among a plurality of servers. The computer program instructions include instructions for determining a total number of the plurality servers as a server set size, a total number of a plurality clients as a client set size, a unique identifier assigned to the client, and a subset size for the client. The subset size is the total number of servers to be connected with the client. The instructions further include, based upon the determined server set size, client set size, subset size, and identifier assigned to the client, determining a subset of servers from the plurality of servers and relative load weights for servers in the selected subset, so that a same volume of processing requests is distributed to the plurality of servers by the client as other clients in the plurality of clients.
  • In these example embodiments, each of the relative load weights indicates relative amounts of processing requests transmitted from the client to respective server in the subset. The selected subset, which has a size of at least the determined subset size and of a same size as respective subsets of selected by each other client in the plurality of clients. At least one of the relative load weights for one server in the selected subset is a fraction of another of the relative load weights for another server in the selected subset.
  • The present disclosure uses the phrases substantially the same amount of load, or substantially equal amounts of load, to indicate that the amounts of load distributed to the servers may be the same, or very nearly the same (e.g., varying only by a relatively small margin, such as, any of, 5%, 2%, 1% etc.), across the servers in a server subset. For example, in some embodiments, although the clients are programmatically configured to distribute the same amount of load to respective servers in a subset of servers, network conditions and/or processing request availability may result in some of the servers receiving a marginally lower amount of work than the other servers in the subset.
  • The example embodiments, aspects, and advantages disclosed herein may be provided in any suitable combination or sub-combination to achieve yet further example embodiments.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings facilitate an understanding of example embodiments of this invention. In the drawings:
  • FIG. 1 illustrates a non-limiting, example system architecture of an example system supporting balancing load from a set of clients across a set of servers;
  • FIG. 2 illustrates a non-limiting, example workflow for an example load balancer;
  • FIG. 3A illustrates a non-limiting, example workflow for determining subsets of servers and relative load weights for servers in each subset;
  • FIG. 3B illustrates a non-limiting, example workflow for determining subsets of servers and relative load weights for servers in each subset;
  • FIG. 4 illustrates a non-limiting, example configuration of another example system supporting balancing load from a set of clients across a set of servers;
  • FIG. 5 illustrates a non-limiting, example logical ring topology for determining server subsets and relative load weights of servers in each subset;
  • FIG. 6 illustrates another non-limiting, example logical ring topology for determining server subsets and relative load weights of servers in each subset; and
  • FIG. 7 illustrates a non-limiting, example block diagram for an example device on which load balancing according to embodiments can be implemented.
  • DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
  • In accordance with certain example embodiments, certain systems, devices, processes, and methods are disclosed for balancing load across a collection of processing resources. More particularly, certain example embodiments relate to techniques for distributing substantially equal amounts of load across a plurality of deterministically subsetted servers. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of example embodiments.
  • As described above, with the use of deterministic subsetting, each client in a set of clients may distribute load to only a subset (or an “aperture”) of the set of servers. Subsetting therefore enables a client to use processing resources sufficient to service its load, and to less frequently incur penalties for connection establishment. Avoiding a large number of connections can result in reduced overhead, and may also result in enhanced functionality of circuit breaker logic and the like because their function may be more effective with the increased traffic from a smaller subset of servers.
  • The deterministic subsetting of the plurality of servers as used in example embodiments is different from random subsetting of servers used by some conventional load balancing systems. Random subsetting may result in a load imbalance, e.g., some servers may be picked more frequently than the others due to inherent probabilities associated with selecting a server randomly. The random selection of servers for each client's subset may result in a load distribution that closely resembles a binomial distribution. For example, when two clients transmit service requests to two servers, and each client only randomly picks one of the servers for its service requests, there is a 50% probability of each server getting an equal amount of load, a 25% chance of one server getting all the load and a 25% chance of the other server getting all the load from the two clients. This phenomenon, in which the load is unevenly distributed among the set of servers, is known as “load banding” or “banding.” Reducing or minimizing load banding in random subsetting may require tuning each client's connections over which service requests are transmitted to servers to achieve a more even distribution.
  • Deterministic subsetting (also referred to as “fixed-size subsetting”) which is used in example embodiments is an enhanced server subset selection method used to mitigate the load banding problem while reducing or minimizing the number of connections. In deterministic subsetting, each client is aware of the set of its peer clients, and deterministically selects a fixed number of servers with which to connect. In deterministic subsetting, a set of clients can distribute its service requests across a set of servers without having established a large number of connections to servers. This technique can reduce the load banding associated with random subsetting in many configurations, and hence reduces or eliminates the need for tuning connections for each client to reduce load banding. Reduced load banding improves utilization of processing resources by distributing loads more evenly.
  • However, deterministic subsetting can still result in a particular kind of load banding problem (e.g., off-by-one errors) if, for example, the number of clients multiplied by the number of connections each client requires is not a multiple of the number of servers in the set of servers. For example, if two clients need to transmit service requests to a set of seven servers, and each client needs to establish connections with four of the servers, then at least one of the seven servers may receive requests from two clients while the other servers may only receive requests from one client. In this example, if each server receives the same amount of load from each client it connects with, the load would not be evenly distributed among these servers.
  • Certain example embodiments described herein introduce “fractional load” to load balancers that use deterministic subsetting. A “unit load” is the maximum load any particular client in the set of clients is configured to transmit to any particular server in its subset. Fractional load capability enables a client to transmit a “unit load” amount of service requests to some servers in its server subset and fractional amounts of a unit load to the other servers in its server subset. In contrast, in conventional deterministic subsetting techniques, each client is configured to distribute its load of service requests equally among the servers in its server subset. In other example embodiments, at least one client is configured to send one or more fractional amounts of a unit load to one or more servers in its server subset, while another client is configured to send one full amount of a unit load to a server in its server subset.
  • Moreover, whereas in conventional deterministic subsetting each server receives a same amount of load (e.g., a unit load) from one or more clients, according to example embodiments described herein, a server may receive different amounts of load from respective clients. According to certain example embodiments, the load balancing system is configured to distribute the total load from all clients in a set of clients equally to each server in the set of servers. Thus, for example, a load balancing system according to an embodiment may be configured to distribute fractional units of load from multiple clients (e.g., two clients) to one server with the total sum of the load from these clients to the server always being equal to the load of the other servers, e.g., one unit of load. As servers may be assigned to different numbers of server subsets (e.g., some servers may be assigned to two subsets while other servers are assigned to only one subset), different amounts of fractional load may be assigned to a server in each subset to which it is assigned in order to configure the load balancing system to distribute the same amount of load to each server in the set of servers. Although different load amounts may be assigned to the same server for different clients in certain example embodiments, the total sum of the loads assigned to the server is the same as that of the other servers.
  • FIG. 1 illustrates a non-limiting, example system 100 in which balancing load from a set of clients across a set of servers is implemented according to certain example embodiments. Some example embodiments may have different and/or additional devices and sub-modules than those described in relation to FIG. 1. Moreover, in other example embodiments, the described functions may be distributed among the sub-modules in a different manner than is described.
  • The system 100 comprises a set of clients 110, a set of servers 120 (also referred to as “processing resources”), a set of user devices 140 that may be operated by end users, and a central server 130. The set of clients 110, set of servers 120, and central server 130 may be communicatively connected via a network such as a local area network and/or wide area network internet). The communicative connections may include wired and/or wireless connections. User devices 140 may be connected (by wired and/or wireless connections) to the clients 110 via the interact. In certain example embodiments, service requests (also referred to as “processing requests”) originating on the user devices 140 are received by the clients 110 and distributed to the servers 120. These service requests can include, by way of example and without limitation, transmitting messages (e.g., a social media post), requesting messages (e.g., accessing a social media post, requesting a timeline), search requests, and the like. The central server 130, for example, may be a naming service server or the like, and may not be required in some example embodiments.
  • Each client in the set of clients 110, for example, clients 111, 112, and 113 (also referred to as clients 1, 2, and 3), may be a frontend server, such as, for example, a proxy server, or the like. According to some example embodiments, at a particular organization and/or server farm, the set of clients 110 may include all proxy servers or the like that receive all incoming service requests for a particular service (e.g., service requests for messaging). The set of clients 110 are configured to load balance service requests they receive by distributing the service requests among the set of servers 120. Some of the clients 110 may perform additional tasks such as routing, etc. In some example embodiments, the set of clients 110 may be homogenous, whereas in other example embodiments the set of clients 110 may include non-homogeneous clients.
  • Each server in the set of servers 120, for example, servers 121-127 (also referred to as servers 0-6), may be a backend server, such as an HTTP server or the like. Each server 121-127 may be configured to receive service requests from clients, perform computing and respond to the client or other entity. According to some example embodiments, at a particular organization and/or server farm, the set of servers 120 may include all HTTP servers, application servers or the like that process service requests for a particular type of service (e.g., service requests for messaging). In certain example embodiments, the servers 121-127 are homogeneous in that they have identical or similar configurations. In some example embodiments, the servers 121-127 are homogeneous in processing capabilities (e.g., one or more of processor types, processor numbers, memory capacity, etc.) and incur identical or at least similar times to process identical load amounts of service requests received from clients. In yet other example embodiments, the servers 121-127 may not be homogeneous with respect to each other, and may incur different amounts of times and/or system capacities with respect to each other to process identical load amounts of service requests.
  • Some of the set of clients 110 and/or some of the set of servers 120 may not necessarily be separate or different machines. In some example embodiments, at least some of the set of clients 110 and/or servers 120 may be multiple virtual machines running on one or more hosts interconnected by a communication network.
  • Examples of the user devices 140 may include a desktop computer, a mobile phone, a laptop computer, a tablet computer, or any of other kinds of devices that are configured to generate service requests automatically or in response to user input(s). As shown in FIG. 1, in certain example embodiments, the user devices 140 are, respectively, a laptop computer 141, a mobile phone 142, a desktop computer 143, and other electronic devices. However, it will be appreciated that other types of properly configured devices (e.g., home appliances, Internet of Things (IoT) devices, and the like) may be used as a user device 140. For example, in some embodiments, user devices 140 may include a server or the like that generates service requests. One or more of the user devices 141-143 may initiate a stream of requests (e.g., HTTP requests) sent to one or more of the set of clients 110.
  • In certain example embodiments, services are implemented as many homogeneous and/or interchangeable servers 120 running on a set of computers. On the other hand, clients 110 running on a set of computers hold connections to these servers. In an example embodiment, a stream of service requests, such as HTTP requests, may be transmitted from the user devices 140 to the set of clients 110. For each incoming service request, a client may determine which server(s) should handle the request and then transmit the service request (or corresponding one or more processing requests) to the server(s). In some examples, a single incoming request may trigger a series of dependent processing requests to several servers.
  • In an ideal case, the load of service requests for a given service is spread evenly over all servers providing that service and, at any given point in time, these servers each consume the same amount of processing resources and/or have the same or similar response times.
  • As described above, clients 110 may include any type of frontend server that receives service requests from other devices (e.g., user devices 140) and direct that traffic, with or without having performed some processing and/or modifications on that traffic, to one or more servers 120 that perform backend processing responsive to the service requests. The servers 120 may respond to the user device 140 that originated a service request either directly or via one of the clients 110. In an example embodiment, the load balancing system's clients and servers are implemented on a Finagle platform (Finagle is an extensible RPC system for the JVM, used to construct high concurrency servers). For example, the set of clients 110 may be Finagle processes operating as HTTP servers configured to receive incoming service requests from instances of a social network application running on user devices 140 and to distribute the service requests to the set of servers 120 of Finagle processes that perform application processing responsive to the service requests. These service requests can include, by way of example and without limitation, transmitting messages, requesting messages (e.g., requesting a timeline), search requests, and the like.
  • In some embodiments, service requests from a client to a server are transmitted over an established connection between the client and server. Connections between each of the set of clients 110 and one or more of the set of servers 120 may be established and maintained in different ways. In an example embodiment, a pool of connections between a client and its associated servers are established as the client starts up and/or is initialized and will remain open, with service requests flowing through them, until the client is shut down or fails. In another example embodiment, a connection is established and terminated for each service request, possibly resulting in significant cost and latency. In yet another example embodiment, after a connection remains idle for a long time, it may be switched to a cheap “inactive” mode, in which less resources are used to maintain the connection.
  • In certain example embodiments, each client in the set of clients 110 is configured to balance its load by distributing its service requests to a smallest subset (“aperture”) of servers that can satisfy the client's concurrency requirements. Concurrency requirements may be preconfigured and specified as a number of concurrent connections to be maintained by a client. For example and without limitation, FIG. 1 illustrates a server subset configuration in which clients 110 each have a concurrency requirement of three concurrent connections. As shown in FIG. 1, client 111 is assigned a server subset 151, client 112 is assigned a server subset 152, and client 113 is assigned server subset 153. Client 111 only connects to servers 121, 122 and 123 in server subset 151, client 112 only connects to servers 125, 126, and 127 in server subset 153, and client 113 only connects to servers 123, 124, and 125 in server subset 152. After the initial configuration of subsets, each client in the set of clients 110 operates independently to distribute its load among the servers in its corresponding subset of servers such that the total load from all clients in the set of clients 110 is evenly distributed across the set of servers 120. This configuration allows each of the clients to use resources commensurate to its offered load and incur penalties associated with connection establishment less frequently. That is, clients in these example embodiments are not required to maintain connections with every server in a large set of servers, and are required only to maintain connections to the typically much smaller number of servers in its assigned subset of servers.
  • An appropriate size for server subsets for a load balancing system according to example embodiments may be determined based on the characteristics of the system. In an example embodiment, a system operator may determine a subset size based on historical load of the system and input to the system. In other example embodiments, a subset size may be automatically determined by a central server or a client based upon historical information and/or the latest load in the system. Moreover, in certain example embodiments, a determined subset size may be communicated from the central server to all of the clients or from one client to the other clients. For example, when the number of clients in the set of clients 110 is significantly smaller than the number of servers in the set of servers 120, the subset size can be large enough so that none of the servers will never receive any of the load distributed by the clients. In another example, in some systems, there can be frequent load imbalances among the clients. For example, some of the clients may occasionally send “bursts” of requests. Because these bursts of requests will only be concentrated in those clients' assigned subsets of servers, a larger subset size may be needed in order to ensure that load is spread evenly across servers in the set of servers.
  • In certain example embodiments, the server subset size may be automatically determined (e.g., by a client and/or a central server in communication with the load balancer) in accordance with, for example, client load and/or certain preconfigured restriction requirements of clients. That is, in example embodiments, a client may determine a subset size that accommodates its expected load, e.g., its historical load and/or currently pending service requests. Moreover, for certain example embodiments, it is desirable that the number of connections for each client be at or above a minimum numbers of connections required, for example, to ensure that even low-throughput clients have a relatively safe amount of redundancy. In some example embodiments, the load balancing system may include a feedback controller on a client or a central server that can organically accommodate a client's load. Although, in the example embodiment illustrated in FIG. 1, the size of each of the server subsets 151-153 is 3, the present disclosure is not limited in this respect and different subset sizes may be used depending on a particular system, as discussed above.
  • The example embodiments described herein improve on conventional random subsetting and deterministic subsetting by introducing, among other things, “fractional load.” Fractional load enables at least some clients to send an entire unit load amount to some servers in their respective server subsets, while sending fractional amounts of a unit load to others of the servers in their respective subsets. A server may be sent fractional units of load from multiple clients with the total sum of the loads from these multiple clients preferably being equal to the respective loads of the other servers in the set of servers. According to some example embodiments, all servers in the set of servers are loaded equally with at least one boundary server (e.g., a server that is in more than one server subset) receiving a total load of a full unit load from two or more clients each of which sends a fractional amount of load. For example, one server in the set of servers 120 may receive different fractional units of load (e.g., ⅓ and ⅔ unit of load) from two clients in the set of clients 110 with the total sum of the load from these two clients being equal to the load of the other servers (e.g., one full unit of load). In other example embodiments, none of the servers of the set of servers receives fractional amounts of load. That is, each server in the set of servers may only be in one server subset and may receive a full unit load from only one client. For example, when there are three clients and nine servers and the required minimum number of connections is 3, each client will simply connect with three servers.
  • As servers may be assigned to different numbers of subsets or clients in example embodiments, some servers may be assigned to two subsets/clients while other servers are assigned to only one subset/client. Accordingly, in example embodiments, a client may be configured to assign different relative load weights to different servers in its server subset. For example, if the client is to send a unit load amount of service requests to a first server in its assigned subset of servers, it may assign a relative weight of 1 to the first server, and if the client is to send a fraction x of a unit load amount of service requests to a second server in its subset, it may assign a relative weight of x to the second server. The sum of the relative weights assigned to any particular server by all clients that send service requests to it may be equal to the relative weight corresponding to a unit load. In the example embodiment illustrated in FIG. 1, various relative load weights are determined for servers in each subset, as follows: (1) in the server subset 151 for client 1: servers 0-2 are assigned with relative load weights 1, 1, and ⅓ respectively; (2) in the server subset 153 for client 2: servers 4-6 are assigned with relative load weights ⅓, 1, and 1 respectively; and (3) in the server subset 152 for client 3: servers 2-4 are assigned with relative load weights ⅔, 1, and ⅔ respectively.
  • In this example, the total sum of the relative load weights of the servers in each subset is the same, e.g., 2⅓, meaning that each client is assigned the same amount of server capacity.
  • Moreover, all of the servers preferably receive the same volume of service requests from their connected clients. As shown in FIG. 1, in this example embodiment, servers 0, 1, 3, 5 and 6 are each is assigned to only one subset, and their relative load weights are the same, e.g., 1. In contrast, servers 2 and 4 are each assigned to two subsets, and they may each receive a fractional unit of load (e.g., ⅓ or ⅔) from two clients with the total sum of the load received at each of servers 2 and 4 being equal to one, which is the same as the relative load weight of the other servers in the set of servers. In accordance with the relative load weights assigned to server 2 in the subsets 151 and 152, server 2 will be sent a fraction ⅓ of a unit load from client 1, relative to one unit load sent from client 1 to each of servers 0 and 1, and will be sent a fraction ⅔ of a unit load from client 3, relative to one unit load sent from client 3 to server 3. Similarly, in accordance with the relative load weights assigned to server 4 in subsets 152 and 153, server 4 will be sent a fraction ⅓ of a unit load from client 2, relative to one unit load sent from client 2 to each of servers 5 and 6, and will be sent a fraction ⅔ of a unit load from client 3, relative to one unit load sent from client 3 to server 3.
  • When service requests are transmitted from clients to servers in accordance with the above configuration, the total sum of the load received by server 2 or 4 is the same as the load received by the other servers. Thus, by using fractional loads as described, the load from the set of clients 110 may be uniformly distributed across the set of servers 120.
  • The load balancing system according to example embodiments operates to configure the set of clients 110 to distribute the total load evenly among the set of servers. In certain example embodiments, each client in the set of clients 110 includes an independent load balancer, so that load balancing decisions are made independently by each client in the set of clients 111-113. In various example embodiments, the load balancing decisions may be made independently by each client in the set of clients, without any one or more of, a centralized coordination mechanism, explicit coordination between clients, or specific relationships between the sizes of the set of client and the set of servers. According to certain example embodiments, to make load balancing decisions, each client in the set of clients 110 (e.g., clients 111-113) is only required to know the size of the set of clients, its “position” in the set of clients, the size of the set of servers, and addresses of the servers in order to connect to them. Among many other advantages, this architecture not only allows each client to reduce latency and increase success rate associated with distributed service requests, but also enable clients to operate with fewer dependencies and points of failure while still converging on balanced (e.g., evenly distributed) global load distribution.
  • In some embodiments, the size of the set of clients and/or the size of the set of servers may change over time. For example, certain clients or servers may become unavailable or disconnected for upgrading, replaced or temporarily shut down. Any such changes to the sets of clients or servers may cause load balancing decisions to be recalculated. Therefore, it is desirable for each client to know the current status of the set of servers and the set of clients. In an example embodiment, a client may subscribe to information regarding selected status changes of the set of servers and the set of clients from the central server 130. For example, the central server may include a peer server set watcher as a process which monitors (e.g., continuously or periodically) the size and/or composition of the set of servers, and each client in the set of clients may register a recalculation of the server subsets (e.g., a closure) to run in its respective process when the peer server set watcher signals a change in the set of servers.
  • In an example embodiment, each client (e.g., implemented as a Finagle process) in the set of clients maintains a persistent connection to a central server, such as, a naming service server (e.g., a WilyNS endpoint or a lookup bound endpoint) which operates to push updates from a bound name. A lookup bound service running on the lookup bound endpoint may use a data structure such as the map Map[Path, (Option [Response], Queue [Promise])]. Each server in the set of servers and/or the set of servers being monitored may be represented as a bound name in the map. Client subscriptions requesting to be notified regarding updates may be represented as “promises” in the map, Each requested bound name (represented as a “Path” in the map) may have the last good response, along with a queue of requests waiting for the next response. When the set of servers updates, the lookup bound endpoint may update the last good response and drain the queue by fulfilling each promise. In this way, the naming service server may “push” updates to clients that have subscribed for such updates. Clients may subsequently send a follow-up request after receiving a “push”—this pattern may sometimes be referred to as “long-polling”. In order to synchronize clients and the naming service server's view of the set of servers, client requests may include an optional stamp. If the stamp matches the stamp of the last cached response, the request is enqueued in the map. Otherwise, the request is satisfied synchronously.
  • In another example embodiment, load balancing decisions may be made (instead of, or in addition to, being made independently by the respective clients) by the central server 130 based on relevant information, including the size of the set of clients, an identifier of each client (e.g., an index or a “position” of the client in the set of clients), and the size of the set of servers. In example embodiments, identifiers of the set of clients may be sorted, and hence the identifier of each client may have a unique “position” with respect to other identifiers. In some embodiments, servers and clients may selectively update the central server 130 with respect to changes to their operating status to enable the central server 130 to dynamically make load balancing decisions. That is, the central server 130 may determine the load balancing configurations for each of the clients in the set of clients 110 and may distribute the configurations to the respective clients that thereafter operate to distribute load according to those configurations. In some other embodiments, the central server 130 may itself actively monitor (e.g., by polling) for changes in the set of clients and/or the set of servers. With respect to load balancing decisions, such as recalculations and/or reconfigurations of server subsets etc., being made centrally at the central server 130, at least in some aspects, a naming service server is natural point of integration for this functionality since the naming server is responsible for directing clients to particular servers, for example, by interpreting a client's logical destination address and returning a concrete bound name or address.
  • FIG. 2 illustrates a non-limiting, example workflow of a process 200 for an example load balancer. In certain example embodiments, process 200 may be implemented by each of the clients in a set of clients such as the set of clients 110. The set of clients 110, with each implementing an instance of process 200, may operate to configure the set of clients to evenly distribute load according to example embodiments.
  • After entering process 200, at operation 210, the load balancer determines the total number of clients as a client set size and determines the total number of servers as a server set size. For example, in the system 100 of FIG. 1, the client set size of the set of clients 110 would be determined as 3, and the server set size of the set of servers 120 would be determined as 7. In some example embodiments, the set of clients 110 are logically arranged in an ordered sequence and each client is assigned a unique identifier (e.g., an “index number”) representing its “position” in the ordered sequence relative to the other clients. In these example embodiments, process 200 may also determine a position for at least one client in the set of clients.
  • Based at least on the determined server set size, the set of servers, and the determined client set size, a plurality of subsets of servers and relative load weights for servers in each subset are determined at operation 220. In some example embodiments, each of the server subsets includes the same number of servers. In other example embodiments, server subsets may include different numbers of servers. For example, when the client set size multiplied by the number of connections required by a client is not a multiple of the server set size, some servers may be assigned to multiple subsets (e.g., server 2 in FIG. 1 being assigned to subsets 151 and 152). Relative load weights may then be assigned to each server in each server subset, where, for each server, the assigned relative load weight in a particular subset represents a relative proportion of the total load that it is expected to receive. In order to configure the load balancing system to distribute the same volume of service requests to each of the servers in the set of servers, different relative load weights may be assigned to different servers in each subset. For example, as described above in relation to FIG. 1, server subset 151 has relative load weights 1, 1, and ⅓ assigned respectively to server 0, server 1, and server 2; and server subset 152 has relative load weights ⅔, 1, and ⅔ assigned to server 2, server 3, and server 4, respectively. In this example, the relative weight 1 assigned to server 0 may represent that the total load for server 0 is to be received from the client to which the subset 151 is assigned; likewise, the relative weight ⅓ assigned to server 2 as a member of subset 151 may represent that server 2 is to receive only ⅓ of its total load from the client to which subset 151 is assigned. In accordance with the different relative load weights, different proportions of the service requests may be transmitted from a client to respective servers in its assigned subset. The total sum of the relative load weights assigned to each server in its associated subset(s) is the same as that of the other servers, and hence, in some embodiments, the same volume of service requests will be transmitted to each of the servers. It should be noted that although in the above example, in which the set of servers is not logically duplicated for forming server subsets, the relative load weight ⅓ represents ⅓ of the total load at server 2. In cases in which the set of servers is duplicated prior to forming server subsets, a relative weight w assigned to a server by one client does not represent a fraction w of the total weight assigned to that server.
  • In certain example embodiments, the determination of the subsets may also be subject to other constraints, such as, for example, having a server subset size that is equal to or greater than a specified minimum number of connections required for each client. The specified minimum number of connections for a client can be considered as a minimum concurrency requirement to, among other things, ensure that each client had a minimum level of redundancy. Since, as noted above, a client establishes a connection with each server to which it distributes load, the minimum number of connections requirement represents a minimum number of servers that are required to be in a server subset. The load balancer may operate to determine server subsets that, in addition to satisfying the criteria specified in the previous paragraph, also meets specified constraints, such as, for example, the minimum connection constraint, if the size of the server subset determined according to the criteria described in the paragraph above is less than the specified minimum number of connections, then the set of servers may be logically expanded by duplicating the servers a number of times sufficient to satisfy the constraints, and the logically expanded set of servers may be divided among the set of clients. In some embodiments, instead of being statically configured as a configuration parameters the system may dynamically determine the minimum number of connections based, for example, on projected and/or actual load amounts. Further details of operation 220 are described in relation to FIGS. 3A and 3B.
  • At operation 230, after the determination of the server subsets and relative load weights for servers in each subset, each determined subset is assigned to a respective client. In some example embodiments, the assignment of a server subset to a client may be made according to the client's “position” in the set of clients relative to the other clients, for example, specified as a unique identifier (e.g., an index) assigned to the client. In other example embodiments, other technique may be used to assign each subset to a respective client. In example embodiments, when all the clients in the same set of clients use unique identifiers (positions) based on the same ordered sequence, each client may perform operation 230 independently to determine its assigned server subset (or the same set of server subsets and assignments as other clients) such that the server subsets are uniformly assigned to the clients uniformly. Although FIG. 2 illustrates operation 230 as following operation 220, it should be understood that each subset may be assigned to a respective client in the set of clients at operation 220.
  • Operations 210-230 relate to establishing the configurations in each of the clients 110. After the configurations are determined, each client may establish a connection to each of the servers in its server subset. The connections may be used for distributing the service requests.
  • At operation 240, a client may distribute service requests to the servers of its assigned subset in accordance with their relative load weights. In certain example embodiments, a client may transmit a weighted-quantity or a proportion of service requests to each of one of more servers in its assigned subset, while also transmitting another weighted-quantity or another proportion of service requests to at least one other server in its assigned subset, and the other weighted-quantity is a fraction of the weighted-quantity (or, equivalently, the other proportion is a fraction of the proportion). In the example system 100 shown in FIG. 1, the client 1 may transmit a first weighted-quantity (e.g., a unit load) of service requests to servers 0 and 1, while transmitting a second weighted-quantity of service requests to server 2. Moreover, the second weighted-quantity is a fraction (e.g., ⅓) of the first weighted-quantity. That is, the second weighted-quantity is a ⅓ fractional load of the unit load.
  • Another client may distribute a fractional load to more than one server in its assigned subset. For example, in the example system 100 shown in FIG. 1, client 3 distributes a ⅔ fractional load to each of server 2 and server 4 respectively, relative to one unit of load distributed to server 3, in its assigned subset 152.
  • According to example embodiments, while each client in a set of clients may transmit service requests to several servers in a set of servers, respective servers in the set of servers may receive service requests from different numbers of clients. For example, some of the servers may receive service requests from multiple clients, while other servers may only receive service requests from a single client. In the example system 100 shown in FIG. 1, servers 2 and 4 will receive service requests from two clients, while the other servers only receive service requests from a single client.
  • FIG. 3A illustrates a non-limiting, example process 300A for determining subsets of servers and relative load weights for servers in each subset. According to some embodiments, process 300A may be performed during operation 220 of process 200 described above with respect to FIG. 2.
  • At operation 310, a client width number is determined based on the number of clients (e.g., the size of the set of clients) and the number of servers (e.g., the size of the set of servers). In some aspects, the client width number can be considered as the amount of unit loads of service requests to be distributed by each client. In an example embodiment, the client width number is calculated by dividing the number of servers by the number of clients. In the example system shown in FIG. 1, the client width number is 2⅓, which is the result of dividing the number of servers (e.g., 7 in FIG. 1) by the number of clients (e.g., 3 in FIG. 1).
  • At operation 320, the load balancer determines whether the client width number is less than a minimum subset size, such as the above described specified minimum number of connections required for each client.
  • If the load balancer determines at operation 320 that the determined client width number is not less than the minimum subset size, then it will determine subsets of servers and relative load weights for servers in each subset at operation 330 as described above. For example, if the specified required minimum number of connections for each client in the system 100 is 2, which is less than the calculated client width number 2⅓, then the server subsets can be determined by dividing the set of servers 120 into three equal-sized subsets for the set of clients 110 as described above.
  • On the other hand, if the load balancer determines at operation 320 that the determined client width number is less than the minimum subset size, then it, logically expands the servers in the set of servers at operation 340. The set of servers may be logically expanded by duplicating the servers in the set of servers. For example, the seven servers shown in FIG. 1 in the set of servers may be logically expanded by duplication to fourteen servers, or twenty-one servers, etc., so that the client width number, re-calculated based on the logically expanded set of servers, is larger than the minimum subset size.
  • In the system 100 shown in FIG. 1, if the minimum number of connections required by each client is 4, then the initial client width number 2⅓ as calculated at operation 310 is less than the required minimum number of connections. As a result, the load balancer can logically expand the set of servers by duplicating these servers to be fourteen server instances, and hence each of the servers may be regarded as two server instances, each of which will be assigned to at least one server subset. As a result, due to the logical expansion of servers, each server in the set of servers may be assigned to at least two subsets of servers, and hence may receive service requests from at least two clients, as indicated in FIG. 4.
  • FIG. 4 illustrates a non-limiting, example configuration of another example system 400 for balancing load from a set of clients 410 (e.g., clients 411, 412 and 413 or clients 1-3) across a set of servers 420 (e.g., servers 421-427 or servers 0-6). In this example system 400, the number of clients 410 is 3, the number of servers 420 is 7, and the minimum number of connections required by each client is 4. As a result, a client width number 2⅓, as initially calculated at operation 310, is less than the required number of connections 4. Consequently, the set of servers 420 is logically expanded to include fourteen server instances at operation 340.
  • After the logical expansion of the servers at operation 340, the load balancer proceeds to operation 310 to re-calculate another client width number based on the number of clients and the number of the expanded server instances (e.g., size of the expanded set of servers). For example, for the example system illustrated in FIG. 4, at operation 310, a client width number (4⅔) may be calculated by dividing the number of the expanded server instances (e.g., 14) by the number of the clients (e.g., 3). Thus, the balancer will determine at operation 320 that the client width number 4⅔ is no longer less than the required number of connections 4, and hence subsets of servers can be determined based on the expanded server instances at operation 330.
  • FIG. 3B illustrates another non-limiting, example process 300B for determining subsets of servers and relative load weights for servers in each subset. According to some embodiments, process 300B may be performed during operation 220 of process 200 described above with respect to FIG. 2. As shown in FIG. 3B, process 300B comprises the same operations of process 300A except for operation 340. That is, the operation 340 of process 300A in FIG. 3A is replaced with an operation 350 in FIG. 3B. According to the process 300B, if the load balancer determines at the operation 320 that the determined client width number is less than the minimum subset size, then it logically multiplies (e.g., double, triple, etc.) the client width number at the operation 350. For example, in the system 100 shown in FIG. 1, the calculated client width number 2⅓ may be doubled to be 4⅔. After that, it will be determined at the operation 320 that the doubled client width number is larger than the required minimum number of connections 4, and hence subsets of servers can be determined based on the expanded server instances at the operation 330.
  • At the operation 330, due to “fractional load” capability, some of the servers may be assigned to three subsets with different relative load weights while other servers are assigned to two subsets. Ideally, all of the servers will still receive the same volume of service requests from their connected clients. As shown in FIG. 4, a server subset including servers 0-2 and 4-6 is assigned to client 1, a server subset including servers 2-6 is assigned to client 2, and a server subset including servers 1-4 is assigned to client 3. Thus, client 1 will send service requests to six servers, while clients 2 and 3 will send service requests to five servers. Moreover, different relative load weights may be assigned to different servers in each of these subsets, as: (1) In the server subset for client 1: a relative load weight 1 is assigned to server instances 0-1 and 5-6, and a relative load weight ⅓ is assigned to both server instances 2 and 4; (2) In the server subset for client 2: a relative load weight 1 is assigned to server instances 3-6 and a relative load weight ⅔ is assigned to server instance 2; and (3) In the server subset for client 3: a relative load weight 1is assigned to server instances 0-3, and a relative load weight ⅔ is assigned to server instance 4.
  • In this example, the total sum of relative load weights of all of the servers in each subset is the same as the determined client width number—4⅔, and the total sum of the assigned relative load weights assigned to each of the servers is also the same—2.
  • The above described methods of determining subsets of servers and relative load weights for servers in each subset may be implemented, according to some embodiments, by forming (e.g., in the memory of the computer performing process 200 described above) a logical ring topology representing relationships between a set of servers (e.g., set of servers 120) and a set of clients (e.g., set of clients 110). FIG. 5 illustrates a non-limiting, example logical ring topology 500 that may be used in determining server subsets and relative load weights of servers in each subset. In the ring topology 500, a set of servers are uniformly distributed on a continuous logical ring (e.g., a server ring) 510 in a manner that each server is assigned a “slice” of a same length on the ring. For example, the entire circumference of the ring can be equally divided among the set of servers so as to assign a slice of the circumference to each server in the set of servers. A slice assigned to a server may be referred to as a “server slice”. In addition, the entire circumference of the ring is also equally sub-divided and allocated to each of the clients in the set of clients, so that each client is also assigned its own portion of the ring. A portion of the ring assigned to a client may be referred to as the “domain” of the client, and that client is configured to only distribute service requests to servers within its domain. That is, each client is configured to distribute service requests only to servers whose server slices overlap the client's own assigned portion (i.e. domain). It is noted that, due to the capability to distribute fractional loads as described above, the lengths of the portions of the ring assigned to all of the clients are the same.
  • In the ring topology 500 shown in FIG. 5, seven servers (servers 0-6) are uniformly distributed on a continuous ring 510. Servers 0-6 are assigned to slices 511-517 respectively, and the lengths of the slices 511-517 are the same. As described above, the concept of fractional aperture is introduced in example embodiments to enable a client to distribute fractional units of load to one or more server(s). As a result, a server ring can be represented as a continuous ring and can be divided at any point of the ring. This allows a load balancer to naturally express full ring coverage—each client is required to communicate with the servers which fall within its domain (e.g., servers whose server slices overlap that client's portion of the ring). Moreover, in contrast to conventional deterministic subsetting, a client is not required to treat the servers in its assigned subset equally, as different proportions of service requests may be sent from the client to different servers.
  • In this example, due to the capability of clients to distribute fractional loads, the ring is evenly divided into three client slices 521, 522, and 523 for clients 1-3 respectively, and hence each client is equally assigned ⅓ portion of the entire ring. The boundaries of the portions 521-523 assigned to each client do not need to be aligned on the edges of server slices 511-517. That is, a server slice (e.g., one of the slices 511-517) can be shared by two clients. For example, in FIG. 5, the server slice 513 for server 2 is shared by clients 1 and 3, and the server slice 515 for server 4 is shared by clients 2 and 3.
  • By evenly dividing the server ring 510 among the clients 1-3, the ring topology 500 represents the relationship between the set of servers 0-6 and the set of clients 1-3, including the server subset assigned to each client, and relative load weights assigned to servers in each subset. For example, the ring topology 500 shows the following relationships between the servers 0-6 and clients 1-3: (1) A server subset including servers 0-2 is assigned to client 1, and servers 0-2 in this subset are assigned with relative load weights 1, 1, and ⅓ respectively; (2) A server subset including servers 4-6 is assigned to client 2, and servers 4-6 in this subset are assigned with relative load weights ⅓, 1, and 1 respectively; and (3) A server subset including servers 2-4 is assigned to client 3, and servers 2-4 in this subset are assigned with relative load weights ⅔, 1, and ⅔ respectively.
  • In the logical ring topology, all of the servers are loaded equally (e.g., with a full unit load amount of work). The majority of servers may each receive its full unit of load from a single client. However, a boundary server (e.g., servers that belong to more than one subset) may potentially receive a full share unit of load from two different clients. As shown in the example logical ring topology 500, servers 0-6 are loaded equally with servers 2 and 4 (which are boundary servers) each receiving a full share unit of load from two clients.
  • The logical ring topology illustrates features and advantages of some example embodiments. For example, in some example embodiments, changes to the set of servers as represented in the logical server ring have diminishing effects on clients whose domains are further away from the change on the logical ring. For example, replacement of a server on one part of the ring may not affect a client on the radially opposite part whose assigned portion of the ring does not overlap the changed servers. In practice, the capability to affect changes to some servers without affecting a substantial number of the server subsets may have beneficial implications by reducing resource churn and enabling the swapping/upgrading of servers with some degree of seamlessness to ongoing servicing of incoming service requests.
  • In the example embodiment illustrated in FIG. 5, the number of connections required by each client (e.g., the higher of the number of connections determined according to an estimated/projected load, or the specified minimum number of connections) may be 1 or 2, so that this constraint can be fulfilled by dividing the size of the server set (e.g., seven servers) evenly among the three clients without logically expanding the client slices 521, 522 and 523 or the servers 1-7. In contrast, in other example embodiments, the number of connections required by each client may be larger than the number calculated by dividing the number of servers by the number of clients. In some example embodiments, in order to accommodate the higher number of required connections, the client slices 521, 522 and 523 may be repeatedly expanded by duplicating the client slices until the total sum of portion(s) and/or the entirety of server slices overlapping each of the client slices is larger than the required number of connections. In other example embodiments, the set of servers 1-7 may be repeatedly logically expanded by duplicating the set of servers until a number calculated by dividing the size of the expanded set of servers by the number of clients is larger than the required number of connections.
  • FIG. 6 illustrates another non-limiting, example ring topology 600 for determining server subsets and relative load weights of servers in each subset. In the ring topology 600, seven servers (servers 0-6) are uniformly distributed on a continuous ring 610 so that each of the servers is assigned a slice of the server ring 610 and the lengths of the server slices assigned to servers 0-6 respectively are the same. In particular, servers 0-6 are assigned to server slices 611-617 respectively,
  • However, in contrast to the example shown in FIG. 5, the minimum number of connections required by clients 1-3 in this example is four. Therefore, the server subsets cannot be determined by merely dividing the server ring 610 evenly into three equally-sized portions for the three clients, as that only yields 2⅓ server units for each client, which is less than 4. In other words, assigning server subsets to clients by dividing the server ring 610 equally among the three clients would yield only three connections to servers for each client—which is less than the required four connections. Therefore, in some example embodiments, the server ring 610 is logically expanded so that each client can be assigned a domain that overlaps at least four server slices. In this particular example, logically duplicating the server ring 610 once is sufficient to satisfy the client connection requirement. This logical expansion can be represented as the load balancer wrapping around the server ring 610 twice with the client domains to satisfy the required number of connections. That is, each of the seven servers is deemed as if being expanded logically to be two server instances, and the connections between clients and servers are expanded as well. After that, the logically expanded two server rings are evenly divided into three portions 621, 622 and 623 among the three clients, so that each client is assigned one portion of the expanded two server rings. In other example embodiments, each of the client slices 521, 522 and 523 may be expanded (e.g., doubled) so that each client slice overlaps at least a total of four server slices. As shown in FIG. 6, the client slices 521, 522 and 523 shown in FIG. 5 are expanded to client slices 621 622 and 623. Due to the expansion of the client slices, the total number of servers overlapping with the expanded client slices is also expanded. That is, the connections between clients and servers are expanded as well.
  • As shown in FIG. 6, clients 1-3 are assigned with the portions 621-623 respectively. Each of the portions of the server rings corresponds to a subset of servers assigned to a client, and indicates the relationship between the servers in the subset and the client, including relative load weights assigned to the servers of the subset. In particular: (1) A server subset including server instances 0-2 and 4-6 is assigned to client 1, and, in this subset, servers instances 0-2 and 4-6 are assigned with a relative load weight 1, 1, ⅓, ⅓, 1, and 1 respectively; (2) A server subset including server instances 2-6 is assigned to client 2, and, in this subset, serves instances 2-6 are assigned with a relative load weights ⅔, 1, 1, 1, and 1 respectively; and (3) A server subset including server instances 0-4 is assigned to client 3, and, in this subset, server instances 0-4 are assigned with relative load weights 1, 1, 1, 1, and ⅔ respectively.
  • In the above example, all servers 0-6 are equally loaded with each client utilizing 4⅔ server units. Servers 0-1, 3 and 5-6 receive two full units of load from two clients, while servers 2 and 4 receive totally two full units of load from three clients. Specifically, as shown in FIGS. 4 and 6, server 2 receives ⅓ unit of load from client 1, receives ⅔ unit of load from client 2, and receives one full unit of load from client 3, while server 4 receives ⅓ unit of load from client 1, receives ⅔ unit of load from client 3, and receives one full unit of load from client 2. Servers 2 and 4 thus each receives a full share unit of load from two different clients.
  • According to some embodiments, a load balancer may handle restarts and failures gracefully and robustly by continuing to load servers (e.g., backend servers) uniformly while minimizing or reducing churn. To adjust load balancing in response to such changes as restarts and failures, in a load balancing system (e.g., as that described in relation to FIGS. 2-3), where all load balancing decisions are made independently by each of the clients, the client may be required to re-determine their respective subsets of servers and relative load weights for servers in each subset by any of the methods described above when is the client becomes aware of the change of the number of servers, the number of clients, the subset size, and/or its position in the set of clients. For example, when a server becomes unavailable, its clients may at least temporarily select a replacement server. When a replacement server is selected, the clients may create new TCP connections, which creates additional overhead. Similarly, when a client restarts, it may be required to reopen the connections to all its servers. Alternatively, a centralized load balancer (e.g., running on a central server) may make load balancing decisions when it is aware of any changes of the number of servers, the number of clients, the subset size, and/or the identifier of a client.
  • In the above described example embodiments, a set of clients is assumed to converge on a uniform server subset size when they are offered, or configured to receive, the same amount of load. However, in some embodiments, one or more clients may dynamically expand their respective server subsets. For example, when a client receives a burst of traffic beyond projected levels such that it determines cannot be handled by the servers of the current subset, the client may temporarily expand the number of servers it distributes to by, for example, temporarily expanding its server subset. This adjustment may or may not be performed in a coordinated fashion
  • FIG. 7 illustrates a non-limiting, example block diagram for an example device 700. The example device 700 may be a computer implementing any of the clients or any of the servers described above in connection with FIGS. 1-6, or a device hosting at least one of these clients and/or one of these servers. In this example embodiment, the device 700 includes a communication module 710, an input/output module 720, a processing system 730, and a storage 740, all of which may be communicatively linked by a system bus, network, or other connection mechanisms.
  • The communication module 710 functions to allow the device 700 to communicate with one or more of the other devices (e.g., user devices, clients, servers or a global server). The communication module 710 is configured to transmit data to other devices and/or receive data from other devices.
  • In certain example embodiments, the communication module 710 may comprise one or more communication interfaces supporting satellite communications, radio communications, telephone communications, cellular communications, internet communications, and/or the like. In other example embodiments, the communication module 710 may comprise a wireless transceiver with connected antenna, a wireless LAN module, a radio-frequency (RE), Infrared, or Bluetooth® transceiver, and/or a near field communication transceiver module. One or more of these communication components may collectively provide a communication mechanism by which the device 700 can communicate with other devices, platform and/or networks.
  • The data storage 740 may comprise one or more volatile and/or non-volatile storage components, such as, a hard disk, a magnetic disk, an optical disk, read only memory (ROM) and/or random access memory (RAM), and may include removable and/or non-removable components. The date storage 740 may be integrated in whole or in part with the processing system 730.
  • The processing system 730 may comprise one or more processors 731, incl one or more general purpose processors and/or one or more special purpose processors (i.e., DSPs, GPUs, FPs or ASICs). The processing system 730 may be capable of executing application program instructions (e.g., compiled or non-compiled program and/or machine code) stored in data storage 740 to perform any of the functions and processes described above. The data storage 740 may include non-transitory computer-readable medium, having stored thereon program instructions that, if executed by the processing system 730, cause the device 700 to perform any of the processes or functions disclosed herein and/or illustrated by the accompanying drawings.
  • In certain example embodiments, the program instructions stored in the storage 740 may include an operating system program and one or more application programs, such as program instructions for one of the above-described load balancers. For example, the operations in example processes of FIGS. 2-3 can be defined by the program instructions stored in the storage 740 and controlled by processing system 730 executing the program instructions.
  • The input/output module 720 of the device 700 may enable the device 700 to interact with a human or non-human user, such as to receive input from a user and to provide output to the user. The input/output module 720 may include a touch-sensitive or presence-sensitive panel, keypad, keyboard, trackball, joystick, microphone, still camera and/or video camera, and the like. The input/output module 720 may also include one or more output components such as a display device, which may be combined with a touch-sensitive or presence-sensitive panel. In an example embodiment, the input/output module 720 may display various user interfaces to enable a user or an operator to access services or functions provided by the device 700.
  • As described above, the improved deterministic subsetting load balancing techniques of various embodiments may be implemented such that the load balancing configurations are either centrally determined or are determined. in a distributed manner. Whereas the central determination results in :less use of computing overhead, the distributed determination. further improves the resilience and robustness of the load balancing. Both types of load balancing determinations in example embodiments yield load balancing systems that enable more even distribution of loads and better control of the maximum load levels experiences in certain servers. These improved characteristics of the load balancing improve the computing performance and/or the memory use of the computers used in the load balancing system, and moreover, improves overall system latency, throughput and responsiveness. The above described embodiments may also be used for distributing load among resources other than servers, such as computers, network links, processors, hard drives, etc.
  • While the disclosure has been described in connection with what is presently considered to be the most practical and preferred embodiments, it is to be understood that the disclosure is not to be limited to the disclosed embodiments, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (21)

1. A load-balancing method for balancing a processing load of a plurality of clients among a plurality of servers, the method comprising:
assigning a different subset of the plurality of servers to each respective client in the plurality of clients, said each client being configured to distribute processing requests only to the servers in the subset assigned thereto;
for each respective client in the plurality of clients, determining respective load weights for distributing processing requests to the servers in the subset assigned thereto, wherein the load weights for each respective client are determined such that at least one server of the plurality of servers is assigned to multiple clients in the plurality of clients, and at least one of the load weights for each respective client is a fraction of another one of the load weights for the client; and
controlling the plurality of clients to distribute processing requests based on the determined load weights,
wherein the assigning a different subset of the plurality of servers to each respective client comprises:
determining a client width number based upon the number of servers in the plurality of servers and the number of clients in the plurality of clients; and
dividing the plurality of servers into said subsets in accordance with the determined client width number.
2. The method of claim 1, wherein, for a particular client in the plurality of clients, respective load weights in the subset assigned thereto are determined based upon the number of servers in the plurality of servers, the number of clients in the plurality of clients, and an identifier of the client, and
wherein the dividing the plurality of servers into said subsets in accordance with the determined client width number comprises:
determining whether the client width number is not less than a preconfigured subset size;
upon determination that the client width number is not less than the preconfigured subset size, evenly dividing the plurality of servers into said subsets; and
upon determination that the client width number is less than the preconfigured subset size, forming a logical expanded set of servers by duplicating the plurality of servers, and, based upon the logical expanded set of servers, evenly dividing the plurality of servers into said subsets.
3. The method of claim 1, wherein each of the load weights indicates a relative volume of processing requests to be transmitted from a client to servers in the subset assigned to the client.
4. The method of claim 1, wherein the assigning a different subset of the plurality of servers to each respective client further comprises:
representing the plurality of servers in a continuous logical ring in a memory, with a respective server slice of a first width in the ring representing each server;
evenly dividing the ring into sub-portions of a second width, the number of the sub-portions being equal to the number of clients in the plurality of clients; and
assigning the subset to the respective client based upon a corresponding one of the sub-portions.
5. The method of claim 4, wherein determining respective load weights for distributing the processing requests to the servers in the subset assigned to each respective client comprises:
determining the respective load weights based upon respective widths of portions of corresponding server slices overlapping with the sub-portion corresponding to the respective client.
6. A client in a load balancing computer processing system comprising a plurality of clients, the client comprising communication circuitry and at least one processor, wherein the at least one processor is configured to:
control the communication circuitry to distribute processing requests to a respective subset of a plurality of servers over a communication network by transmitting a first weighted-quantity of the processing requests to each of one or more of the servers in the respective subset and transmitting to at least one other server in the respective subset a second weighted-quantity of the processing requests, the second weighted-quantity being a fraction of the first weighted-quantity such that at least one server of the plurality of servers is assigned to multiple clients, and
determine the respective subset of servers and a proportion of processing requests to be sent to each server in the respective subset based at least upon information about other said clients, information about the plurality of servers, and size of the respective subset of servers,
wherein determining the respective subset of servers and the proportion of processing requests to be sent to each server in the respective subset comprises:
determining the number of clients in the plurality of clients, the number of the servers in the plurality of servers, and the specified subset size;
determining a client width number based upon the number of servers in the plurality of servers and the number of clients in the plurality of clients; and
dividing the plurality of servers into said subsets in accordance with the determined client width number.
7. The client according to claim 6, wherein the one or more of the servers comprise a majority of servers in the respective subset of servers.
8. The client according to claim 6, wherein the first weighted-quantity and the second weighted-quantity are determined based upon at least the number of servers in the plurality of servers, and the number of clients in the plurality of clients.
9. The client according to claim 8, wherein the dividing the plurality of servers into said subsets in accordance with the determined client width number comprises:
determining whether the client width number is less than a preconfigured subset size;
upon determination that the client width number is not less than the specified subset size, evenly dividing the plurality of servers into subsets of servers; and
upon determination that the client width number is less than the specified subset size, forming a logical expanded set of servers by duplicating the plurality of servers, and, based upon the logical expanded set of servers, evenly dividing the plurality of servers into subsets of servers.
10. The client according to claim 6, wherein the client is configured to transmit the second weighted-quantity of processing requests to a first one of the other servers in the respective subset and to transmit a third weighted-quantity of processing requests to a second one of the other servers, the second and third weighted-quantities each being a fraction of the first weighted-quantity.
11. The client according to claim 6, wherein at least one of said respective subset of servers having a different number of servers than others of the respective subset of servers.
12. The client according to claim 6, wherein the client is configured to determine, independently of other said clients, the respective subset of servers, and the proportion of processing requests to be sent to each server in the respective subset based upon information about other said clients, information about the plurality of servers, the size of the respective subset of servers, and an unique identifier for the client.
13. The client according to claim 6, wherein each server in the plurality of servers is configured to be homogeneous with respect to other servers in the plurality of servers.
14. The client according to claim 6, wherein each of the servers is configured to receive processing requests from at least one of the clients, and at least one of the servers being configured to receive first and second amounts of the processing requests from two clients, a total sum of the first and second amounts of the processing requests being equal to the first weighted-quantity of the processing requests.
15. The client according to claim 6, wherein the client is configured to receive the processing requests from a plurality of other devices.
16. The client according to claim 6, wherein the first weighted-quantity is uniform among all clients in the plurality of clients, and wherein the second weighted-quantity is non-uniform among at least some of the clients in the plurality of clients.
17. The client according to claim 6, wherein a first client in the plurality of clients is configured to transmit the second weighted-quantity of the respectively corresponding processing requests to a first one of the servers, and a second client of the plurality of clients is configured to transmit the second weighted-quantity of the respectively corresponding processing requests to the first one of the servers and a second one of the servers.
18. The client according to claim 6, wherein each of the servers is configured as a HTTP backend server and each of the clients is configured as a HTTP proxy.
19. The client according to claim 6, wherein determining the respective subset of servers and the proportion of processing requests to be sent to each server in the respective subset further comprises:
representing the plurality of servers in a continuous logical ring, with a respective slice of a first width in the ring representing each server;
evenly dividing the ring into sub-portions of a second width, a total number of the sub-portions being equal to the number of clients in the plurality of clients; and
determining the respective subset based upon a corresponding one of the sub-portions.
20. The client according to claim 6, wherein the client is configured to establish a communication connection to each server in a corresponding subset of servers, and wherein at least one server in the plurality of servers is replaceable or removable without causing at least one of the plurality of clients to re-establish previously established connections to servers in a corresponding subset of servers.
21. A non-transitory computer readable storage medium storing computer program instructions that, when executed by at least one processor of a client in a computer processing system, causes the client to balance load distributed over a communication network among a plurality of servers, comprising:
determining a total number of the plurality servers as a server set size, a total number of a plurality clients as a client set size, an unique identifier assigned to the client, and a subset size for the client, the subset size being a total number of servers to be connected with the client; and
based upon the determined server set size, client set size, subset size, and identifier assigned to the client, determining a subset of servers from the plurality of servers, and relative load weights for servers in the selected subset, each of the relative load weights indicating relative amounts of processing requests transmitted from the client to respective server in the subset, and the selected subset having a size of at least the determined subset size and of a same size as respective subsets of selected by each other client in the plurality of clients,
wherein at least one of the relative load weights for one server in the selected subset is a fraction of another of the relative load weights for another server in the selected subset,
wherein determining a subset of servers from the plurality of servers comprises:
determining a client width number based upon the server set size and the client set size; and
dividing the plurality of servers into subsets of servers in accordance with the determined client width number.
US17/407,343 2018-08-13 2021-08-20 Load balancing deterministically-subsetted processing resources using fractional loads Abandoned US20210382755A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/407,343 US20210382755A1 (en) 2018-08-13 2021-08-20 Load balancing deterministically-subsetted processing resources using fractional loads

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16/101,748 US10579432B1 (en) 2018-08-13 2018-08-13 Load balancing deterministically-subsetted processing resources using fractional loads
US16/775,292 US11119827B2 (en) 2018-08-13 2020-01-29 Load balancing deterministically-subsetted processing resources using fractional loads
US17/407,343 US20210382755A1 (en) 2018-08-13 2021-08-20 Load balancing deterministically-subsetted processing resources using fractional loads

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US16/775,292 Continuation US11119827B2 (en) 2018-08-13 2020-01-29 Load balancing deterministically-subsetted processing resources using fractional loads

Publications (1)

Publication Number Publication Date
US20210382755A1 true US20210382755A1 (en) 2021-12-09

Family

ID=69406714

Family Applications (3)

Application Number Title Priority Date Filing Date
US16/101,748 Expired - Fee Related US10579432B1 (en) 2018-08-13 2018-08-13 Load balancing deterministically-subsetted processing resources using fractional loads
US16/775,292 Active US11119827B2 (en) 2018-08-13 2020-01-29 Load balancing deterministically-subsetted processing resources using fractional loads
US17/407,343 Abandoned US20210382755A1 (en) 2018-08-13 2021-08-20 Load balancing deterministically-subsetted processing resources using fractional loads

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US16/101,748 Expired - Fee Related US10579432B1 (en) 2018-08-13 2018-08-13 Load balancing deterministically-subsetted processing resources using fractional loads
US16/775,292 Active US11119827B2 (en) 2018-08-13 2020-01-29 Load balancing deterministically-subsetted processing resources using fractional loads

Country Status (1)

Country Link
US (3) US10579432B1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769271A (en) * 2018-08-20 2018-11-06 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of load balancing
US11381639B1 (en) 2020-06-03 2022-07-05 Amazon Technologies, Inc. Load balancing technique selection based on estimated load
US11245628B1 (en) * 2020-06-03 2022-02-08 Amazon Technologies, Inc. Load balancing based on randomized selection of a load value based on a load probability distribution
US11032361B1 (en) * 2020-07-14 2021-06-08 Coupang Corp. Systems and methods of balancing network load for ultra high server availability
CN113822485B (en) * 2021-09-27 2023-10-20 国网山东省电力公司泗水县供电公司 Power distribution network scheduling task optimization method and system
US11838193B1 (en) * 2022-12-16 2023-12-05 Amazon Technologies, Inc. Real-time load limit measurement for a plurality of nodes

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078303A1 (en) * 2009-09-30 2011-03-31 Alcatel-Lucent Usa Inc. Dynamic load balancing and scaling of allocated cloud resources in an enterprise network
US9549043B1 (en) * 2004-07-20 2017-01-17 Conviva Inc. Allocating resources in a content delivery environment

Family Cites Families (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119143A (en) * 1997-05-22 2000-09-12 International Business Machines Corporation Computer system and method for load balancing with selective control
US6578066B1 (en) * 1999-09-17 2003-06-10 Alteon Websystems Distributed load-balancing internet servers
US6314465B1 (en) * 1999-03-11 2001-11-06 Lucent Technologies Inc. Method and apparatus for load sharing on a wide area network
JP2000278720A (en) * 1999-03-26 2000-10-06 Ando Electric Co Ltd Device and method for evaluating dynamic picture communication
JP2002091936A (en) * 2000-09-11 2002-03-29 Hitachi Ltd Device for distributing load and method for estimating load
JP4230673B2 (en) * 2001-02-22 2009-02-25 富士通株式会社 Service management device
US7174379B2 (en) * 2001-08-03 2007-02-06 International Business Machines Corporation Managing server resources for hosted applications
US20040088408A1 (en) * 2002-11-01 2004-05-06 Igor Tsyganskiy Methods and systems for routing requests at a network switch
KR100715674B1 (en) * 2005-09-15 2007-05-09 한국전자통신연구원 Load balancing method and software steaming system using the same
US20070143460A1 (en) * 2005-12-19 2007-06-21 International Business Machines Corporation Load-balancing metrics for adaptive dispatching of long asynchronous network requests
JP2007183883A (en) * 2006-01-10 2007-07-19 Fujitsu Ltd Resource plan preparation program, recording medium with this program recorded thereon, and apparatus and method for preparing resource plan
US7808931B2 (en) * 2006-03-02 2010-10-05 Corrigent Systems Ltd. High capacity ring communication network
US7760641B2 (en) * 2006-07-10 2010-07-20 International Business Machines Corporation Distributed traffic shaping across a cluster
US8291108B2 (en) * 2007-03-12 2012-10-16 Citrix Systems, Inc. Systems and methods for load balancing based on user selected metrics
US8484656B2 (en) * 2007-03-12 2013-07-09 Citrix Systems, Inc. Systems and methods for providing global server load balancing of heterogeneous devices
US7984141B2 (en) * 2007-07-16 2011-07-19 Cisco Technology, Inc. Independent load balancing for servers
US8539565B2 (en) * 2008-03-21 2013-09-17 Microsoft Corporation Load balancing in server computer systems
US9071608B2 (en) * 2008-04-28 2015-06-30 International Business Machines Corporation Method and apparatus for load balancing in network based telephony application
US9116752B1 (en) * 2009-03-25 2015-08-25 8X8, Inc. Systems, methods, devices and arrangements for server load distribution
US8578027B2 (en) * 2010-03-09 2013-11-05 Blackberry Limited Communications system providing server load balancing based upon load and separation metrics and related methods
US8533337B2 (en) * 2010-05-06 2013-09-10 Citrix Systems, Inc. Continuous upgrading of computers in a load balanced environment
US9363312B2 (en) * 2010-07-28 2016-06-07 International Business Machines Corporation Transparent header modification for reducing serving load based on current and projected usage
US8954587B2 (en) * 2011-07-27 2015-02-10 Salesforce.Com, Inc. Mechanism for facilitating dynamic load balancing at application servers in an on-demand services environment
JP5782641B2 (en) * 2012-08-31 2015-09-24 株式会社日立製作所 Computer system and packet transfer method
US9232000B1 (en) * 2012-12-21 2016-01-05 Emc Corporation Method and system for balancing load across target endpoints on a server and initiator endpoints accessing the server
US9560126B2 (en) * 2013-05-06 2017-01-31 Alcatel Lucent Stateless load balancing of connections
US9537787B2 (en) * 2013-08-05 2017-01-03 International Business Machines Corporation Dynamically balancing resource requirements for clients with unpredictable loads
US9501325B2 (en) * 2014-04-11 2016-11-22 Maxeler Technologies Ltd. System and method for shared utilization of virtualized computing resources
JP6272190B2 (en) * 2014-09-02 2018-01-31 株式会社日立製作所 Computer system, computer, load balancing method and program thereof
US9571570B1 (en) * 2014-09-24 2017-02-14 Juniper Networks, Inc. Weighted rendezvous hashing
US10530700B2 (en) * 2015-07-07 2020-01-07 Strong Force Iot Portfolio 2016, Llc Message reordering timers
US10788996B2 (en) * 2015-03-25 2020-09-29 Hitachi, Ltd. Computer system and process execution method
US10320891B2 (en) * 2016-01-25 2019-06-11 Vmware, Inc. Node selection for message redistribution in an integrated application-aware load balancer incorporated within a distributed-service-application-controlled distributed computer system
US11140217B2 (en) * 2016-05-06 2021-10-05 Telefonaktiebolaget Lm Ericsson (Publ) Dynamic load calculation for server selection
US10397315B2 (en) * 2016-05-26 2019-08-27 Fujitsu Limited Information processing apparatus and load distribution control method
US10587668B2 (en) * 2017-10-18 2020-03-10 Citrix Systems, Inc. Method to determine optimal number of HTTP2.0 streams and connections for better QoE
US10805179B2 (en) * 2017-12-28 2020-10-13 Intel Corporation Service level agreement-based multi-hardware accelerated inference
US20190303757A1 (en) * 2018-03-29 2019-10-03 Mediatek Inc. Weight skipping deep learning accelerator

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9549043B1 (en) * 2004-07-20 2017-01-17 Conviva Inc. Allocating resources in a content delivery environment
US20110078303A1 (en) * 2009-09-30 2011-03-31 Alcatel-Lucent Usa Inc. Dynamic load balancing and scaling of allocated cloud resources in an enterprise network

Also Published As

Publication number Publication date
US10579432B1 (en) 2020-03-03
US20200050485A1 (en) 2020-02-13
US20200167198A1 (en) 2020-05-28
US11119827B2 (en) 2021-09-14

Similar Documents

Publication Publication Date Title
US11119827B2 (en) Load balancing deterministically-subsetted processing resources using fractional loads
JP7252356B2 (en) MOBILE EDGE COMPUTING NODE SELECTION METHOD, APPARATUS AND SYSTEM AND COMPUTER PROGRAM
US10171567B2 (en) Load balancing computer device, system, and method
US9769253B2 (en) Port pooling
US10719369B1 (en) Network interfaces for containers running on a virtual machine instance in a distributed computing environment
EP1891523B1 (en) Methods and apparatus for selective workload off-loading across multiple data centers
EP1599793B1 (en) System and method for server load balancing and server affinity
US9405571B2 (en) Method and system for abstracting virtual machines in a network comprising plurality of hypervisor and sub-hypervisors
US20160261713A1 (en) Cloud provider selection
US7660896B1 (en) Method of load balancing edge-enabled applications in a content delivery network (CDN)
US20180367606A1 (en) Establishing nodes for global routing manager
EP2972854B1 (en) Distributed data center technology
KR20080005539A (en) System, network device, method, and computer program product for active load balancing using clustered nodes as authoritative domain name servers
WO2018201856A1 (en) System and method for self organizing data center
Li et al. Finedge: A dynamic cost-efficient edge resource management platform for NFV network
US20140189092A1 (en) System and Method for Intelligent Data Center Positioning Mechanism in Cloud Computing
US20160057210A1 (en) Application profile to configure and manage a software defined environment
US20210334126A1 (en) On-demand code execution with limited memory footprint
US9760370B2 (en) Load balancing using predictable state partitioning
Sasidhar et al. Load Balancing Techniques for Efficient Traffic Management in Cloud Environment.
KR20220000592A (en) Method and device of dynamic resource allocation for container-based applications for iot edge computing infrastructure
Subalakshmi et al. Enhanced hybrid approach for load balancing algorithms in cloud computing
Eswaran et al. Multiservice load balancing with hybrid particle swarm optimization in cloud-based multimedia storage system with QoS provision
Tychalas et al. SaMW: a probabilistic meta-heuristic algorithm for job scheduling in heterogeneous distributed systems powered by microservices
Xie et al. Sharing-aware task offloading of remote rendering for interactive applications in mobile edge computing

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

AS Assignment

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: SECURITY INTEREST;ASSIGNOR:TWITTER, INC.;REEL/FRAME:062079/0677

Effective date: 20221027

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: SECURITY INTEREST;ASSIGNOR:TWITTER, INC.;REEL/FRAME:061804/0086

Effective date: 20221027

Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND

Free format text: SECURITY INTEREST;ASSIGNOR:TWITTER, INC.;REEL/FRAME:061804/0001

Effective date: 20221027

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION