US20230232195A1 - Collective scaling of applications - Google Patents

Collective scaling of applications Download PDF

Info

Publication number
US20230232195A1
US20230232195A1 US17/729,776 US202217729776A US2023232195A1 US 20230232195 A1 US20230232195 A1 US 20230232195A1 US 202217729776 A US202217729776 A US 202217729776A US 2023232195 A1 US2023232195 A1 US 2023232195A1
Authority
US
United States
Prior art keywords
service
requests
chain
services
scaling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/729,776
Inventor
Sudipta Biswas
Monotosh Das
Hemant Kumar Shaw
Shubham Chauhan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
VMware LLC
Original Assignee
VMware LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by VMware LLC filed Critical VMware LLC
Assigned to VMWARE, INC. reassignment VMWARE, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BISWAS, Sudipta, DAS, MONOTOSH, SHAW, HEMANT KUMAR, CHAUHAN, Shubham
Priority to PCT/US2022/039025 priority Critical patent/WO2023140895A1/en
Publication of US20230232195A1 publication Critical patent/US20230232195A1/en
Assigned to VMware LLC reassignment VMware LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: VMWARE, INC.
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/50Service provisioning or reconfiguring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0896Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities
    • H04L41/0897Bandwidth or capacity management, i.e. automatically increasing or decreasing capacities by horizontal or vertical scaling of resources, or by migrating entities, e.g. virtual resources or entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/40Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using virtualisation of network functions or resources, e.g. SDN or NFV entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5019Ensuring fulfilment of SLA
    • H04L41/5025Ensuring fulfilment of SLA by proactively reacting to service quality change, e.g. by reconfiguration after service quality degradation or upgrade
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/82Miscellaneous aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0289Congestion control

Definitions

  • NFV Network Functions Virtualization
  • Microservices allow parts of applications (or different services in a service chain) to react independently to user input.
  • Kubernetes is considered as the most popular microservices orchestration.
  • auto-scaling is a mechanism by which applications can be scaled in or scaled out based on triggers. These triggers are typically based on observing an individual service and scaling that service as necessary.
  • these reactive measures can cause issues for latency-sensitive workloads such as those implemented in a 5G service chain. As such, better techniques for auto-scaling such workloads would be useful.
  • Some embodiments provide a method for pre-emptively scaling resources allocated to one application based on identifying an amount of traffic received at another, related application.
  • the method identifies a first number of requests received at a first application and, based on this first number of requests, determines that a second application that processes at least a subset of the requests after processing by the first application requires additional resources to handle a second number of requests that will be received at the second application.
  • the method increases the number of resources available to the second application prior to the second application receiving this second number of requests, in order to avoid processing delays and/or dropped requests at the second application.
  • the first and second applications are services in a service chain (e.g., for a 5G or other telecommunications network) that includes at least two services applied to the requests (e.g., audio and/or video calls).
  • a respective scaling factor that estimates a percentage of the requests received at the first service that will subsequently be received at the respective service in order to deploy additional resources to the respective service.
  • the services of the service chain are implemented as virtualized network functions. For instance, some embodiments deploy each service as one or more Pods in a Kubernetes cluster for the service chain. Each Pod is allocated a particular amount of resources (e.g., memory and processing capability) that enable the Pod to perform its respective service on a particular number of requests in a given time period (e.g., requests/second).
  • a front-end load balancer is configured to receive the requests and measure the number of requests received in real-time or near-real-time.
  • a data plane component of the load balancer receives and load balances the traffic at least among instances of the first service in the service chain in addition to providing information about the processed traffic to a control plane component of the load balancer.
  • the control plane component uses this traffic information to measure the number of requests and provide that information to a scaler module that also operates (e.g., as a Pod or set of Pods) in the Kubernetes cluster.
  • the scaler module in some embodiments, (i) computes the scaling factors for each service in the service chain and (ii) handles auto-scaling the services based on these scaling factors, traffic measurements from the load balancer (e.g., a number of requests forwarded by the load balancer to the first service), and data indicating the processing capabilities of each Pod for each service (enabling a judgment as to when the number of Pods should be increased or decreased for a given service).
  • the scaler module computes the scaling factors either once at initial setup of the cluster or on a regular basis (e.g., depending on whether the inputs to the scaling factors use predefined or real-world data).
  • the scaler module of some embodiments generates a graph (e.g., a directed acyclic graph) of the service chain.
  • a graph e.g., a directed acyclic graph
  • each service is represented as a node and each direct path from one service to another is represented as an edge.
  • Each edge from a first node to a second node has an associated coefficient that specifies an estimate of the percentage of requests received at the service represented by the first node that are forwarded to the service represented by the second node (as opposed to being dropped, blocked, or forwarded to a different service).
  • coefficients may be specified by a user (e.g., a network administrator) or based on real-world measurement of the number of requests received at each of the services (e.g., over a given time period).
  • the scaler module For each service, the scaler module uses the graph to identify each path through the service chain from the first service in the service chain to that service. For each such path, the scaler module multiplies each coefficient along the path in order to compute a factor for the path. The scaling factor for a given service is then the sum of the computed factors for each of the paths from the first service to that service, representing the estimated percentage of the requests received by the first service that will need to be processed by that service (the scaling factor for the first service will be 1). Other embodiments use a different equivalent computation that performs the component calculations in a different order in order to reduce the number of multiplications.
  • the scaler module determines whether each of the services needs to be scaled (e.g., whether additional Pods should be instantiated for each service). Specifically, for one or more metrics (e.g., total requests, requests per second, latency (which is correlated with the rate of incoming traffic), etc.), the capacity of each Pod is specified for each service. A current value for each metric (based on the metrics from the load balancer and the scaling factor for the service) is divided by the Pod capacity for a given service to determine the number of Pods that will be required for the service.
  • metrics e.g., total requests, requests per second, latency (which is correlated with the rate of incoming traffic), etc.
  • the scaler module manages the deployment of additional Pods for the service. In this manner, if a large increase in traffic is detected at the load balancer, all of the services can be scaled up to meet this demand prior to the receipt of all of those requests at the services.
  • FIG. 1 conceptually illustrates the architecture of a service chain deployment of some embodiments.
  • FIG. 2 conceptually illustrates the architecture of the scaler module of some embodiments.
  • FIG. 3 conceptually illustrates a process of some embodiments for computing scaling factors for a set of services in a service chain.
  • FIG. 4 conceptually illustrates an example of a directed acyclic graph for a service chain that includes five services.
  • FIG. 5 illustrates a table showing scaling factor computations for each of the services shown in the graph of FIG. 4 .
  • FIG. 6 illustrates a table providing an example of flow data over a time period for a particular path between two applications.
  • FIG. 7 conceptually illustrates a process of some embodiments for determining whether scaling of services in a service chain is required based on traffic expected to arrive at those services and initiating that scaling if needed.
  • FIG. 8 conceptually illustrates an example of a service chain as deployed.
  • FIG. 9 conceptually illustrates scaling up the deployment of the service chain of FIG. 8 in response to receiving a first traffic measurement.
  • FIG. 10 illustrates a table showing the computations to arrive at the scaling decisions for the example shown in FIG. 9 .
  • FIG. 11 conceptually illustrates scaling down the deployment of the service chain of FIG. 8 in response to receiving a second traffic measurement.
  • FIG. 12 illustrates a table showing the computations to arrive at the scaling decisions for the example shown in FIG. 11 .
  • FIG. 13 conceptually illustrates an electronic system with which some embodiments of the invention are implemented.
  • Some embodiments provide a method for pre-emptively scaling resources allocated to one application based on identifying an amount of traffic received at another, related application.
  • the method identifies a first number of requests received at a first application and, based on this first number of requests, determines that a second application (which processes at least a subset of the requests after processing by the first application) requires additional resources to handle a second number of requests that will be received at the second application.
  • the method increases the number of resources available to the second application prior to the second application receiving this second number of requests, in order to avoid processing delays and/or dropped requests at the second application.
  • the first and second applications are services in a service chain (e.g., for a 5G or other telecommunications network) that includes at least two services applied to the requests (e.g., audio and/or video calls).
  • a respective scaling factor that estimates a percentage of the requests received at the first service that will subsequently be received at the respective service in order to deploy additional resources to the respective service.
  • the services of the service chain are implemented as virtualized network functions. For instance, some embodiments deploy each service as one or more Pods in a Kubernetes cluster for the service chain.
  • FIG. 1 conceptually illustrates the architecture of such a service chain deployment 100 of some embodiments. As shown, the deployment 100 includes a Kubernetes cluster 105 as well as a front-end load balancer.
  • the Kubernetes cluster 105 includes an ingress controller 120 for the load balancer, a Kubernetes ingress object 125 , a scaler module 130 , and the service chain 135 (which, as described below, includes multiple services).
  • the front-end load balancer includes both a data plane 110 and a controller 115 .
  • front-end load balancers such as Avi Vantage can be configured to define virtual services as a front-end for Kubernetes-based applications, generally via ingress controllers. Each virtual service maps to back-end server pools, serviced by application pods in the Kubernetes cluster. In some embodiments, these virtual services are the ingress point for incoming traffic when used at the edge of the cluster, as in the deployment 100 . That is, all traffic sent to the cluster passes initially through the front-end load balancer. In the example of a 5G or other telecommunication network, this traffic may include audio and/or video calls, potentially in addition to other types of traffic.
  • an initial data message or set of data messages for a call (which can be referred to as a request) is sent to the service chain via the front-end load balancer (e.g., for the service chain to perform authentication and other tasks for the call), while subsequent traffic (carrying audio and/or video data) does not need to be processed through the service chain.
  • the front-end load balancer e.g., for the service chain to perform authentication and other tasks for the call
  • the load balancer data plane 110 receives the incoming requests and load balances this ingressing traffic (possibly in addition to providing additional services, such as web application firewall).
  • the load balancer data plane 110 may be implemented by a single appliance, a centralized cluster of appliances or virtualized data compute nodes (e.g., bare metal computers, virtual machines, containers, etc.), or a distributed set of appliances or virtualized data compute nodes.
  • the load balancer data plane 110 may load balance traffic between different Pods implementing the first service in the service chain and then forward the traffic to the selected Pods.
  • the load balancer data plane 110 performs additional service chaining features, such as defining a path through the service chain for an incoming data message (e.g., by selecting Pods for each of the services and embedding this selection in a header of that data message).
  • additional service chaining features such as defining a path through the service chain for an incoming data message (e.g., by selecting Pods for each of the services and embedding this selection in a header of that data message).
  • the load balancer data plane 110 gathers information about the incoming requests and provides this traffic information to the load balancer controller 115 .
  • the load balancer controller 115 is a centralized control plane that manages one or more instances of the load balancer data plane 110 .
  • the load balancer controller 115 defines traffic rules (e.g., based on administrator input) and configures the data plane 110 to enforce these traffic rules.
  • the controller 115 gathers various traffic metrics from the data plane 110 (and aggregates these metrics in the case of a distributed data plane).
  • the controller 115 also makes these aggregated metrics accessible (e.g., to an administrator, the scaler module 130 , etc.).
  • the metrics are accessible (to administrators, other modules, etc.) via application programming interfaces (APIs) such as representational state transfer (REST) APIs.
  • APIs application programming interfaces
  • REST representational state transfer
  • the ingress controller 120 for the load balancer handles conversion of data between the Kubernetes cluster and the front-end load balancer.
  • the ingress controller 120 is implemented on one or more Pods in the Kubernetes cluster 105 .
  • This ingress controller 120 listens to a Kubernetes API server and translates Kubernetes data (e.g., the ingress object 125 , service objects, etc.) into the data model used by the front-end load balancer.
  • the ingress controller 120 communicates the translated information to the load balancer controller 115 via API calls in some embodiments to automate the implementation of this configuration by the load balancer data plane 110 .
  • the ingress object 125 is a Kubernetes object that defines external access to the Kubernetes cluster 105 .
  • the ingress object 125 exposes routes to services within the cluster (e.g., the first service in the service chain 135 ) and can define how load balancing should be performed.
  • the ingress controller 120 is responsible for translating this ingress object into configuration data to provide to the front-end load balancer controller 115 .
  • the scaler module 130 also operates on one or more Pods within the Kubernetes cluster 105 .
  • the scaler module 130 is responsible for (i) computing the scaling factors for each service in the service chain 135 and (ii) initiating pre-emptive auto-scaling of the services based on these computed scaling factors, traffic measurements from the front-end load balancer controller 115 , and data indicating the processing capabilities of each service.
  • the scaling factors for each respective service estimates the percentage of the traffic received at the front-end load balancer (and thus received at the first service in the service chain) that will be subsequently received at the respective service.
  • the scaling factors can either be computed once (at initial setup of the cluster) or on a regular basis.
  • the pre-emptive auto-scaling decisions output by the scaler module 130 specify, in some embodiments, when the number of Pods should be increased or decreased for a given service in the service chain. The operation of the scaler module 130 is described in additional detail below by reference to FIG. 2 .
  • the service chain 135 is a set of services deployed in a specific topology.
  • each of these services is implemented as a “micro-service” and is deployed as a set of one or more Pods.
  • the service chain may be implemented as a set of virtual machines (VMs) or other data compute nodes (DCNs) rather than as Pods in a Kubernetes cluster.
  • VMs virtual machines
  • DCNs data compute nodes
  • a front-end load balancer can still be configured to measure incoming traffic and provide this data to a scaler module (executing, e.g., on a different VM) that performs similar auto-scaling operations outside of the Kubernetes context.
  • the service chain 135 includes three services 140 - 150 .
  • the first service 140 receives traffic directly from the load balancer data plane 110 (potentially via Kubernetes ingress modules) and sends portions of this traffic (after performing processing) to both the second service 145 and the third service 150 .
  • the second service 145 also sends a portion of its traffic (after it has performed its processing) to the third service 150 .
  • the first service 140 is implemented using two Pods
  • the second service 145 is implemented using three Pods
  • the third service 150 is implemented using a single Pod.
  • Each Pod is allocated a particular amount of resources (e.g., memory and processing capability) that enable the Pod to perform its respective service on a particular number of requests in a given time period (e.g., requests/second).
  • resources e.g., memory and processing capability
  • This per-Pod capacity can vary from one service to the next based on the physical resources allocated to each Pod for the service as well as the resources required to process an individual request by the service.
  • the services can include firewalls, forwarding elements (e.g., routers and/or switches), load balancers, VPN edges, intrusion detection and prevention services, logging functions, network address translation (NAT) functions, telecommunications network specific gateway functions, and other services, depending on the needs of the network.
  • forwarding elements e.g., routers and/or switches
  • load balancers e.g., VPN edges, intrusion detection and prevention services, logging functions, network address translation (NAT) functions, telecommunications network specific gateway functions, and other services, depending on the needs of
  • FIG. 2 conceptually illustrates a more detailed view of the architecture of the scaler 200 of some embodiments (e.g., the scaler module 130 ).
  • the scaler 200 includes a modeler 205 , a metrics receiver 210 , and an auto-scaling and deployment module 215 .
  • the modeler 205 receives the service chain topology as well as direct path coefficients and uses this information to (i) define a graph representing the service chain and (ii) compute the scaling factors for each of the services in the service chain (i.e., the scaling factors for estimating the percentage of traffic received at the first service that will be subsequently received at the other services).
  • the modeler stores (e.g., in memory) the graph and scaling factors 220 for use by the auto-scaling and deployment module 215 .
  • the modeler 205 receives the service chain topology information from the services themselves or from other Kubernetes constructs.
  • This topology indicates the direct paths between services in the service chain.
  • the direct path coefficients specify, for each direct path in the service chain topology from a first service to a second service, the portion of traffic received at the first service that is forwarded on to the second service.
  • the paths from the first service 140 to the second service 145 , from the first service 140 to the third service 150 , and from the second service 145 to the third service 150 each have their own associated direct path coefficients.
  • the direct path coefficients may be administrator-specified or be based on recent observation of the service chain traffic.
  • traffic information from the services in the service chain is provided to the modeler 205 , which regularly determines the ratios of traffic forwarded from one service to the next.
  • the operations of the modeler 205 to define the service chain graph and compute the scaling factors are described further below by reference to FIG. 3 .
  • the metrics receiver 210 receives traffic metrics, including those indicating the amount of requests received at the first service in the service chain.
  • the metrics receiver 210 receives API schema information for the front-end load balancer from the ingress controller and uses this API information to retrieve the traffic metrics from the load balancer controller via API calls.
  • the specific metrics received can include a total number of requests, requests per unit time (e.g., requests per second or millisecond), etc. As these metrics are retrieved, the metrics receiver 210 provides the metrics to the auto-scaling and deployment module 215 .
  • the auto-scaling and deployment module 215 uses the scaling factors computed by the modeler 205 to determine, in real-time, whether any of the services in the service chain need to be scaled (e.g., either instantiation of additional Pods or removal of Pods) based on the traffic metrics.
  • the capacity of each Pod is specified for each service (the capacity can vary between services) for one or more metrics (e.g., requests per unit time) and provided to the auto-scaling and deployment module 215 (e.g., as an administrator-provided variable or based on observation).
  • the current value for this metric (as received from the load balancer controller and multiplied by the scaling factor for a given service) is divided by the Pod capacity for the service to determine the number of Pods that will be required for the service. If the actual number of Pods is less than the required number of Pods, then the auto-scaling and deployment module 215 manages the deployment of additional Pods for the service. In this manner, if a large increase in traffic is detected at the load balancer, all of the services can be scaled up to meet this demand prior to the receipt of all of those requests at the services.
  • the auto-scaling and deployment module 215 manages the deletion of one or more Pods for the service.
  • the auto-scaling and deployment module 215 either handles the deployment/deletion operations directly or provides the necessary instructions to a Kubernetes control plane module that handles these operations for the cluster.
  • the operations of the auto-scaling and deployment module 215 to predictively auto-scale the services of a service chain will be described in detail below by reference to FIG. 7 .
  • FIG. 3 conceptually illustrates a process 300 of some embodiments for computing scaling factors for a set of services in a service chain.
  • the process 300 is performed by a scaler module (e.g., the modeler 205 of the scaler module 200 shown in FIG. 2 ).
  • this process 300 (or a similar process) may be performed once at initial configuration of the service chain or at regular intervals (if the direct path coefficients change over time).
  • FIGS. 4 and 5 illustrate an example of the calculation of scaling factors for a service chain.
  • the process 300 begins by receiving (at 305 ) a service chain topology and a set of direct path coefficients.
  • the service chain topology indicates which services forward traffic directly to other services (i.e., indicates direct paths between services in the service chain).
  • the service chain topology information can be received from the services themselves or from other Kubernetes constructs.
  • the direct path coefficients specify, for each direct path in the service chain topology from a first service to a second service, the portion of traffic received at the first service that is forwarded on to the second service.
  • the direct path coefficients may be administrator-specified or be based on recent observation of the service chain traffic. In the latter case, traffic information from the services in the service chain is provided to the scaler, which regularly determines the ratios of traffic forwarded from one service to the next (e.g., on an hourly basis based on the past hour of traffic).
  • the service chain topology is user-specified information that defines connections between services.
  • the service chain topology (referred to as NetworkServiceTopology) is a cluster-scoped construct capable of chaining services from different namespaces together in the cluster, thereby allowing service administrators to operate in their own namespaces while handing the job of service chaining to the infrastructure administrator.
  • NetworkServiceTopology is a cluster-scoped construct capable of chaining services from different namespaces together in the cluster, thereby allowing service administrators to operate in their own namespaces while handing the job of service chaining to the infrastructure administrator.
  • serviceA red namespace
  • serviceB service
  • the process 300 defines (at 310 ) a graph for the service chain.
  • some embodiments define a directed acyclic graph (DAG) based on the service chain topology (e.g., based on each user-specified connection).
  • DAG directed acyclic graph
  • each service is represented as a node and each direct path from one service to another is represented as an edge.
  • Each edge from a first node to a second node has an associated coefficient (i.e., the direct path coefficient for the connection represented by that edge) that specifies an estimate of the percentage of requests received at the service represented by the first node that are forwarded to the service represented by the second node (as opposed to being dropped, blocked, or forwarded to a different service).
  • FIG. 4 conceptually illustrates an example of a DAG 400 for a service chain that includes five services.
  • Each service (A-E) is represented by a node in the DAG 400 , with each of the edges having an associated direct path coefficient.
  • service A is expected to forward 70% of its traffic to service B, which is expected to forward 50% of its traffic to service C and 80% of its traffic to service E (meaning that service B is expected to forward at least some of its traffic to both services).
  • Service E is expected, in turn, to forward 50% of its traffic to service C, which is expected to forward only 30% of its traffic to service D.
  • the process With the graph defined, the process generates scaling factors for each of the services in the service chain. Different embodiments use different specific calculations to compute these scaling factors, though they reduce to the same computation. For instance, some embodiments traverse through the graph starting from the beginning of the service chain and compute scaling factors for nodes that build on the computations for previous nodes in the graph. Other embodiments, as in the process 300 , compute the scaling factor for each service separately.
  • the process 300 selects (at 315 ) a service in the service chain.
  • the process 300 begins with the first node in the directed graph and then proceeds to select nodes using a breadth-first traversal of the graph.
  • Other embodiments select the nodes randomly.
  • embodiments that compute the scaling factors for later services in the chain by building on previous computations cannot use a random selection.
  • the process 300 uses (at 320 ) the graph to identify paths from the first service in the service chain to the selected service in the service chain. Assuming a single ingress point for the service chain, when the first service is selected, there is no path discovery required (and no scaling factor computation needed, as the scaling factor is always equal to 1).
  • Service B has a single path (from service A).
  • Service C on the other hand has two paths (one from Service A to Service B to Service C and another from Service A to Service B to Service E to Service C).
  • Service E only has a single path (Service A to Service B to Service E), while Service D also has two paths (one from Service A to Service B to Service C to Service D and another from Service A to Service B to Service E to Service C to Service D).
  • the example graph shown in FIG. 4 includes a single ingress node (Service A) and a single egress node (Service D, which does not forward traffic to any other service).
  • Service A ingress node
  • Service D egress node
  • other service chains may have other structures.
  • the process 300 (or other processes) can be expanded to accommodate these multiple entry points to the graph.
  • the scaling factors for the ingress services will depend on whether the front-end load balancer provides traffic metrics indicating the number of requests forwarded to each different ingress service (in which case the scaling factors are all equal to 1) or the total number of services provided to the service chain as a whole (in which case each ingress service has its own scaling factor).
  • scaling factors relative to each ingress service are calculated for each service in the service chain and in real-time the number of requests predicted for each service is a weighted sum over the requests being provided to each ingress service.
  • Multiple egress services do not require a change to the computations of the scaling factors in some embodiments.
  • the process 300 computes (at 325 ) an estimated percentage of the traffic received at the first service that arrives at the selected service via each path by using the direct path coefficients.
  • the process 300 then sums (at 330 ) these percentages from the various different paths to the selected service in order to compute the scaling factor for the selected service.
  • the estimated percentage of traffic is computed by multiplying all of the direct path coefficients along that path.
  • FIG. 5 conceptually illustrates a table 500 showing these computations for each of the services shown in the graph 400 (relative to x, the ingress traffic).
  • Service A the ingress service
  • the scaling factor is simply equal to 1.
  • Service B there is only a single path represented by a single graph edge having a coefficient of 0.7, so the scaling factor of 0.7 is arrived at by simply using this single coefficient.
  • the computation for Service C is more complicated.
  • Service C two paths are identified; each of these includes the path from Service A to Service B, so the 0.7 coefficient can be factored out of the computation.
  • the path from Service B to Service C has a coefficient of 0.5 while the path from Service B to Service E to Service C has a coefficient of 0.5 multiplied by 0.8.
  • all of the scaling factors are less than 1, as is common.
  • a service may actually receive more traffic than the front-end load balancer if multiple different paths exist to the service.
  • a logging application might receive log data from most or all applications in a cluster such that it receives multiple times the traffic that enters the cluster.
  • the process 300 determines (at 335 ) whether more services remain in the service chain. If this is the case, the process returns to 315 to select the next service. On the other hand, once all of the scaling factors have been computed, the process 300 ends. It should be understood that the process 300 is a conceptual process. Not only might the scaling factor computations be computed slightly differently but some embodiments compute all or some of the scaling factors in parallel rather than using the serial process shown in the figure.
  • the inputs are a DAG represented with an adjacency list (DG), a coefficient for each directed edge in the graph, and a starting point in the graph (S).
  • DG adjacency list
  • S starting point in the graph
  • the first step of the ModifiedBFS algorithm is to define the incoming edge graph (IE) for a node.
  • the algorithm traverses through the keys of the adjacency list DG and finds out the next set of vertices for a key. For each node N, it appends the key in IE(N) list.
  • the start node S is added to a queue and a modified version of breadth first search is performed.
  • the node is dequeued in V.
  • the neighbors of V are fetched.
  • its scaling factor is calculated and its incoming edge from V to N is deleted. Once no other incoming edges to N are found, N is enqueue to the queue. This ensures that nodes are not enqueued unless all of the incoming edges to that node are exhausted, because the scaling factor is only complete if all of the incoming edges are visited.
  • the scaling factors may be computed once or on a regular basis, depending on whether the direct path coefficients are fixed or the scaler receives statistics with which to determine those coefficients.
  • the direct path coefficients are calculated dynamically in some embodiments, either because the user does not have the information to set these values or because the values are not constant and change over time. Such a heuristics-based approach allows for more accurate auto-scaling calculations, especially when these values change over time.
  • some embodiments use traffic metrics from the services. This information may be retrieved from the services themselves or from load balancers interposed between the services in a service chain. Some embodiments send traffic to the front-end load balancer for inter-service load balancing while other embodiments use other load balancers to handle traffic between one service and the next.
  • FS(e, t) represents flow data for a period of time on a given direct path
  • time range (TR) is the period of time over which a sliding window average is calculated
  • total flow (TF) is the sum of all incoming flow data for a node (e.g., a service or other application) in the given time range TR
  • average flow (AvgF) is the average value of the incoming flow data for a node, which is obtained by dividing the total flow TF by the time range TR.
  • the above algorithm determines the value of the average flow over a time period using a sliding window method to determine the direct path coefficient for an edge in the graph.
  • An array Flow at time t stores the flow data for all of the edges and is stored in FlowStore (FS).
  • FS FlowStore
  • the algorithm finds the total flow (TF) for an edge by adding all of the flows for the edge from the current time t to (t-TR).
  • the total flow for an edge is calculated by adding the latest flow for the edge and subtracting the value of the flow at the time t-TR.
  • the average flow for an edge is calculated by dividing the total flow for the edge by the time range.
  • FIG. 6 conceptually illustrates a table 600 providing an example of flow data for a particular path between two applications.
  • data points are retrieved for every 12 minutes, and a time range value of 5 provides a one-hour time period. This allows, at 9:48, the total flow to be calculated as 540 (100+110+120+90+120) for an average flow value of 108.
  • the new flow value to add is 60 and so the total flow is equal to 500 (540 ⁇ 100+60), for an average flow value of 100.
  • These flow values can then be used to calculate the direct path coefficient (i.e., by identifying what percentage of traffic sent to a particular service is forwarded to the next service in the service chain), and subsequently to calculate the scaling factors.
  • the scaler module determines in real-time whether each of the services needs to be scaled. Based on the traffic being received at the first service in the service chain (from the load balancer), some embodiments calculate the traffic expected at each service and determine whether the current deployment for that service has the capacity required to handle that traffic without dropping traffic or imposing longer latency. If the expected traffic is larger than current capacity for a given service, the scaler initiates deployment of additional instances (e.g., additional Pods) for that service.
  • additional instances e.g., additional Pods
  • FIG. 7 conceptually illustrates a process 700 of some embodiments for determining whether scaling of services in a service chain is required based on traffic expected to arrive at those services and initiating that scaling if needed.
  • the process 700 is performed by a scaler module (e.g., the auto-scaling and deployment module 215 of the scaler shown in FIG. 2 ).
  • the process 700 is performed at regular intervals or as metrics are retrieved from the front-end load balancer.
  • FIGS. 8 - 12 illustrate examples of scaling the services in a service chain.
  • FIG. 8 conceptually illustrates an example of a service chain 800 as deployed.
  • the service chain 800 includes a first service (A) for which two Pods are instantiated, a second service (B) for which three Pods are instantiated, a third service (C) for which two Pods are deployed, and a fourth service (D) for which a single Pod is deployed.
  • This figure also shows the direct path coefficients for each of the connections in the service chain 800 .
  • Service A receives data messages directly from the front-end load balancer 805 then forwards 40% of its traffic to Service B and 50% of its traffic to Service C.
  • Service B sends 80% of its traffic to Service D and Service C sends 90% of its traffic to Service D.
  • the process 700 begins by receiving (at 705 ) traffic measurements at the ingress of a service chain, corresponding to the traffic at the first service in the service chain.
  • the front-end load balancer of some embodiments generates these metrics, which are retrievable by the scaler module (e.g., using API calls in the load balancer schema).
  • the received metrics provide a measure of incoming traffic to the first service. This may be measured in an absolute number of requests, a rate of requests (e.g., requests per second), a latency measure (which can be assumed to scale linearly with the request rate, or other metrics.
  • the scaler is then able to use these received metrics to scale each of the services in the service chain.
  • the scaler determines, for each service, the expected traffic to reach that service (based on the scaling factor) and whether the current deployment for the service will have adequate capacity to handle that expected traffic. If the current deployment is inadequate, the scaler initiates deployment of one or more additional instances; if the current deployment should be reduced, the scaler initiates deletion of one or more existing instances.
  • FIG. 9 conceptually illustrates scaling the deployment of the service chain 800 over two stages 905 - 910 in response to receiving a first traffic measurement.
  • the first stage 905 of FIG. 9 shows the service chain 800 as deployed in FIG. 8
  • the second stage 910 shows this service chain 800 after being scaled up in response to the traffic measurement.
  • FIG. 10 illustrates a table 1000 showing the computations to arrive at the scaling decisions for the example shown in FIG. 9 .
  • each of the services in the service chain 800 has an associated scaling factor (1 for Service A, 0.4 for Service B, 0.5 for Service C, and 0.77 for Service D) computed as described above based on the direct path coefficients shown in FIG. 8 .
  • the process 700 selects (at 710 ) a service in the service chain. Some embodiments select the services by traversing the graph of the service chain topology (as with the scaling factor calculation), while other embodiments select services using different techniques (e.g., randomly). It should be noted that the process 700 is a conceptual process and other processes might perform slightly different operations or perform the operations in a different order. For instance, although the process 700 shows each service being evaluated serially, other embodiments might evaluate whether to scale each of the services in parallel.
  • the process 700 determines (at 715 ) the current capacity of the selected service.
  • each service has a different capacity per instance (e.g., per Pod). This capacity may vary based on (i) the physical resources allocated to each instance for the service and (ii) the type of processing performed by the service.
  • different administrators for the different services set their own configurations for the physical resources allocated to each Pod, which may vary from service to service as a result.
  • different types of data message processing can require different amounts of memory and/or processing power.
  • a service that performs layer 7 processing might require more resources per data message than a service that only performs layer 2 or layer 3 processing (e.g., an L2/L3 firewall service).
  • the current capacity for a given service is the per-instance capacity multiplied by the currently-deployed number of instances.
  • each of the services has a different per-Pod capacity.
  • Each Pod for Service A can handle 1000 requests/second
  • each Pod for Service B can handle 300 requests/second
  • each Pod for Service C can handle 600 requests/second
  • each Pod for Service D can handle 1250 requests/second.
  • Service A can handle 2000 requests/second
  • Service B can handle 900 requests/second
  • Service C can handle 1200 requests/second
  • Service D can handle 1250 requests/second.
  • the process 700 computes (at 720 ) the traffic expected to be received at the selected service based on the scaling factor for the service and the traffic at the first service. Some embodiments compute this expected traffic for the service by simply multiplying the traffic seen at the first service (i.e., from the front-end load balancer) by the scaling factor for the service. As shown in FIG. 10 , the first service (Service A) is expected to receive 3200 requests/second, the traffic flow retrieved from the load balancer.
  • Service B is expected to receive 1280 requests/second (multiplying 3200 by the scaling factor of 0.4)
  • Service C is expected to receive 1600 requests/second (multiplying 3200 by the scaling factor of 0.5)
  • Service D is expected to receive 2464 requests/second (multiplying 3200 by the scaling factor of 0.77).
  • the process 700 computes (at 725 ) the required number of instances for that service.
  • This value can be computed by dividing the expected traffic by the per-instance capacity and applying the ceiling function that rounds a decimal value up to the next integer (i.e., 3.1 and 3.9 are both rounded to 4).
  • Some embodiments also add a safety factor to the value before applying the ceiling function (e.g., adding 0.1 so that 2.95 becomes 3.05, which rounds to 4) in case the traffic to a service increases more than expected (e.g., based on more traffic than expected being forwarded by one or more of the services in the chain).
  • the expected traffic for Service A is 3200 requests/second, which requires 4 Pods when divided by the per-Pod capacity of 1000 requests/second (with 3.2 rounding up to 4).
  • the expected traffic for Service B is 1280 requests/second, which requires 5 Pods when divided by the per-Pod capacity of 300 requests/second (with 4.27 rounding up to 5).
  • the expected traffic for Service C is 1600 requests/second, which requires 3 Pods when divided by the per-Pod capacity of 600 requests/second (with 2.67 rounding up to 3).
  • the expected traffic for Service D is 2464 requests/second, which requires 2 Pods when divided by the per-Pod capacity of 1250 requests/second (with 1.97 rounding up to 2). However, if a small safety factor were applied, 3 Pods would be required for Service D.
  • the process 700 determines (at 730 ) whether to scale the selected service. If the service should be scaled, the process determines (at 735 ) the scaling action for the selected service. As discussed, some embodiments scale the services predictively so that the additional instances are deployed prior to the existing instances for the service being overloaded by the incoming traffic. In the example, shown in the table 1000 of FIG. 10 , Services A and B should be scaled by adding 2 Pods for each service while Services C and D should be scaled by adding 1 Pod for each service. Owing to (1) the different per-instance capacity of different applications (in this case, the services) and (2) the different expected traffic flow reaching each different applications based on the computed scaling factors, in many cases different numbers of instances will be required due to the same change in ingress traffic.
  • the process 700 determines (at 740 ) whether any more services in the service chain remain for evaluation. If additional services remain, the process 700 returns to 710 to select the next service and determine whether to scale that service. As mentioned, in some embodiments the various services are evaluated in parallel to determine whether to scale each of the services, rather than in a serial loop as shown in the figure.
  • the process 700 initiates (at 745 ) deployment of additional instances or removal of instances based on the determined scaling actions (i.e., the actions determined at 735 for each service), then ends.
  • the scaler module itself modifies the deployment to change the number of instances for the different services.
  • the scaler module edits one or more configuration objects that define the deployment to change the number of Pods implementing each service in some embodiments.
  • the scaler module provides the deployment edits to another component (e.g., an auto-scaling application in the Kubernetes control plane) that handles the modification to the deployment.
  • the second stage 910 of FIG. 9 shows that, as a result of the calculations made by the scaler module 915 (as shown in the table 1000 ), the scaler has modified the deployment of the service chain 800 .
  • Service A now has 4 Pods deployed
  • Service B now has 5 Pods deployed
  • Service C now has 3 Pods deployed
  • Service D now has 2 Pods deployed.
  • the scaler 915 directly modifying the services, as noted above in some embodiments the scaler either modifies configuration objects for the services directly or hands off the deployment to a separate component (e.g., an auto-scaling application) that modifies these configuration objects.
  • the scaler module can also determine when to scale down one or more applications. That is, if the number of incoming requests drops, the scaler can perform similar calculations to determine that fewer instances of at least some of the applications are required. For scaling up due to increased traffic, the predictiveness helps avoid increases in latency and dropped packets. For scaling down, the predictiveness is not as crucial but can help free up resources more quickly when those resources are not needed for the current traffic levels.
  • FIG. 11 conceptually illustrates scaling down the deployment of the service chain 800 over two stages 1105 - 1110 in response to receiving a second traffic measurement.
  • the first stage 1105 of FIG. 11 shows the service chain 800 as deployed in the second stage 910 of FIG. 9
  • the second stage 1110 shows this service chain 800 after some of the services have been scaled down in response to the second traffic measurement.
  • FIG. 12 illustrates a table 1200 showing the computations to arrive at the scaling decisions for the example shown in FIG. 11 .
  • the scaling factors and per-Pod capacity for these services are the same as shown in the table 1000 of FIG. 10 .
  • Service A is expected to receive 2500 requests/second (the traffic flow retrieved from the load balancer), a drop in traffic as compared to the measurement shown in FIG. 10 that caused all of the services to be scaled up.
  • Service B is expected to receive 1000 requests/second (multiplying 2500 by the scaling factor of 0.4)
  • Service C is expected to receive 1250 requests/second (multiplying 2500 by the scaling factor of 0.5)
  • Service D is expected to receive 1925 requests/second (multiplying 2500 by the scaling factor of 0.77).
  • the expected traffic for Service A of 2500 requests/second requires 3 Pods when divided by the per-Pod capacity of 1000 requests/second (with 2.5 rounding up to 3).
  • the expected traffic for Service B of 1000 requests/second requires 4 Pods when divided by the per-Pod capacity of 300 requests/second (with 3.33 rounding up to 4).
  • the expected traffic for Service C of 1250 requests/second requires 3 Pods when divided by the per-Pod capacity of 600 requests/second (with 2.08 rounding up to 3).
  • the expected traffic for Service D of 1925 requests/second requires 2 Pods when divided by the per-Pod capacity of 1250 requests/second (with 1.54 rounding up to 2).
  • the scaler 915 determines that one Pod should be removed from the deployments of Services A and B, but the deployments of Services C and D do not require updating.
  • the second stage 1110 of FIG. 11 shows that, as a result of the calculations made by the scaler module 915 (as shown in the table 1200 ), the scaler 915 has modified the deployment of the service chain 800 .
  • Service A now has only 3 Pods deployed
  • Service B now has 4 Pods deployed
  • Service C and Service D are unchanged.
  • FIG. 13 conceptually illustrates an electronic system 1300 with which some embodiments of the invention are implemented.
  • the electronic system 1300 may be a computer (e.g., a desktop computer, personal computer, tablet computer, server computer, mainframe, a blade computer etc.), phone, PDA, or any other sort of electronic device.
  • Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media.
  • Electronic system 1300 includes a bus 1305 , processing unit(s) 1310 , a system memory 1325 , a read-only memory 1330 , a permanent storage device 1335 , input devices 1340 , and output devices 1345 .
  • the bus 1305 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1300 .
  • the bus 1305 communicatively connects the processing unit(s) 1310 with the read-only memory 1330 , the system memory 1325 , and the permanent storage device 1335 .
  • the processing unit(s) 1310 retrieve instructions to execute and data to process in order to execute the processes of the invention.
  • the processing unit(s) may be a single processor or a multi-core processor in different embodiments.
  • the read-only-memory (ROM) 1330 stores static data and instructions that are needed by the processing unit(s) 1310 and other modules of the electronic system.
  • the permanent storage device 1335 is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1300 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1335 .
  • the system memory 1325 is a read-and-write memory device. However, unlike storage device 1335 , the system memory is a volatile read-and-write memory, such a random-access memory.
  • the system memory stores some of the instructions and data that the processor needs at runtime.
  • the invention's processes are stored in the system memory 1325 , the permanent storage device 1335 , and/or the read-only memory 1330 . From these various memory units, the processing unit(s) 1310 retrieve instructions to execute and data to process in order to execute the processes of some embodiments.
  • the bus 1305 also connects to the input and output devices 1340 and 1345 .
  • the input devices enable the user to communicate information and select commands to the electronic system.
  • the input devices 1340 include alphanumeric keyboards and pointing devices (also called “cursor control devices”).
  • the output devices 1345 display images generated by the electronic system.
  • the output devices include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD). Some embodiments include devices such as a touchscreen that function as both input and output devices.
  • bus 1305 also couples electronic system 1300 to a network 1365 through a network adapter (not shown).
  • the computer can be a part of a network of computers (such as a local area network (“LAN”), a wide area network (“WAN”), or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1300 may be used in conjunction with the invention.
  • Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media).
  • computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD ⁇ RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks.
  • CD-ROM compact discs
  • CD-R recordable compact
  • the computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations.
  • Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • integrated circuits execute instructions that are stored on the circuit itself.
  • the terms “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people.
  • display or displaying means displaying on an electronic device.
  • the terms “computer readable medium,” “computer readable media,” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
  • DCNs data compute nodes
  • addressable nodes may include non-virtualized physical hosts, virtual machines, containers that run on top of a host operating system without the need for a hypervisor or separate operating system, and hypervisor kernel network interface modules.
  • VMs in some embodiments, operate with their own guest operating systems on a host using resources of the host virtualized by virtualization software (e.g., a hypervisor, virtual machine monitor, etc.).
  • the tenant i.e., the owner of the VM
  • Some containers are constructs that run on top of a host operating system without the need for a hypervisor or separate guest operating system.
  • the host operating system uses name spaces to isolate the containers from each other and therefore provides operating-system level segregation of the different groups of applications that operate within different containers.
  • This segregation is akin to the VM segregation that is offered in hypervisor-virtualized environments that virtualize system hardware, and thus can be viewed as a form of virtualization that isolates different groups of applications that operate in different containers.
  • Such containers are more lightweight than VMs.
  • Hypervisor kernel network interface modules in some embodiments, is a non-VM DCN that includes a network stack with a hypervisor kernel network interface and receive/transmit threads.
  • a hypervisor kernel network interface module is the vmknic module that is part of the ESXiTM hypervisor of VMware, Inc.
  • VMs virtual machines
  • examples given could be any type of DCNs, including physical hosts, VMs, non-VM containers, and hypervisor kernel network interface modules.
  • the example networks could include combinations of different types of DCNs in some embodiments.
  • FIGS. 3 and 7 conceptually illustrate processes. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.

Abstract

Some embodiments provide a method for scaling a service chain that includes multiple services, each of which is provided by one or more instances of the service. The method identifies that a first service in the service chain has received a number of requests. For each service in the service chain, the method (i) identifies a scaling factor that estimates a portion of requests received at the first service that will be subsequently received at the service and (ii) deploys a number of additional instances of the service based on the identified scaling factor for the service and the number of requests received at the first service.

Description

    BACKGROUND
  • The increase in traffic diversity and accelerated capacity demand in mobile networks have pushed design of innovative architectural solutions and cost-effective paradigms for 5G evolution. Network Functions Virtualization (NFV) is an emerging trend in networking that involves migration of network functions (NFs) into virtualized environments, which leads to reduced capital investment. Traditionally, NFs are embedded on dedicated hardware devices (middleboxes or network appliances), but service providers and operators decouple NFs from their underlying hardware and run them on commodity servers. This has given birth to NFV technology that converts NFs into virtualized network functions (VNFs) hosted in virtual machines or containers. Network policies often require these VNFs to be stitched together as service chains to deliver various services or network functionality. These service chains define a sequence of services (network functions) through which traffic is steered.
  • Microservices allow parts of applications (or different services in a service chain) to react independently to user input. Kubernetes is considered as the most popular microservices orchestration. Within Kubernetes, auto-scaling is a mechanism by which applications can be scaled in or scaled out based on triggers. These triggers are typically based on observing an individual service and scaling that service as necessary. However, these reactive measures can cause issues for latency-sensitive workloads such as those implemented in a 5G service chain. As such, better techniques for auto-scaling such workloads would be useful.
  • BRIEF SUMMARY
  • Some embodiments provide a method for pre-emptively scaling resources allocated to one application based on identifying an amount of traffic received at another, related application. The method identifies a first number of requests received at a first application and, based on this first number of requests, determines that a second application that processes at least a subset of the requests after processing by the first application requires additional resources to handle a second number of requests that will be received at the second application. The method increases the number of resources available to the second application prior to the second application receiving this second number of requests, in order to avoid processing delays and/or dropped requests at the second application.
  • In some embodiments, the first and second applications are services in a service chain (e.g., for a 5G or other telecommunications network) that includes at least two services applied to the requests (e.g., audio and/or video calls). For each respective service in the service chain, the method uses a respective scaling factor that estimates a percentage of the requests received at the first service that will subsequently be received at the respective service in order to deploy additional resources to the respective service.
  • The services of the service chain, in some embodiments, are implemented as virtualized network functions. For instance, some embodiments deploy each service as one or more Pods in a Kubernetes cluster for the service chain. Each Pod is allocated a particular amount of resources (e.g., memory and processing capability) that enable the Pod to perform its respective service on a particular number of requests in a given time period (e.g., requests/second). In this environment, a front-end load balancer is configured to receive the requests and measure the number of requests received in real-time or near-real-time. Specifically, a data plane component of the load balancer receives and load balances the traffic at least among instances of the first service in the service chain in addition to providing information about the processed traffic to a control plane component of the load balancer. The control plane component uses this traffic information to measure the number of requests and provide that information to a scaler module that also operates (e.g., as a Pod or set of Pods) in the Kubernetes cluster.
  • The scaler module, in some embodiments, (i) computes the scaling factors for each service in the service chain and (ii) handles auto-scaling the services based on these scaling factors, traffic measurements from the load balancer (e.g., a number of requests forwarded by the load balancer to the first service), and data indicating the processing capabilities of each Pod for each service (enabling a judgment as to when the number of Pods should be increased or decreased for a given service). The scaler module computes the scaling factors either once at initial setup of the cluster or on a regular basis (e.g., depending on whether the inputs to the scaling factors use predefined or real-world data).
  • To compute the scaling factors, the scaler module of some embodiments generates a graph (e.g., a directed acyclic graph) of the service chain. In this graph, each service is represented as a node and each direct path from one service to another is represented as an edge. Each edge from a first node to a second node has an associated coefficient that specifies an estimate of the percentage of requests received at the service represented by the first node that are forwarded to the service represented by the second node (as opposed to being dropped, blocked, or forwarded to a different service). These coefficients may be specified by a user (e.g., a network administrator) or based on real-world measurement of the number of requests received at each of the services (e.g., over a given time period).
  • For each service, the scaler module uses the graph to identify each path through the service chain from the first service in the service chain to that service. For each such path, the scaler module multiplies each coefficient along the path in order to compute a factor for the path. The scaling factor for a given service is then the sum of the computed factors for each of the paths from the first service to that service, representing the estimated percentage of the requests received by the first service that will need to be processed by that service (the scaling factor for the first service will be 1). Other embodiments use a different equivalent computation that performs the component calculations in a different order in order to reduce the number of multiplications.
  • In real-time, as the load balancer provides measurements to the scaler module, the scaler module determines whether each of the services needs to be scaled (e.g., whether additional Pods should be instantiated for each service). Specifically, for one or more metrics (e.g., total requests, requests per second, latency (which is correlated with the rate of incoming traffic), etc.), the capacity of each Pod is specified for each service. A current value for each metric (based on the metrics from the load balancer and the scaling factor for the service) is divided by the Pod capacity for a given service to determine the number of Pods that will be required for the service. If the actual number of Pods is less than the required number of Pods, then the scaler module manages the deployment of additional Pods for the service. In this manner, if a large increase in traffic is detected at the load balancer, all of the services can be scaled up to meet this demand prior to the receipt of all of those requests at the services.
  • The preceding Summary is intended to serve as a brief introduction to some embodiments of the invention. It is not meant to be an introduction or overview of all inventive subject matter disclosed in this document. The Detailed Description that follows and the Drawings that are referred to in the Detailed Description will further describe the embodiments described in the Summary as well as other embodiments. Accordingly, to understand all the embodiments described by this document, a full review of the Summary, Detailed Description and the Drawings is needed. Moreover, the claimed subject matters are not to be limited by the illustrative details in the Summary, Detailed Description and the Drawings, but rather are to be defined by the appended claims, because the claimed subject matters can be embodied in other specific forms without departing from the spirit of the subject matters.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The novel features of the invention are set forth in the appended claims. However, for purpose of explanation, several embodiments of the invention are set forth in the following figures.
  • FIG. 1 conceptually illustrates the architecture of a service chain deployment of some embodiments.
  • FIG. 2 conceptually illustrates the architecture of the scaler module of some embodiments.
  • FIG. 3 conceptually illustrates a process of some embodiments for computing scaling factors for a set of services in a service chain.
  • FIG. 4 conceptually illustrates an example of a directed acyclic graph for a service chain that includes five services.
  • FIG. 5 illustrates a table showing scaling factor computations for each of the services shown in the graph of FIG. 4 .
  • FIG. 6 illustrates a table providing an example of flow data over a time period for a particular path between two applications.
  • FIG. 7 conceptually illustrates a process of some embodiments for determining whether scaling of services in a service chain is required based on traffic expected to arrive at those services and initiating that scaling if needed.
  • FIG. 8 conceptually illustrates an example of a service chain as deployed.
  • FIG. 9 conceptually illustrates scaling up the deployment of the service chain of FIG. 8 in response to receiving a first traffic measurement.
  • FIG. 10 illustrates a table showing the computations to arrive at the scaling decisions for the example shown in FIG. 9 .
  • FIG. 11 conceptually illustrates scaling down the deployment of the service chain of FIG. 8 in response to receiving a second traffic measurement.
  • FIG. 12 illustrates a table showing the computations to arrive at the scaling decisions for the example shown in FIG. 11 .
  • FIG. 13 conceptually illustrates an electronic system with which some embodiments of the invention are implemented.
  • DETAILED DESCRIPTION
  • In the following detailed description of the invention, numerous details, examples, and embodiments of the invention are set forth and described. However, it will be clear and apparent to one skilled in the art that the invention is not limited to the embodiments set forth and that the invention may be practiced without some of the specific details and examples discussed.
  • Some embodiments provide a method for pre-emptively scaling resources allocated to one application based on identifying an amount of traffic received at another, related application. The method identifies a first number of requests received at a first application and, based on this first number of requests, determines that a second application (which processes at least a subset of the requests after processing by the first application) requires additional resources to handle a second number of requests that will be received at the second application. The method increases the number of resources available to the second application prior to the second application receiving this second number of requests, in order to avoid processing delays and/or dropped requests at the second application.
  • In some embodiments, the first and second applications are services in a service chain (e.g., for a 5G or other telecommunications network) that includes at least two services applied to the requests (e.g., audio and/or video calls). For each respective service in the service chain, the method uses a respective scaling factor that estimates a percentage of the requests received at the first service that will subsequently be received at the respective service in order to deploy additional resources to the respective service.
  • The services of the service chain, in some embodiments, are implemented as virtualized network functions. For instance, some embodiments deploy each service as one or more Pods in a Kubernetes cluster for the service chain. FIG. 1 conceptually illustrates the architecture of such a service chain deployment 100 of some embodiments. As shown, the deployment 100 includes a Kubernetes cluster 105 as well as a front-end load balancer. The Kubernetes cluster 105 includes an ingress controller 120 for the load balancer, a Kubernetes ingress object 125, a scaler module 130, and the service chain 135 (which, as described below, includes multiple services).
  • The front-end load balancer, in some embodiments, includes both a data plane 110 and a controller 115. In some embodiments, front-end load balancers such as Avi Vantage can be configured to define virtual services as a front-end for Kubernetes-based applications, generally via ingress controllers. Each virtual service maps to back-end server pools, serviced by application pods in the Kubernetes cluster. In some embodiments, these virtual services are the ingress point for incoming traffic when used at the edge of the cluster, as in the deployment 100. That is, all traffic sent to the cluster passes initially through the front-end load balancer. In the example of a 5G or other telecommunication network, this traffic may include audio and/or video calls, potentially in addition to other types of traffic.
  • In some embodiments, an initial data message or set of data messages for a call (which can be referred to as a request) is sent to the service chain via the front-end load balancer (e.g., for the service chain to perform authentication and other tasks for the call), while subsequent traffic (carrying audio and/or video data) does not need to be processed through the service chain. In other embodiments, all traffic for the network passes through the front-end load balancer and service chain.
  • In some embodiments, the load balancer data plane 110 receives the incoming requests and load balances this ingressing traffic (possibly in addition to providing additional services, such as web application firewall). The load balancer data plane 110 may be implemented by a single appliance, a centralized cluster of appliances or virtualized data compute nodes (e.g., bare metal computers, virtual machines, containers, etc.), or a distributed set of appliances or virtualized data compute nodes. In some embodiments, the load balancer data plane 110 may load balance traffic between different Pods implementing the first service in the service chain and then forward the traffic to the selected Pods. In other embodiments, the load balancer data plane 110 performs additional service chaining features, such as defining a path through the service chain for an incoming data message (e.g., by selecting Pods for each of the services and embedding this selection in a header of that data message). In addition, while forwarding the incoming requests to the service chain 135 in the cluster 105, the load balancer data plane 110 gathers information about the incoming requests and provides this traffic information to the load balancer controller 115.
  • The load balancer controller 115 is a centralized control plane that manages one or more instances of the load balancer data plane 110. The load balancer controller 115 defines traffic rules (e.g., based on administrator input) and configures the data plane 110 to enforce these traffic rules. In addition, the controller 115 gathers various traffic metrics from the data plane 110 (and aggregates these metrics in the case of a distributed data plane). The controller 115 also makes these aggregated metrics accessible (e.g., to an administrator, the scaler module 130, etc.). In some embodiments, the metrics are accessible (to administrators, other modules, etc.) via application programming interfaces (APIs) such as representational state transfer (REST) APIs.
  • The ingress controller 120 for the load balancer, in some embodiments, handles conversion of data between the Kubernetes cluster and the front-end load balancer. In some embodiments, the ingress controller 120 is implemented on one or more Pods in the Kubernetes cluster 105. This ingress controller 120 listens to a Kubernetes API server and translates Kubernetes data (e.g., the ingress object 125, service objects, etc.) into the data model used by the front-end load balancer. The ingress controller 120 communicates the translated information to the load balancer controller 115 via API calls in some embodiments to automate the implementation of this configuration by the load balancer data plane 110.
  • The ingress object 125 is a Kubernetes object that defines external access to the Kubernetes cluster 105. The ingress object 125 exposes routes to services within the cluster (e.g., the first service in the service chain 135) and can define how load balancing should be performed. As noted, the ingress controller 120 is responsible for translating this ingress object into configuration data to provide to the front-end load balancer controller 115.
  • The scaler module 130, in some embodiments, also operates on one or more Pods within the Kubernetes cluster 105. The scaler module 130 is responsible for (i) computing the scaling factors for each service in the service chain 135 and (ii) initiating pre-emptive auto-scaling of the services based on these computed scaling factors, traffic measurements from the front-end load balancer controller 115, and data indicating the processing capabilities of each service. The scaling factors for each respective service, as mentioned, estimates the percentage of the traffic received at the front-end load balancer (and thus received at the first service in the service chain) that will be subsequently received at the respective service. Depending on whether input coefficients to the scaling factor computation use user-defined or real-world observation data, the scaling factors can either be computed once (at initial setup of the cluster) or on a regular basis. The pre-emptive auto-scaling decisions output by the scaler module 130 specify, in some embodiments, when the number of Pods should be increased or decreased for a given service in the service chain. The operation of the scaler module 130 is described in additional detail below by reference to FIG. 2 .
  • The service chain 135 is a set of services deployed in a specific topology. In the Kubernetes context of some embodiments, each of these services is implemented as a “micro-service” and is deployed as a set of one or more Pods. However, it should be understood that in other embodiments the service chain may be implemented as a set of virtual machines (VMs) or other data compute nodes (DCNs) rather than as Pods in a Kubernetes cluster. In this case, a front-end load balancer can still be configured to measure incoming traffic and provide this data to a scaler module (executing, e.g., on a different VM) that performs similar auto-scaling operations outside of the Kubernetes context.
  • In this example, the service chain 135 includes three services 140-150. The first service 140 receives traffic directly from the load balancer data plane 110 (potentially via Kubernetes ingress modules) and sends portions of this traffic (after performing processing) to both the second service 145 and the third service 150. The second service 145 also sends a portion of its traffic (after it has performed its processing) to the third service 150. The first service 140 is implemented using two Pods, the second service 145 is implemented using three Pods, and the third service 150 is implemented using a single Pod. Each Pod is allocated a particular amount of resources (e.g., memory and processing capability) that enable the Pod to perform its respective service on a particular number of requests in a given time period (e.g., requests/second). This per-Pod capacity can vary from one service to the next based on the physical resources allocated to each Pod for the service as well as the resources required to process an individual request by the service. In various embodiments, the services can include firewalls, forwarding elements (e.g., routers and/or switches), load balancers, VPN edges, intrusion detection and prevention services, logging functions, network address translation (NAT) functions, telecommunications network specific gateway functions, and other services, depending on the needs of the network.
  • FIG. 2 conceptually illustrates a more detailed view of the architecture of the scaler 200 of some embodiments (e.g., the scaler module 130). As shown, the scaler 200 includes a modeler 205, a metrics receiver 210, and an auto-scaling and deployment module 215. The modeler 205 receives the service chain topology as well as direct path coefficients and uses this information to (i) define a graph representing the service chain and (ii) compute the scaling factors for each of the services in the service chain (i.e., the scaling factors for estimating the percentage of traffic received at the first service that will be subsequently received at the other services). The modeler stores (e.g., in memory) the graph and scaling factors 220 for use by the auto-scaling and deployment module 215.
  • In some embodiments, the modeler 205 receives the service chain topology information from the services themselves or from other Kubernetes constructs. This topology indicates the direct paths between services in the service chain. The direct path coefficients specify, for each direct path in the service chain topology from a first service to a second service, the portion of traffic received at the first service that is forwarded on to the second service. For the example service chain 135 shown in FIG. 1 , the paths from the first service 140 to the second service 145, from the first service 140 to the third service 150, and from the second service 145 to the third service 150 each have their own associated direct path coefficients. In different embodiments, the direct path coefficients may be administrator-specified or be based on recent observation of the service chain traffic. In the latter case, traffic information from the services in the service chain is provided to the modeler 205, which regularly determines the ratios of traffic forwarded from one service to the next. The operations of the modeler 205 to define the service chain graph and compute the scaling factors are described further below by reference to FIG. 3 .
  • The metrics receiver 210 receives traffic metrics, including those indicating the amount of requests received at the first service in the service chain. In some embodiments, the metrics receiver 210 receives API schema information for the front-end load balancer from the ingress controller and uses this API information to retrieve the traffic metrics from the load balancer controller via API calls. The specific metrics received can include a total number of requests, requests per unit time (e.g., requests per second or millisecond), etc. As these metrics are retrieved, the metrics receiver 210 provides the metrics to the auto-scaling and deployment module 215.
  • The auto-scaling and deployment module 215 uses the scaling factors computed by the modeler 205 to determine, in real-time, whether any of the services in the service chain need to be scaled (e.g., either instantiation of additional Pods or removal of Pods) based on the traffic metrics. The capacity of each Pod is specified for each service (the capacity can vary between services) for one or more metrics (e.g., requests per unit time) and provided to the auto-scaling and deployment module 215 (e.g., as an administrator-provided variable or based on observation). The current value for this metric (as received from the load balancer controller and multiplied by the scaling factor for a given service) is divided by the Pod capacity for the service to determine the number of Pods that will be required for the service. If the actual number of Pods is less than the required number of Pods, then the auto-scaling and deployment module 215 manages the deployment of additional Pods for the service. In this manner, if a large increase in traffic is detected at the load balancer, all of the services can be scaled up to meet this demand prior to the receipt of all of those requests at the services. On the other hand, if the actual number of Pods deployed is greater than the required amount for the service, the auto-scaling and deployment module 215 manages the deletion of one or more Pods for the service. In different embodiments, the auto-scaling and deployment module 215 either handles the deployment/deletion operations directly or provides the necessary instructions to a Kubernetes control plane module that handles these operations for the cluster. The operations of the auto-scaling and deployment module 215 to predictively auto-scale the services of a service chain will be described in detail below by reference to FIG. 7 .
  • FIG. 3 conceptually illustrates a process 300 of some embodiments for computing scaling factors for a set of services in a service chain. In some embodiments, the process 300 is performed by a scaler module (e.g., the modeler 205 of the scaler module 200 shown in FIG. 2 ). In different embodiments, this process 300 (or a similar process) may be performed once at initial configuration of the service chain or at regular intervals (if the direct path coefficients change over time). The process 300 will be described by reference to FIGS. 4 and 5 , which illustrate an example of the calculation of scaling factors for a service chain.
  • As shown, the process 300 begins by receiving (at 305) a service chain topology and a set of direct path coefficients. The service chain topology indicates which services forward traffic directly to other services (i.e., indicates direct paths between services in the service chain). As noted above, the service chain topology information can be received from the services themselves or from other Kubernetes constructs. The direct path coefficients specify, for each direct path in the service chain topology from a first service to a second service, the portion of traffic received at the first service that is forwarded on to the second service. In different embodiments, the direct path coefficients may be administrator-specified or be based on recent observation of the service chain traffic. In the latter case, traffic information from the services in the service chain is provided to the scaler, which regularly determines the ratios of traffic forwarded from one service to the next (e.g., on an hourly basis based on the past hour of traffic).
  • The service chain topology, in some embodiments, is user-specified information that defines connections between services. Specifically, in the Kubernetes context, the service chain topology (referred to as NetworkServiceTopology) is a cluster-scoped construct capable of chaining services from different namespaces together in the cluster, thereby allowing service administrators to operate in their own namespaces while handing the job of service chaining to the infrastructure administrator. The following provides an example declaration for a connection between a first service (serviceA) in red namespace and a second service (serviceB) in blue namespace, with a direct path coefficient of 0.7:
  • kind: NetworkServiceTopology
  • metadata:
      • name: conn-svcA-svcB
  • spec:
      • sourceVertex: red/serviceA
      • destinationVertex: blue/serviceB
      • edgeWeight: 0.7
  • Next, the process 300 defines (at 310) a graph for the service chain. Specifically, some embodiments define a directed acyclic graph (DAG) based on the service chain topology (e.g., based on each user-specified connection). In this graph, each service is represented as a node and each direct path from one service to another is represented as an edge. Each edge from a first node to a second node has an associated coefficient (i.e., the direct path coefficient for the connection represented by that edge) that specifies an estimate of the percentage of requests received at the service represented by the first node that are forwarded to the service represented by the second node (as opposed to being dropped, blocked, or forwarded to a different service).
  • FIG. 4 conceptually illustrates an example of a DAG 400 for a service chain that includes five services. Each service (A-E) is represented by a node in the DAG 400, with each of the edges having an associated direct path coefficient. As shown, service A is expected to forward 70% of its traffic to service B, which is expected to forward 50% of its traffic to service C and 80% of its traffic to service E (meaning that service B is expected to forward at least some of its traffic to both services). Service E is expected, in turn, to forward 50% of its traffic to service C, which is expected to forward only 30% of its traffic to service D.
  • With the graph defined, the process generates scaling factors for each of the services in the service chain. Different embodiments use different specific calculations to compute these scaling factors, though they reduce to the same computation. For instance, some embodiments traverse through the graph starting from the beginning of the service chain and compute scaling factors for nodes that build on the computations for previous nodes in the graph. Other embodiments, as in the process 300, compute the scaling factor for each service separately.
  • As shown, the process 300 selects (at 315) a service in the service chain. In some embodiments, the process 300 begins with the first node in the directed graph and then proceeds to select nodes using a breadth-first traversal of the graph. Other embodiments select the nodes randomly. However, embodiments that compute the scaling factors for later services in the chain by building on previous computations cannot use a random selection.
  • The process 300 uses (at 320) the graph to identify paths from the first service in the service chain to the selected service in the service chain. Assuming a single ingress point for the service chain, when the first service is selected, there is no path discovery required (and no scaling factor computation needed, as the scaling factor is always equal to 1). In the example graph 400 shown in FIG. 4 , Service B has a single path (from service A). Service C, on the other hand has two paths (one from Service A to Service B to Service C and another from Service A to Service B to Service E to Service C). Service E only has a single path (Service A to Service B to Service E), while Service D also has two paths (one from Service A to Service B to Service C to Service D and another from Service A to Service B to Service E to Service C to Service D).
  • It should be noted that the example graph shown in FIG. 4 includes a single ingress node (Service A) and a single egress node (Service D, which does not forward traffic to any other service). However, other service chains may have other structures. In the case of multiple ingress services, the process 300 (or other processes) can be expanded to accommodate these multiple entry points to the graph. The scaling factors for the ingress services will depend on whether the front-end load balancer provides traffic metrics indicating the number of requests forwarded to each different ingress service (in which case the scaling factors are all equal to 1) or the total number of services provided to the service chain as a whole (in which case each ingress service has its own scaling factor). In addition, in the former case, scaling factors relative to each ingress service are calculated for each service in the service chain and in real-time the number of requests predicted for each service is a weighted sum over the requests being provided to each ingress service. Multiple egress services do not require a change to the computations of the scaling factors in some embodiments.
  • Next, for the selected service, the process 300 computes (at 325) an estimated percentage of the traffic received at the first service that arrives at the selected service via each path by using the direct path coefficients. The process 300 then sums (at 330) these percentages from the various different paths to the selected service in order to compute the scaling factor for the selected service. For a single path, the estimated percentage of traffic is computed by multiplying all of the direct path coefficients along that path.
  • FIG. 5 conceptually illustrates a table 500 showing these computations for each of the services shown in the graph 400 (relative to x, the ingress traffic). For Service A (the ingress service), the scaling factor is simply equal to 1. For Service B, there is only a single path represented by a single graph edge having a coefficient of 0.7, so the scaling factor of 0.7 is arrived at by simply using this single coefficient. The computation for Service C is more complicated. For Service C, two paths are identified; each of these includes the path from Service A to Service B, so the 0.7 coefficient can be factored out of the computation. The path from Service B to Service C has a coefficient of 0.5 while the path from Service B to Service E to Service C has a coefficient of 0.5 multiplied by 0.8. This results in a scaling factor of 0.63 for Service C, as shown in the graph. Service D is simply the scaling factor for Service C multiplied by the coefficient 0.3 for the path from Service C to Service D, equal to 0.189. Finally, there is a single path to reach Service E and the multiplication of the coefficients 0.7 and 0.8 results in a scaling factor of 0.56.
  • In the above example, all of the scaling factors are less than 1, as is common. However, in certain cases, a service may actually receive more traffic than the front-end load balancer if multiple different paths exist to the service. For instance, a logging application might receive log data from most or all applications in a cluster such that it receives multiple times the traffic that enters the cluster.
  • Returning to FIG. 3 , after computing the scaling factor for the selected service, the process 300 determines (at 335) whether more services remain in the service chain. If this is the case, the process returns to 315 to select the next service. On the other hand, once all of the scaling factors have been computed, the process 300 ends. It should be understood that the process 300 is a conceptual process. Not only might the scaling factor computations be computed slightly differently but some embodiments compute all or some of the scaling factors in parallel rather than using the serial process shown in the figure.
  • As mentioned, some embodiments use slightly different processes to compute the scaling factors based on the directed acyclic graph for the service chain. Specifically, for the following algorithm, the inputs are a DAG represented with an adjacency list (DG), a coefficient for each directed edge in the graph, and a starting point in the graph (S). The following pseudocode describes the algorithm:
  • BuildIncomingEdgeGraph(DG AdjacencyList)
     (IE Incoming Edges):
      for K in DG.keys():
       NodeList := DG(K)
       for N in NodeList:
        if N not in IE.keys():
         append (IE(N), K)
      return IE
    ModifiedBFS (DG AdjacencyList, S Node):
     IE = BuildIncomingEdgeGraph (DG)
     ScaleFactor(S) = 1
     Queue Q
     Enqueue(Q, S)
     while Q is not empty:
      V = Dequeue(Q)
       for N in DG(V):
        ScaleFactor(N) = ScaleFactor(N) + Weight(V,N)
        delete(IE(N), V)
        if IE(N) is empty:
         Enqueue (Queue, N)
  • In the above, the first step of the ModifiedBFS algorithm is to define the incoming edge graph (IE) for a node. The incoming edge graph for a particular node represents all the nodes that have an edge to that particular node. For example, in the graph 500 shown in FIG. 5 , IE(B)==[A]. To compute the IE for a node, the algorithm traverses through the keys of the adjacency list DG and finds out the next set of vertices for a key. For each node N, it appends the key in IE(N) list.
  • Once the IE map is computed for all of the nodes, the start node S is added to a queue and a modified version of breadth first search is performed. For a node in the queue, first the node is dequeued in V. After this, the neighbors of V are fetched. For each neighbor, its scaling factor is calculated and its incoming edge from V to N is deleted. Once no other incoming edges to N are found, N is enqueue to the queue. This ensures that nodes are not enqueued unless all of the incoming edges to that node are exhausted, because the scaling factor is only complete if all of the incoming edges are visited.
  • As described above, the scaling factors may be computed once or on a regular basis, depending on whether the direct path coefficients are fixed or the scaler receives statistics with which to determine those coefficients. In the latter case, the direct path coefficients are calculated dynamically in some embodiments, either because the user does not have the information to set these values or because the values are not constant and change over time. Such a heuristics-based approach allows for more accurate auto-scaling calculations, especially when these values change over time.
  • For input into the dynamic calculations, some embodiments use traffic metrics from the services. This information may be retrieved from the services themselves or from load balancers interposed between the services in a service chain. Some embodiments send traffic to the front-end load balancer for inter-service load balancing while other embodiments use other load balancers to handle traffic between one service and the next. The following pseudocode for an algorithm CalculateEdgeWeight uses the following variables: FS(e, t) represents flow data for a period of time on a given direct path, time range (TR) is the period of time over which a sliding window average is calculated, total flow (TF) is the sum of all incoming flow data for a node (e.g., a service or other application) in the given time range TR, and average flow (AvgF) is the average value of the incoming flow data for a node, which is obtained by dividing the total flow TF by the time range TR.
  • Initial condition for all edges: TF←0
    Cal culateEdgeWeight (Flow, G, TF) (AvgF):
     for all edges e in graph G(V,E):
      FS(e, t) ← Flow(e)
      TF(e) ← TF(e) + FS(e, t)
      if t > TR:
       TF(e) ← TF(e) − FS(e, t-TR)
       AvgF(e) ← TF(e)/TR
      return AvgF
  • The above algorithm determines the value of the average flow over a time period using a sliding window method to determine the direct path coefficient for an edge in the graph. An array Flow at time t stores the flow data for all of the edges and is stored in FlowStore (FS). To calculate the average weight, the algorithm finds the total flow (TF) for an edge by adding all of the flows for the edge from the current time t to (t-TR). At later times, the total flow for an edge is calculated by adding the latest flow for the edge and subtracting the value of the flow at the time t-TR. The average flow for an edge is calculated by dividing the total flow for the edge by the time range.
  • FIG. 6 conceptually illustrates a table 600 providing an example of flow data for a particular path between two applications. In this example, data points are retrieved for every 12 minutes, and a time range value of 5 provides a one-hour time period. This allows, at 9:48, the total flow to be calculated as 540 (100+110+120+90+120) for an average flow value of 108. At 10:00, the new flow value to add is 60 and so the total flow is equal to 500 (540−100+60), for an average flow value of 100. These flow values can then be used to calculate the direct path coefficient (i.e., by identifying what percentage of traffic sent to a particular service is forwarded to the next service in the service chain), and subsequently to calculate the scaling factors.
  • With the scaling factors determined, the scaler module determines in real-time whether each of the services needs to be scaled. Based on the traffic being received at the first service in the service chain (from the load balancer), some embodiments calculate the traffic expected at each service and determine whether the current deployment for that service has the capacity required to handle that traffic without dropping traffic or imposing longer latency. If the expected traffic is larger than current capacity for a given service, the scaler initiates deployment of additional instances (e.g., additional Pods) for that service.
  • FIG. 7 conceptually illustrates a process 700 of some embodiments for determining whether scaling of services in a service chain is required based on traffic expected to arrive at those services and initiating that scaling if needed. In some embodiments, the process 700 is performed by a scaler module (e.g., the auto-scaling and deployment module 215 of the scaler shown in FIG. 2 ). In some embodiments, the process 700 is performed at regular intervals or as metrics are retrieved from the front-end load balancer. The process 700 will be described by reference to FIGS. 8-12 , which illustrate examples of scaling the services in a service chain.
  • Specifically, FIG. 8 conceptually illustrates an example of a service chain 800 as deployed. The service chain 800 includes a first service (A) for which two Pods are instantiated, a second service (B) for which three Pods are instantiated, a third service (C) for which two Pods are deployed, and a fourth service (D) for which a single Pod is deployed. This figure also shows the direct path coefficients for each of the connections in the service chain 800. Service A receives data messages directly from the front-end load balancer 805 then forwards 40% of its traffic to Service B and 50% of its traffic to Service C. Service B sends 80% of its traffic to Service D and Service C sends 90% of its traffic to Service D.
  • As shown, the process 700 begins by receiving (at 705) traffic measurements at the ingress of a service chain, corresponding to the traffic at the first service in the service chain. As described, the front-end load balancer of some embodiments generates these metrics, which are retrievable by the scaler module (e.g., using API calls in the load balancer schema). The received metrics provide a measure of incoming traffic to the first service. This may be measured in an absolute number of requests, a rate of requests (e.g., requests per second), a latency measure (which can be assumed to scale linearly with the request rate, or other metrics.
  • The scaler is then able to use these received metrics to scale each of the services in the service chain. The scaler determines, for each service, the expected traffic to reach that service (based on the scaling factor) and whether the current deployment for the service will have adequate capacity to handle that expected traffic. If the current deployment is inadequate, the scaler initiates deployment of one or more additional instances; if the current deployment should be reduced, the scaler initiates deletion of one or more existing instances.
  • FIG. 9 conceptually illustrates scaling the deployment of the service chain 800 over two stages 905-910 in response to receiving a first traffic measurement. The first stage 905 of FIG. 9 shows the service chain 800 as deployed in FIG. 8 , while the second stage 910 shows this service chain 800 after being scaled up in response to the traffic measurement.
  • FIG. 10 illustrates a table 1000 showing the computations to arrive at the scaling decisions for the example shown in FIG. 9 . As shown, each of the services in the service chain 800 has an associated scaling factor (1 for Service A, 0.4 for Service B, 0.5 for Service C, and 0.77 for Service D) computed as described above based on the direct path coefficients shown in FIG. 8 .
  • Returning to FIG. 7 , the process 700 selects (at 710) a service in the service chain. Some embodiments select the services by traversing the graph of the service chain topology (as with the scaling factor calculation), while other embodiments select services using different techniques (e.g., randomly). It should be noted that the process 700 is a conceptual process and other processes might perform slightly different operations or perform the operations in a different order. For instance, although the process 700 shows each service being evaluated serially, other embodiments might evaluate whether to scale each of the services in parallel.
  • The process 700 then determines (at 715) the current capacity of the selected service. In some embodiments, each service has a different capacity per instance (e.g., per Pod). This capacity may vary based on (i) the physical resources allocated to each instance for the service and (ii) the type of processing performed by the service. In some embodiments, different administrators for the different services set their own configurations for the physical resources allocated to each Pod, which may vary from service to service as a result. In addition, different types of data message processing can require different amounts of memory and/or processing power. For instance, a service that performs layer 7 processing (e.g., a deep packet inspection service) might require more resources per data message than a service that only performs layer 2 or layer 3 processing (e.g., an L2/L3 firewall service). Typically, the current capacity for a given service is the per-instance capacity multiplied by the currently-deployed number of instances.
  • In the example of FIG. 10 , each of the services has a different per-Pod capacity. Each Pod for Service A can handle 1000 requests/second, each Pod for Service B can handle 300 requests/second, each Pod for Service C can handle 600 requests/second, and each Pod for Service D can handle 1250 requests/second. Given the current deployment, Service A can handle 2000 requests/second, Service B can handle 900 requests/second, Service C can handle 1200 requests/second, and Service D can handle 1250 requests/second.
  • Next, the process 700 computes (at 720) the traffic expected to be received at the selected service based on the scaling factor for the service and the traffic at the first service. Some embodiments compute this expected traffic for the service by simply multiplying the traffic seen at the first service (i.e., from the front-end load balancer) by the scaling factor for the service. As shown in FIG. 10 , the first service (Service A) is expected to receive 3200 requests/second, the traffic flow retrieved from the load balancer. Service B is expected to receive 1280 requests/second (multiplying 3200 by the scaling factor of 0.4), Service C is expected to receive 1600 requests/second (multiplying 3200 by the scaling factor of 0.5), and Service D is expected to receive 2464 requests/second (multiplying 3200 by the scaling factor of 0.77).
  • Based on the per-instance capacity and the expected traffic for the selected service, the process 700 computes (at 725) the required number of instances for that service. This value can be computed by dividing the expected traffic by the per-instance capacity and applying the ceiling function that rounds a decimal value up to the next integer (i.e., 3.1 and 3.9 are both rounded to 4). Some embodiments also add a safety factor to the value before applying the ceiling function (e.g., adding 0.1 so that 2.95 becomes 3.05, which rounds to 4) in case the traffic to a service increases more than expected (e.g., based on more traffic than expected being forwarded by one or more of the services in the chain).
  • In the example shown in FIG. 10 , the expected traffic for Service A is 3200 requests/second, which requires 4 Pods when divided by the per-Pod capacity of 1000 requests/second (with 3.2 rounding up to 4). The expected traffic for Service B is 1280 requests/second, which requires 5 Pods when divided by the per-Pod capacity of 300 requests/second (with 4.27 rounding up to 5). The expected traffic for Service C is 1600 requests/second, which requires 3 Pods when divided by the per-Pod capacity of 600 requests/second (with 2.67 rounding up to 3). Lastly, the expected traffic for Service D is 2464 requests/second, which requires 2 Pods when divided by the per-Pod capacity of 1250 requests/second (with 1.97 rounding up to 2). However, if a small safety factor were applied, 3 Pods would be required for Service D.
  • The process 700 then determines (at 730) whether to scale the selected service. If the service should be scaled, the process determines (at 735) the scaling action for the selected service. As discussed, some embodiments scale the services predictively so that the additional instances are deployed prior to the existing instances for the service being overloaded by the incoming traffic. In the example, shown in the table 1000 of FIG. 10 , Services A and B should be scaled by adding 2 Pods for each service while Services C and D should be scaled by adding 1 Pod for each service. Owing to (1) the different per-instance capacity of different applications (in this case, the services) and (2) the different expected traffic flow reaching each different applications based on the computed scaling factors, in many cases different numbers of instances will be required due to the same change in ingress traffic.
  • Next, the process 700 determines (at 740) whether any more services in the service chain remain for evaluation. If additional services remain, the process 700 returns to 710 to select the next service and determine whether to scale that service. As mentioned, in some embodiments the various services are evaluated in parallel to determine whether to scale each of the services, rather than in a serial loop as shown in the figure.
  • Once all of the services have been evaluated, the process 700 initiates (at 745) deployment of additional instances or removal of instances based on the determined scaling actions (i.e., the actions determined at 735 for each service), then ends. In some embodiments, the scaler module itself modifies the deployment to change the number of instances for the different services. In the Kubernetes context, the scaler module edits one or more configuration objects that define the deployment to change the number of Pods implementing each service in some embodiments. In other embodiments, the scaler module provides the deployment edits to another component (e.g., an auto-scaling application in the Kubernetes control plane) that handles the modification to the deployment.
  • The second stage 910 of FIG. 9 shows that, as a result of the calculations made by the scaler module 915 (as shown in the table 1000), the scaler has modified the deployment of the service chain 800. Service A now has 4 Pods deployed, Service B now has 5 Pods deployed, Service C now has 3 Pods deployed, and Service D now has 2 Pods deployed. While the figure conceptually shows the scaler 915 directly modifying the services, as noted above in some embodiments the scaler either modifies configuration objects for the services directly or hands off the deployment to a separate component (e.g., an auto-scaling application) that modifies these configuration objects.
  • As noted, in addition to scaling up a set of applications (e.g., the services in a service chain, as shown in FIG. 9 ), in some embodiments the scaler module can also determine when to scale down one or more applications. That is, if the number of incoming requests drops, the scaler can perform similar calculations to determine that fewer instances of at least some of the applications are required. For scaling up due to increased traffic, the predictiveness helps avoid increases in latency and dropped packets. For scaling down, the predictiveness is not as crucial but can help free up resources more quickly when those resources are not needed for the current traffic levels.
  • FIG. 11 conceptually illustrates scaling down the deployment of the service chain 800 over two stages 1105-1110 in response to receiving a second traffic measurement. The first stage 1105 of FIG. 11 shows the service chain 800 as deployed in the second stage 910 of FIG. 9 , while the second stage 1110 shows this service chain 800 after some of the services have been scaled down in response to the second traffic measurement.
  • FIG. 12 illustrates a table 1200 showing the computations to arrive at the scaling decisions for the example shown in FIG. 11 . The scaling factors and per-Pod capacity for these services are the same as shown in the table 1000 of FIG. 10 . In this case, Service A is expected to receive 2500 requests/second (the traffic flow retrieved from the load balancer), a drop in traffic as compared to the measurement shown in FIG. 10 that caused all of the services to be scaled up. Service B is expected to receive 1000 requests/second (multiplying 2500 by the scaling factor of 0.4), Service C is expected to receive 1250 requests/second (multiplying 2500 by the scaling factor of 0.5), and Service D is expected to receive 1925 requests/second (multiplying 2500 by the scaling factor of 0.77).
  • The expected traffic for Service A of 2500 requests/second requires 3 Pods when divided by the per-Pod capacity of 1000 requests/second (with 2.5 rounding up to 3). The expected traffic for Service B of 1000 requests/second requires 4 Pods when divided by the per-Pod capacity of 300 requests/second (with 3.33 rounding up to 4). The expected traffic for Service C of 1250 requests/second requires 3 Pods when divided by the per-Pod capacity of 600 requests/second (with 2.08 rounding up to 3). Lastly, the expected traffic for Service D of 1925 requests/second requires 2 Pods when divided by the per-Pod capacity of 1250 requests/second (with 1.54 rounding up to 2).
  • As a result of these calculations, the scaler 915 determines that one Pod should be removed from the deployments of Services A and B, but the deployments of Services C and D do not require updating. As such, the second stage 1110 of FIG. 11 shows that, as a result of the calculations made by the scaler module 915 (as shown in the table 1200), the scaler 915 has modified the deployment of the service chain 800. Service A now has only 3 Pods deployed, Service B now has 4 Pods deployed, while Service C and Service D are unchanged.
  • FIG. 13 conceptually illustrates an electronic system 1300 with which some embodiments of the invention are implemented. The electronic system 1300 may be a computer (e.g., a desktop computer, personal computer, tablet computer, server computer, mainframe, a blade computer etc.), phone, PDA, or any other sort of electronic device. Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media. Electronic system 1300 includes a bus 1305, processing unit(s) 1310, a system memory 1325, a read-only memory 1330, a permanent storage device 1335, input devices 1340, and output devices 1345.
  • The bus 1305 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1300. For instance, the bus 1305 communicatively connects the processing unit(s) 1310 with the read-only memory 1330, the system memory 1325, and the permanent storage device 1335.
  • From these various memory units, the processing unit(s) 1310 retrieve instructions to execute and data to process in order to execute the processes of the invention. The processing unit(s) may be a single processor or a multi-core processor in different embodiments.
  • The read-only-memory (ROM) 1330 stores static data and instructions that are needed by the processing unit(s) 1310 and other modules of the electronic system. The permanent storage device 1335, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1300 is off. Some embodiments of the invention use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1335.
  • Other embodiments use a removable storage device (such as a floppy disk, flash drive, etc.) as the permanent storage device. Like the permanent storage device 1335, the system memory 1325 is a read-and-write memory device. However, unlike storage device 1335, the system memory is a volatile read-and-write memory, such a random-access memory. The system memory stores some of the instructions and data that the processor needs at runtime. In some embodiments, the invention's processes are stored in the system memory 1325, the permanent storage device 1335, and/or the read-only memory 1330. From these various memory units, the processing unit(s) 1310 retrieve instructions to execute and data to process in order to execute the processes of some embodiments.
  • The bus 1305 also connects to the input and output devices 1340 and 1345. The input devices enable the user to communicate information and select commands to the electronic system. The input devices 1340 include alphanumeric keyboards and pointing devices (also called “cursor control devices”). The output devices 1345 display images generated by the electronic system. The output devices include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD). Some embodiments include devices such as a touchscreen that function as both input and output devices.
  • Finally, as shown in FIG. 13 , bus 1305 also couples electronic system 1300 to a network 1365 through a network adapter (not shown). In this manner, the computer can be a part of a network of computers (such as a local area network (“LAN”), a wide area network (“WAN”), or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1300 may be used in conjunction with the invention.
  • Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media). Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM), recordable compact discs (CD-R), rewritable compact discs (CD-RW), read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM), a variety of recordable/rewritable DVDs (e.g., DVD-RAM, DVD−RW, DVD+RW, etc.), flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc.), magnetic and/or solid state hard drives, read-only and recordable Blu-Ray® discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.
  • While the above discussion primarily refers to microprocessor or multi-core processors that execute software, some embodiments are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself.
  • As used in this specification, the terms “computer”, “server”, “processor”, and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification, the terms “computer readable medium,” “computer readable media,” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.
  • This specification refers throughout to computational and network environments that include virtual machines (VMs). However, virtual machines are merely one example of data compute nodes (DCNs) or data compute end nodes, also referred to as addressable nodes. DCNs may include non-virtualized physical hosts, virtual machines, containers that run on top of a host operating system without the need for a hypervisor or separate operating system, and hypervisor kernel network interface modules.
  • VMs, in some embodiments, operate with their own guest operating systems on a host using resources of the host virtualized by virtualization software (e.g., a hypervisor, virtual machine monitor, etc.). The tenant (i.e., the owner of the VM) can choose which applications to operate on top of the guest operating system. Some containers, on the other hand, are constructs that run on top of a host operating system without the need for a hypervisor or separate guest operating system. In some embodiments, the host operating system uses name spaces to isolate the containers from each other and therefore provides operating-system level segregation of the different groups of applications that operate within different containers. This segregation is akin to the VM segregation that is offered in hypervisor-virtualized environments that virtualize system hardware, and thus can be viewed as a form of virtualization that isolates different groups of applications that operate in different containers. Such containers are more lightweight than VMs.
  • Hypervisor kernel network interface modules, in some embodiments, is a non-VM DCN that includes a network stack with a hypervisor kernel network interface and receive/transmit threads. One example of a hypervisor kernel network interface module is the vmknic module that is part of the ESXi™ hypervisor of VMware, Inc.
  • It should be understood that while the specification refers to VMs, the examples given could be any type of DCNs, including physical hosts, VMs, non-VM containers, and hypervisor kernel network interface modules. In fact, the example networks could include combinations of different types of DCNs in some embodiments.
  • While the invention has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the invention can be embodied in other specific forms without departing from the spirit of the invention. In addition, a number of the figures (including FIGS. 3 and 7 ) conceptually illustrate processes. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process. Thus, one of ordinary skill in the art would understand that the invention is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims.

Claims (21)

1. A method for scaling a service chain that comprises a plurality of services, each service provided by one or more instances of the service, the method comprising:
identifying that a first service in the service chain has received a number of requests;
for each service in the service chain:
identifying a scaling factor that estimates a portion of requests received at the first service that will be subsequently received at the service; and
deploying a number of additional instances of the service based on the identified scaling factor for the service and the number of requests received at the first service.
2. The method of claim 1 further comprising, prior to identifying that the first service chain has received the number of requests, computing the scaling factor for each service in the service chain.
3. The method of claim 2, wherein:
computing the scaling factors comprises defining a graph of the service chain; and
each service in the service chain is represented as a node of the graph and each direct path from a particular service to another particular service in the service chain is represented as an edge of the graph from the node representing the particular service to the node representing the other particular service.
4. The method of claim 3, wherein each edge representing a direct path from a particular service to another particular service has an associated coefficient that specifies an estimate of a percentage of requests received at the particular service that are sent to the other particular service.
5. The method of claim 4, wherein computing the scaling factor for each respective service comprises:
identifying each path through the service chain from first service to the respective service;
for each identified path to the respective service, multiplying each coefficient along the identified path to compute a factor for the path; and
adding together the factors for all of the identified paths to the respective service to compute the scaling factor for the respective service.
6. The method of claim 4, wherein the associated coefficients are user-specified.
7. The method of claim 4, wherein the associated coefficients are calculated based on measurements from the services in the service chain.
8. The method of claim 1, wherein identifying that the first service has received the number of requests comprises receiving ingress metrics measured by a load balancer that processes incoming traffic and forwards the incoming traffic to the first service in the service chain.
9. The method of claim 1, wherein the scaling factors are based on percentages of the requests being dropped or blocked by the services in the service chain.
10. The method of claim 1, wherein the services in service chain are virtualized network functions (VNFs) in a telecommunications network.
11. The method of claim 10, wherein at least a subset of the requests are audio calls and video calls.
12. The method of claim 10, wherein the services comprise at least one of a firewall, an intrusion detection and prevention system, a load balancer, a forwarding element, and a virtual private network (VPN) edge.
13. The method of claim 10, wherein at least a subset of the VNFs are container network functions.
14. A non-transitory machine-readable medium storing a program which when executed by at least one processing unit scales a service chain that comprises a plurality of services, each service provided by one or more instances of the service, the program comprising sets of instructions for:
identifying that a first service in the service chain has received a number of requests; and
for each service in the service chain:
identifying a scaling factor that estimates a portion of requests received at the first service that will be subsequently received at the service; and
deploying a number of additional instances of the service based on the identified scaling factor for the service and the number of requests received at the first service.
15. The non-transitory machine-readable medium of claim 14, wherein the program further comprises a set of instructions for computing the scaling factor for each service in the service chain prior to identifying that the first service chain has received the number of requests.
16. The non-transitory machine-readable medium of claim 15, wherein:
the set of instructions for computing the scaling factors comprises a set of instructions for defining a graph of the service chain;
each service in the service chain is represented as a node of the graph and each direct path from a particular service to another particular service in the service chain is represented as an edge of the graph from the node representing the particular service to the node representing the other particular service; and
each edge representing a direct path from a particular service to another particular service has an associated coefficient that specifies an estimate of a percentage of requests received at the particular service that are sent to the other particular service.
17. The non-transitory machine-readable medium of claim 16, wherein the set of instructions for computing the scaling factor for each respective service comprises sets of instructions for:
identifying each path through the service chain from first service to the respective service;
for each identified path to the respective service, multiplying each coefficient along the identified path to compute a factor for the path; and
adding together the factors for all of the identified paths to the respective service to compute the scaling factor for the respective service.
18. The non-transitory machine-readable medium of claim 14, wherein the set of instructions for identifying that the first service has received the number of requests comprises a set of instructions for receiving ingress metrics measured by a load balancer that processes incoming traffic and forwards the incoming traffic to the first service in the service chain.
19. The non-transitory machine-readable medium of claim 14, wherein the scaling factors are based on percentages of the requests being dropped or blocked by the services in the service chain.
20. The non-transitory machine-readable medium of claim 14, wherein the services in service chain are virtualized network functions (VNFs) in a telecommunications network.
21. The non-transitory machine-readable medium of claim 20, wherein at least a subset of the VNFs are container network functions.
US17/729,776 2022-01-19 2022-04-26 Collective scaling of applications Pending US20230232195A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2022/039025 WO2023140895A1 (en) 2022-01-19 2022-08-01 Predictive scaling of application based on traffic at another application

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IN202241003041 2022-01-19
IN202241003045 2022-01-19
IN202241003045 2022-01-19
IN202241003041 2022-01-19

Publications (1)

Publication Number Publication Date
US20230232195A1 true US20230232195A1 (en) 2023-07-20

Family

ID=87161473

Family Applications (2)

Application Number Title Priority Date Filing Date
US17/729,776 Pending US20230232195A1 (en) 2022-01-19 2022-04-26 Collective scaling of applications
US17/729,774 Active US11800335B2 (en) 2022-01-19 2022-04-26 Predictive scaling of application based on traffic at another application

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/729,774 Active US11800335B2 (en) 2022-01-19 2022-04-26 Predictive scaling of application based on traffic at another application

Country Status (1)

Country Link
US (2) US20230232195A1 (en)

Family Cites Families (74)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8776050B2 (en) 2003-08-20 2014-07-08 Oracle International Corporation Distributed virtual machine monitor for managing multiple virtual resources across multiple physical nodes
US8555287B2 (en) 2006-08-31 2013-10-08 Bmc Software, Inc. Automated capacity provisioning method using historical performance data
US9239996B2 (en) 2010-08-24 2016-01-19 Solano Labs, Inc. Method and apparatus for clearing cloud compute demand
US8499066B1 (en) * 2010-11-19 2013-07-30 Amazon Technologies, Inc. Predicting long-term computing resource usage
US10740353B2 (en) 2010-12-23 2020-08-11 Mongodb, Inc. Systems and methods for managing distributed database deployments
JP5843459B2 (en) 2011-03-30 2016-01-13 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Information processing system, information processing apparatus, scaling method, program, and recording medium
US8806018B2 (en) * 2011-04-01 2014-08-12 Carnegie Mellon University Dynamic capacity management of multiple parallel-connected computing resources
US9329904B2 (en) 2011-10-04 2016-05-03 Tier 3, Inc. Predictive two-dimensional autoscaling
US8856797B1 (en) 2011-10-05 2014-10-07 Amazon Technologies, Inc. Reactive auto-scaling of capacity
US9372735B2 (en) 2012-01-09 2016-06-21 Microsoft Technology Licensing, Llc Auto-scaling of pool of virtual machines based on auto-scaling rules of user associated with the pool
US9170849B2 (en) 2012-01-09 2015-10-27 Microsoft Technology Licensing, Llc Migration of task to different pool of resources based on task retry count during task lease
US20130179289A1 (en) 2012-01-09 2013-07-11 Microsoft Corportaion Pricing of resources in virtual machine pools
US20130179894A1 (en) 2012-01-09 2013-07-11 Microsoft Corporation Platform as a service job scheduling
US20210349749A1 (en) 2012-02-14 2021-11-11 Aloke Guha Systems and methods for dynamic provisioning of resources for virtualized
US8918510B2 (en) 2012-04-27 2014-12-23 Hewlett-Packard Development Company, L. P. Evaluation of cloud computing services
US9329915B1 (en) 2012-05-08 2016-05-03 Amazon Technologies, Inc. System and method for testing in a production environment
US9161064B2 (en) * 2012-08-23 2015-10-13 Adobe Systems Incorporated Auto-scaling management of web content
US9256452B1 (en) 2012-11-14 2016-02-09 Amazon Technologies, Inc. Providing an instance availability estimate
US9032078B2 (en) 2013-01-02 2015-05-12 International Business Machines Corporation Predictive scaling for clusters
US9331952B2 (en) 2013-01-02 2016-05-03 International Business Machines Corporation Modifying an assignment of nodes to roles in a computing environment
WO2014116888A1 (en) 2013-01-25 2014-07-31 REMTCS Inc. Network security system, method, and apparatus
US9817699B2 (en) 2013-03-13 2017-11-14 Elasticbox Inc. Adaptive autoscaling for virtualized applications
US9596299B2 (en) 2013-04-06 2017-03-14 Citrix Systems, Inc. Systems and methods for dynamically expanding load balancing pool
US9459980B1 (en) 2013-04-17 2016-10-04 Amazon Technologies, Inc. Varying cluster sizes in a predictive test load while testing a productive system
US9491063B2 (en) 2013-05-15 2016-11-08 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for providing network services orchestration
US9288193B1 (en) 2013-06-25 2016-03-15 Intuit Inc. Authenticating cloud services
US9412075B2 (en) 2013-08-23 2016-08-09 Vmware, Inc. Automated scaling of multi-tier applications using reinforced learning
US9386086B2 (en) 2013-09-11 2016-07-05 Cisco Technology Inc. Dynamic scaling for multi-tiered distributed systems using payoff optimization of application classes
US10489807B1 (en) 2013-11-11 2019-11-26 Amazon Technologles, Inc. Non-deterministic load-testing
JP6248560B2 (en) 2013-11-13 2017-12-20 富士通株式会社 Management program, management method, and management apparatus
US9300552B2 (en) 2013-12-16 2016-03-29 International Business Machines Corporation Scaling a cloud infrastructure
KR20150083713A (en) 2014-01-10 2015-07-20 삼성전자주식회사 Electronic device and method for managing resource
US10237135B1 (en) 2014-03-04 2019-03-19 Amazon Technologies, Inc. Computing optimization
WO2015132945A1 (en) 2014-03-07 2015-09-11 株式会社日立製作所 Performance evaluation method and information processing device
US10003550B1 (en) 2014-03-14 2018-06-19 Amazon Technologies Smart autoscaling of a cluster for processing a work queue in a distributed system
US9842039B2 (en) 2014-03-31 2017-12-12 Microsoft Technology Licensing, Llc Predictive load scaling for services
US9979617B1 (en) 2014-05-15 2018-05-22 Amazon Technologies, Inc. Techniques for controlling scaling behavior of resources
US9674302B1 (en) 2014-06-13 2017-06-06 Amazon Technologies, Inc. Computing resource transition notification and pending state
KR102295966B1 (en) 2014-08-27 2021-09-01 삼성전자주식회사 Method of Fabricating Semiconductor Devices Using Nanowires
US9935829B1 (en) 2014-09-24 2018-04-03 Amazon Technologies, Inc. Scalable packet processing service
US10467036B2 (en) * 2014-09-30 2019-11-05 International Business Machines Corporation Dynamic metering adjustment for service management of computing platform
US9825881B2 (en) 2014-09-30 2017-11-21 Sony Interactive Entertainment America Llc Methods and systems for portably deploying applications on one or more cloud systems
US10171371B2 (en) 2014-09-30 2019-01-01 International Business Machines Corporation Scalable metering for cloud service management based on cost-awareness
US9547534B2 (en) 2014-10-10 2017-01-17 International Business Machines Corporation Autoscaling applications in shared cloud resources
US9613120B1 (en) 2014-11-11 2017-04-04 Amazon Technologies, Inc. Replicated database startup for common database storage
US10355934B2 (en) 2014-12-03 2019-07-16 Amazon Technologies, Inc. Vertical scaling of computing instances
US10432734B2 (en) 2014-12-12 2019-10-01 Hewlett Packard Enterprise Development Lp Cloud service tuning
JP6520959B2 (en) 2015-01-30 2019-05-29 日本電気株式会社 Node system, server device, scaling control method and program
US9825875B2 (en) 2015-03-31 2017-11-21 Alcatel Lucent Method and apparatus for provisioning resources using clustering
US10412020B2 (en) * 2015-04-30 2019-09-10 Amazon Technologies, Inc. Background processes in update load balancers of an auto scaling group
US9848041B2 (en) 2015-05-01 2017-12-19 Amazon Technologies, Inc. Automatic scaling of resource instance groups within compute clusters
US10789542B2 (en) 2015-06-05 2020-09-29 Apple Inc. System and method for predicting changes in network quality
US9910755B2 (en) 2015-06-26 2018-03-06 Amazon Technologies, Inc. Retrieval of authoritative measurement data from in-memory datastores
US9959188B1 (en) 2015-07-02 2018-05-01 Amazon Technologies, Inc. Managing processor usage of a physical host configured for hosting computing instances
US11115273B2 (en) 2015-07-13 2021-09-07 Telefonaktiebolaget Lm Ericsson (Publ) Analytics-driven dynamic network design and configuration
US10594562B1 (en) 2015-08-25 2020-03-17 Vmware, Inc. Intelligent autoscale of services
US10419530B2 (en) * 2015-11-02 2019-09-17 Telefonaktiebolaget Lm Ericsson (Publ) System and methods for intelligent service function placement and autoscale based on machine learning
US10089135B2 (en) 2016-08-09 2018-10-02 International Business Machines Corporation Expediting the provisioning of virtual machines based on cached repeated portions of a template
US11886922B2 (en) * 2016-09-07 2024-01-30 Pure Storage, Inc. Scheduling input/output operations for a storage system
US20220261164A1 (en) * 2016-10-20 2022-08-18 Pure Storage, Inc. Configuring Storage Systems Based On Storage Utilization Patterns
CN108366082B (en) * 2017-01-26 2020-03-10 华为技术有限公司 Capacity expansion method and capacity expansion device
US11349708B2 (en) * 2017-03-09 2022-05-31 Telefonaktiebolaget L M Ericsson (Publ) Configuration generation for virtual network functions (VNFs) with requested service availability
US10873541B2 (en) 2017-04-17 2020-12-22 Microsoft Technology Licensing, Llc Systems and methods for proactively and reactively allocating resources in cloud-based networks
US10887380B2 (en) * 2019-04-01 2021-01-05 Google Llc Multi-cluster ingress
US11388054B2 (en) 2019-04-30 2022-07-12 Intel Corporation Modular I/O configurations for edge computing using disaggregated chiplets
WO2021040584A1 (en) * 2019-08-26 2021-03-04 Telefonaktiebolaget Lm Ericsson (Publ) Entity and method performed therein for handling computational resources
US20210126871A1 (en) * 2019-10-25 2021-04-29 Red Hat, Inc. Outlier event autoscaling in a cloud computing system
US11842214B2 (en) * 2021-03-31 2023-12-12 International Business Machines Corporation Full-dimensional scheduling and scaling for microservice applications
EP4086764A1 (en) * 2021-05-06 2022-11-09 Ateme Method for dynamic resources allocation and apparatus for implementing the same
US11625319B2 (en) * 2021-06-14 2023-04-11 Intuit Inc. Systems and methods for workflow based application testing in cloud computing environments
US11838206B2 (en) * 2021-07-23 2023-12-05 Vmware, Inc. Edge node with datapath split between pods
US20230028837A1 (en) * 2021-07-23 2023-01-26 Vmware, Inc. Scaling for split-networking datapath
US20230073891A1 (en) * 2021-09-09 2023-03-09 Beijing Bytedance Network Technology Co., Ltd. Multifunctional application gateway for security and privacy
KR20230069483A (en) * 2021-11-12 2023-05-19 한국전자기술연구원 Method for optimal resource selection based on available GPU resource analysis in a large-scale container platform

Also Published As

Publication number Publication date
US11800335B2 (en) 2023-10-24
US20230231933A1 (en) 2023-07-20

Similar Documents

Publication Publication Date Title
US10491688B2 (en) Virtualized network function placements
US9749402B2 (en) Workload deployment with real-time consideration of global network congestion
US20230185630A1 (en) Monitoring and optimizing interhost network traffic
US20230144041A1 (en) Determining an end user experience score based on client device, network, server device, and application metrics
US9756121B2 (en) Optimizing routing and load balancing in an SDN-enabled cloud during enterprise data center migration
US9465630B1 (en) Assigning dynamic weighted variables to cluster resources for virtual machine provisioning
CN107689925A (en) Load balance optimization method and device based on cloud monitoring
Shah et al. Load balancing in cloud computing: Methodological survey on different types of algorithm
US20140347998A1 (en) Ensuring predictable and quantifiable networking performance
US10725810B2 (en) Migrating virtualized computing instances that implement a logical multi-node application
US11138047B2 (en) Efficient network services with performance lag prediction and prevention
US11936563B2 (en) Enhanced network stack
Zinner et al. A discrete-time model for optimizing the processing time of virtualized network functions
US11800335B2 (en) Predictive scaling of application based on traffic at another application
WO2023140895A1 (en) Predictive scaling of application based on traffic at another application
US20240039813A1 (en) Health analytics for easier health monitoring of a network
US20150347174A1 (en) Method, Apparatus, and System for Migrating Virtual Machine
US20230007839A1 (en) Measuring performance of virtual desktop event redirection
Gouareb et al. Placement and routing of vnfs for horizontal scaling
US20140379899A1 (en) Automatic adjustment of application launch endpoints
US11558775B2 (en) Determining rate differential weighted fair output queue scheduling for a network device
Na et al. Optimal service placement using pseudo service chaining mechanism for cloud-based multimedia services
US9461933B2 (en) Virtual server system, management server device, and system managing method
JP5482052B2 (en) Observation analysis apparatus and observation analysis method
US20240073110A1 (en) Multi-cloud recommendation engine for customer workloads

Legal Events

Date Code Title Description
AS Assignment

Owner name: VMWARE, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BISWAS, SUDIPTA;DAS, MONOTOSH;SHAW, HEMANT KUMAR;AND OTHERS;SIGNING DATES FROM 20220401 TO 20220423;REEL/FRAME:059731/0741

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

AS Assignment

Owner name: VMWARE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:VMWARE, INC.;REEL/FRAME:066692/0103

Effective date: 20231121