US20220300344A1 - Flexible credential supported software service provisioning - Google Patents
Flexible credential supported software service provisioning Download PDFInfo
- Publication number
- US20220300344A1 US20220300344A1 US17/619,062 US202017619062A US2022300344A1 US 20220300344 A1 US20220300344 A1 US 20220300344A1 US 202017619062 A US202017619062 A US 202017619062A US 2022300344 A1 US2022300344 A1 US 2022300344A1
- Authority
- US
- United States
- Prior art keywords
- load
- engine
- convergent
- metrics
- deployments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/302—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G06F11/3433—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3447—Performance evaluation by modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/12—Arrangements for remote connection or disconnection of substations or of equipment thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/40—Bus networks
- H04L12/40006—Architecture of a communication node
- H04L12/40039—Details regarding the setting of the power status of a node according to activity on the bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5019—Workload prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/40—Bus networks
- H04L2012/40267—Bus for use in transportation systems
- H04L2012/40273—Bus for use in transportation systems the transportation system being a vehicle
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Definitions
- Capacity planning and cost optimization for software operations are areas of ongoing research and development. Over-provisioning leads to resource waste and extra cost, yet the industry standard is an average of 80-93% over-provisioned. Under-provisioning causes performance degradation and violation of SLAs. Research shows that performance degradation in web applications can result in up to 75% increase in churn. “Preliminary results . . . of cloud service availability show an average of 7 . 738 hours unavailable per year or 99.91% availability . . . The cost of these failures amounts for almost 285 million USDs based on hourly costs accepted in industry.” Downtime Statistics of Current Cloud Solutions (Updated version—March 2014) by Cérin et al.
- the system operates on software deployment orchestration platforms such as Kubernetes that expose application and resource metrics as well as provide standard scaling and resourcing mechanisms. Users declare performance objectives and the system learns application behavior and load profile to determine minimum cost resourcing to meet the declared performance objectives.
- FIG. 1 depicts a diagram of an example of a predictive autoscaling and resource optimization system.
- FIG. 2 depicts a graph that compares resource provisioning pursuant to the recommendations of a reactive recommendation engine with resource provisioning pursuant to the recommendations of a predicted recommendation engine.
- FIG. 3 is a diagram of a total forecasted load versus total actual load chart and associated code display.
- FIG. 4 depicts a flowchart of an example of a method of predictive autoscaling and resource optimization.
- FIG. 5 depicts a flowchart of an example of generating predictive autoscaling and resource optimization results in association with a machine learning process.
- FIG. 6 depicts a diagram of an example of a system for generating minimum cost optimization parameters.
- FIG. 1 depicts a diagram 100 of an example of a predictive autoscaling and resource optimization system.
- resources can be characterized as central processing unit (CPU), memory, network input/output (I/O), disk I/O, graphics processing unit (GPU), and/or other applicable resources.
- the diagram 100 includes a computer-readable medium (CRM) 102 , a service level agreement (SLA) metric datastore 104 coupled to the CRM 102 , a feedforward control system for a software orchestration platform 106 coupled to the CRM 102 , convergent deployments 122 coupled to the CRM 102 , and a load distribution and metrics engine 124 coupled to the CRM 102 .
- CRM computer-readable medium
- SLA service level agreement
- the feedforward control system for a software orchestration platform 106 includes a declarative performance interface engine 108 , a predictive autoscaling and resource optimization operator engine 110 , a dynamics estimation engine 112 , an application load forecasting engine 114 , a minimum cost optimization engine 116 , an optimal configuration for scale resources actuator engine 118 , and a convergent deployment, resource, and application level metrics collection engine 120 .
- a predictive autoscaling and resource optimization system can be fully implemented; implemented as a staged integration (e.g., into a Kubernetes cluster), with customers having control over whether changes are made live and how much change is allowed; or implemented with a sample of platform data to provide a cost savings and/or performance improvement report.
- the CRM 102 is intended to represent a computer system or network of computer systems.
- a “computer system,” as used herein, may include or be implemented as a specific purpose computer system for carrying out the functionalities described in this paper.
- a computer system will include a processor, memory, non-volatile storage, and an interface.
- a typical computer system will usually include at least a processor, memory, and a device (e.g., a bus) coupling the memory to the processor.
- the processor can be, for example, a general-purpose central processing unit (CPU), such as a microprocessor, or a special-purpose processor, such as a microcontroller.
- CPU general-purpose central processing unit
- microcontroller such as a microcontroller
- Memory of a computer system includes, by way of example but not limitation, random access memory (RAM), such as dynamic RAM (DRAM) and static RAM (SRAM).
- RAM random access memory
- DRAM dynamic RAM
- SRAM static RAM
- the memory can be local, remote, or distributed.
- Non-volatile storage is often a magnetic floppy or hard disk, a magnetic-optical disk, an optical disk, a read-only memory (ROM), such as a CD-ROM, EPROM, or EEPROM, a magnetic or optical card, or another form of storage for large amounts of data.
- ROM read-only memory
- Non-volatile storage can be local, remote, or distributed, but is optional because systems can be created with all applicable data available in memory.
- Software in a computer system is typically stored in non-volatile storage. Indeed, for large programs, it may not even be possible to store the entire program in memory. For software to run, if necessary, it is moved to a computer-readable location appropriate for processing, and for illustrative purposes in this paper, that location is referred to as memory. Even when software is moved to memory for execution, a processor will typically make use of hardware registers to store values associated with the software, and a local cache that, ideally, serves to speed up execution.
- a software program is assumed to be stored at an applicable known or convenient location (from non-volatile storage to hardware registers) when the software program is referred to as “implemented in a computer-readable storage medium.”
- a processor is considered “configured to execute a program” when at least one value associated with the program is stored in a register readable by the processor.
- a computer system can be controlled by operating system software, which is a software program that includes a file management system, such as a disk operating system.
- operating system software is a software program that includes a file management system, such as a disk operating system.
- file management system is typically stored in the non-volatile storage and causes the processor to execute the various acts required by the operating system to input and output data and to store data in the memory, including storing files on the non-volatile storage.
- the bus of a computer system can couple a processor to an interface.
- Interfaces facilitate the coupling of devices and computer systems.
- Interfaces can be for input and/or output (I/O) devices, modems, or networks.
- I/O devices can include, by way of example but not limitation, a keyboard, a mouse or other pointing device, disk drives, printers, a scanner, and other I/O devices, including a display device.
- Display devices can include, by way of example but not limitation, a cathode ray tube (CRT), liquid crystal display (LCD), or some other applicable known or convenient display device.
- Modems can include, by way of example but not limitation, an analog modem, an IDSN modem, a cable modem, and other modems.
- Network interfaces can include, by way of example but not limitation, a token ring interface, a satellite transmission interface (e.g. “direct PC”), or other network interface for coupling a first computer system to a second computer system.
- An interface can be considered part of a device or computer system.
- Computer systems can be compatible with or implemented as part of or through a cloud-based computing system.
- a cloud-based computing system is a system that provides virtualized computing resources, software and/or information to client devices.
- the computing resources, software and/or information can be virtualized by maintaining centralized services and resources that the edge devices can access over a communication interface, such as a network.
- Cloud may be a marketing term and for the purposes of this paper can include any of the networks described herein.
- the cloud-based computing system can involve a subscription for services or use a utility pricing model. Users can access the protocols of the cloud-based computing system through a web browser or other container application located on their client device.
- a computer system can be implemented as an engine, as part of an engine, or through multiple engines.
- an engine includes at least two components: 1) a dedicated or shared processor or a portion thereof; 2) hardware, firmware, and/or software modules executed by the processor.
- a portion of one or more processors can include some portion of hardware less than all of the hardware comprising any given one or more processors, such as a subset of registers, the portion of the processor dedicated to one or more threads of a multi-threaded processor, a time slice during which the processor is wholly or partially dedicated to carrying out part of the engine's functionality, or the like.
- a first engine and a second engine can have one or more dedicated processors, or a first engine and a second engine can share one or more processors with one another or other engines.
- an engine can be centralized or its functionality distributed.
- An engine can include hardware, firmware, or software embodied in a computer-readable medium for execution by the processor.
- the processor transforms data into new data using implemented data structures and methods, such as is described with reference to the figures in this paper.
- Engines described in this paper, or the engines through which the systems and devices described in this paper can be implemented, can be cloud-based engines.
- a cloud-based engine is an engine that can run applications and/or functionalities using a cloud-based computing system. All or portions of the applications and/or functionalities can be distributed across multiple computing devices, and need not be restricted to only one computing device.
- the cloud-based engines can execute functionalities and/or modules that end users access through a web browser or container application without having the functionalities and/or modules installed locally on the end-users' computing devices.
- datastores are intended to include repositories having any applicable organization of data, including tables, comma-separated values (CSV) files, traditional databases (e.g., SQL), or other applicable known or convenient organizational formats.
- Datastores can be implemented, for example, as software embodied in a physical computer-readable medium on a general- or specific-purpose machine, in firmware, in hardware, in a combination thereof, or in an applicable known or convenient device or system.
- Datastore-associated components such as database interfaces, can be considered “part of” a datastore, part of some other system component, or a combination thereof, though the physical location and other characteristics of datastore-associated components is not critical for an understanding of the techniques described in this paper.
- Datastores can include data structures.
- a data structure is associated with a particular way of storing and organizing data in a computer so that it can be used efficiently within a given context.
- Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address, a bit string that can be itself stored in memory and manipulated by the program.
- Some data structures are based on computing the addresses of data items with arithmetic operations; while other data structures are based on storing addresses of data items within the structure itself.
- Many data structures use both principles, sometimes combined in non-trivial ways.
- the implementation of a data structure usually entails writing a set of procedures that create and manipulate instances of that structure.
- the datastores, described in this paper can be cloud-based datastores.
- a cloud based datastore is a datastore that is compatible with cloud-based computing systems and engines.
- the network can be an applicable communications network, such as the Internet or an infrastructure network.
- the term “Internet” as used in this paper refers to a network of networks that use certain protocols, such as the TCP/IP protocol, and possibly other protocols, such as the hypertext transfer protocol (HTTP) for hypertext markup language (HTML) documents that make up the World Wide Web (“the web”).
- HTTP hypertext transfer protocol
- a network can include, for example, a wide area network (WAN), metropolitan area network (MAN), campus area network (CAN), or local area network (LAN), but the network could at least theoretically be of an applicable size or characterized in some other fashion (e.g., personal area network (PAN) or home area network (HAN), to name a couple of alternatives).
- PAN personal area network
- HAN home area network
- Networks can include enterprise private networks and virtual private networks (collectively, private networks). As the name suggests, private networks are under the control of a single entity. Private networks can include a head office and optional regional offices (collectively, offices). Many offices enable remote users to connect to the private network offices via some other network, such as the Internet.
- the SLA metric datastore 104 is intended to represent a datastore that includes data structures representing declared performance objectives for software deployments.
- the SLA metric datastore 104 can include a service level indicator (SLI) data structure.
- SLI service level indicator
- an SLI is a measure of a service level provided by a service provider to a customer.
- SLIs form the basis of a service level objective (SLO), which in turn form the basis of an SLA.
- SLO service level objective
- the SLA metric datastore 104 can instead or in addition include an SLO data structure.
- one or both of an SLI and an SLO can be treated as an SLA metric.
- a combination of SLIs and/or SLOs could be formed into an SLA data structure, stored in the SLA metric datastore 104 , and converted into an aggregated declarative performance target.
- the declared performance objectives are on behalf of a service consumer responsible, at least in part, for a software deployment.
- Service consumers can include entities that build, provide, or host software products that require inclusion of one or more services (e.g., compute resources) for desired function and operation.
- a service consumer can include a company, an organization, an institution, a venture, a group, a person, or some other applicable entity or group of entities.
- the feedforward control system for a software orchestration platform 106 is intended to represent a system that includes engines and datastores used for proactive scaling to prepare for predicted load ahead of time, thus mitigating provisioning delay for software deployments. Proactive scaling of applications and application resources enables provisioning of resources to meet future load.
- the feedforward control system for a software orchestration platform 106 reduces the cost of running applications by reducing resource consumption; improves performance of applications by constantly resourcing and scaling applications so they can meet performance objectives both for current load and for predicted load; allows users to be confident in the SLOs they have set by having a resourcing and scaling mechanism go out and meet it; and/or reduces the manual time and effort involved analyzing applications for the purposes of resourcing and scaling them correctly.
- the feedforward control system for a software orchestration platform 106 forecasts both random and regular workloads with up to 90% accuracy; preemptive resourcing results in an average of 10 times less SLA violations. Preemptive resourcing for an application means under-provisioning is at least ameliorated and, ideally, eliminated. In a specific implementation, the feedforward control system for a software orchestration platform 106 can learn from decisions made in order to improve forecasting, modeling, resource estimation, and other applicable decisions.
- the declarative performance interface engine 108 is intended to represent an engine used in coordination with a control system type technology (described in association with other engines of the feedforward control system for a software orchestration platform 106 ) to consume target performance metrics associated with a software deployment from the SLA metric datastore 104 and provide a declarative performance target to the control system type technology to devise an actuation program to achieve the target.
- the declarative performance interface engine 108 defines references or targets, the declaration of which can be characterized as a “declarative performance” and the definition of which can be characterized as a “declarative performance objective” or a “declarative performance target.”
- the declarative performance interface engine 108 enables a human or artificial agent of a service provider (or service consumer) to define an SLA metric, such as an SLI or an SLO, for storage in the SLA metric datastore 104 as targets for which the feedforward control system for a software orchestration platform 106 resources applications.
- an SLA metric such as an SLI or an SLO
- the declarative performance interface engine 108 allows a service provider to rely upon a declared SLO (or an SLO derived from one or more declared SLIs) for an SLA offered to a service consumer.
- a declared SLO or an SLO derived from one or more declared SLIs
- implementing the declarative performance interface engine 108 and control system type technology as described in this paper resulted in an average 70% increase in SLA compliance.
- the predictive autoscaling and resource optimization operator engine 110 is intended to represent an engine that autoscales in response to predicted resource needs without over-provisioning. Resource optimization is intended to mean provisioning the minimum resources needed to meet declared performance objectives. In a specific implementation, the predictive autoscaling and resource optimization operator engine 110 makes changes automatically, recommends more cost effective SLOs, and sends alerts regarding potential performance degradation. In a specific implementation, the predictive autoscaling and resource optimization operator engine 110 is robust against both seasonal and random application load and resource signatures by using a deep learning approach sensitive to trends and seasonality, and is trained to be sensitive to leading indicators of random bursts.
- the predictive autoscaling and resource optimization operator engine 110 is easy to install with support for interchangeable metrics collection and load balancers; is able to operate on a cloud or on-prem; and can make recommendations in as little as 5 minutes (in coordination with engines and datastores of the feedforward control system for a software orchestration platform 106 ).
- the predictive autoscaling and resource optimization operator engine 110 does not interfere with a Kubernetes scheduler.
- Kubernetes is an open-source container-orchestration system for automating application deployment, scaling, and management maintained by the Cloud Native Computing Foundation. It aims to provide a “platform for automating deployment, scaling, and operations of application containers across clusters of hosts”. It works with a range of container tools, including Docker.
- Many cloud services offer a Kubernetes-based platform or infrastructure as a service (PaaS or IaaS) on which Kubernetes can be deployed as a platform-providing service.
- PaaS or IaaS Kubernetes-based platform or infrastructure as a service
- Many vendors also provide their own branded Kubernetes distributions.
- the document entitled “The Kubernetes Architectural Roadmap” by Brian Grant, Tim Hockin, and Clayton Colman, last updated Apr. 20, 2017, is incorporated herein by reference.
- the predictive autoscaling and resource optimization operator engine 110 works well with horizontal and cluster autoscalers.
- a Kubernetes horizontal pod autoscaler automatically scales the number of pods in a replication controller, deployment or replicaset based on observed CPU utilization.
- Oracle Cloud Platform allows server instances to automatically scale a cluster in or out by defining an auto-scaling rule based on CPU and/or memory utilization to determine when to add or remove nodes.
- the predictive autoscaling and resource optimization operator engine 110 receives as input 1) measured outputs from the convergent deployment, resource, and application level metrics collection engine 120 and 2) predicted outputs from the application load forecasting engine 114 . In a specific implementation, the predictive autoscaling and resource optimization operator engine 110 determines measured error from performance objectives, which is provided to the dynamics estimation engine 112 .
- the dynamics estimation engine 112 is intended to represent an engine that estimates a minimum amount of resources needed to meet service level objectives under predicted and current load.
- the dynamics estimation engine 112 models application behavior in terms of resource utilization and models performance as a response to load. By modeling an application's response and resource utilization under load, it becomes possible to enable estimations for vertical and horizontal autoscaling and, based on the modeling, estimate how many resources will be used under certain load and corresponding service indicators such as time to process request; a deployment request is set accordingly.
- the application load forecasting engine 114 provides a load estimate out to a future time. Depending upon implementation-specific factors, the future time is configurable to be as little as a minute or as much as an hour into the future. In a specific implementation, the application load forecasting engine 114 provided 91% accurate load forecasting.
- the application load forecasting engine 114 can forecast seasonal, trendy, bursty, and random load with a high degree of accuracy (at least 83% or as high as over 95%).
- signature limits can be set on a deployment according to a utilization pattern (e.g., bursty resulting in a higher limit vs. stable resulting in a lower limit).
- the minimum cost optimization engine 116 uses current load, the forecasted load from the application load forecasting engine 114 , the application behavior model from the dynamics estimation engine 112 , and declared objectives from the declarative performance interface engine 108 to find a minimum cost to run the modeled application at a performance appropriate for the declared objectives, which is a tradeoff between replicas of an application (horizontal scale) and resources (vertical scale).
- a focus on optimization for performance objectives results in cost optimization.
- the optimal configuration for scale resources actuator engine 118 is intended to represent an actuator that is unique to a problem space.
- a forecasting model is robust against different time series profiles because parametric time series models are, by their nature, tuned to one type of time series profile. These profiles include on-off workloads, bursty workloads, workloads with various trends, and workloads with different seasonality components (seconds, minutes, hours, etc.). This can be accomplished by training a model off line for forecasting against many different time series profiles; a recurrent neural network can be utilized for this purpose. The off line model is then deployed in the system and the optimal configuration for scale resources actuator engine 118 can be characterized as unique to a problem space associated with one type of time series profile.
- the optimal configuration for scale resources actuator engine 118 executes the optimal configuration for scale resources, such as number of replicas, size of resource requests, and quality of service (as defined by limits with which to kill or throttle application resource usage).
- the optimal configuration for scale resources such as number of replicas, size of resource requests, and quality of service (as defined by limits with which to kill or throttle application resource usage).
- the optimal configuration for scale resources actuator engine 118 uses heuristics unique to the scale resource such as maximum allocatable resources and oscillation damping through consensus-based recommendations.
- the optimal configuration for scale resources actuator engine 118 causes an application to be executed as one of the convergent deployments 122 .
- the convergent deployment, resource, and application level metrics collection engine 120 is intended to represent an engine that measures feedback and feedforward (forecasting) based on current performance and predicted load.
- the feedback can come in the form of system output from the optimal configuration for scale resources actuator engine 118 or the convergent deployments 122 , or in the form of other data associated with the relevant convergent deployment of the convergent deployments 122 provided through or observed on the CRM 102 .
- the feedback and feedforward is used by the predictive autoscaling and resource optimization operator engine 110 to adjust recommendations.
- the convergent deployment, resource, and application level metrics collection engine 120 monitors performance indicators and resource usage, including SLIs such as request count and request duration, and resource utilization metrics such as memory, CPU, disk I/O, and network I/O per container and pod.
- SLIs such as request count and request duration
- resource utilization metrics such as memory, CPU, disk I/O, and network I/O per container and pod.
- the convergent deployments 122 are intended to represent engines executing applications with a convergent configuration.
- a convergent configuration is one that is executed by the optimal configuration for scale resources actuator engine 118 to incorporate predictive autoscaling and resource optimization.
- the load distribution and metrics engine 124 is intended to represent an engine that designates how application load metrics are collected and configured to be distributed. In a specific implementation, the load distribution and metrics engine 124 performs load balancing on traffic to (or from) the convergent deployments 122 . Load balancing improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, CPUs, or disk drives. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. Load balancing usually involves dedicated software or hardware, such as a multilayer switch or a Domain Name System server process.
- Load balancing differs from channel bonding in that load balancing divides traffic between network interfaces on a network socket (OSI model layer 4) basis, while channel bonding implies a division of traffic between physical interfaces at a lower level, either per packet (OSI model Layer 3) or on a data link (OSI model Layer 2) basis with a protocol such as shortest path bridging.
- OSI model layer 4 a network socket
- OSI model Layer 2 a data link
- a proxy takes on the load distribution and metrics functionality in lieu of what would likely be referred to as a “load balancer.”
- load distribution and metrics engine is a more general term for application load metrics collection and configuration for distribution than load balancer, proxy, or other applicable specific application load distribution and metrics system.
- an application load metrics collection and distribution engine is used without a load balancer.
- the load distribution and metrics engine 124 can collect metrics from and configure an application proxy such as Envoy, an L7 proxy and communication bus designed for large modern service oriented architectures.
- the load distribution and metrics engine 124 can make use of other load systems, such as message queues.
- the load distribution and metrics engine 124 can be used across different workloads (instead of or in addition to network-based workloads to which a load balance caters).
- the load distribution and metrics engine 124 is informed by the optimal configuration for scale resources actuator engine 118 to balance traffic in a manner appropriate for the convergent deployments 122 .
- the convergent deployment, resource, and application level metrics collection engine 120 can also collect data from the load distribution and metrics engine 124 .
- a human or artificial agent of a service provider uses the declarative performance interface engine 108 to store an SLA metric, such as an SLI or an SLO, in the SLA metric datastore 104 .
- the agent can store an SLA metric in the SLA metric datastore 104 through an SLA metric datastore interface (not shown).
- the SLA metric datastore 104 could be referred to as an SLI datastore, an SLO datastore, or an SLA datastore.
- the declarative performance interface engine 108 converts the data structures into declarative performance objectives for consumption by the predictive autoscaling and resource optimization operator engine 110 .
- the dynamics estimation engine 112 models application behavior in terms of resource utilization and models performance as a response to load, and the application load forecasting engine 114 provides a load estimate out to a future time.
- the minimum cost optimization engine 116 uses the forecasted load from the application load forecasting engine 114 , the application behavior model from the dynamics estimation engine 112 , and declared objectives from the declarative performance interface engine 108 , to find a minimum cost to run the modeled application at a performance appropriate for the declared objectives.
- the predictive autoscaling and resource optimization operator engine 110 provides the minimum cost optimization parameters to the optimal configuration for scale resources actuator engine 118 , which executes the convergent deployments 122 and configures the load distribution and metrics engine 124 in accordance with the minimum cost optimization parameters.
- configuring the load distribution and metrics engine 124 involves making provisioned resources known to the load distribution and metrics engine 124 , which may occur as a matter of course.
- the convergent deployment, resource, and application level metrics collection engine 120 monitors channels and other resources associated with the convergent deployments 122 , which can be processed (to generate, e.g., measured outputs) and provided as feedback to the predictive autoscaling and resource optimization operator engine 110 , the dynamics estimation engine 112 , and the application load forecasting engine 114 .
- the feedback can be used to provide an initial data set or to improve upon modeling and recommendations over time.
- FIG. 2 depicts a graph 200 that compares resource provisioning pursuant to the recommendations of a reactive recommendation engine with resource provisioning pursuant to the recommendations of a predicted recommendation engine.
- the graph 200 includes a resource consumption curve 202 , a predictive provisioning curve 204 , a reactive provisioning curve 206 , a performance degradation area 208 , and a wasted cost area 210 .
- the resource consumption curve 202 is intended to represent amount of resources used (y axis) over time (x axis).
- the predictive provisioning curve 204 is intended to represent resources provisioned pursuant to recommendations of a predictive recommendation engine.
- the reactive provisioning curve 206 is intended to represent resources allocated pursuant to recommendations of a reactive recommendation engine, as an alternative to a predictive recommendation engine.
- the performance degradation area 208 is greater for the reactive provisioning curve 206 than it is for the predictive provisioning curve 204 .
- the predictive provisioning curve 204 matches or slightly exceeds the resource consumption curve 202 over the measured time period, which means there is no performance degradation for the system utilizing the predictive recommendation engine. It may be noted, performance degradation occurs when a provisioning curve is less than the resource consumption curve 202 , which means under-provisioning has occurred.
- the wasted cost 210 is greater for the reactive provisioning curve 206 than it is for the predictive provisioning curve 204 . While the predictive provisioning curve 204 exceeds the resource consumption curve 202 at most points of the graph 200 , the amount of wasted cost is substantially less than that associated with the reactive provisioning curve 206 .
- a reactive provisioning system cannot achieve correct provisioning before 5 minutes of reacting to load because, while there are scale up events before 5 minutes (e.g., with a 1-2 minute reaction time), following the curve downwards is difficult and a reactive algorithm degrades over time. In a specific implementation, correct provisioning (with provisioning insurance) takes less than 5 minutes after load.
- a reactive provisioning system cannot achieve correct provisioning before it receives metrics, calculates requests, and actuates those, it is impossible for a reactive system to act within a minute of load, which is well within the capabilities of the specific implementation.
- correct provisioning can be achieved within x minutes ahead of load or the resource being required or consumed, with x being a configurable look ahead time greater than 1 minute and less than 1 hour.
- Wasted cost occurs when a provisioning curve is more than the resource consumption curve 202 plus provisioning insurance.
- provisioning insurance can be defined as the x % likelihood a resource value will be under a provisioned amount, either as a 95% likelihood as 2 standard deviations from the mean value and/or a peak-to-mean ratio (crest factor). In a specific implementation, both of these heuristics are used.
- the 95% likelihood a resource value will be under a provisioned amount is used for requests (e.g., how many resources to request) while the crest factor is used for determining limits (e.g., how many resources an application is allowed to consume beyond the request before killing, throttling, or compressing the resource usage).
- QoS quality of service
- the difference between requests and limits in a software orchestration platform can be referred to as quality of service (QoS), which defines whether you always guarantee resources are available (i.e., requests and limits are the same) or you allow software to burst above its request as necessary when resources are available (i.e., limits are above requests).
- FIG. 3 is a diagram 300 of a total forecasted load versus total actual load chart 302 and associated code display 304 .
- the total forecasted load versus total actual load chart 302 has an x axis of seconds of a timestamp and a y axis of count per second of load. As can be seen, the predicted load curve always exceeds the request count curve by a relatively small margin (the provisioning insurance margin).
- the associated code display 304 indicates the resources include limits and requests, which were described in the preceding paragraph.
- FIG. 4 depicts a flowchart 400 of an example of a method of predictive autoscaling and resource optimization.
- the flowchart 400 starts at module 402 with converting an SLA metric data structure into a declarative performance objective.
- the SLA metric data structure can be stored in an SLA metric datastore, such as the SLA metric datastore 104 described in association with FIG. 1 .
- a declarative performance interface engine such as the declarative performance interface engine 108 described in association with FIG. 1 , can convert the SLA metric data structure into a declarative performance object.
- the flowchart 400 continues to module 404 with estimating a load forecast out to a future time.
- the load forecast may be of limited use because it amounts to little more than a guess based upon known data regarding the deployment, without the benefit of feedback related to resource utilization and performance post-deployment. It typically takes a few minutes to receive and process such feedback, at which point the load forecast can become a much more predictive estimate. Accordingly, the module 404 could be skipped as unnecessary until such time as data becomes useful for making accurate predictions.
- An application load forecasting engine such as the application load forecasting engine 114 described in association with FIG. 1 , can estimate a load forecast out to a future time.
- the flowchart 400 continues to module 406 with using the load forecast and the declarative performance objective to generate minimum cost optimization parameters.
- the forecast may be of limited use for accurately generating minimum cost optimization parameters.
- a model estimate could be provided in lieu of a performance model and an application behavior model generated in response to feedback associated with a deployment, such models are of limited value.
- the module 406 can also use the performance model and the application behavior model to generate minimum cost optimization parameters. (See the description associated with the modules 416 and 418 below.)
- a minimum cost optimization engine such as the minimum cost optimization engine 116 described in association with FIG. 1 , can use the load forecast, the performance model (if applicable), the application behavior model (if applicable), and the declarative performance objective to generate minimum cost optimization parameters.
- the flowchart 400 continues to module 408 with executing convergent deployments in accordance with the minimum cost optimization parameters.
- An optimal configuration for scale resources actuator engine such as the optimal configuration for scale resources actuator engine 118 described in association with FIG. 1 , can execute convergent deployments in accordance with the minimum cost optimization parameters.
- the flowchart 400 continues in parallel to module 410 with configuring a load distribution and metrics engine in accordance with the minimum cost optimization parameters.
- module 410 can be configured for parallel execution with another, the modules 408 and 410 are relatively likely to be carried out in parallel, so the illustration is made rather explicit. Of course, the modules could be rearranged for serial processing.
- An optimal configuration for scale resources actuator engine such as the optimal configuration for scale resources actuator engine 118 described in association with FIG. 1 , can configure a load distribution and metrics engine in accordance with the minimum cost optimization parameters.
- a convergent deployment and resource metrics collection engine can monitor resources (including channels) associated with the convergent deployments.
- the flowchart 400 continues to module 414 with providing feedback associated with the convergent deployments.
- a convergent deployment and resource metrics collection engine such as the convergent deployment, resource, and application level metrics collection engine 120 described in association with FIG. 1 , can provide feedback associated with the convergent deployments.
- the flowchart 400 returns to module 404 and continues as described previously and also (in parallel) continues to module 416 with modeling application behavior in terms of resource utilization.
- the modeling of application behavior requires a combined total of up to approximately 5 minutes to receive, process, and perform machine learning on feedback from module 414 .
- the module 416 does is not introduced as quickly as module 404 , though a model “stand-in” could be used.
- the flowchart 400 could loop multiple times through other modules before module 416 completes.
- a dynamics estimation engine such as the dynamics estimation engine 112 described in association with FIG. 1 , can model application behavior in terms of resource utilization. From module 416 , the flowchart 400 returns to module 406 and continues as described previously.
- the flowchart 400 also continues to module 418 from module 414 with modeling performance as a response to load.
- the modeling of application behavior requires a combined total of up to approximately 5 minutes to receive, process, and perform machine learning on feedback from module 414 .
- the module 418 does is not introduced as quickly as module 404 , though a model “stand-in” could be used.
- the flowchart 400 could loop multiple times through other modules before module 418 completes.
- a dynamics estimation engine such as the dynamics estimation engine 112 described in association with FIG. 1 , can model performance as a response to load. From module 418 , the flowchart 400 returns to module 406 and continues as described previously.
- modules 404 , 416 , and 418 can be processed in parallel, though one or more of the modules can, for practical purposes, be skipped in a second loop from module 414 to module 404 , 416 , and module 418 if no updates to a model or forecast are made relative to the model or forecast from a first loop.
- the modules 404 , 416 , and 418 could also be rearranged for serial processing.
- the modules 404 , 416 , and 418 could conceivably come before the module 402 if a deployment is made without SLA metric data, which is provided later in the process.
- the module 402 could be repeated if declarative performance objectives change (not shown).
- FIG. 5 depicts a flowchart 500 of an example of generating predictive autoscaling and resource optimization results in association with a machine learning process.
- the flowchart 500 starts at module 502 with monitoring performance indicators and resource usage.
- a human or artificial agent of a service provider (or service consumer) can provide new performance indicators following, for example, a review of convergent deployment performance.
- a declarative performance interface engine such as the declarative performance interface engine 108 described in association with FIG. 1 , can monitor performance indicators.
- a convergent deployment and resource metrics collection engine such as the convergent deployment, resource, and application level metrics collection engine 120 described in association with FIG. 1 , can monitor resource usage.
- the flowchart 500 continues to module 504 with forecasting application load and seasonality.
- Seasonality can be illustrated in association with a use case, which is, in this example, a shoe company e-commerce deployment.
- Successful shoe company e-commerce deployments typically have stable traffic with some seasonality at Black Friday and during the holidays, plus some seemingly random spikes (e.g., when new shoes are released).
- reactive autoscaling is suboptimal at these times.
- Systems engineers will manually over-provision so SLOs are met.
- a predictive autoscaling and resource optimization system, as described in this paper, is able to learn these seasonalities and provision a correct amount of resources (with provisioning insurance) for these events without manual intervention.
- An application load forecasting engine such as the application load forecasting engine 114 described in association with FIG. 1 , can forecast application load and seasonality.
- the flowchart 500 continues to module 506 with learning a behavior function of an application under load.
- a dynamics estimation engine such as the dynamics estimation engine 112 described in association with FIG. 1 , can learn a behavior function of an application under load.
- the flowchart 500 continues to module 508 with estimating resources used at forecasted demand for resource requests.
- An application load forecasting engine such as the application load forecasting engine 114 described in association with FIG. 1 , can estimate resources used at forecasted demand for resource requests.
- the flowchart 500 continues to module 510 with estimating a forecast pattern for setting resource limits.
- An application load forecasting engine such as the application load forecasting engine 114 described in association with FIG. 1 , can estimate a forecast pattern for setting resource limits.
- the flowchart 500 continues to module 512 with minimizing resources needed to meet the forecasted demand.
- a minimum cost optimization engine such as the minimum cost optimization engine 116 described in association with FIG. 1 , can minimize resources needed to meet the forecasted demand.
- the flowchart 500 ends at module 514 with learning from decisions made in order to improve forecasting and resource estimation.
- An optimal configuration for scale resources actuator engine such as the optimal configuration for scale resources actuator engine 118 described in association with FIG. 1 , can benefit in convergent deployment from learning from decisions made in order to improve forecasting and resource estimation.
- FIG. 6 depicts a diagram 600 of an example of a system for generating minimum cost optimization parameters.
- the diagram 600 includes an SLA metrics datastore 604 , which may be implemented as the SLA metrics datastore 104 described in association with FIG. 1 ; a declarative performance interface engine 608 coupled to the SLA metric datastore 604 and which may be implemented as the declarative performance interface engine 108 described in association with FIG. 1 ; a dynamics estimation engine 612 , which may be implemented as the dynamics estimation engine 112 described in association with FIG. 1 ; an application load forecasting engine 614 , which may be implemented as the application load forecasting engine 114 described in association with FIG. 1 ; a minimum cost optimization engine 616 , which may be implemented as the minimum cost optimization engine 116 described in association with FIG.
- a declarative performance datastore 626 coupled to the declarative performance interface engine 608 and the minimum cost optimization engine 616 ; a behavior model datastore 628 coupled to the dynamics estimation engine 612 and the minimum cost optimization engine 616 ; a performance model datastore 630 coupled to the dynamics estimation engine 612 and the minimum cost optimization engine 616 ; a convergent deployment and resource metrics datastore 632 coupled to the dynamics estimation engine 612 and the application load forecasting engine 614 ; a utilization pattern learning engine 634 coupled to the convergent deployment and resource metric datastore 632 ; a forecasting model datastore 636 coupled to the application load forecasting engine 614 and the utilization pattern learning engine 634 ; a forecasted load datastore 638 coupled to the application load forecasting engine 614 and the minimum cost optimization engine 616 ; and a minimum cost optimization parameters datastore 640 coupled to the minimum cost optimization engine 616 .
- the declarative performance interface engine 608 converts SLA metrics from the SLA metric datastore 604 to declarative performance data structures represented by the declarative performance datastore 626 .
- the declarative performance interface engine 608 may or may not receive instructions from a human or artificial agent of a service provider (or service consumer) to populate the SLA metric datastore 604 . If the SLA metric datastore 604 is modified, the declarative performance engine 608 converts the modification so as to match an intended SLO represented in the SLA metric datastore 604 with a declarative performance data structure in the declarative performance datastore 626 .
- the dynamics estimation engine 608 uses machine learning techniques, such as deep learning, to generate a behavior model, which is represented by the behavior model datastore 628 and to generate a performance model, which is represented by the performance model datastore 630 .
- the models can be improved with feedback associated with applicable convergent deployments. Such feedback is represented by the convergent deployment and resource metrics datastore 632 .
- the convergent deployment and resource metric datastore 632 can be populated by a convergent deployment and resource metrics collection engine (not shown), which may be implemented as the convergent deployment, resource, and application level metrics collection engine 120 described in association with FIG. 1 .
- the utilization pattern learning engine 634 uses deep learning to understand workload to generate models for seasonal load, trendy load, bursty load, and random load. Based on a request and an understanding of workload, signature limits can be set on a deployment according to a utilization pattern (e.g., bursty resulting in a higher limit vs. stable resulting in a lower limit).
- the result of the deep learning is a forecasting model represented by the forecasting model datastore 636 .
- the forecasting model datastore 636 can be improved with feedback associated with applicable convergent deployments. Such feedback is represented by the convergent deployment and resource metrics datastore 632 .
- the application load forecasting engine 614 uses one or more forecasting models from the forecasting model datastore 636 and feedback from the convergent deployment and resource metrics datastore 632 to estimate resource usage at a future time; this forecasted load is represented by the forecasted load datastore 638 .
- the minimum cost optimization engine 616 uses the declarative performance datastore 626 , the behavior model datastore 628 , the performance model datastore 630 , and the forecasted load datastore 638 to generate minimum cost optimization parameters, which are represented by the minimum cost optimization parameters datastore 640 .
- the minimum cost optimization parameters can be used by a software deployment platform that can include, for example, an optimal configuration for scale resources actuator engine (not shown), which may be implemented as the optimal configuration for scale resources actuator engine 118 described in association with FIG. 1 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Debugging And Monitoring (AREA)
Abstract
Techniques for predictive autoscaling and resource optimization of software deployments. In an implementation, users declare performance objectives, and machine learning of application behavior and load profile is used to determine minimum cost resourcing to meet the declared performance objectives. In an embodiment, convergent deployments are monitored and related feedback is provided to improve forecasting, behavior modeling, and resource estimation over time.
Description
- Capacity planning and cost optimization for software operations are areas of ongoing research and development. Over-provisioning leads to resource waste and extra cost, yet the industry standard is an average of 80-93% over-provisioned. Under-provisioning causes performance degradation and violation of SLAs. Research shows that performance degradation in web applications can result in up to 75% increase in churn. “Preliminary results . . . of cloud service availability show an average of 7.738 hours unavailable per year or 99.91% availability . . . The cost of these failures amounts for almost 285 million USDs based on hourly costs accepted in industry.” Downtime Statistics of Current Cloud Solutions (Updated version—March 2014) by Cérin et al.
- Manually determining capacity is practically always wrong due to the dynamic nature of resource utilization and application load. Reactive autoscaling, by definition, fails to meet load ahead of time. Threshold-based auto scaling requires significant work and fails to align with defined service level objectives even when using custom application level metrics. At best an 80% utilization threshold results in 20% under-capacity. Research shows that threshold-based autoscalers fail to adapt to changing workloads.
- Techniques to address these and other deficiencies associated with capacity planning and cost optimization are desirable.
- Disclosed is a cost and performance management solution for software resourcing and scaling. In a specific implementation, the system operates on software deployment orchestration platforms such as Kubernetes that expose application and resource metrics as well as provide standard scaling and resourcing mechanisms. Users declare performance objectives and the system learns application behavior and load profile to determine minimum cost resourcing to meet the declared performance objectives.
-
FIG. 1 depicts a diagram of an example of a predictive autoscaling and resource optimization system. -
FIG. 2 depicts a graph that compares resource provisioning pursuant to the recommendations of a reactive recommendation engine with resource provisioning pursuant to the recommendations of a predicted recommendation engine. -
FIG. 3 is a diagram of a total forecasted load versus total actual load chart and associated code display. -
FIG. 4 depicts a flowchart of an example of a method of predictive autoscaling and resource optimization. -
FIG. 5 depicts a flowchart of an example of generating predictive autoscaling and resource optimization results in association with a machine learning process. -
FIG. 6 depicts a diagram of an example of a system for generating minimum cost optimization parameters. -
FIG. 1 depicts a diagram 100 of an example of a predictive autoscaling and resource optimization system. As used in this paper, resources can be characterized as central processing unit (CPU), memory, network input/output (I/O), disk I/O, graphics processing unit (GPU), and/or other applicable resources. The diagram 100 includes a computer-readable medium (CRM) 102, a service level agreement (SLA)metric datastore 104 coupled to theCRM 102, a feedforward control system for asoftware orchestration platform 106 coupled to theCRM 102,convergent deployments 122 coupled to theCRM 102, and a load distribution and metrics engine 124 coupled to theCRM 102. The feedforward control system for asoftware orchestration platform 106 includes a declarativeperformance interface engine 108, a predictive autoscaling and resourceoptimization operator engine 110, adynamics estimation engine 112, an applicationload forecasting engine 114, a minimumcost optimization engine 116, an optimal configuration for scaleresources actuator engine 118, and a convergent deployment, resource, and application levelmetrics collection engine 120. A predictive autoscaling and resource optimization system can be fully implemented; implemented as a staged integration (e.g., into a Kubernetes cluster), with customers having control over whether changes are made live and how much change is allowed; or implemented with a sample of platform data to provide a cost savings and/or performance improvement report. - The
CRM 102 is intended to represent a computer system or network of computer systems. A “computer system,” as used herein, may include or be implemented as a specific purpose computer system for carrying out the functionalities described in this paper. In general, a computer system will include a processor, memory, non-volatile storage, and an interface. A typical computer system will usually include at least a processor, memory, and a device (e.g., a bus) coupling the memory to the processor. The processor can be, for example, a general-purpose central processing unit (CPU), such as a microprocessor, or a special-purpose processor, such as a microcontroller. - Memory of a computer system includes, by way of example but not limitation, random access memory (RAM), such as dynamic RAM (DRAM) and static RAM (SRAM). The memory can be local, remote, or distributed. Non-volatile storage is often a magnetic floppy or hard disk, a magnetic-optical disk, an optical disk, a read-only memory (ROM), such as a CD-ROM, EPROM, or EEPROM, a magnetic or optical card, or another form of storage for large amounts of data. During execution of software, some of this data is often written, by a direct memory access process, into memory by way of a bus coupled to non-volatile storage. Non-volatile storage can be local, remote, or distributed, but is optional because systems can be created with all applicable data available in memory.
- Software in a computer system is typically stored in non-volatile storage. Indeed, for large programs, it may not even be possible to store the entire program in memory. For software to run, if necessary, it is moved to a computer-readable location appropriate for processing, and for illustrative purposes in this paper, that location is referred to as memory. Even when software is moved to memory for execution, a processor will typically make use of hardware registers to store values associated with the software, and a local cache that, ideally, serves to speed up execution. As used herein, a software program is assumed to be stored at an applicable known or convenient location (from non-volatile storage to hardware registers) when the software program is referred to as “implemented in a computer-readable storage medium.” A processor is considered “configured to execute a program” when at least one value associated with the program is stored in a register readable by the processor.
- In one example of operation, a computer system can be controlled by operating system software, which is a software program that includes a file management system, such as a disk operating system. One example of operating system software with associated file management system software is the family of operating systems known as Windows from Microsoft Corporation of Redmond, Wash., and their associated file management systems. Another example of operating system software with its associated file management system software is the Linux operating system and its associated file management system. The file management system is typically stored in the non-volatile storage and causes the processor to execute the various acts required by the operating system to input and output data and to store data in the memory, including storing files on the non-volatile storage.
- The bus of a computer system can couple a processor to an interface. Interfaces facilitate the coupling of devices and computer systems. Interfaces can be for input and/or output (I/O) devices, modems, or networks. I/O devices can include, by way of example but not limitation, a keyboard, a mouse or other pointing device, disk drives, printers, a scanner, and other I/O devices, including a display device. Display devices can include, by way of example but not limitation, a cathode ray tube (CRT), liquid crystal display (LCD), or some other applicable known or convenient display device. Modems can include, by way of example but not limitation, an analog modem, an IDSN modem, a cable modem, and other modems. Network interfaces can include, by way of example but not limitation, a token ring interface, a satellite transmission interface (e.g. “direct PC”), or other network interface for coupling a first computer system to a second computer system. An interface can be considered part of a device or computer system.
- Computer systems can be compatible with or implemented as part of or through a cloud-based computing system. As used in this paper, a cloud-based computing system is a system that provides virtualized computing resources, software and/or information to client devices. The computing resources, software and/or information can be virtualized by maintaining centralized services and resources that the edge devices can access over a communication interface, such as a network. “Cloud” may be a marketing term and for the purposes of this paper can include any of the networks described herein. The cloud-based computing system can involve a subscription for services or use a utility pricing model. Users can access the protocols of the cloud-based computing system through a web browser or other container application located on their client device.
- A computer system can be implemented as an engine, as part of an engine, or through multiple engines. As used in this paper, an engine includes at least two components: 1) a dedicated or shared processor or a portion thereof; 2) hardware, firmware, and/or software modules executed by the processor. A portion of one or more processors can include some portion of hardware less than all of the hardware comprising any given one or more processors, such as a subset of registers, the portion of the processor dedicated to one or more threads of a multi-threaded processor, a time slice during which the processor is wholly or partially dedicated to carrying out part of the engine's functionality, or the like. As such, a first engine and a second engine can have one or more dedicated processors, or a first engine and a second engine can share one or more processors with one another or other engines. Depending upon implementation-specific or other considerations, an engine can be centralized or its functionality distributed. An engine can include hardware, firmware, or software embodied in a computer-readable medium for execution by the processor. The processor transforms data into new data using implemented data structures and methods, such as is described with reference to the figures in this paper.
- Engines described in this paper, or the engines through which the systems and devices described in this paper can be implemented, can be cloud-based engines. As used in this paper, a cloud-based engine is an engine that can run applications and/or functionalities using a cloud-based computing system. All or portions of the applications and/or functionalities can be distributed across multiple computing devices, and need not be restricted to only one computing device. In some embodiments, the cloud-based engines can execute functionalities and/or modules that end users access through a web browser or container application without having the functionalities and/or modules installed locally on the end-users' computing devices.
- As used in this paper, datastores are intended to include repositories having any applicable organization of data, including tables, comma-separated values (CSV) files, traditional databases (e.g., SQL), or other applicable known or convenient organizational formats. Datastores can be implemented, for example, as software embodied in a physical computer-readable medium on a general- or specific-purpose machine, in firmware, in hardware, in a combination thereof, or in an applicable known or convenient device or system. Datastore-associated components, such as database interfaces, can be considered “part of” a datastore, part of some other system component, or a combination thereof, though the physical location and other characteristics of datastore-associated components is not critical for an understanding of the techniques described in this paper.
- Datastores can include data structures. As used in this paper, a data structure is associated with a particular way of storing and organizing data in a computer so that it can be used efficiently within a given context. Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address, a bit string that can be itself stored in memory and manipulated by the program. Thus, some data structures are based on computing the addresses of data items with arithmetic operations; while other data structures are based on storing addresses of data items within the structure itself. Many data structures use both principles, sometimes combined in non-trivial ways. The implementation of a data structure usually entails writing a set of procedures that create and manipulate instances of that structure. The datastores, described in this paper, can be cloud-based datastores. A cloud based datastore is a datastore that is compatible with cloud-based computing systems and engines.
- Assuming a CRM includes a network, the network can be an applicable communications network, such as the Internet or an infrastructure network. The term “Internet” as used in this paper refers to a network of networks that use certain protocols, such as the TCP/IP protocol, and possibly other protocols, such as the hypertext transfer protocol (HTTP) for hypertext markup language (HTML) documents that make up the World Wide Web (“the web”). More generally, a network can include, for example, a wide area network (WAN), metropolitan area network (MAN), campus area network (CAN), or local area network (LAN), but the network could at least theoretically be of an applicable size or characterized in some other fashion (e.g., personal area network (PAN) or home area network (HAN), to name a couple of alternatives). Networks can include enterprise private networks and virtual private networks (collectively, private networks). As the name suggests, private networks are under the control of a single entity. Private networks can include a head office and optional regional offices (collectively, offices). Many offices enable remote users to connect to the private network offices via some other network, such as the Internet.
- Referring once again to the example of
FIG. 1 , the SLAmetric datastore 104 is intended to represent a datastore that includes data structures representing declared performance objectives for software deployments. The SLAmetric datastore 104 can include a service level indicator (SLI) data structure. In information technology, an SLI is a measure of a service level provided by a service provider to a customer. SLIs form the basis of a service level objective (SLO), which in turn form the basis of an SLA. The SLAmetric datastore 104 can instead or in addition include an SLO data structure. As such, as used in this paper, one or both of an SLI and an SLO can be treated as an SLA metric. Although it is assumed for illustrative purposes that granular SLIs or SLOs are converted into declarative performance targets, a combination of SLIs and/or SLOs could be formed into an SLA data structure, stored in the SLAmetric datastore 104, and converted into an aggregated declarative performance target. In a specific implementation, the declared performance objectives are on behalf of a service consumer responsible, at least in part, for a software deployment. Service consumers can include entities that build, provide, or host software products that require inclusion of one or more services (e.g., compute resources) for desired function and operation. For example, a service consumer can include a company, an organization, an institution, a venture, a group, a person, or some other applicable entity or group of entities. - The feedforward control system for a
software orchestration platform 106 is intended to represent a system that includes engines and datastores used for proactive scaling to prepare for predicted load ahead of time, thus mitigating provisioning delay for software deployments. Proactive scaling of applications and application resources enables provisioning of resources to meet future load. In various embodiments, the feedforward control system for asoftware orchestration platform 106 reduces the cost of running applications by reducing resource consumption; improves performance of applications by constantly resourcing and scaling applications so they can meet performance objectives both for current load and for predicted load; allows users to be confident in the SLOs they have set by having a resourcing and scaling mechanism go out and meet it; and/or reduces the manual time and effort involved analyzing applications for the purposes of resourcing and scaling them correctly. In a specific implementation, the feedforward control system for asoftware orchestration platform 106 forecasts both random and regular workloads with up to 90% accuracy; preemptive resourcing results in an average of 10 times less SLA violations. Preemptive resourcing for an application means under-provisioning is at least ameliorated and, ideally, eliminated. In a specific implementation, the feedforward control system for asoftware orchestration platform 106 can learn from decisions made in order to improve forecasting, modeling, resource estimation, and other applicable decisions. - The declarative
performance interface engine 108 is intended to represent an engine used in coordination with a control system type technology (described in association with other engines of the feedforward control system for a software orchestration platform 106) to consume target performance metrics associated with a software deployment from the SLAmetric datastore 104 and provide a declarative performance target to the control system type technology to devise an actuation program to achieve the target. In a specific implementation, the declarativeperformance interface engine 108 defines references or targets, the declaration of which can be characterized as a “declarative performance” and the definition of which can be characterized as a “declarative performance objective” or a “declarative performance target.” The declarativeperformance interface engine 108 enables a human or artificial agent of a service provider (or service consumer) to define an SLA metric, such as an SLI or an SLO, for storage in the SLAmetric datastore 104 as targets for which the feedforward control system for asoftware orchestration platform 106 resources applications. By declaring performance, human agents are not required to exert effort manually analyzing their applications, configuring their applications, and resourcing their applications each time a software or application load changes; and artificial agents need not be AIs. Advantageously, the declarativeperformance interface engine 108 allows a service provider to rely upon a declared SLO (or an SLO derived from one or more declared SLIs) for an SLA offered to a service consumer. In a specific implementation, implementing the declarativeperformance interface engine 108 and control system type technology as described in this paper resulted in an average 70% increase in SLA compliance. - The predictive autoscaling and resource
optimization operator engine 110 is intended to represent an engine that autoscales in response to predicted resource needs without over-provisioning. Resource optimization is intended to mean provisioning the minimum resources needed to meet declared performance objectives. In a specific implementation, the predictive autoscaling and resourceoptimization operator engine 110 makes changes automatically, recommends more cost effective SLOs, and sends alerts regarding potential performance degradation. In a specific implementation, the predictive autoscaling and resourceoptimization operator engine 110 is robust against both seasonal and random application load and resource signatures by using a deep learning approach sensitive to trends and seasonality, and is trained to be sensitive to leading indicators of random bursts. In a specific implementation, the predictive autoscaling and resourceoptimization operator engine 110 is easy to install with support for interchangeable metrics collection and load balancers; is able to operate on a cloud or on-prem; and can make recommendations in as little as 5 minutes (in coordination with engines and datastores of the feedforward control system for a software orchestration platform 106). - In a specific implementation, the predictive autoscaling and resource
optimization operator engine 110 does not interfere with a Kubernetes scheduler. Kubernetes is an open-source container-orchestration system for automating application deployment, scaling, and management maintained by the Cloud Native Computing Foundation. It aims to provide a “platform for automating deployment, scaling, and operations of application containers across clusters of hosts”. It works with a range of container tools, including Docker. Many cloud services offer a Kubernetes-based platform or infrastructure as a service (PaaS or IaaS) on which Kubernetes can be deployed as a platform-providing service. Many vendors also provide their own branded Kubernetes distributions. The document entitled “The Kubernetes Architectural Roadmap” by Brian Grant, Tim Hockin, and Clayton Colman, last updated Apr. 20, 2017, is incorporated herein by reference. - In a specific implementation, the predictive autoscaling and resource
optimization operator engine 110 works well with horizontal and cluster autoscalers. For example, a Kubernetes horizontal pod autoscaler automatically scales the number of pods in a replication controller, deployment or replicaset based on observed CPU utilization. As another example, Oracle Cloud Platform allows server instances to automatically scale a cluster in or out by defining an auto-scaling rule based on CPU and/or memory utilization to determine when to add or remove nodes. - In a specific implementation, the predictive autoscaling and resource
optimization operator engine 110 receives as input 1) measured outputs from the convergent deployment, resource, and application levelmetrics collection engine 120 and 2) predicted outputs from the applicationload forecasting engine 114. In a specific implementation, the predictive autoscaling and resourceoptimization operator engine 110 determines measured error from performance objectives, which is provided to thedynamics estimation engine 112. - The
dynamics estimation engine 112 is intended to represent an engine that estimates a minimum amount of resources needed to meet service level objectives under predicted and current load. In a specific implementation, thedynamics estimation engine 112 models application behavior in terms of resource utilization and models performance as a response to load. By modeling an application's response and resource utilization under load, it becomes possible to enable estimations for vertical and horizontal autoscaling and, based on the modeling, estimate how many resources will be used under certain load and corresponding service indicators such as time to process request; a deployment request is set accordingly. - The application
load forecasting engine 114 provides a load estimate out to a future time. Depending upon implementation-specific factors, the future time is configurable to be as little as a minute or as much as an hour into the future. In a specific implementation, the applicationload forecasting engine 114 provided 91% accurate load forecasting. Advantageously, by predicting incoming application load in advance, it is possible to scale up before load events happen, even across a wide variety of workloads. For example, using a deep learning approach that has been proven to generalize across a wide variety of workloads, the applicationload forecasting engine 114 can forecast seasonal, trendy, bursty, and random load with a high degree of accuracy (at least 83% or as high as over 95%). Based on a request and an understanding of workload, signature limits can be set on a deployment according to a utilization pattern (e.g., bursty resulting in a higher limit vs. stable resulting in a lower limit). - The minimum
cost optimization engine 116 uses current load, the forecasted load from the applicationload forecasting engine 114, the application behavior model from thedynamics estimation engine 112, and declared objectives from the declarativeperformance interface engine 108 to find a minimum cost to run the modeled application at a performance appropriate for the declared objectives, which is a tradeoff between replicas of an application (horizontal scale) and resources (vertical scale). A focus on optimization for performance objectives results in cost optimization. Advantageously, in a specific implementation, this results in an average of up to 80% cost savings compared to systems without such focus. - The optimal configuration for scale
resources actuator engine 118 is intended to represent an actuator that is unique to a problem space. In a specific implementation, a forecasting model is robust against different time series profiles because parametric time series models are, by their nature, tuned to one type of time series profile. These profiles include on-off workloads, bursty workloads, workloads with various trends, and workloads with different seasonality components (seconds, minutes, hours, etc.). This can be accomplished by training a model off line for forecasting against many different time series profiles; a recurrent neural network can be utilized for this purpose. The off line model is then deployed in the system and the optimal configuration for scaleresources actuator engine 118 can be characterized as unique to a problem space associated with one type of time series profile. - The optimal configuration for scale
resources actuator engine 118 executes the optimal configuration for scale resources, such as number of replicas, size of resource requests, and quality of service (as defined by limits with which to kill or throttle application resource usage). By understanding an application under load, the number of replicas and virtual machine (VM) instance types (in the case of network and disk bound applications) selected for meeting forecasted demand according to performance objectives at minimum cost can be optimized, thus minimizing resources needed to meet forecasted demand. - In a specific implementation, the optimal configuration for scale
resources actuator engine 118 uses heuristics unique to the scale resource such as maximum allocatable resources and oscillation damping through consensus-based recommendations. The optimal configuration for scaleresources actuator engine 118 causes an application to be executed as one of theconvergent deployments 122. - The convergent deployment, resource, and application level
metrics collection engine 120 is intended to represent an engine that measures feedback and feedforward (forecasting) based on current performance and predicted load. The feedback can come in the form of system output from the optimal configuration for scaleresources actuator engine 118 or theconvergent deployments 122, or in the form of other data associated with the relevant convergent deployment of theconvergent deployments 122 provided through or observed on theCRM 102. The feedback and feedforward is used by the predictive autoscaling and resourceoptimization operator engine 110 to adjust recommendations. In a specific implementation, the convergent deployment, resource, and application levelmetrics collection engine 120 monitors performance indicators and resource usage, including SLIs such as request count and request duration, and resource utilization metrics such as memory, CPU, disk I/O, and network I/O per container and pod. - The
convergent deployments 122 are intended to represent engines executing applications with a convergent configuration. A convergent configuration is one that is executed by the optimal configuration for scaleresources actuator engine 118 to incorporate predictive autoscaling and resource optimization. - The load distribution and metrics engine 124 is intended to represent an engine that designates how application load metrics are collected and configured to be distributed. In a specific implementation, the load distribution and metrics engine 124 performs load balancing on traffic to (or from) the
convergent deployments 122. Load balancing improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, CPUs, or disk drives. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. Load balancing usually involves dedicated software or hardware, such as a multilayer switch or a Domain Name System server process. Load balancing differs from channel bonding in that load balancing divides traffic between network interfaces on a network socket (OSI model layer 4) basis, while channel bonding implies a division of traffic between physical interfaces at a lower level, either per packet (OSI model Layer 3) or on a data link (OSI model Layer 2) basis with a protocol such as shortest path bridging. However, channel bonding is treated as load balancing in this paper. In an alternative, a proxy takes on the load distribution and metrics functionality in lieu of what would likely be referred to as a “load balancer.” - Where it matters for the purpose of distinction in this paper, “load distribution and metrics engine” is a more general term for application load metrics collection and configuration for distribution than load balancer, proxy, or other applicable specific application load distribution and metrics system. Indeed, in a specific implementation, an application load metrics collection and distribution engine is used without a load balancer. For example, the load distribution and metrics engine 124 can collect metrics from and configure an application proxy such as Envoy, an L7 proxy and communication bus designed for large modern service oriented architectures. As another example, the load distribution and metrics engine 124 can make use of other load systems, such as message queues. In general, the load distribution and metrics engine 124 can be used across different workloads (instead of or in addition to network-based workloads to which a load balance caters).
- The load distribution and metrics engine 124 is informed by the optimal configuration for scale
resources actuator engine 118 to balance traffic in a manner appropriate for theconvergent deployments 122. The convergent deployment, resource, and application levelmetrics collection engine 120 can also collect data from the load distribution and metrics engine 124. - In an example of operation, a human or artificial agent of a service provider (or service consumer) uses the declarative
performance interface engine 108 to store an SLA metric, such as an SLI or an SLO, in the SLAmetric datastore 104. In an alternative, the agent can store an SLA metric in the SLAmetric datastore 104 through an SLA metric datastore interface (not shown). Depending upon what is stored, the SLAmetric datastore 104 could be referred to as an SLI datastore, an SLO datastore, or an SLA datastore. - Continuing this example of operation, the declarative
performance interface engine 108 converts the data structures into declarative performance objectives for consumption by the predictive autoscaling and resourceoptimization operator engine 110. When applicable data becomes available from the convergent deployment, resource, and application levelmetrics collection engine 120, thedynamics estimation engine 112 models application behavior in terms of resource utilization and models performance as a response to load, and the applicationload forecasting engine 114 provides a load estimate out to a future time. - Continuing this example of operation, the minimum
cost optimization engine 116 uses the forecasted load from the applicationload forecasting engine 114, the application behavior model from thedynamics estimation engine 112, and declared objectives from the declarativeperformance interface engine 108, to find a minimum cost to run the modeled application at a performance appropriate for the declared objectives. The predictive autoscaling and resourceoptimization operator engine 110 provides the minimum cost optimization parameters to the optimal configuration for scaleresources actuator engine 118, which executes theconvergent deployments 122 and configures the load distribution and metrics engine 124 in accordance with the minimum cost optimization parameters. In a specific implementation, configuring the load distribution and metrics engine 124 involves making provisioned resources known to the load distribution and metrics engine 124, which may occur as a matter of course. - Continuing this example of operation, the convergent deployment, resource, and application level
metrics collection engine 120 monitors channels and other resources associated with theconvergent deployments 122, which can be processed (to generate, e.g., measured outputs) and provided as feedback to the predictive autoscaling and resourceoptimization operator engine 110, thedynamics estimation engine 112, and the applicationload forecasting engine 114. The feedback can be used to provide an initial data set or to improve upon modeling and recommendations over time. -
FIG. 2 depicts agraph 200 that compares resource provisioning pursuant to the recommendations of a reactive recommendation engine with resource provisioning pursuant to the recommendations of a predicted recommendation engine. Thegraph 200 includes aresource consumption curve 202, apredictive provisioning curve 204, areactive provisioning curve 206, aperformance degradation area 208, and a wastedcost area 210. Theresource consumption curve 202 is intended to represent amount of resources used (y axis) over time (x axis). Thepredictive provisioning curve 204 is intended to represent resources provisioned pursuant to recommendations of a predictive recommendation engine. Thereactive provisioning curve 206 is intended to represent resources allocated pursuant to recommendations of a reactive recommendation engine, as an alternative to a predictive recommendation engine. - As the
graph 200 illustrates, theperformance degradation area 208 is greater for thereactive provisioning curve 206 than it is for thepredictive provisioning curve 204. Indeed, thepredictive provisioning curve 204 matches or slightly exceeds theresource consumption curve 202 over the measured time period, which means there is no performance degradation for the system utilizing the predictive recommendation engine. It may be noted, performance degradation occurs when a provisioning curve is less than theresource consumption curve 202, which means under-provisioning has occurred. - As the
graph 200 illustrates, the wastedcost 210 is greater for thereactive provisioning curve 206 than it is for thepredictive provisioning curve 204. While thepredictive provisioning curve 204 exceeds theresource consumption curve 202 at most points of thegraph 200, the amount of wasted cost is substantially less than that associated with thereactive provisioning curve 206. A reactive provisioning system cannot achieve correct provisioning before 5 minutes of reacting to load because, while there are scale up events before 5 minutes (e.g., with a 1-2 minute reaction time), following the curve downwards is difficult and a reactive algorithm degrades over time. In a specific implementation, correct provisioning (with provisioning insurance) takes less than 5 minutes after load. Because a reactive provisioning system cannot achieve correct provisioning before it receives metrics, calculates requests, and actuates those, it is impossible for a reactive system to act within a minute of load, which is well within the capabilities of the specific implementation. Advantageously, in this specific implementation, which has a correctly configured predictive system, correct provisioning can be achieved within x minutes ahead of load or the resource being required or consumed, with x being a configurable look ahead time greater than 1 minute and less than 1 hour. - Wasted cost occurs when a provisioning curve is more than the
resource consumption curve 202 plus provisioning insurance. In simplistic terms, wasted cost of less than x % of resource consumption over a minute can be referred to as provisioning insurance, which is desirable in many instances to ensure under-provisioning does not occur. Provisioning insurance can be defined as the x % likelihood a resource value will be under a provisioned amount, either as a 95% likelihood as 2 standard deviations from the mean value and/or a peak-to-mean ratio (crest factor). In a specific implementation, both of these heuristics are used. The 95% likelihood a resource value will be under a provisioned amount is used for requests (e.g., how many resources to request) while the crest factor is used for determining limits (e.g., how many resources an application is allowed to consume beyond the request before killing, throttling, or compressing the resource usage). The difference between requests and limits in a software orchestration platform can be referred to as quality of service (QoS), which defines whether you always guarantee resources are available (i.e., requests and limits are the same) or you allow software to burst above its request as necessary when resources are available (i.e., limits are above requests). -
FIG. 3 is a diagram 300 of a total forecasted load versus totalactual load chart 302 and associatedcode display 304. The total forecasted load versus totalactual load chart 302 has an x axis of seconds of a timestamp and a y axis of count per second of load. As can be seen, the predicted load curve always exceeds the request count curve by a relatively small margin (the provisioning insurance margin). The associatedcode display 304 indicates the resources include limits and requests, which were described in the preceding paragraph. -
FIG. 4 depicts aflowchart 400 of an example of a method of predictive autoscaling and resource optimization. Theflowchart 400 starts atmodule 402 with converting an SLA metric data structure into a declarative performance objective. The SLA metric data structure can be stored in an SLA metric datastore, such as the SLAmetric datastore 104 described in association withFIG. 1 . A declarative performance interface engine, such as the declarativeperformance interface engine 108 described in association withFIG. 1 , can convert the SLA metric data structure into a declarative performance object. - The
flowchart 400 continues tomodule 404 with estimating a load forecast out to a future time. For a new deployment, it should be noted the load forecast may be of limited use because it amounts to little more than a guess based upon known data regarding the deployment, without the benefit of feedback related to resource utilization and performance post-deployment. It typically takes a few minutes to receive and process such feedback, at which point the load forecast can become a much more predictive estimate. Accordingly, themodule 404 could be skipped as unnecessary until such time as data becomes useful for making accurate predictions. An application load forecasting engine, such as the applicationload forecasting engine 114 described in association withFIG. 1 , can estimate a load forecast out to a future time. - The
flowchart 400 continues tomodule 406 with using the load forecast and the declarative performance objective to generate minimum cost optimization parameters. As was noted in the preceding paragraph, the forecast may be of limited use for accurately generating minimum cost optimization parameters. Moreover, although a model estimate could be provided in lieu of a performance model and an application behavior model generated in response to feedback associated with a deployment, such models are of limited value. After receiving feedback, themodule 406 can also use the performance model and the application behavior model to generate minimum cost optimization parameters. (See the description associated with themodules cost optimization engine 116 described in association withFIG. 1 , can use the load forecast, the performance model (if applicable), the application behavior model (if applicable), and the declarative performance objective to generate minimum cost optimization parameters. - The
flowchart 400 continues tomodule 408 with executing convergent deployments in accordance with the minimum cost optimization parameters. An optimal configuration for scale resources actuator engine, such as the optimal configuration for scaleresources actuator engine 118 described in association withFIG. 1 , can execute convergent deployments in accordance with the minimum cost optimization parameters. - The
flowchart 400 continues in parallel tomodule 410 with configuring a load distribution and metrics engine in accordance with the minimum cost optimization parameters. Although any applicable module can be configured for parallel execution with another, themodules resources actuator engine 118 described in association withFIG. 1 , can configure a load distribution and metrics engine in accordance with the minimum cost optimization parameters. - Following
modules flowchart 400 continues tomodule 412 with monitoring resources associated with the convergent deployments. A convergent deployment and resource metrics collection engine, such as the convergent deployment, resource, and application levelmetrics collection engine 120 described in association withFIG. 1 , can monitor resources (including channels) associated with the convergent deployments. - The
flowchart 400 continues tomodule 414 with providing feedback associated with the convergent deployments. A convergent deployment and resource metrics collection engine, such as the convergent deployment, resource, and application levelmetrics collection engine 120 described in association withFIG. 1 , can provide feedback associated with the convergent deployments. - The
flowchart 400 returns tomodule 404 and continues as described previously and also (in parallel) continues tomodule 416 with modeling application behavior in terms of resource utilization. In a specific implementation, the modeling of application behavior requires a combined total of up to approximately 5 minutes to receive, process, and perform machine learning on feedback frommodule 414. Accordingly, in this description of the example ofFIG. 4 , themodule 416 does is not introduced as quickly asmodule 404, though a model “stand-in” could be used. Moreover, theflowchart 400 could loop multiple times through other modules beforemodule 416 completes. A dynamics estimation engine, such as thedynamics estimation engine 112 described in association withFIG. 1 , can model application behavior in terms of resource utilization. Frommodule 416, theflowchart 400 returns tomodule 406 and continues as described previously. - The
flowchart 400 also continues tomodule 418 frommodule 414 with modeling performance as a response to load. In a specific implementation, the modeling of application behavior requires a combined total of up to approximately 5 minutes to receive, process, and perform machine learning on feedback frommodule 414. Accordingly, in this description of the example ofFIG. 4 , themodule 418 does is not introduced as quickly asmodule 404, though a model “stand-in” could be used. Moreover, theflowchart 400 could loop multiple times through other modules beforemodule 418 completes. A dynamics estimation engine, such as thedynamics estimation engine 112 described in association withFIG. 1 , can model performance as a response to load. Frommodule 418, theflowchart 400 returns tomodule 406 and continues as described previously. - It may be noted that the
modules module 414 tomodule module 418 if no updates to a model or forecast are made relative to the model or forecast from a first loop. Of course, themodules modules module 402 if a deployment is made without SLA metric data, which is provided later in the process. Finally, it may be noted that themodule 402 could be repeated if declarative performance objectives change (not shown). -
FIG. 5 depicts aflowchart 500 of an example of generating predictive autoscaling and resource optimization results in association with a machine learning process. Theflowchart 500 starts atmodule 502 with monitoring performance indicators and resource usage. A human or artificial agent of a service provider (or service consumer) can provide new performance indicators following, for example, a review of convergent deployment performance. A declarative performance interface engine, such as the declarativeperformance interface engine 108 described in association withFIG. 1 , can monitor performance indicators. A convergent deployment and resource metrics collection engine, such as the convergent deployment, resource, and application levelmetrics collection engine 120 described in association withFIG. 1 , can monitor resource usage. - The
flowchart 500 continues tomodule 504 with forecasting application load and seasonality. Seasonality can be illustrated in association with a use case, which is, in this example, a shoe company e-commerce deployment. Successful shoe company e-commerce deployments typically have stable traffic with some seasonality at Black Friday and during the holidays, plus some seemingly random spikes (e.g., when new shoes are released). Because of the importance of performance on revenue during these events, reactive autoscaling is suboptimal at these times. Systems engineers will manually over-provision so SLOs are met. A predictive autoscaling and resource optimization system, as described in this paper, is able to learn these seasonalities and provision a correct amount of resources (with provisioning insurance) for these events without manual intervention. Moreover, during the year, new shoes are released and system engineers are often unprepared for the massive load during the release. By the time they scale up reactively to meet demand, the shoe is selling on eBay for 10× the price. A predictive autoscaling and resource optimization system, as described in this paper, is able to predict these seemingly random spikes in traffic and scale accordingly so SLOs are met, and maximum revenue and customer satisfaction are achieved. Advantageously, money is not wasted on over-provisioning just to be prepared for these events. An application load forecasting engine, such as the applicationload forecasting engine 114 described in association withFIG. 1 , can forecast application load and seasonality. - The
flowchart 500 continues tomodule 506 with learning a behavior function of an application under load. A dynamics estimation engine, such as thedynamics estimation engine 112 described in association withFIG. 1 , can learn a behavior function of an application under load. - The
flowchart 500 continues tomodule 508 with estimating resources used at forecasted demand for resource requests. An application load forecasting engine, such as the applicationload forecasting engine 114 described in association withFIG. 1 , can estimate resources used at forecasted demand for resource requests. - The
flowchart 500 continues tomodule 510 with estimating a forecast pattern for setting resource limits. An application load forecasting engine, such as the applicationload forecasting engine 114 described in association withFIG. 1 , can estimate a forecast pattern for setting resource limits. - The
flowchart 500 continues tomodule 512 with minimizing resources needed to meet the forecasted demand. A minimum cost optimization engine, such as the minimumcost optimization engine 116 described in association withFIG. 1 , can minimize resources needed to meet the forecasted demand. - The
flowchart 500 ends atmodule 514 with learning from decisions made in order to improve forecasting and resource estimation. An optimal configuration for scale resources actuator engine, such as the optimal configuration for scaleresources actuator engine 118 described in association withFIG. 1 , can benefit in convergent deployment from learning from decisions made in order to improve forecasting and resource estimation. -
FIG. 6 depicts a diagram 600 of an example of a system for generating minimum cost optimization parameters. The diagram 600 includes an SLA metrics datastore 604, which may be implemented as the SLA metrics datastore 104 described in association withFIG. 1 ; a declarative performance interface engine 608 coupled to the SLA metric datastore 604 and which may be implemented as the declarative performance interface engine 108 described in association withFIG. 1 ; a dynamics estimation engine 612, which may be implemented as the dynamics estimation engine 112 described in association withFIG. 1 ; an application load forecasting engine 614, which may be implemented as the application load forecasting engine 114 described in association withFIG. 1 ; a minimum cost optimization engine 616, which may be implemented as the minimum cost optimization engine 116 described in association withFIG. 1 ; a declarative performance datastore 626 coupled to the declarative performance interface engine 608 and the minimum cost optimization engine 616; a behavior model datastore 628 coupled to the dynamics estimation engine 612 and the minimum cost optimization engine 616; a performance model datastore 630 coupled to the dynamics estimation engine 612 and the minimum cost optimization engine 616; a convergent deployment and resource metrics datastore 632 coupled to the dynamics estimation engine 612 and the application load forecasting engine 614; a utilization pattern learning engine 634 coupled to the convergent deployment and resource metric datastore 632; a forecasting model datastore 636 coupled to the application load forecasting engine 614 and the utilization pattern learning engine 634; a forecasted load datastore 638 coupled to the application load forecasting engine 614 and the minimum cost optimization engine 616; and a minimum cost optimization parameters datastore 640 coupled to the minimum cost optimization engine 616. - The declarative
performance interface engine 608 converts SLA metrics from the SLAmetric datastore 604 to declarative performance data structures represented by thedeclarative performance datastore 626. The declarativeperformance interface engine 608 may or may not receive instructions from a human or artificial agent of a service provider (or service consumer) to populate the SLAmetric datastore 604. If the SLAmetric datastore 604 is modified, thedeclarative performance engine 608 converts the modification so as to match an intended SLO represented in the SLAmetric datastore 604 with a declarative performance data structure in thedeclarative performance datastore 626. - The
dynamics estimation engine 608 uses machine learning techniques, such as deep learning, to generate a behavior model, which is represented by the behavior model datastore 628 and to generate a performance model, which is represented by theperformance model datastore 630. The models can be improved with feedback associated with applicable convergent deployments. Such feedback is represented by the convergent deployment and resource metrics datastore 632. The convergent deployment and resourcemetric datastore 632 can be populated by a convergent deployment and resource metrics collection engine (not shown), which may be implemented as the convergent deployment, resource, and application levelmetrics collection engine 120 described in association withFIG. 1 . - The utilization
pattern learning engine 634 uses deep learning to understand workload to generate models for seasonal load, trendy load, bursty load, and random load. Based on a request and an understanding of workload, signature limits can be set on a deployment according to a utilization pattern (e.g., bursty resulting in a higher limit vs. stable resulting in a lower limit). The result of the deep learning is a forecasting model represented by theforecasting model datastore 636. Theforecasting model datastore 636 can be improved with feedback associated with applicable convergent deployments. Such feedback is represented by the convergent deployment and resource metrics datastore 632. - The application
load forecasting engine 614 uses one or more forecasting models from theforecasting model datastore 636 and feedback from the convergent deployment and resource metrics datastore 632 to estimate resource usage at a future time; this forecasted load is represented by the forecastedload datastore 638. - The minimum
cost optimization engine 616 uses thedeclarative performance datastore 626, the behavior model datastore 628, the performance model datastore 630, and the forecasted load datastore 638 to generate minimum cost optimization parameters, which are represented by the minimum cost optimization parameters datastore 640. The minimum cost optimization parameters can be used by a software deployment platform that can include, for example, an optimal configuration for scale resources actuator engine (not shown), which may be implemented as the optimal configuration for scaleresources actuator engine 118 described in association withFIG. 1 .
Claims (20)
1. A system comprising:
a declarative performance interface engine configured to:
receive service level agreement (SLA) metrics;
convert the SLA metrics to declarative performance data structures, the data structures representing declared performance objectives for software deployment;
monitor performance indicators of deployed software;
a convergent deployment, resource, and application level metrics collection engine configured to monitor resource usage of deployed software;
a dynamics estimation engine configured to generate an application behavior model, the application behavior model based on performance indicators and resource usage as a function of load;
an application load forecasting engine configured to forecast load at a future time;
a minimum cost optimization engine configured to generate minimum cost optimization parameters based on the declarative performance data structures, the application behavior model, and the forecasted load;
a predictive autoscaling and resource optimization operator engine configured to provide the minimum cost optimization parameters to an optimal configuration for scale resources actuator engine;
the optimal configuration for scale resources actuator engine configured to execute convergent deployments;
a load distribution and metrics engine configured to perform load balancing on one or more of traffic to the convergent deployments and traffic from the convergent deployments, the load distribution and metrics engine being configured by the optimal configuration for scale resources actuator engine in accordance with the minimum cost optimization parameters;
the convergent deployment, resource, and application level metrics collection engine further configured to:
monitor resources associated with the convergent deployments;
provide feedback associated with the convergent deployments to the dynamics estimation engine;
the dynamics estimation engine further configured to generate an updated application behavior model based on the feedback.
2. The system of claim 1 , wherein the SLA metrics comprise one or more of service level indicator (SLI) metrics and service level objective (SLO) metrics.
3. The system of claim 1 , wherein the SLA metrics are defined by a human agent.
4. The system of claim 1 , wherein the SLA metrics are defined by an artificial agent.
5. The system of claim 1 , wherein the performance indicators comprise one or more of request count and request duration.
6. The system of claim 1 , wherein the resource usage comprises usage of one or more of memory, CPU power, disk I/O, and network I/O.
7. The system of claim 1 , wherein the application load forecasting engine is configured to forecast one or more of seasonal load, trendy load, bursty load, and random load.
8. The system of claim 1 , wherein the application load forecasting engine is configured to estimate a forecast pattern for use in setting resource limits.
9. The system of claim 1 , wherein the dynamics estimation engine generates the application behavior model using deep learning.
10. The system of claim 1 , further comprising a predictive autoscaling and resource optimization operator engine configured to determine measured error from the declared performance objectives.
11. A method comprising:
receiving service level agreement (SLA) metrics;
converting the SLA metrics to declarative performance data structures, the data structures representing declared performance objectives for software deployment;
monitoring performance indicators and resource usage of deployed software;
generating an application behavior model based on performance indicators and resource usage as a function of load;
forecasting load at a future time;
generating minimum cost optimization parameters based on the declarative performance data structures, the application behavior model, and the forecasted load;
executing convergent deployments;
performing load balancing on one or more of traffic to the convergent deployments and traffic from the convergent deployments in accordance with the minimum cost optimization parameters;
monitoring resources associated with the convergent deployments;
providing feedback associated with the convergent deployments;
generating an updated application behavior model based on the feedback.
12. The method of claim 11 , wherein the SLA metrics comprise one or more of service level indicator (SLI) metrics and service level objective (SLO) metrics.
13. The method of claim 11 , wherein the SLA metrics are defined by an artificial agent.
14. The method of claim 11 , wherein the performance indicators comprise one or more of request count and request duration.
15. The method of claim 11 , wherein the resource usage comprises usage of one or more of memory, CPU power, disk I/O, and network I/O.
16. The method of claim 11 , wherein the forecasted load comprises one or more of seasonal load, trendy load, bursty load, and random load.
17. The method of claim 11 , further comprising estimate a forecast pattern for use in setting resource limits.
18. The method of claim 11 , wherein the application behavior model is generated using deep learning.
19. The method of claim 11 , further comprising determining measured error from the declared performance objectives.
20. A system comprising:
means for receiving service level agreement (SLA) metrics;
means for converting the SLA metrics to declarative performance data structures, the data structures representing declared performance objectives for software deployment;
means for monitoring performance indicators and resource usage of deployed software;
means for generating an application behavior model based on performance indicators and resource usage as a function of load;
means for forecasting load at a future time;
means for generating minimum cost optimization parameters based on the declarative performance data structures, the application behavior model, and the forecasted load;
means for executing convergent deployments;
means for performing load balancing on one or more of traffic to the convergent deployments and traffic from the convergent deployments in accordance with the minimum cost optimization parameters;
means for monitoring resources associated with the convergent deployments;
means for providing feedback associated with the convergent deployments;
means for generating an updated application behavior model based on the feedback.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/619,062 US20220300344A1 (en) | 2019-06-12 | 2020-06-12 | Flexible credential supported software service provisioning |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962860740P | 2019-06-12 | 2019-06-12 | |
US201962864476P | 2019-06-20 | 2019-06-20 | |
PCT/US2020/037598 WO2020252390A1 (en) | 2019-06-12 | 2020-06-12 | Predictive autoscaling and resource optimization |
US17/619,062 US20220300344A1 (en) | 2019-06-12 | 2020-06-12 | Flexible credential supported software service provisioning |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220300344A1 true US20220300344A1 (en) | 2022-09-22 |
Family
ID=73781083
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/619,062 Abandoned US20220300344A1 (en) | 2019-06-12 | 2020-06-12 | Flexible credential supported software service provisioning |
US17/618,621 Active 2040-07-09 US11966788B2 (en) | 2019-06-12 | 2020-06-12 | Predictive autoscaling and resource optimization |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/618,621 Active 2040-07-09 US11966788B2 (en) | 2019-06-12 | 2020-06-12 | Predictive autoscaling and resource optimization |
Country Status (4)
Country | Link |
---|---|
US (2) | US20220300344A1 (en) |
EP (1) | EP3983894B1 (en) |
CN (1) | CN114930293A (en) |
WO (1) | WO2020252390A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220308539A1 (en) * | 2021-03-25 | 2022-09-29 | Robert Bosch Gmbh | Method and device for controlling a driving function |
US20230353464A1 (en) * | 2022-05-02 | 2023-11-02 | Jpmorgan Chase Bank, N.A. | Systems and methods for site reliability engineering |
US11966788B2 (en) | 2019-06-12 | 2024-04-23 | Snyk Limited | Predictive autoscaling and resource optimization |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021001958A1 (en) * | 2019-07-03 | 2021-01-07 | 日本電信電話株式会社 | Quality of service control device, quality of service control method, and program |
US11757982B2 (en) | 2020-08-05 | 2023-09-12 | Avesha, Inc. | Performing load balancing self adjustment within an application environment |
US11552886B2 (en) | 2021-03-09 | 2023-01-10 | Cisco Technology, Inc. | Topology optimization in SD-WANs with path downgrading |
CN113225211B (en) * | 2021-04-27 | 2022-09-02 | 中国人民解放军空军工程大学 | Fine-grained service function chain extension method |
US20220350675A1 (en) | 2021-05-03 | 2022-11-03 | Avesha, Inc. | Distributed computing system with multi tenancy based on application slices |
WO2022235624A1 (en) * | 2021-05-03 | 2022-11-10 | Avesha, Inc. | Controlling placement of workloads of an application within an application environment |
CN113127042A (en) * | 2021-05-08 | 2021-07-16 | 中山大学 | Intelligent contract recommendation method, equipment and storage medium |
US11893614B2 (en) | 2021-08-23 | 2024-02-06 | Shopify Inc. | Systems and methods for balancing online stores across servers |
US11880874B2 (en) | 2021-08-23 | 2024-01-23 | Shopify Inc. | Systems and methods for server load balancing based on correlated events |
US20230053818A1 (en) * | 2021-08-23 | 2023-02-23 | Shopify Inc. | Systems and methods for modifying online stores |
KR102621092B1 (en) * | 2021-11-23 | 2024-01-05 | (주)글루시스 | Resource provisioning method for kubernetes pods |
WO2023177709A1 (en) * | 2022-03-15 | 2023-09-21 | Liveperson, Inc. | Methods and systems for ai-based load balancing of processing resources in distributed environments |
US20230333912A1 (en) * | 2022-04-15 | 2023-10-19 | Dell Products L.P. | Method and system for managing a distributed multi-tiered computing environment based on load predictions |
CN116149843A (en) * | 2022-11-28 | 2023-05-23 | 中国科学院深圳先进技术研究院 | Dynamic programming-based resource allocation method, network, storage medium and processor |
CN116501594B (en) * | 2023-06-27 | 2023-09-08 | 上海燧原科技有限公司 | System modeling evaluation method and device, electronic equipment and storage medium |
US12132621B1 (en) | 2023-09-29 | 2024-10-29 | Hewlett Packard Enterprise Development Lp | Managing network service level thresholds |
CN118363734B (en) * | 2024-05-13 | 2024-10-25 | 中国人民解放军军事科学院战争研究院 | Dynamic optimization scheduling method and system for cloud environment high-density service |
Family Cites Families (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7173910B2 (en) * | 2001-05-14 | 2007-02-06 | Level 3 Communications, Inc. | Service level agreements based on objective voice quality testing for voice over IP (VOIP) networks |
US8346909B2 (en) * | 2004-01-22 | 2013-01-01 | International Business Machines Corporation | Method for supporting transaction and parallel application workloads across multiple domains based on service level agreements |
US7571120B2 (en) * | 2005-01-12 | 2009-08-04 | International Business Machines Corporation | Computer implemented method for estimating future grid job costs by classifying grid jobs and storing results of processing grid job microcosms |
US8677353B2 (en) * | 2007-01-11 | 2014-03-18 | Nec Corporation | Provisioning a standby virtual machine based on the prediction of a provisioning request being generated |
US8918496B2 (en) * | 2007-04-30 | 2014-12-23 | Hewlett-Packard Development Company, L.P. | System and method for generating synthetic workload traces |
US8397088B1 (en) * | 2009-07-21 | 2013-03-12 | The Research Foundation Of State University Of New York | Apparatus and method for efficient estimation of the energy dissipation of processor based systems |
US8645529B2 (en) * | 2010-10-06 | 2014-02-04 | Infosys Limited | Automated service level management of applications in cloud computing environment |
US8621058B2 (en) * | 2010-10-28 | 2013-12-31 | Hewlett-Packard Development Company, L.P. | Providing cloud-based computing services |
US8756609B2 (en) * | 2011-12-30 | 2014-06-17 | International Business Machines Corporation | Dynamically scaling multi-tier applications vertically and horizontally in a cloud environment |
US9967159B2 (en) | 2012-01-31 | 2018-05-08 | Infosys Limited | Systems and methods for providing decision time brokerage in a hybrid cloud ecosystem |
US20150032495A1 (en) * | 2013-07-25 | 2015-01-29 | Futurewei Technologies, Inc. | System and Method for User Controlled Cost Based Network and Path Selection across Multiple Networks |
WO2015138678A1 (en) * | 2014-03-13 | 2015-09-17 | Jpmorgan Chase Bank, N.A. | Systems and methods for intelligent workload routing |
US10445134B2 (en) * | 2014-06-03 | 2019-10-15 | Amazon Technologies, Inc. | Identifying candidate workloads for migration |
US10439870B2 (en) * | 2015-11-24 | 2019-10-08 | International Business Machines Corporation | Assessment and dynamic provisioning of computing resources for multi-tiered application |
US10887176B2 (en) * | 2017-03-30 | 2021-01-05 | Hewlett Packard Enterprise Development Lp | Predicting resource demand in computing environments |
US11294726B2 (en) * | 2017-05-04 | 2022-04-05 | Salesforce.Com, Inc. | Systems, methods, and apparatuses for implementing a scalable scheduler with heterogeneous resource allocation of large competing workloads types using QoS |
US11126927B2 (en) | 2017-11-24 | 2021-09-21 | Amazon Technologies, Inc. | Auto-scaling hosted machine learning models for production inference |
US11256548B2 (en) * | 2018-05-03 | 2022-02-22 | LGS Innovations LLC | Systems and methods for cloud computing data processing |
US20190342380A1 (en) * | 2018-05-07 | 2019-11-07 | Microsoft Technology Licensing, Llc | Adaptive resource-governed services for performance-compliant distributed workloads |
US10977083B2 (en) * | 2018-08-30 | 2021-04-13 | Intuit Inc. | Cost optimized dynamic resource allocation in a cloud infrastructure |
US10789089B2 (en) * | 2018-09-13 | 2020-09-29 | Intuit Inc. | Dynamic application migration between cloud providers |
EP3983894B1 (en) | 2019-06-12 | 2024-10-30 | Arigato Machine, Inc., dba Manifold | Predictive autoscaling and resource optimization |
US11144443B2 (en) * | 2019-09-09 | 2021-10-12 | Microsoft Technology Licensing, Llc | Optimization of workloads based on constraints |
US11561836B2 (en) * | 2019-12-11 | 2023-01-24 | Sap Se | Optimizing distribution of heterogeneous software process workloads |
US11593180B2 (en) * | 2020-12-15 | 2023-02-28 | Kyndryl, Inc. | Cluster selection for workload deployment |
US12020070B2 (en) * | 2021-04-02 | 2024-06-25 | Red Hat, Inc. | Managing computer workloads across distributed computing clusters |
US20220114251A1 (en) * | 2021-11-16 | 2022-04-14 | Francesc Guim Bernat | Reputation management and intent-based security mechanisms |
-
2020
- 2020-06-12 EP EP20823465.8A patent/EP3983894B1/en active Active
- 2020-06-12 CN CN202080056503.4A patent/CN114930293A/en active Pending
- 2020-06-12 US US17/619,062 patent/US20220300344A1/en not_active Abandoned
- 2020-06-12 US US17/618,621 patent/US11966788B2/en active Active
- 2020-06-12 WO PCT/US2020/037598 patent/WO2020252390A1/en unknown
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11966788B2 (en) | 2019-06-12 | 2024-04-23 | Snyk Limited | Predictive autoscaling and resource optimization |
US20220308539A1 (en) * | 2021-03-25 | 2022-09-29 | Robert Bosch Gmbh | Method and device for controlling a driving function |
US12093006B2 (en) * | 2021-03-25 | 2024-09-17 | Robert Bosch Gmbh | Method and device for controlling a driving function |
US20230353464A1 (en) * | 2022-05-02 | 2023-11-02 | Jpmorgan Chase Bank, N.A. | Systems and methods for site reliability engineering |
Also Published As
Publication number | Publication date |
---|---|
EP3983894B1 (en) | 2024-10-30 |
US11966788B2 (en) | 2024-04-23 |
WO2020252390A1 (en) | 2020-12-17 |
US20220244993A1 (en) | 2022-08-04 |
EP3983894A1 (en) | 2022-04-20 |
EP3983894A4 (en) | 2023-06-21 |
CN114930293A (en) | 2022-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11966788B2 (en) | Predictive autoscaling and resource optimization | |
WO2020252390A9 (en) | Predictive autoscaling and resource optimization | |
US10942781B2 (en) | Automated capacity provisioning method using historical performance data | |
Imdoukh et al. | Machine learning-based auto-scaling for containerized applications | |
US20150039764A1 (en) | System, Method and Computer Program Product for Energy-Efficient and Service Level Agreement (SLA)-Based Management of Data Centers for Cloud Computing | |
Krebs et al. | Resource usage control in multi-tenant applications | |
Arkian et al. | Model-based stream processing auto-scaling in geo-distributed environments | |
Gias et al. | Cocoa: Cold start aware capacity planning for function-as-a-service platforms | |
US9851988B1 (en) | Recommending computer sizes for automatically scalable computer groups | |
Anglano et al. | FC2Q: exploiting fuzzy control in server consolidation for cloud applications with SLA constraints | |
Sousa et al. | Predictive elastic replication for multi‐tenant databases in the cloud | |
Zhou et al. | AHPA: adaptive horizontal pod autoscaling systems on alibaba cloud container service for kubernetes | |
Mera-Gómez et al. | A debt-aware learning approach for resource adaptations in cloud elasticity management | |
Samir et al. | Autoscaling recovery actions for container‐based clusters | |
Giannakopoulos et al. | Smilax: statistical machine learning autoscaler agent for Apache Flink | |
Antonescu et al. | Improving management of distributed services using correlations and predictions in SLA-driven cloud computing systems | |
Xiong et al. | Sla-based service composition in enterprise computing | |
Sood | Function points‐based resource prediction in cloud computing | |
Gohad et al. | Model driven provisioning in multi-tenant clouds | |
Martinez-Julia et al. | Explained intelligent management decisions in virtual networks and network slices | |
Incerto et al. | μOpt: An Efficient Optimal Autoscaler for Microservice Applications | |
Di Sanzo et al. | Providing transaction class-based qos in in-memory data grids via machine learning | |
Simmons et al. | Dynamic provisioning of resources in data centers | |
Wu et al. | Elastic resource provisioning for batched stream processing system in container cloud | |
Akash et al. | An event-driven and lightweight proactive auto-scaling architecture for cloud applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION UNDERGOING PREEXAM PROCESSING |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- INCOMPLETE APPLICATION (PRE-EXAMINATION) |